Tutorial: Learning to Drive

Check out this thread for performance ideas: DeepMimic-like System With Learning Agents Plugin

BTW, in the beginning going in reverse is not wrong. The agent is randomly testing out the actions to determine what the rewards it will get are.

I would highly encourage you to use a lower tick rate on the manager. A high ticket rate makes it harder for the learning algorithm as you are not allowing much time to pass for the action to impact the game state.