Tutorial: Learning to Drive

Check out this thread for performance ideas: DeepMimic-like System With Learning Agents Plugin

BTW, in the beginning going in reverse is not wrong. The agent is randomly testing out the actions to determine what the rewards it will get are.

I would highly encourage you to use a lower tick rate on the manager. A high ticket rate makes it harder for the learning algorithm as you are not allowing much time to pass for the action to impact the game state.

Thanks for this @Deathcalibur ! Great plugin, I made a community tutorial I wanted to share.

Unreal Learning Agents Tutorial | Epic Developer Community (epicgames.com)

2 Likes

I got the setup from the tutorial working and it’s great quality!! Vehicles really drive around within minutes :smiley: Thanks for creating this!

Now; can someone tell me the logic for not-hitting other cars? As for what I understand, I should set this up as observations AND (negative) rewards… But, looking into the nodes for observation, all I see is:

    1. Planar Position Observation
    1. Angle Observation
    1. Velocity Observation
    1. Time Observation
      and more…

But none of these, in my mind can bring me to making logic to train the AI not to crash into other cars/obstacles… Maybe anyone figured out how to do this?

In order to avoid other cars, you would obviously need to turn the collisions on. You might want to add an explicit negative reward when hitting another car. All the agents “train together” right now so like hitting another car will slow down one agent but give another agent a speed boost, so this might cause some weird behavior to emerge if not penalized in some way.

In terms of observations, what you would need to do is say query the top X nearest cars, and then for each car you would probably want to add a position observation and velocity observation. These observations take in a relative position where you could plug in the “current” car who is looking at these other cars.

So your observations would sort of look like:
Self_Position, Self_Velocity, … OpponentCar0_Position, OpponentCar0_Velocity, OpponentCar1_Position, OpponentCar1_Velocity, etc.

I haven’t tried this myself yet but it should work. One big limitation here is that the weights for Opponent cars will not be shared and Learning Agents doesn’t currently have a way to do this yet.

We are working on addressing these shortcomings in the next release :smiley:

2 Likes

I’ve implemented almost ad-verbatim the tutorial, only had to change the line 228 of train_common.py so that the training didn’t fail from the timeout, but even after several hours, all vehicles have the same erratic behavior and barely move. The avg reward in the log is always a negative near zero, didn’t seem to be improving over time and looked more like a random value.

I have tried removing the brake like other people have suggested, but the result didn’t change much either. Is there something I could debug to figure out what is failing in the training?

Training started to progress after changing the completion mode from Termination to Truncation. Cars are now confidently taking curves and running around the map :smiley:

1 Like

Hello guys!

I followed the tutorial but I got stuck at “Reset Episodes” at:
Open the SportsCar_Pawn blueprint. On the left side, add a new function and give it the name “ResetToRandomPointOnSpline”:

But when I add this function I don’t get the option with “Splines”. I copied the Blueprint from the tutorial but it breaks the first connection to the Splines.

What can I do?
I’m using UE 5.3.1

You should add a SplineComponent input parameter to the function, so that the Track Spline can be supplied when calling it from the trainer.

One big limitation here is that the weights for Opponent cars will not be shared

What does it mean that the weights won’t be shared in this example, and what’s the issue if they’re not shared?

But when I add this function I don’t get the option with “Splines”. I copied the Blueprint from the tutorial but it breaks the first connection to the Splines.

You need to add an input named Spline to the function of type Spline Component.

I’m surprised that nobody pointed out that this tutorial won’t make AMD GPUs very happy as the python backend seems to rely on CUDA (works fine of you just switch the training device to CPU). Maybe add a little note in the tutorial?

2 Likes

I am facing the same velocity and leaning issue. I did everything in the tutorial, my BP manager has the same values as yours, but the project is running for more than 4 hours and the car learning is really slow.

After 4 hours this is my learning results

The reward it’s not even positive. I really don’t know how to make it work right :confused:

I have tried this tutorial several, and the only scenario that works for me was as the following. I created a new vehicle template on UE5.2 and after saving the project, I opened the project in UE5.3. Then I followed the tutorial and the only thing that I changed was checking the reinitialize policy network in training(by checking it gets false ).

Hi Brendan can you help me with these errors.

I have been running a lot of training sessions now and would like to share my observations and maybe get some feedback from you guys.

So, the training mostly works out quite good so far. The major issue I notice is that the optimum the neural network arrives at is not ideal. Sure, they are clever enough to stay on the track and get so somewhat decent speeds (in my experiments they reach about 80-100 km/h. Even adding additional rewards (like forward speed instead of velocity along track, or reaching higher gears) doesn’t seem to have an effect on the overall velocity of the cars. I tried different scales and weights to prioritize speed over following the spline, played around with different network sizes and episode lengths, reduced the tick interval to 0,01 etc. But the results are always more or less the same.

What I observe is:

  • the cars don’t really stay on the spline (even when reducing the threshold for the off track penalty to a narrow interval like 50 or so). They tend to start veering when they are already almost off the track.
  • They don’t steer consistently when taking a curve, but rather do it at short intervals, resulting in a wobbly motion
  • They speed up at unexpected areas of the track, right before a curve etc., and maintain a rather slow speed on straight parts

For a racing game, it would be expected that they accelerate on straight parts of the track and slow down when taking curves, and that they try to stay very close to the spline (maybe deviating from it a bit to take a more ideal path).

I thought that maybe with time the agents will slowly get to a more ideal behavior, but even after running the training for 120k steps, the network seems to not learn anything new. The rewards are flattening out after some time.

My suspicion is that the agents lack sufficient observation of the world. Changing direction is a reaction to the observed distance to the spline. So in a sense, the agents only notice that they should change direction, when it is already to late. They are not able to “see” a curve in advance like a human driver would. This probably leads to the overall decrease in speed, since it is more optimal to drive slowly and be able to react in time when the spline changes its form.
So the network either needs a much faster reaction time to small changes in the observations (like the distance along spline), or the agents need more foresight (maybe with line trace or something similar). What are your thoughts?

Also, is it reasonable to assume that you very quickly reach a limit in terms of network size with a single graphics card (e.g. RTX3070). So it would be a hard hardware limitation when you try to train a network with dozens of observations and a more complex reward setup?

I’m having the same problem. My cars quickly learn to accelerate but don’t even attempt to turn. They simply continue straight until they leave the track and reset their position. You were able to find a solution for this problem. Thanks in advance

1 Like

Question: do we know if any companies are currently trying to use this yet?

Hey, good advice about adding the note!

We are currently limited to what PyTorch supports for training purposes. If they add support on Windows/Mac for AMD GPUs, we will gladly support it in the future.

Inference with Learning Agents is currently and intentionally limited to CPU only so its a non-question on that side.

@1sirianth1

Question: do we know if any companies are currently trying to use this yet?

Learning Agents is still experimental so I don’t recommend depending on it fully just yet, unless you like living on the bleeding edge. It’s not going anywhere but definitely expect breaking changes in the next release.

Im having an issue where the cars go 3km/h no matter how long I train them, how do i fix this

1 Like

My agent can run normally, but the log shows that it does not generate any experience

LogLearning: Display: BP_RLTrainer: Sending / Receiving initial policy...
LogLearning: Display: Training Process: {
LogLearning: Display: Training Process:     "TaskName": "BP_RLTrainer",
LogLearning: Display: Training Process:     "TrainerMethod": "PPO",
LogLearning: Display: Training Process:     "TrainerType": "SharedMemory",
LogLearning: Display: Training Process:     "TimeStamp": "2024-01-03_22-59-44",
LogLearning: Display: Training Process:     "SitePackagesPath": "D:/ue 5.3/UE_5.3/Engine/Plugins/Experimental/PythonFoundationPackages/Content/Python/Lib/Win64/site-packages",
LogLearning: Display: Training Process:     "IntermediatePath": "D:/ue 5.3/project/RL_ship/Intermediate/LearningAgents",
LogLearning: Display: Training Process:     "PolicyGuid": "{12A8D344-4B6A-E2D7-C938-14A1F9EA5BB4}",
LogLearning: Display: Training Process:     "ControlsGuid": "{30308F02-4E8E-0477-5376-6FA70169977F}",
LogLearning: Display: Training Process:     "EpisodeStartsGuid": "{A9790BA8-4060-BD5C-4D82-11AFFA86CA3B}",
LogLearning: Display: Training Process:     "EpisodeLengthsGuid": "{99752603-4D32-8964-88A7-08BF8FE25295}",
LogLearning: Display: Training Process:     "EpisodeCompletionModesGuid": "{8031D284-4106-BFF0-83F0-F2BFBC8155CF}",
LogLearning: Display: Training Process:     "EpisodeFinalObservationsGuid": "{01234C97-493C-D892-A647-E7A722628CD8}",
LogLearning: Display: Training Process:     "ObservationsGuid": "{92463EA6-4D09-CA6A-BCFD-6695828505EA}",
LogLearning: Display: Training Process:     "ActionsGuid": "{89FCCA85-461B-E86B-DD0F-419255892C0E}",
LogLearning: Display: Training Process:     "RewardsGuid": "{30CBE59D-4F20-AB2A-E980-8C84EBDCB994}",
LogLearning: Display: Training Process:     "ObservationVectorDimensionNum": 27,
LogLearning: Display: Training Process:     "ActionVectorDimensionNum": 2,
LogLearning: Display: Training Process:     "MaxEpisodeNum": 5000,
LogLearning: Display: Training Process:     "MaxStepNum": 10000,
LogLearning: Display: Training Process:     "PolicyNetworkByteNum": 82516,
LogLearning: Display: Training Process:     "PolicyHiddenUnitNum": 128,
LogLearning: Display: Training Process:     "PolicyLayerNum": 3,
LogLearning: Display: Training Process:     "PolicyActivationFunction": "ELU",
LogLearning: Display: Training Process:     "PolicyActionNoiseMin": 0.25,
LogLearning: Display: Training Process:     "PolicyActionNoiseMax": 0.25,
LogLearning: Display: Training Process:     "CriticNetworkByteNum": 80968,
LogLearning: Display: Training Process:     "CriticHiddenUnitNum": 128,
LogLearning: Display: Training Process:     "CriticLayerNum": 3,
LogLearning: Display: Training Process:     "CriticActivationFunction": "ELU",
LogLearning: Display: Training Process:     "ProcessNum": 1,
LogLearning: Display: Training Process:     "IterationNum": 1000000,
LogLearning: Display: Training Process:     "LearningRatePolicy": 0.0010000000474974513,
LogLearning: Display: Training Process:     "LearningRateCritic": 0.009999999776482582,
LogLearning: Display: Training Process:     "LearningRateDecay": 0.9900000095367432,
LogLearning: Display: Training Process:     "WeightDecay": 0.0010000000474974513,
LogLearning: Display: Training Process:     "InitialActionScale": 0.10000000149011612,
LogLearning: Display: Training Process:     "BatchSize": 128,
LogLearning: Display: Training Process:     "EpsilonClip": 0.20000000298023224,
LogLearning: Display: Training Process:     "ActionRegularizationWeight": 0.0010000000474974513,
LogLearning: Display: Training Process:     "EntropyWeight": 0.019999999552965164,
LogLearning: Display: Training Process:     "GaeLambda": 0.8999999761581421,
LogLearning: Display: Training Process:     "ClipAdvantages": true,
LogLearning: Display: Training Process:     "AdvantageNormalization": true,
LogLearning: Display: Training Process:     "TrimEpisodeStartStepNum": 0,
LogLearning: Display: Training Process:     "TrimEpisodeEndStepNum": 0,
LogLearning: Display: Training Process:     "Seed": 1234,
LogLearning: Display: Training Process:     "DiscountFactor": 0.9900000095367432,
LogLearning: Display: Training Process:     "Device": "GPU",
LogLearning: Display: Training Process:     "UseTensorBoard": false,
LogLearning: Display: Training Process:     "UseInitialPolicyNetwork": false,
LogLearning: Display: Training Process:     "UseInitialCriticNetwork": false,
LogLearning: Display: Training Process:     "SynchronizeCriticNetwork": false,
LogLearning: Display: Training Process:     "LoggingEnabled": true
LogLearning: Display: Training Process: }
LogLearning: Display: BP_RLTrainingManager_C_1: Resetting Agents [0].
LogLearning: Warning: BP_RLTrainer: Agent with id 0 has completed episode and will be reset but has not generated any experience.

Can you tell what maybe happening?

1 Like