Tutorial: Learning to Drive

tomhalpin8 · September 9, 2023, 5:46pm

I decided to try it on my Mac M2

Getting this error on play:

LogDebuggerCommands: Repeating last play command: Selected Viewport
LogContentBundle: [VehicleExampleMap(Standalone)] Generating Streaming for 0 Content Bundles.
LogWorldPartition: Display: GenerateStreaming for 'VehicleExampleMap' started...
LogWorldPartition: Display: GenerateStreaming for 'VehicleExampleMap' took 1 ms (total: 5 ms)
LogPlayLevel: PlayLevel: No blueprints needed recompiling
PIE: New page: PIE session: VehicleExampleMap (Sep 9, 2023, 3:45:04 AM)
LogPlayLevel: Creating play world package: /Game/VehicleTemplate/Maps/UEDPIE_0_VehicleExampleMap
LogPlayLevel: PIE: StaticDuplicateObject took: (0.164241s)
LogPlayLevel: PIE: Created PIE world by copying editor world from /Game/VehicleTemplate/Maps/VehicleExampleMap.VehicleExampleMap to /Game/VehicleTemplate/Maps/UEDPIE_0_VehicleExampleMap.VehicleExampleMap (0.164266s)
LogUObjectHash: Compacting FUObjectHashTables data took   0.31ms
LogWorldSubsystemInput: UEnhancedInputDeveloperSettings::bEnableWorldSubsystem is false, the world subsystem will not be created!
LogChaos: FPhysicsSolverBase::AsyncDt:-1.000000
LogAIModule: Creating AISystem for world VehicleExampleMap
LogWorldPartition: ULevel::OnLevelLoaded(VehicleExampleMap)(bIsOwningWorldGameWorld=1, bIsOwningWorldPartitioned=1, InitializeForMainWorld=1, InitializeForEditor=0, InitializeForGame=1)
LogWorldPartition: Display: WorldPartition initialize started...
LogWorldPartition: UWorldPartition::Initialize : World = VehicleExampleMap, World Type = PIE, IsMainWorldPartition = 1, Location = V(0), Rotation = R(0), IsEditor = 0, IsGame = 0, IsPIEWorldTravel = 0, IsCooking = 0
LogWorldPartition: UWorldPartition::Initialize Context : World NetMode = Standalone, IsServer = 0, IsDedicatedServer = 0, IsServerStreamingEnabled = 0, IsServerStreamingOutEnabled = 0, IsUsingMakingVisibleTransaction = 0, IsUsingMakingInvisibleTransaction = 0
LogContentBundle: [VehicleExampleMap(Standalone)] Creating new contrainer.
LogWorldPartition: Display: WorldPartition initialize took 9 ms (total: 277 ms)
LogPlayLevel: PIE: World Init took: (0.010491s)
LogAudio: Display: Creating Audio Device:                 Id: 4, Scope: Unique, Realtime: True
LogAudioMixer: Display: Audio Mixer Platform Settings:
LogAudioMixer: Display:     Sample Rate:                          48000
LogAudioMixer: Display:     Callback Buffer Frame Size Requested: 1024
LogAudioMixer: Display:     Callback Buffer Frame Size To Use:    1024
LogAudioMixer: Display:     Number of buffers to queue:           2
LogAudioMixer: Display:     Max Channels (voices):                32
LogAudioMixer: Display:     Number of Async Source Workers:       0
LogAudio: Display: AudioDevice MaxSources: 32
LogAudio: Display: Audio Spatialization Plugin: None (built-in).
LogAudio: Display: Audio Reverb Plugin: None (built-in).
LogAudio: Display: Audio Occlusion Plugin: None (built-in).
LogAudioMixerAudioUnit: Display: Bytes per submitted buffer: 8192
LogAudioMixerAudioUnit: Warning: Error querying Sample Rate: 2003332927
LogAudioMixer: Display: Initializing audio mixer using platform API: 'CoreAudio'
LogAudioMixer: Display: Using Audio Hardware Device Unknown
LogAudioMixer: Display: Initializing Sound Submixes...
LogAudioMixer: Display: Creating Master Submix 'MasterSubmixDefault'
LogAudioMixer: Display: Creating Master Submix 'MasterReverbSubmixDefault'
LogAudioMixer: Display: Output buffers initialized: Frames=1024, Channels=2, Samples=2048, InstanceID=4
LogAudioMixer: Display: Starting AudioMixerPlatformInterface::RunInternal(), InstanceID=4
LogInit: FAudioDevice initialized with ID 4.
LogAudio: Display: Audio Device (ID: 4) registered with world 'VehicleExampleMap'.
LogAudioMixer: Initializing Audio Bus Subsystem for audio device with ID 4
LogLoad: Game class is 'VehicleAdvGameMode_C'
LogWorld: Bringing World /Game/VehicleTemplate/Maps/UEDPIE_0_VehicleExampleMap.VehicleExampleMap up for play (max tick rate 60) at 2023.09.09-10.45.04
LogWorld: Bringing up level for play took: 0.010977
LogOnline: OSS: Created online subsystem instance for: :Context_3
LogSpawn: Warning: SpawnActor failed because no class was specified
r.MotionBlur.Amount = "0"
LogLearning: Display: BP_RLTrainingManager_C_UAID_2CF05DAF86939DA601_1666759087: Adding Agent SportsCar_Pawn_C_UAID_2CF05DAF8693A2A601_1835287986 with id 0.
LogLearning: Display: BP_RLTrainingManager_C_UAID_2CF05DAF86939DA601_1666759087: Adding Agent SportsCar_Pawn_C_UAID_2CF05DAF8693A2A601_1829841982 with id 1.
LogLearning: Display: BP_RLTrainingManager_C_UAID_2CF05DAF86939DA601_1666759087: Adding Agent SportsCar_Pawn_C_UAID_2CF05DAF8693A2A601_1778131970 with id 2.
LogLearning: Display: BP_RLTrainingManager_C_UAID_2CF05DAF86939DA601_1666759087: Adding Agent SportsCar_Pawn_C_UAID_2CF05DAF8693A2A601_1784027972 with id 3.
LogLearning: Display: BP_RLTrainingManager_C_UAID_2CF05DAF86939DA601_1666759087: Adding Agent SportsCar_Pawn_C_UAID_2CF05DAF8693A2A601_1848125997 with id 4.
LogLearning: Display: BP_RLTrainingManager_C_UAID_2CF05DAF86939DA601_1666759087: Adding Agent SportsCar_Pawn_C_UAID_2CF05DAF8693A2A601_1819803974 with id 5.
LogLearning: Display: BP_RLTrainingManager_C_UAID_2CF05DAF86939DA601_1666759087: Adding Agent SportsCar_Pawn_C_UAID_2CF05DAF8693A2A601_1819807975 with id 6.
LogLearning: Display: BP_RLTrainingManager_C_UAID_2CF05DAF86939DA601_1666759087: Adding Agent SportsCar_Pawn_C_UAID_2CF05DAF8693A2A601_1825199978 with id 7.
LogLearning: Display: BP_RLTrainingManager_C_UAID_2CF05DAF86939DA601_1666759087: Adding Agent SportsCar_Pawn_C_UAID_2CF05DAF8693A2A601_1829836981 with id 8.
LogLearning: Display: BP_RLTrainingManager_C_UAID_2CF05DAF86939DA601_1666759087: Adding Agent SportsCar_Pawn_C_UAID_2CF05DAF8693A2A601_1848134999 with id 9.
LogLearning: Display: BP_RLTrainingManager_C_UAID_2CF05DAF86939DA601_1666759087: Adding Agent SportsCar_Pawn_C_UAID_2CF05DAF8693A2A601_1829845983 with id 10.
LogLearning: Display: BP_RLTrainingManager_C_UAID_2CF05DAF86939DA601_1666759087: Adding Agent SportsCar_Pawn_C_UAID_2CF05DAF8693A2A601_1835293987 with id 11.
LogLearning: Display: BP_RLTrainingManager_C_UAID_2CF05DAF86939DA601_1666759087: Adding Agent SportsCar_Pawn_C_UAID_2CF05DAF8693A2A601_1848130998 with id 12.
LogLearning: Display: BP_RLTrainingManager_C_UAID_2CF05DAF86939DA601_1666759087: Adding Agent SportsCar_Pawn_C_UAID_2CF05DAF8693A2A601_1825208980 with id 13.
LogLearning: Display: BP_RLTrainingManager_C_UAID_2CF05DAF86939DA601_1666759087: Adding Agent SportsCar_Pawn_C_UAID_2CF05DAF8693A2A601_1844606993 with id 14.
LogLearning: Display: BP_RLTrainingManager_C_UAID_2CF05DAF86939DA601_1666759087: Adding Agent SportsCar_Pawn_C_UAID_2CF05DAF8693A2A601_1829849984 with id 15.
LogLearning: Display: BP_RLTrainingManager_C_UAID_2CF05DAF86939DA601_1666759087: Adding Agent SportsCar_Pawn_C_UAID_2CF05DAF8693A2A601_1780584971 with id 16.
LogLearning: Display: BP_RLTrainingManager_C_UAID_2CF05DAF86939DA601_1666759087: Adding Agent SportsCar_Pawn_C_UAID_2CF05DAF8693A2A601_1819799973 with id 17.
LogLearning: Display: BP_RLTrainingManager_C_UAID_2CF05DAF86939DA601_1666759087: Adding Agent SportsCar_Pawn_C_UAID_2CF05DAF8693A2A601_1766075969 with id 18.
LogLearning: Display: BP_RLTrainingManager_C_UAID_2CF05DAF86939DA601_1666759087: Adding Agent SportsCar_Pawn_C_UAID_2CF05DAF8693A2A601_1844615995 with id 19.
LogLearning: Display: BP_RLTrainingManager_C_UAID_2CF05DAF86939DA601_1666759087: Adding Agent SportsCar_Pawn_C_UAID_2CF05DAF8693A2A601_1825204979 with id 20.
LogLearning: Display: BP_RLTrainingManager_C_UAID_2CF05DAF86939DA601_1666759087: Adding Agent SportsCar_Pawn_C_UAID_2CF05DAF8693A2A601_1825195977 with id 21.
LogLearning: Display: BP_RLTrainingManager_C_UAID_2CF05DAF86939DA601_1666759087: Adding Agent SportsCar_Pawn_C_UAID_2CF05DAF8693A2A601_1835281985 with id 22.
LogLearning: Display: BP_RLTrainingManager_C_UAID_2CF05DAF86939DA601_1666759087: Adding Agent SportsCar_Pawn_C_UAID_2CF05DAF8693A2A601_1841329991 with id 23.
LogLearning: Display: BP_RLTrainingManager_C_UAID_2CF05DAF86939DA601_1666759087: Adding Agent SportsCar_Pawn_C_UAID_2CF05DAF8693A2A601_1844611994 with id 24.
LogLearning: Display: BP_RLTrainingManager_C_UAID_2CF05DAF86939DA601_1666759087: Adding Agent SportsCar_Pawn_C_UAID_2CF05DAF8693A2A601_1835297988 with id 25.
LogLearning: Display: BP_RLTrainingManager_C_UAID_2CF05DAF86939DA601_1666759087: Adding Agent SportsCar_Pawn_C_UAID_2CF05DAF8693A2A601_1841320989 with id 26.
LogLearning: Display: BP_RLTrainingManager_C_UAID_2CF05DAF86939DA601_1666759087: Adding Agent SportsCar_Pawn_C_UAID_2CF05DAF8693A2A601_1819811976 with id 27.
LogLearning: Display: BP_RLTrainingManager_C_UAID_2CF05DAF86939DA601_1666759087: Adding Agent SportsCar_Pawn_C_UAID_2CF05DAF8693A2A601_1841325990 with id 28.
LogLearning: Display: BP_RLTrainingManager_C_UAID_2CF05DAF86939DA601_1666759087: Adding Agent SportsCar_Pawn_C_UAID_2CF05DAF8693A2A601_1841333992 with id 29.
LogLearning: Display: BP_RLTrainingManager_C_UAID_2CF05DAF86939DA601_1666759087: Adding Agent SportsCar_Pawn_C_UAID_2CF05DAF8693A2A601_1844619996 with id 30.
LogLearning: Display: BP_RLTrainingManager_C_UAID_2CF05DAF86939DA601_1666759087: Adding Agent SportsCar_Pawn_C_0 with id 31.
PIE: Server logged in
PIE: Play in editor total start time 0.2 seconds.
LogLearning: Error: BP_DrivingRLTrainer: Can't find Python executable "../../../Engine/Binaries/ThirdParty/Python3/Mac/bin/python".
LogLearning: Error: BP_DrivingRLTrainer: Can't find Python executable "../../../Engine/Binaries/ThirdParty/Python3/Mac/bin/python".
LogLearning: Error: BP_DrivingRLTrainer: Can't find Python executable "../../../Engine/Binaries/ThirdParty/Python3/Mac/bin/python".
LogLearning: Error: BP_DrivingRLTrainer: Can't find Python executable "../../../Engine/Binaries/ThirdParty/Python3/Mac/bin/python".
LogLearning: Error: BP_DrivingRLTrainer: Can't find Python executable "../../../Engine/Binaries/ThirdParty/Python3/Mac/bin/python".
LogLearning: Error: BP_DrivingRLTrainer: Can't find Python executable "../../../Engine/Binaries/ThirdParty/Python3/Mac/bin/python".
LogLearning: Error: BP_DrivingRLTrainer: Can't find Python executable "../../../Engine/Binaries/ThirdParty/Python3/Mac/bin/python".
LogLearning: Error: BP_DrivingRLTrainer: Can't find Python executable "../../../Engine/Binaries/ThirdParty/Python3/Mac/bin/python".
LogLearning: Error: BP_DrivingRLTrainer: Can't find Python executable "../../../Engine/Binaries/ThirdParty/Python3/Mac/bin/python".
LogSlate: Updating window title bar state: overlay mode, drag disabled, window buttons hidden, title bar hidden
LogWorld: BeginTearingDown for /Game/VehicleTemplate/Maps/UEDPIE_0_VehicleExampleMap
LogWorld: UWorld::CleanupWorld for VehicleExampleMap, bSessionEnded=true, bCleanupResources=true
LogSlate: InvalidateAllWidgets triggered.  All widgets were invalidated
LogContentBundle: [VehicleExampleMap(Standalone)] Deleting container.
LogPlayLevel: Display: Shutting down PIE online subsystems
LogSlate: InvalidateAllWidgets triggered.  All widgets were invalidated
LogAudio: Display: Audio Device unregistered from world 'None'.
LogAudioMixer: Deinitializing Audio Bus Subsystem for audio device with ID 4
LogSlate: Updating window title bar state: overlay mode, drag disabled, window buttons hidden, title bar hidden
LogUObjectHash: Compacting FUObjectHashTables data took   0.26ms
LogPlayLevel: Display: Destroying online subsystem :Context_3

Edit: I look in the folder and there is no just Python file. There is a Python3 executable file and some sort of python3.9 file

PM_Zahid · September 10, 2023, 12:07am

the event graph in the sportscar bp has no nodes, just a comment indicated all the logic is in the parent vehicleadvpawn bp. when i setup the node snippet from the tut, on both sportscarbp and its parent bp seperately, i kept getting an error and didnt see any learning in the log. also the beginplay node was already connected to sum else and flipped out whenever i disconnected it and reassigned it to the agent snippet.

ConverseFox · September 10, 2023, 8:09am

I was having the same issue as other people here, and this fixed it for me. Having it set to 10, 20 or 30 all would error out before I could see the first set of results. When I set it to 40 it got past the first set of results, but errored before the second set. Running it at 45 worked fine for a long time until I ejected from the pawn to move the camera around. I’m guessing that caused the editor to slow down a bit which probably pushed it past the timeout limit.

EDIT: I upped it to 60 just to be safe and I can eject without it having issues now. I didn’t try any other numbers since it can take a while to wait for training.

PunchesBears · September 10, 2023, 1:02pm

M1 Mac. When i start training i get a python path error

LogLearning: Error: BP_DrivingRLTrainer: Can't find Python executable "../../../Engine/Binaries/ThirdParty/Python3/Mac/bin/python".
LogLearning: Error: BP_DrivingRLTrainer: Can't find Python executable "../../../Engine/Binaries/ThirdParty/Python3/Mac/bin/python".
LogLearning: Error: BP_DrivingRLTrainer: Can't find Python executable "../../../Engine/Binaries/ThirdParty/Python3/Mac/bin/python".
LogLearning: Error: BP_DrivingRLTrainer: Can't find Python executable "../../../Engine/Binaries/ThirdParty/Python3/Mac/bin/python".
LogLearning: Error: BP_DrivingRLTrainer: Can't find Python executable "../../../Engine/Binaries/ThirdParty/Python3/Mac/bin/python".
LogLearning: Error: BP_DrivingRLTrainer: Can't find Python executable "../../../Engine/Binaries/ThirdParty/Python3/Mac/bin/python".

The full path to the python exe on my machine is

/Users/Shared/Epic Games/UE_5.3/Engine/Binaries/ThirdParty/Python3/Mac/bin/python3

The full path to my LearningAgents folder is

/Users/Shared/Epic Games/UE_5.3/Engine/Plugins/Experimental/LearningAgents

Neuro · September 11, 2023, 11:04am

Hi Brendan, Really great tutorial, and it’s been fun to start playing around with this.

Unfortunately I’ve run into a whole number of issues, the moment i try to deviate from the tutorial even a little bit

for example, i’ve been training my cars to run a non-looping path, and i want to reward them when they get to the end of the path, so i’ve set up a conditional reward as seen here in the ‘set up rewards’ section.

the problem is that this just breaks everything, and i can’t seem to figure out what’s causing this to break.

I assumed that maybe this reward not being '‘given’ anywhere in the graph was making it break, but adding to ‘event set rewards’ doesnt help it. see below:

am i just using the ‘add rewards’ and ‘set rewards’ in a wrong fashion?

I’ve taken the exact same approach with a ‘competion’ node which works just fine, so long as i dont have the ‘reward’ in my graph, as seen below:

kamilabianchi.fs · September 11, 2023, 11:07am

Same issue here. I decided to do it in the SportsCar BP. Is that correct?

And then, on this part " In the “Get Actor of Class” node, set the ActorClass pin to be your “BP_SportCarTrackSpline”. I got confused, which BP should I be working for this?

I know written tutorials are probably easier to set, but I would really appreciate a video on this. The topic is very interesting but I am finding hard to follow the tutorial.

Deathcalibur · September 11, 2023, 2:56pm

Given that this is an experimental release, we unfortunately only did full testing on Windows. If you have exact steps to remediate issues with the tutorial, I’m happy to add a section to the tutorial which contains Mac specific steps.

I will work on getting some extra resources so we can do more testing on Mac next go around.

Thanks!

Deathcalibur · September 11, 2023, 2:57pm

Does your SportsCar_Pawn blueprint not look like the following? Are you on Windows?

Deathcalibur · September 11, 2023, 3:01pm

Whenever you add new observations/actions/rewards/completions, Learning Agents will try to help you by providing warnings that they are not being used. This seems like a big improvement over silently failing, i.e. you go to train and get no results because something in the code is incorrect.

I would check any upstream branching logic you have. The SetConditionalReward node needs to be called every time SetRewards event is fired. The reward will only be given when the condition == true, so on frames where you don’t want the reward, then simply need to pass false.

Mellos · September 12, 2023, 3:31am

Hi Brendan Mulcahy. I set it up, but the cars keep going forward and backward over and over again, and I’m not sure if they’re learning anything.

JamesYoo82 · September 12, 2023, 5:59am

SportsCar_Pawn blueprint only shows the following. I tested under Windows 11 and Ubuntu 22.04 with UE5.3.

Neuro · September 12, 2023, 6:14am

Hi Mellos,

in my experience, this is normal for the first few iterations, as the cars seem to just be doing ‘random inputs’ and whichever results in the highest score gets reinforced; the cars need to ‘learn’ that ‘forward gets you more points’

Deathcalibur · September 12, 2023, 1:56pm

Ah ok I finally see the issue. I made this tutorial using UE5.3 Preview 1 and some small things changed during the transition to the full release.

I have updated this section of the tutorial to better reflect the changes to the SportsCar_Pawn. Basically just taking the same code and applying it to the BeginPlay + calling the parent’s BeginPlay.

FatemehMousavu · September 12, 2023, 4:28pm

it is working for creating the agent, but my problem, and some others is that the car’s speed would not exceed 5km/sec and they got stuck in R and 1 gear. we know that we should do something with these features ( Reverse ratio which is 4.04 for now ) and reverse as brake feature which is true for now. I tried to make that false and also change the ratio but it is not working.

JamesYoo82 · September 12, 2023, 9:40pm

In my case, I used the default values for the Learning-Agent. After 2.5 hours of training, the car’s speed exceeded 40 km/h but it still couldn’t go through the curves.

FatemehMousavu · September 12, 2023, 9:45pm

I think it depends on the position of splines which we are getting different results

amir_hm · September 12, 2023, 11:00pm

Helo, I try to following the tutorial, but getting lost to BP_RLTrainingManager. When I make Setup Policy node, ther is pin for Policy Settings, but I dont know where to get this reference from? when I copy from the code will give me error

Edit: Ups my bad, just need to promote to variable haha

Neuro · September 13, 2023, 9:59am

Hey Brendan, thanks for the reply.

i’ve got another few, broader questions;

Is there any way to see/visualize things that are happening in the learning? more specifically, is there a way to check if any observation that has been set, is actually being used?
Also to expand on this, do observations on their own need any extra rewards or something set up for them, or do observations, when observed ‘’‘just work’’ for the learning?

im my case, i’ve added a few raycasts to the vehicle to probe walls around the track, assuming that simply providing it with more information about its surroundings would help the vehicle learn.

should i be feeding some kind of rewards to these ray cast results, or should the mere act of feeding it the numbers in an observation be enough?

Thanks again for this excellent plugin!

edit:
For future readers:
When adding new observations to the interactor, you MUST reset the associated neural-network asset, or it wont work, and your entities will be braindead.

JamesYoo82 · September 13, 2023, 10:49am

Thanks, Brendan! It works!

Deathcalibur · September 13, 2023, 12:45pm

Good questions!

Is there any way to see/visualize things that are happening in the learning? more specifically, is there a way to check if any observation that has been set, is actually being used?
Also to expand on this, do observations on their own need any extra rewards or something set up for them, or do observations, when observed ‘’‘just work’’ for the learning?

To use new observations, you simply “declare” they exist in your interactor by calling their “Add X Observation” function during Setup Observations. Then in SetObservations, you fill in their data. If you don’t set the values in an observation, you will get a warning + your agent will not be used during training (look at LogLearning in the Output Log).

One thing that can help with debugging/understanding is to look at Unreal’s Visual Logger | Unreal Engine 4.27 Documentation - Learning Agents already logs all the observations and actions to the visual logger (using the names you provided during “Add X Obs/Action”). I’ve used this tool a lot to look at the data getting fed to the network.

im my case, i’ve added a few raycasts to the vehicle to probe walls around the track, assuming that simply providing it with more information about its surroundings would help the vehicle learn.

I think this is unlikely to help significantly but I never tried it. One thing to be careful of is ensuring that your new observations are normalized correctly. You want all the inputs to roughly live in the range of (-1,1) and setting an appropriate Scale to achieve this.

If you want to improve the behavior of the vehicles (i.e. getting proper racing lines to emerge), you probably need to tune the reward function more than anything. The vehicles need to be rewarded based on achieving the fastest time on a fairly big chunk of the track, but I felt like setting this up was too much work for the tutorial.

should i be feeding some kind of rewards to these ray cast results, or should the mere act of feeding it the numbers in an observation be enough?

Rewards are not given to individual observations. The agent as a whole gets rewarded. The agent is trying to learn a function (called the policy) that given a state, produces the best action (State → Action). The combination of all your observations is the “state” and the combination of all the actions is the “action”. The best action is the one which will give the agent the highest return (Part 1: Key Concepts in RL — Spinning Up documentation).

Thanks again for this excellent plugin!

Anytime, thanks for providing excellent questions and good feedback.

edit:
For future readers:
When adding new observations to the interactor, you MUST reset the associated neural-network asset, or it wont work, and your entities will be braindead.

If you try to use a model which was trained on different obs/actions, you should see errors every frame like:

LogLearning: Error: BP_DrivingPolicy: Setup not complete.
LogLearning: Error: BP_DrivingPolicy: Setup not complete.
LogLearning: Error: BP_DrivingPolicy: Setup not complete.

And if you scroll further up, you should see:

LogLearning: Error: BP_DrivingPolicy: Neural Network Asset provided during Setup is incorrect size: Inputs and outputs don't match what is required.

This error is trying to tell you that your observations + actions don’t match how the network was previously trained.

Thanks,
Brendan