Tutorial: Learning Agents Introduction

Thanks for the information. I’ll give it some more thought. Maybe it makes more sense to wait with testing stuff like this and continue with it once discrete actions are added.

Another question I have: With a use case such as the Lyra Bots, would it make more sense to use imitation learning here? One goal would be to achieve bots that are not “bot-like” but rather play like humans.

What state is the imitation learning in? Can it be used for such a case right now?

Yes, imitation learning would be important for human-like aiming. We currently give you behavior cloning which is mostly what you would need for this.

In lieu of a formal tutorial for imitation learning:

  1. Setup an interactor - implement Setup and Set Observations for data collection. Create the actions you would want to track during SetupActions and then make the Action object variables public.
  2. Create a blueprint from ULearningAgentsController and implement SetActions - this is like SetObservations but for actions. During SetActions, set the publicly exposed action objects via their SetXAction function (don’t accidentally try to set the variable itself)
  3. Create a blueprint from the ULearningAgentsRecorder - nothing needed in the BP Event Graph.
  4. Create a Miscellaneous->DataAsset->LearningAgentsRecording and provide this during Recorder->SetupRecorder.
  5. Create a data collection manager from ULearningAgentsManager - during the manager’s Tick: call Interactor->EncodeObservations, Controller->EncodeActions, Recorder->AddExperience (behind a branch with Recorder->IsRecording == true). After spawning or whenever an enemy is found, call Recorder->BeginRecording. Whenever the player gets a kill or however you want, call EndRecording. Every begin/end recording will create a Record in the Recording object.
  6. Create empty blueprints from the ULearningAgentsImitationTrainer and ULearningAgentsPolicy
  7. Implement GetActions on the original interactor if not already done
  8. Create a separate manager for imitation learning with the Policy, Interactor and ImitationTrainer components attached (could be on the same manager with some controls added to it). During tick, call ImitationTrainer->RunTraining and pass in the Recording asset which was populated with data above.

This looks like a lot of steps when written out but it structurally shouldn’t take long to setup. Deciding on how to do the interactor is the hardest part.

The biggest challenge is that aiming in a human-like way takes place over many frames, so you need some kind of trajectory information, i.e. a memory. This can be achieved currently by doing a time-lagged MLP. For example, if you cared about the target’s position (seems necessary for aiming lol) and you wanted 1 second of history data at 30 FPS, you could feed the time dimension as different columns: Pos_Time0, Pos_Time1, Pos_Time2… managing the observations is a bit cumbersome right now.

Anyway, hope this helps!

4 Likes

erm, hello sir. Its kinda stupid to ask. But i’ve already done with the driving example. Its working fine. And now i want to Train a simple scene where AI chasing an object. But i’m stucked at the penalty set up. My reward is the distance from AI to target and if it facing the target or not and the penalty is also the distance to the target, What i’m going to ask is how can i set the penalty, i can only find 1 penalty function in the but its for position. Thankyou

Hey amazing work! I was looking forward to seeing machine learning on Unreal.
Follow your tutorial and I was having a small doubt, I’m getting this error
“LogLearning: Display: Training Process: RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from Official Drivers | NVIDIA
so my doubt is, this is exclusive to Nvidia gpus? Thanks!

same here, I tried following the lets learn to drive tutorial:Learning Agents - Getting Started | Course

I get these errors, Im on an Amd cpu and gpu

LogLearning: Display: BP_RLTrainingManager_C_UAID_3C219C5E395D7FB501_1735601647: Adding Agent SportsCar_Pawn_C_0 with id 0.
PIE: Server logged in
PIE: Play in editor total start time 0.593 seconds.
LogLearning: Display: BP_DrivingRLTrainer: Sending / Receiving initial policy…
LogLearning: Display: Training Process: {
LogLearning: Display: Training Process: “TaskName”: “BP_DrivingRLTrainer”,
LogLearning: Display: Training Process: “TrainerMethod”: “PPO”,
LogLearning: Display: Training Process: “TrainerType”: “SharedMemory”,
LogLearning: Display: Training Process: “TimeStamp”: “2023-10-23_21-37-00”,
LogLearning: Display: Training Process: “SitePackagesPath”: “C:/Program Files/Epic Games/UE_5.3/Engine/Plugins/Experimental/PythonFoundationPackages/Content/Python/Lib/Win64/site-packages”,
LogLearning: Display: Training Process: “IntermediatePath”: “C:/Users/kavis/OneDrive/Documents/Unreal Projects/AiCarTest/Intermediate/LearningAgents”,
LogLearning: Display: Training Process: “PolicyGuid”: “{8640D3B8-41AF-6B36-34DB-51B2478F7EDE}”,
LogLearning: Display: Training Process: “ControlsGuid”: “{B9DCFDAF-44BB-660D-09B5-DDA99411FF7E}”,
LogLearning: Display: Training Process: “EpisodeStartsGuid”: “{63107E00-4D95-F9F3-63BE-7195A699B07A}”,
LogLearning: Display: Training Process: “EpisodeLengthsGuid”: “{62B6E09B-41C2-26DF-2D90-C1BF26A8E0B6}”,
LogLearning: Display: Training Process: “EpisodeCompletionModesGuid”: “{631EEE41-4923-093D-166E-12ABFB2FDB1D}”,
LogLearning: Display: Training Process: “EpisodeFinalObservationsGuid”: “{4E3CB7BA-40B0-D531-6947-32B6586392CB}”,
LogLearning: Display: Training Process: “ObservationsGuid”: “{F49869F2-40E9-2586-EBAA-58A11488C880}”,
LogLearning: Display: Training Process: “ActionsGuid”: “{D2A58163-47E2-9D51-0789-6B93FDF8C207}”,
LogLearning: Display: Training Process: “RewardsGuid”: “{E5369965-42F1-8790-D21D-7EACD8730C89}”,
LogLearning: Display: Training Process: “ObservationVectorDimensionNum”: 8,
LogLearning: Display: Training Process: “ActionVectorDimensionNum”: 2,
LogLearning: Display: Training Process: “MaxEpisodeNum”: 1000,
LogLearning: Display: Training Process: “MaxStepNum”: 10000,
LogLearning: Display: Training Process: “PolicyNetworkByteNum”: 72788,
LogLearning: Display: Training Process: “PolicyHiddenUnitNum”: 128,
LogLearning: Display: Training Process: “PolicyLayerNum”: 3,
LogLearning: Display: Training Process: “PolicyActivationFunction”: “ELU”,
LogLearning: Display: Training Process: “PolicyActionNoiseMin”: 0.25,
LogLearning: Display: Training Process: “PolicyActionNoiseMax”: 0.25,
LogLearning: Display: Training Process: “CriticNetworkByteNum”: 71240,
LogLearning: Display: Training Process: “CriticHiddenUnitNum”: 128,
LogLearning: Display: Training Process: “CriticLayerNum”: 3,
LogLearning: Display: Training Process: “CriticActivationFunction”: “ELU”,
LogLearning: Display: Training Process: “ProcessNum”: 1,
LogLearning: Display: Training Process: “IterationNum”: 1000000,
LogLearning: Display: Training Process: “LearningRatePolicy”: 9.999999747378752e-05,
LogLearning: Display: Training Process: “LearningRateCritic”: 0.0010000000474974513,
LogLearning: Display: Training Process: “LearningRateDecay”: 0.9900000095367432,
LogLearning: Display: Training Process: “WeightDecay”: 0.0010000000474974513,
LogLearning: Display: Training Process: “InitialActionScale”: 0.10000000149011612,
LogLearning: Display: Training Process: “BatchSize”: 128,
LogLearning: Display: Training Process: “EpsilonClip”: 0.20000000298023224,
LogLearning: Display: Training Process: “ActionRegularizationWeight”: 0.0010000000474974513,
LogLearning: Display: Training Process: “EntropyWeight”: 0.009999999776482582,
LogLearning: Display: Training Process: “GaeLambda”: 0.8999999761581421,
LogLearning: Display: Training Process: “ClipAdvantages”: true,
LogLearning: Display: Training Process: “AdvantageNormalization”: true,
LogLearning: Display: Training Process: “TrimEpisodeStartStepNum”: 0,
LogLearning: Display: Training Process: “TrimEpisodeEndStepNum”: 0,
LogLearning: Display: Training Process: “Seed”: 1234,
LogLearning: Display: Training Process: “DiscountFactor”: 0.9900000095367432,
LogLearning: Display: Training Process: “Device”: “GPU”,
LogLearning: Display: Training Process: “UseTensorBoard”: false,
LogLearning: Display: Training Process: “UseInitialPolicyNetwork”: true,
LogLearning: Display: Training Process: “UseInitialCriticNetwork”: false,
LogLearning: Display: Training Process: “SynchronizeCriticNetwork”: false,
LogLearning: Display: Training Process: “LoggingEnabled”: true
LogLearning: Display: Training Process: }
LogLearning: Display: Training Process: Creating Replay Buffer…
LogLearning: Display: Training Process: Creating Networks…
LogLearning: Display: Training Process: Traceback (most recent call last):
LogLearning: Display: Training Process: File “C:\Program Files\Epic Games\UE_5.3\Engine\Plugins\Experimental\LearningAgents\Content\Python\train_ppo.py”, line 361, in
LogLearning: Display: Training Process: train_ppo(config, trainer)
LogLearning: Display: Training Process: File “C:\Program Files\Epic Games\UE_5.3\Engine\Plugins\Experimental\LearningAgents\Content\Python\train_ppo.py”, line 87, in train_ppo
LogLearning: Display: Training Process: actor_network = NeuralNetwork(
LogLearning: Display: Training Process: File “C:\Program Files/Epic Games/UE_5.3/Engine/Plugins/Experimental/PythonFoundationPackages/Content/Python/Lib/Win64/site-packages\torch\nn\modules\module.py”, line 852, in to
LogLearning: Display: Training Process: return self._apply(convert)
LogLearning: Display: Training Process: File “C:\Program Files/Epic Games/UE_5.3/Engine/Plugins/Experimental/PythonFoundationPackages/Content/Python/Lib/Win64/site-packages\torch\nn\modules\module.py”, line 530, in _apply
LogLearning: Display: Training Process: module._apply(fn)
LogLearning: Display: Training Process: File “C:\Program Files/Epic Games/UE_5.3/Engine/Plugins/Experimental/PythonFoundationPackages/Content/Python/Lib/Win64/site-packages\torch\nn\modules\module.py”, line 530, in _apply
LogLearning: Display: Training Process: module._apply(fn)
LogLearning: Display: Training Process: File “C:\Program Files/Epic Games/UE_5.3/Engine/Plugins/Experimental/PythonFoundationPackages/Content/Python/Lib/Win64/site-packages\torch\nn\modules\module.py”, line 552, in apply
LogLearning: Display: Training Process: param_applied = fn(param)
LogLearning: Display: Training Process: File “C:\Program Files/Epic Games/UE_5.3/Engine/Plugins/Experimental/PythonFoundationPackages/Content/Python/Lib/Win64/site-packages\torch\nn\modules\module.py”, line 850, in convert
LogLearning: Display: Training Process: return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
LogLearning: Display: Training Process: File "C:\Program Files/Epic Games/UE_5.3/Engine/Plugins/Experimental/PythonFoundationPackages/Content/Python/Lib/Win64/site-packages\torch\cuda_init
.py", line 172, in _lazy_init
LogLearning: Display: Training Process: torch._C._cuda_init()
LogLearning: Display: Training Process: RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from Official Drivers | NVIDIA
LogLearning: Warning: Training Process finished with warnings or errors

I think maybe you can use a scalar reward with negative scale to make penalty

Thank you for this first indication of imitation learning, but do you have a target date for an official tutorial? :slight_smile:

Great job with Learning Agents!

There’s a time dilation console command called “slomo” that can be used to simulate slower or faster. By accelerating the simulation with “slomo 5”, are we accelerating learning?

@Deathcalibur where can we actively fallow the development of Learning Agents? Maybe a separate section in Forums could be useful?

2 Likes

Still not working on AMD. Maybe this should be mentioned in the tutorial. @Deathcalibur

LogLearning: Display: Training Process: RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

Hey @MetinCelik, I had problems with nvidia gpu too, not sure if this will help you, but i just switched to CPU training. (I think GPU is default).

Thanks, now it’s working. But training will take longer without GPU acceleration. I just read that the Learning Agents plugin uses PyTorch which supports AMD only on Linux so we won’t get AMD support anytime soon. @FexotheFCO @fortniterickroll

URotationAction I need to use the SetXAction function, but I don’t see it in the code.

1 Like

Dear Learning Agents Team,

Thanks for the great support and effort in developing such an awesome plugin for UE users. I am a UE beginner and an expert in RL and AI. It would be great if you could create a document to tell us about the philosophy behind it and add more examples of how to build RL, and IL workflows.

4 Likes

I’ve been getting an invalid object as a Return Value for the Get Agent function in some (but not all, or even most) of the Iterator and and Trainer events

Same happens in the Vehicle Template I did first and in my own project where I’ve implemented (or tried to) the same thing.

When the breakpoint after Is Not Valid is triggered, the Get Agent output looks like this.

And the Array Element in the loop node has no debug data which I have no idea what it means.
ForEach

Any ideas? Is this thread alive?

Did you make sure the objects you’re adding as agents can be cast to the “Agent Class”? If they can’t, then you will get a nullptr which will show up as “Unknown”.

The no debug data is just a problem with the debugger. You have to stop somewhere after the data has been used, usually its one node further than you think.

Brendan

1 Like

32 objects mae from CollectoBot Blueprint Class are placed on the map and they call the Add Agent on the manager and set themselves as the agent

When calling GetAgentNum from the Manager, I get 32

But When going through the foreach loop, some objects are valid and some are not

I’m not sure how only some of them could be invalid

Strange…

  1. Are you doing anything with networking?
  2. Are you somehow destroying the agents?
1 Like

Huh, this one’s on me.
The agents save their location at begin play, and use it for their relocation function for episode reset. I forgot reset episode is ran first so it saved 0,0,0 as a respawn location, bunched all of them there and kept resetting the episode (not sure why though, is the episode reset if the agent cannot be legally spawned because a collision?)

1 Like

Hi There,

We currently can’t do RL/IL using “image data” (screenshots from cameras) using this plugin; is that correct?