Tutorial: Learning Agents Introduction

Hello again,

Great question!

It’s possible to use replays but there are some major challenges. I’ve tried it - not with Lyra though - so this is coming from 1st-hand experience:

  1. The UE built in replay system does not directly record ground-truth actions, i.e. user inputs. For example, rotations of the player pawns will be the final result synced over the network, so if you have multiple things manipulating the rotations, it’s challenging to untangle them. For example, weapon recoil to pitch a gun up vs user moved the mouse to pitch up. Depending on the problem you’re trying to solve, this may not be a issue for you.
    • You could try to solve this by learning an inverse dynamics model or by pushing up more data into the replays (I’m not an expert on how to do this - yet)
  2. The replays are not a deterministic simulation of the game, so querying the game state can be challenging as well. For example, at game time you might call “IsFiring()” on a Pawn to see if its currently firing its weapon, but at replay time this function might always return false because the backing code doesn’t need to run (only the special effects like bullet tracers actually get played back at replay time)
    • You can get creative with how you query game state but it’s a major pain.

How could you do it though?
Assuming you still want to try, you simply spawn a learning agents manager in the replay level and have it query the stuff you want through the interactor and save off the data with the LA Data Recorder object. I haven’t done this recently so it’s possible there is some regression in the LA design that makes this no longer work (please let me know if you try and run into problems).

I’m working on internally “solving” this issue and hope to be able to port the solution into LA/replay system but I have absolutely no idea on the time frame or if it’ll ever come!

Hope that answer your question!
Brendan

I did want to let you know we’ve run into issues where the engine will go to sleep because its waiting for the subprocess to finish writing which isn’t great. If it was possible to have this not hitch the editor and instead just buffer(so it could do this async) or stream that information to write to the subprocess then that would help a ton!
We were seeing this with SharedMemoryTraining::RecvNetwork btw.

1 Like

Async is in the works! How big are your obs/training batches that it’s causing the “engine to sleep”? I’ve never seen that so I’m curious exactly what is happening.

This is occurring on the Learning to Drive 4.2 tutorial with even 1 agent training(noticed it with the 32 agents and tested with 1 and it would still occur). The project is on an SSD(Sabrent Rocket 4.0 Plus), the GPU is an RTX 3080 and CPU is an AMD Ryzen 9 5950x, all new as of three months old regarding hardware incase there is any concern that its a hardware failure.

I was following the tutorial to the letter and this was happening after both the initial implementation and adding on improved learning(which Im still having issues with my cars staying on the track after letting it train all night). I ran the training on the GPU and then switched to the CPU to see if it was an issue there, occurs on both.

Hopefully this helps with diagnosing the issue, let me know if there’s any other info I can provide!

1 Like

Unrelated to the hitching, I am also having an issue with getting tensorboard to work with my project. I enabled it within the training settings in the Blueprint defaults and I am able to get it running(with some work regarding numpy being the incorrect version and dealing with that).


Is there a way I can observe or specify an action where instead of a float it will give me an integer? As well as an integer within a min/max range? I found exclusive discrete action’s but I’m only getting zero… this is on 5.4.2

What exactly are you trying to accomplish with the integer?

You can maybe use one of the existing observations, or you could create your own observation if you have a C++ project.

(Sorry for the delay - been on summer break)

Once the new asset is created, we should give it an appropriate name. This manager will be used for RL training, so let’s name it “BP_RLTrainingManager”. You can open the blueprint graph if you like as we will be returning to build it up as we continue this tutorial.

Finally, place an instance of the manager on the “VehicleExampleMap”. The manager’s location is not important for this tutorial. I’m using version 5.4.3 and this tutorial is really difficult to follow for example I created the learning agents maanger blueprint class and the parent turns out to be learning agents manager in the blueprints class settings atlhough in the dropdown it’s nested under Actor so even if I change it to Actor Component I can’t do the next step of placing it on the Vehcile Map. Can you clarify the gaps in the instrucitons?

It was more for testing out different approaches to building out schemas, I was able to get around it using floats for now.

I tried the tutorial with the driving car. It was really well-written and easy to follow. Good work on that!

How would I go about adding some avoidance to it? So that I can turn on collision on different vehicles and they will avoid each other (including the player vehicle).

I assume there needs to be some sort of vision/observation that is done in local space? Since the position of the other vehicles should be arbitrary.

And another question @Deathcalibur :
Is it possible to add branching paths within the context of the driving tutorial? Or have multiple splines set up so that the car can take different routes?

The best way to do this currently is to use the Raycast observation and have each of the cars cast some rays in the plane of the ground.

Then you need to add a negative reward whenever a collision is detected. Basically have the collision event set some variable on the car, then have the reward function check for the collisions and then clear the value after reading it.

I have gotten it to work on my local workstation and was able to play alongside the cars. I will at some point extend the tutorial to be like this because then it goes from “watching cars drive around” → “oh these are actual NPCs I can play alongside”. It’s just a lot of work to juggle in addition to working on the library itself. Once the library is out of experimental, you can expect us to really up the tutorials.

Branching paths:
No idea, I haven’t tried it but I don’t see why it couldn’t work. You would need to come up with a new observations/rewards or something to encourage them to drive on the different paths. I don’t think it would be too difficult but you never really know how hard something will be until you try to do it.

Brendan

That worked really well. I added 5 traces and now they are avoiding other cars.
Would you recommend resetting them after they crash into another car (similar to when they go too far away from the spline)? At the moment, I keep them running because they usually recover from the crash.

However, this causes my avg_reward to look a bit odd. Avg_return looks okay though:

Another question: I tried
I tried the imitation learning (following the tutorial). I was able to record the data, but running the imitation learning fails after a bit, the last thing shown in the log is this:

Subproccess: Profile | Logging                            0ms
Subproccess: Profile | Pushing Policy...             2ms
Subproccess: Profile | Pushing Encoder...         1ms
Subproccess: Profile | Pushing Decoder...         0ms
Subproccess: Profile | Training                             371ms

After that, the editor freezes. No crash occurs.

I followed the tutorial, the only thing I changed is the Trainer File Name:

i

By default, this was set to train_ppo which causes an error though when doing imitation learning. This is on UE 5.5.

Edit: Also my setup for the “Make Imitation Trainer” looks a bit different (due to API changes between 5.5 and 5.4):

You need to play with resetting vs not for the cars. I’m not sure what will work best. When in doubt, you want the training environment to match as closely as possible to the behavior you want at inference time. E.g. if the cars crash and you reset, they will never learn how to stop crashing. I assume at game time, you won’t be resetting cars when a player crashes into the agents (if you were making an actual racing game), so this might mean that resetting would cause the agents to not learn some skills that are needed. This is merely my intuition: I have not played around with this enough where I can give you a definitive answer.

BTW reading the GT Sophy paper can help a lot if you are really into the racing agents:
https://www.researchgate.net/publication/358484368_Outracing_champion_Gran_Turismo_drivers_with_deep_reinforcement_learning

For the imitation learning, I think that was an unfortunate oversight which I will have to fix in 5.6 and make note of in the tutorial. Thanks for bringing it to my attention!

I will check out that paper.

For the imitation learning: I found out that the culprit was the “Reset Agents” function call. Leaving that one out prevents the freeze and the imitation learning works now:

1 Like

Very nice feature!! Great job.

Does inference work in a build / shipping version on Meta Horizon OS ?