Hmm, it’s not currently exposed in an easily accessible way. In the past, I have added a counter to the manager’s tick (or wherever you call RunTraining) and then manually save/load snapshots when hitting certain counts.
We’ll think about this and see if we can’t add something.
Hi. This tutorial is really great. i have just a little problem : in The Reset Agent Episode, when i paste, the function Reset To Random Point On Spline dont appear in the blueprint…I’ve been looking for a while, i dont know why
Hello Everyone Ill start off with thanks for the Turtorial. It was great got a lot of information from it. But I seem to be having lots of LogLearning: Warning: did i miss reference objector and the scene episodes reset evey so often is that normal?
The error is indicating that something got wired up wrong in one of these parts. Make sure that the “Location along spline observation” is being put into the “location” on the map, and the same for the direction one.
Hello! I’m trying to run headless with snapshots but I’m not seeing any in the snapshots folder. I see lots of config json files but no .bin snapshots. I have Save Snapshots set to true in my Trainer Training Settings. Is there another step to save snapshots beyond checking that box? What is the best place to debug why snapshots might not be saving? Do I need to run it for a certain period of time before snapshots start saving?
I also have tensorboard set up. I followed the steps to install it, can run it and see the window on my browser connected to local host, but then when I run the game headless with the settings to Use Tensorboard set to true it looks like it can’t find tensorboard:
LogLearning: Display: Training Process: Warning: Failed to Load TensorBoard: No module named ‘tensorboard’. Please add manually to site-packages.
Is there an additional configuration step required for that to recognize tensorboard beyond the local file paths?
To get tensorboard to work, follow the tensorboard tutorial here:
Snapshots might be getting saved to a directory you’re not expecting. Have you seen https://www.voidtools.com/ ? A great search tool.
Snapshots save on startup and every 1000 game steps (this needs to be fixed in a later version of LA to give users more control - I guess you could edit the python code pretty easily)
Debugging is best accomplished through reading the log.
Good luck,
Brendan
P.S. let me know if you’re still stuck after trying to figure it out
I’ve been looking at/adding logs to train_ppo.py to get a grasp on what is going on with my setup. It seems to be receiving/creating everything, starts training, and then it looks like it pulls the experience when I call StopTraining after about 8 seconds of training where my agent is clearly receiving commands but it doesn’t seem like any of the training is actually being saved:
LogLearning: Display: Training Process: Receiving Policy...
LogLearning: Display: Training Process: Receiving Critic...
LogLearning: Display: Training Process: Receiving Encoder...
LogLearning: Display: Training Process: Receiving Decoder...
LogLearning: Display: Training Process: Creating Optimizer...
LogLearning: Display: Training Process: Creating PPO Policy...
LogLearning: Display: Training Process: Opening TensorBoard...
LogLearning: Display: Training Process: Begin Training...
LogLearning: Display: Training Process: Profile| Pull Experience 2528ms
LogLearning: Display: Training Process: Done!
LogLearning: Display: Training Process: Exiting...
I never see it actually running any of the “push” functionality it looks like it should be… It is as if this code:
Never receives a response until I stop the training even though I am calling the RunTraining function every tick. Should I be seeing the push logs while training is running?
If you have the version of Learning Agents from UE5-Main (eventually coming to UE 5.6), you can run the python process separately from UE, which means you can easily run train.py from VS Code or your favorite python debugger. This makes it 100% easier to debug compared to adding a bunch of prints.
I think it actually works with 5.5 but I kind of forget, we switched the python CLI from the positional arguments to using argparse so 5.6 will be much better. I wish I could help you more with 5.5 but I’m really busy with working on 5.6
Thanks for pushing through these issues though! I know it’s a pain and can get frustrating (for me at least).
Hello @Deathcalibur and thank you for your tutorial. I followed the 5.4 version and it worked perfectly until I turned on my PC and opened the project again the next day The cars didn’t move anymore and in the Output Log I got the error:
That’s odd. I would navigate to that folder and see if the python exe is there. If it’s not, then I would close the editor and re-open it. The editor runs python installation scripts during startup, and it’s possible they failed for whatever reason.
If you close and open the editor, it should fix itself. If it doesn’t, then try deleting the Intermediate directory’s contents after closing the editor, and then open the editor again.
In the future, we are hoping we can put a UI on the python installation process, but it’s not clear when or even if that will be able to come.
It looks like shared_memory_recv_experience_multiprocess is getting hung up here waiting for these controls:
for controls in processes_controls:
while not controls[UE_SHARED_MEMORY_EXPERIENCE_SIGNAL]:
if controls[UE_SHARED_MEMORY_STOP_SIGNAL]:
controls[UE_SHARED_MEMORY_STOP_SIGNAL] = 0
return UE_RESPONSE_STOPPED, None, None
print("Sleeping")
time.sleep(0.001)
print("Moving On")
I see it print sleeping until the process stops, and it never moves on. Any idea why the controls don’t have the shared memory experience signal to move on?
Each of the worker processes (UE game process) will write a “1” to the shared memory in the appropriate index when they are done writing experience to the shared mem. So there must be something going on with UE where it’s not writing the data appropriately. I would check the logs on the UE side and step through to see what is going on over there.