Learning Agents 5.4 Questions

AuraNode · April 19, 2024, 7:41pm

I am working through Learning Agents 5.4 and I am running into some areas of confusion.

Trainer:

If I am setting up Completions, Rewards, and Resets, how am I accessing data at runtime not already exposed within the Trainer?

Ex. I want to reset an agents episode if it contacts the object I have tagged as ‘reset’.

Reset: If I am crudely looping through agents (regardless of whether they have made contact yet or not), how am I receiving the event or condition from the Agent blueprint?

Am I setting up an event dispatcher within the Agent and calling it On Component Hit?

Am I setting a public boolean to true?

It’s not entirely clear what best practice there is here.

Completion: Is the completion set as the effect of the Reset Episode call or is it what determines if an episode is Reset?

Reward: If I want to set a penalty or reward for contacting the object, is the GatherReward(s) function(s) accessing this event or condition independently of the Completion or Reset?

If I am attempting to create a reward for a generic velocity similarity using random vectors, am I able to access the current step of the agent or the tick of the world in order to set the random vector based on an elapsed time?

Should the values that constitute the reward (the desired velocity, the actual velocity, the difference, etc.) be managed within the Agent or the GatherReward(s) function(s)?

At the moment, I don’t have any questions related to the Manager, Interactor, Policy, Critic, Neural Networks, Agent, ImitationTrainer, various Settings, Recorder, or Recording(s). However, if anyone does feel free to post it here and if I can answer, I will.

Thank you ahead of time for any responses, and good luck to my fellow Learning Agents.

Deathcalibur · April 19, 2024, 8:00pm

Normally for this sort of collision, we have the agent implement the Event ActorBeginOverlap and Event ActorEndOverlap such that it keeps track of the collision (make whatever variable public or have a public function to get it):

Then in the Completion, have it check if the collision has occurred:

Similar in the reward:

Then the Reset merely resets the state. It doesn’t do any conditional logic. The reset is called automatically when a completion has triggered.

Hopefully that clarifies what to do, but let me know either way.

AuraNode · April 19, 2024, 8:38pm

Excellent, that helps a lot!

To clarify the extension of the collision detection completion logic, in the case that I am checking each bone of a skeletal mesh (within the agent blueprint) for a collision and if any bone not foot_l or foot_r contacts the tagged ‘ground’, I store that in a variable (like the set you used), call the completion if a contact is within that set, and reset the episode?

To clarify with a velocity similarity reward, I want to setup the logic for a velocity similarity in the Agent itself if I wish to track the elapsed tick or elapsed steps?

If I want the value of the reward to scale based on the magnitude of the difference between the target and actual velocity, I want to set that up in my reward logic (wherever it goes) and output the value itself into a MakeReward node?

Now, has the Reward Scale value been intended to normalize rewards in the event that a dev may want the sum of various rewards to scale to 1 or does it have another purpose?

Thank you very much for the assistance, I will hopefully have this working by tomorrow.

By the way, once I really dove into the observations and actions, I couldn’t help but be very impressed with how powerful this API is turning out to be. I am super excited about it.

Deathcalibur · April 19, 2024, 9:14pm

To clarify the extension of the collision detection completion logic, in the case that I am checking each bone of a skeletal mesh (within the agent blueprint) for a collision and if any bone not foot_l or foot_r contacts the tagged ‘ground’, I store that in a variable (like the set you used), call the completion if a contact is within that set, and reset the episode?

Pretty much, but to be clear: you don’t need to call any function from the pawn to have learning agents check the completion. Rather, you have the GatherAgentCompletion function inspect/call functions on the Pawn then do a MakeCompletion on Condition and make sure whatever condition you care about is True when you want it to complete.

To clarify with a velocity similarity reward, I want to setup the logic for a velocity similarity in the Agent itself if I wish to track the elapsed tick or elapsed steps?

If I want the value of the reward to scale based on the magnitude of the difference between the target and actual velocity, I want to set that up in my reward logic (wherever it goes) and output the value itself into a MakeReward node?

Both these things should be setup within the GatherAgentReward function. The reward function has a reference to the agent so you can inspect the agent’s velocity. If the logic is complicated, you can consider to make a function on the pawn to help out.

The part to make clear is that you shouldn’t be trying to call MakeReward from inside the pawn… do this inside the Trainer object’s Gather function.

Now, has the Reward Scale value been intended to normalize rewards in the event that a dev may want the sum of various rewards to scale to 1 or does it have another purpose?

If you use the MakeReward, the internal function is doing:

	const float Reward = RewardValue * RewardScale;

Usually you make the scale a fixed value and the value itself can vary, but yes its essentially just there to remind you to normalize the rewards a bit. Obviously you can do the normalization yourself if you leave the scale as 1.0.

Thank you very much for the assistance, I will hopefully have this working by tomorrow.

By the way, once I really dove into the observations and actions, I couldn’t help but be very impressed with how powerful this API is turning out to be. I am super excited about it.

Good luck and thanks!

ThreeTreeDog · April 19, 2024, 9:23pm

Good day, thanks for the updated track demo, unfortunately I am having issues on the setup event the Make Interactor, trainer, policy nodes return null refs
I don’t have debug symbols installed so I am unsure where to go from here.

I have editor 5.4p installed.
I have attempted to make class variables and plug those in but its still the same result.

Deathcalibur · April 19, 2024, 9:29pm

Look for errors in the LogLearning output log. Probably something inside the interactor is broken.

ThreeTreeDog · April 19, 2024, 9:44pm

ah yes, the observation schema is null
I promoted to a local var instead of grabbing to schema var

Thanks!

ploddor · April 23, 2024, 7:21pm

Good day.
I’m working on Learning Agent 5.4 and when I try to run Headless Training, I can’t output snapshots and files for Tensorboard. It does not give an error when starting the program but at the start of training it creates an empty Tensorboard file and does not create a snapshots folder at all. Can you help me understand what I’m doing wrong?

D:\MD\MD5\Intermediate

D:\MD\MD5\Saved\StagedBuilds\Windows\MyProject5\Binaries\Win64

Deathcalibur · April 24, 2024, 1:29pm

If there are no errors, then it’s creating them somewhere. I would look around and see if an Intermediate folder got made somewhere unexpected.

ploddor · April 24, 2024, 2:13pm

Here’s what it looks like
And nothing happens for an hour. At the same time, I have now noticed that UseTensorboard is set to False, although in the trainer training setting it is set to True

ps
Sorry that I took a screenshot. I can only insert one image

Deathcalibur · April 24, 2024, 2:28pm

Since the config says “UseTensorBoard=False”, that means there was an error. There should be an error message prior to this saying something like “Warning: Failed to Load TensorBoard: %s. Please add manually to site-packages.”

You followed the tensorboard installation instructions here, right?

Brendan

ploddor · April 24, 2024, 2:35pm

Yes, but I recently updated Unreal 5.4.
I did not find the folder with Snapshots in the project, but I found it here: D:\UE_5.4\Sandbox\MachineLearning\LearningExamples\Intermediate\LearningAgents\Snapshots

And as I understand it, this is not what I need.

[2024.04.24-14.43.55:832][ 2]LogLearning: Display: Training Process: Warning: Failed to Load TensorBoard: No module named ‘tensorboard’. Please add manually to site-packages.

UPDATE
I most likely understood why the SnapShots were not saved to the folder. The problem is that I didn’t have a single iteration.

Now I have done all the points step by step on your training agent course and the problem with TensorBoard remains the same

Genie30 · June 28, 2024, 2:56pm

Hey! I’m having the same problems even after following the instructions.
I have a ‘tensorboard.exe’ in my ‘C:\Program Files\Epic Games\UE_5.3\Engine\Binaries\ThirdParty\Python3\Win64\Scripts’ directory.
And still get this error;

LogLearning: Display: Training Process: Warning: Failed to Load TensorBoard: No module named ‘tensorboard’. Please add manually to site-packages.

Did you solve this issue?

J.Germishuys · July 2, 2024, 6:54pm

Does anyone have any idea around this error? I’m not able to train anything.

Deathcalibur · July 15, 2024, 2:03pm

Take a look at this thread: Can't find LearningAgents plugin Content

MLGNicks · October 17, 2024, 12:48pm

Hi, I am trying to figure out how to use the “Make Completion on Episode Steps Recorded” function properly

I cant seem to figure out how to get the step count to increase, I am trying to make it so the agent does a maximum of 100 steps then starts a new episode

REWARD
I am also trying to get the step count so I can give minus rewards for every step it takes

My end goal is for the RL Agent to be able to learn to walk to the “goal” that I have set. I have made 5 raycast as its inputs and also the goal location, distance and the outputs are going forward/backwards, left/right. My rewards are the screenshot below

Please tell me whats wrong as I am trying to figure out how to do this for the first time

Deathcalibur · October 17, 2024, 1:24pm

For the episode steps, you have two options:

Set the Max Episode Step Num in the Trainer Settings to 100. Agents will automatically get Truncated if they hit the threshold
You can call GetEpisodeStepNum and plug it into the Episode Steps. You can also make a variable and manually count the steps if you have trouble finding the automatically recorded one (just remember you would have to reset it from the agent’s reset too).

FYI in 5.5 I kind of messed up the design of the step num and need to move it in 5.6:

This is what you need to do in 5.5, but there isn’t a convenient way to get the PPO Trainer. You have to do it yourself and be careful of not creating a circular reference.

Otherwise I think you should be able to do what you want.

Let me know if this wasn’t clear or if you are still stuck.

Thanks!

mentecato106 · October 18, 2024, 10:08am

Hi, I am trying to launch a learning agents model to train it on a cluster with L40s (but Unreal doesn’t work directly there). What I am using initially are workstations with 4080s to test the training. I want to compile it somehow and then run the heavy training on the servers, but I am not sure if it’s possible to do headless training from the tutorial directly on the Linux server (as it’s mentioned that it only works on Windows). Do you know of any way to make this work? Thanks!

Deathcalibur · October 21, 2024, 1:59pm

This should be doable with the 5.5 version of learning agents (and easier still if you can build from UE Main). If you DM me I can send you some sloppy code to help you test this out.

SuperParticle · November 12, 2024, 3:42am

希望多出一些教程