How to stop a Learning Agent when he reaches his goal?

SimoneDC01 · July 12, 2025, 9:33pm

Hi @Deathcalibur and everyone else,
I am a complete beginner in Learning Agents and I would like to ask some questions to get an help with my Master’s Thesis Project. I followed and implemented the provided tutorial about the Sports Car RL Agent, which gave me some understanding of the basic concepts. However, a lot of things are not still clear to me, like the following:
I want to implement a RL Agent which has to reach a particular goal (e.g. “Go to that position”). I tried to apply the same logic of the tutorial, but when the agent gets close to the goal point he starts “gravitating” or swinging around it in order to keep the reward as high as possible. I would like to manually “pause” the agent when he is reasonably close to the goal, and then switch on again his brain later when I need him to move towards the next goal. Which is the best practice to achieve this behavior? Do I have to add to the agent actions a boolean variable which tells him to stop if close enough to the goal? Or maybe do I have to stop the Agent in the manager, by pausing the “Run inference” mode? If the latter, can I still rely on the reward as a measure of “how much I am close to the goal”, even if the agent is not in Training phase anymore? I hope my question is understandable, but if it is not please tell me, so I can explain myself better. Thanks everyone in advance!

Deathcalibur · July 15, 2025, 2:32pm

I have seen the gravitating behavior before and I think this is simply a reward function that isn’t written well. You should keep track of the closest distance to the goal and only reward when progress is being made, i.e. the agent got closer than the best previous closeness. If your reward is simply based on proximity, then I believe you will get what you are seeing. If what I described is not the case, then I would need more details on your reward function to offer any advice.

If you want to pause the behavior, you could probably call RemoveAgent and then later call AddAgent again? I think that’s the best bet but it might not work well in your project for reasons that I can’t anticipate.

Let me know if that helps!

SimoneDC01 · July 19, 2025, 2:54pm

Thank you so much for your answer! I will post here all the details about my implementation and goals so I will be able to make more specific and understandable questions.

My goal
My project consists in animating chess matches, starting from PGN files and producing 3D replays of the moves. Between one move and the next, I want the camera (my RL Agent) to move in a good position to capture the upcoming scene. However, I’m still trying to figure out basic concepts about RL so for now I have implemented a very simple task consisting in Agent moving to a Goal Location.

Agent
My RL Agent is a Character which a Camera component and a Collision Capsule (even though for now I have disabled all collisions for the agent to keep the task as simple as possible). Since I am using as my only reference the Learning to Drive tutorial, I allowed the Agent to move only on his forward/back axis (no lateral shifting) and to rotate only on his Z axis (yaw), in order to try to simulate a car movement.

Observations
I have tried some different combinations but I was not always able to fully understand which observations were useful and which were redundant.
Currently, I am observing Goal location and distance from the Agent, and Agent location and direction.

Does this set of observations make any sense? It is redundant to observe the Agent’s own location when the Goal location is already transformed relative to the Agent? I am missing some key observations?

I plan to add Ray Casting to the observations in the future since somewhere here in the Dev Community I read some posts where it was said to work well.

Actions
Similarly to Learning To Drive I used two float actions to add a movement input on forward axis and a rotation input on Z axis. When I watched the training process I found the rotation to be a bit choppy and not smooth at all, even when I tried to scale down the float action by a coefficient (Rotation speed).

Is this due to the Manager Tick Interval set to 0.1 seconds? Do I have to keep this value also when I Run Inference or it is just for the training phase? Do I have some other ways to make the movement smoother?

Completion
Following tutorial’s logic I returned a Completion when the Agent is too far from the Goal.

Reward
Also in Reward function I tried various combinations, but I decided to upload the simplest one. Here I have kept only a reward for Location similarity between Goal and Agent and a Penalty given when the distance overcomes a threshold.

The threshold is set on the same value of the Completion’s one (as in the tutorial). Is it correct? The penalty is assigned even though the episode ends at the same moment?

Training
The agents are spawned/resetted randomly in a square of side 1500 centered on Chessboard center (the Chessboard is about 1600x1600). The goal location is Cell A1 (highlighted in red). All collision are disabled for the Agent so the giant chess pieces should not affect the training.
After about 2500 iterations this is the result (the video is taken from training phase, not inference):

The result is far from perfect so I have some doubts.

The number of iterations is low? How many iterations should I expect to wait before achieving a good result?
Even when the Agent reaches the Goal, he does not stop, but keeps moving back and forth around it (this was the topic of my original question in this forum but I hope to have given more context now). Does it depend from the reward function? Should I reward the agent when he reaches the best-so-far distance from the Goal? This reward should be added to the Location Similarity Reward or should substitute it? Also, when the Agent reach a best distance, in which function should I update this Best Distance value? I was thinking to update it in Agent’s Blueprint, but since it ticks more often than the Manager I supposed it would not work properly. Maybe should I update the value in Reward function itself?

That’s all. I tried to be as clear as possible, but if I have not explained myself well please tell me so I can add informations. Any suggestion/answer is welcome, thank you in advance for your help!