How to stop a Learning Agent when he reaches his goal?

SimoneDC01 · July 19, 2025, 2:54pm

Thank you so much for your answer! I will post here all the details about my implementation and goals so I will be able to make more specific and understandable questions.

My goal
My project consists in animating chess matches, starting from PGN files and producing 3D replays of the moves. Between one move and the next, I want the camera (my RL Agent) to move in a good position to capture the upcoming scene. However, I’m still trying to figure out basic concepts about RL so for now I have implemented a very simple task consisting in Agent moving to a Goal Location.

Agent
My RL Agent is a Character which a Camera component and a Collision Capsule (even though for now I have disabled all collisions for the agent to keep the task as simple as possible). Since I am using as my only reference the Learning to Drive tutorial, I allowed the Agent to move only on his forward/back axis (no lateral shifting) and to rotate only on his Z axis (yaw), in order to try to simulate a car movement.

Observations
I have tried some different combinations but I was not always able to fully understand which observations were useful and which were redundant.
Currently, I am observing Goal location and distance from the Agent, and Agent location and direction.

Does this set of observations make any sense? It is redundant to observe the Agent’s own location when the Goal location is already transformed relative to the Agent? I am missing some key observations?

I plan to add Ray Casting to the observations in the future since somewhere here in the Dev Community I read some posts where it was said to work well.

Actions
Similarly to Learning To Drive I used two float actions to add a movement input on forward axis and a rotation input on Z axis. When I watched the training process I found the rotation to be a bit choppy and not smooth at all, even when I tried to scale down the float action by a coefficient (Rotation speed).

Is this due to the Manager Tick Interval set to 0.1 seconds? Do I have to keep this value also when I Run Inference or it is just for the training phase? Do I have some other ways to make the movement smoother?

Completion
Following tutorial’s logic I returned a Completion when the Agent is too far from the Goal.

Reward
Also in Reward function I tried various combinations, but I decided to upload the simplest one. Here I have kept only a reward for Location similarity between Goal and Agent and a Penalty given when the distance overcomes a threshold.

The threshold is set on the same value of the Completion’s one (as in the tutorial). Is it correct? The penalty is assigned even though the episode ends at the same moment?

Training
The agents are spawned/resetted randomly in a square of side 1500 centered on Chessboard center (the Chessboard is about 1600x1600). The goal location is Cell A1 (highlighted in red). All collision are disabled for the Agent so the giant chess pieces should not affect the training.
After about 2500 iterations this is the result (the video is taken from training phase, not inference):

The result is far from perfect so I have some doubts.

The number of iterations is low? How many iterations should I expect to wait before achieving a good result?
Even when the Agent reaches the Goal, he does not stop, but keeps moving back and forth around it (this was the topic of my original question in this forum but I hope to have given more context now). Does it depend from the reward function? Should I reward the agent when he reaches the best-so-far distance from the Goal? This reward should be added to the Location Similarity Reward or should substitute it? Also, when the Agent reach a best distance, in which function should I update this Best Distance value? I was thinking to update it in Agent’s Blueprint, but since it ticks more often than the Manager I supposed it would not work properly. Maybe should I update the value in Reward function itself?

That’s all. I tried to be as clear as possible, but if I have not explained myself well please tell me so I can add informations. Any suggestion/answer is welcome, thank you in advance for your help!