Is it possible to add a control function at the agent level for Gathering/Performing?

pulupulu2000 · June 15, 2024, 9:39am

I followed the tutorial to create my own spaceship control Agent. During the Reset Agent Episode step, since my agent uses PhysicsConstraints and has SimulatePhysics enabled, I found that I cannot complete the agent’s position reset within a single tick. The engine requires a few ticks to achieve the desired position reset.

During this reset process, LearningAgents continues to interact with the agent, leading to anomalies in sample data. For example, data in GatherObservation is based on the pre-reset position, while GatherReward is calculated based on the post-reset position.

I attempted to implement my own AgentStatus control and added logic in functions like Gather Agent Observation to check whether sampling should be performed. For instance, if AgentStatus == PAUSE, these callback functions would not be called. However, the current ProcessExperience requires that the iteration counters for Observation, Action, Rewards, and Completion be the same. Without direct control over these counters, this approach often leads to Non-matching Iteration Number errors.

I also tried using RemoveAgent/AddAgent to replace the reset but encountered similar issues during agent initialization. Specifically:

If I wait for the agent’s position initialization to complete before adding the agent, I cannot ensure the agent starts from the beginning of a new sampling cycle, leading to a Non-Matching Iteration Number Error
If I add the agent without waiting for the position initialization to complete, data anomalies and pollution occur during sampling.

In summary, would it be possible to add a control function at the agent level for Gathering/Performing? This would allow developers to directly implement simple sampling control logic, such as pausing, starting from the beginning of next cycle, or discarding the current cycle.
This would greatly help in managing agent interactions.

Deathcalibur · June 18, 2024, 1:24pm

Thanks, this is valuable feedback as I haven’t experimented with a use-case like this yet.

Have you tried using the “lower” level API provided by the trainer? E.g. here’s a training program I setup in my manager for playing both sides of Connect Four like game:

Instead of calling RunTraining/RunInference, you can call the functions like:

Begin Training
Gather Completions/Rewards
Process Experience
etc.

Then have your on/off control logic here in the manager instead of inside the GatherObs function (i.e. the manager should control who is training, not the agents).

Let me know if/why that does not work if you can, but otherwise I will think about your use-case some more and see if I can’t think of a way to make it more convenient for you.

Thanks for the feedback,
Brendan

Deathcalibur · June 18, 2024, 1:40pm

I talked to my colleague who also trains some agents using the physics and his solution is to use the training settings to trim the first few samples from each episode:

He added these settings to make physics examples easier without having to muddle around with “agent pausing”.

Thanks,
Brendan

pulupulu2000 · June 22, 2024, 1:16pm

Thank you for your response, this completely resolved my current issue.

Thank you very much!

Deathcalibur · June 24, 2024, 1:46am

You are very welcome. Thanks for the questions! I learned something too

MrWetsnow · December 16, 2024, 5:12pm

Hi,

A related question. How do you handle invalid moves? (Although perhaps connect 4 doesn’t have invalid moves). There doesn’t seem to be support for masking valid moves in the framework and I am having a hard time just getting my agent to learn what an invalid move is.

Part of the issue is that the invalid move happens at Run Inference (so at the end of the train chain → completion, reward, process experience, run inference). Now I am in a situation where the agent tried to make an invalid move, so the game cannot proceed. I would catch this in a completion, but the completion doesn’t run until the next time the agent tries to move. (And before that the other player - which is just random - also needs to move).

Do you have suggestions on how to deal with this situation in a clean way?

thank you
Dan

Deathcalibur · December 16, 2024, 9:19pm

Masking is coming in 5.6. It’s currently available on UE5-Main if you compile from source.

We still need to finish testing it so it may contain bugs but it’s “code complete” and we did some preliminary testing.

Not sure what you should do in the meantime. Best you could do with the tools available would be to Terminate the episode with a negative penalty and the policy should eventually learn not to do invalid moves, but that’s not a great solution (IMHO).

Brendan

MrWetsnow · December 17, 2024, 9:54pm

Ok thank you. I’ll wait until 5.6.

MrWetsnow · June 7, 2025, 2:22pm

Hi,

Just wondering if this made it into 5.6? Is there an example or a pointer to some docs?

Thank you
Dan

Deathcalibur · June 26, 2025, 2:59pm

Hello, yes it made it. If you open an interactor, you’ll see there is a new function to override “Make Agent Action Modifier” along with a bunch of “Make Float Action Modifier” etc. functions. If you look at the float one, it might make intuitive sense, you determine in some code whether the mask is applied, and what the mask value is.

Sorry I can’t provide a more detailed tutorial. Let me know if you have more questions.