Is it possible to add a control function at the agent level for Gathering/Performing?

I followed the tutorial to create my own spaceship control Agent. During the Reset Agent Episode step, since my agent uses PhysicsConstraints and has SimulatePhysics enabled, I found that I cannot complete the agent’s position reset within a single tick. The engine requires a few ticks to achieve the desired position reset.

During this reset process, LearningAgents continues to interact with the agent, leading to anomalies in sample data. For example, data in GatherObservation is based on the pre-reset position, while GatherReward is calculated based on the post-reset position.

I attempted to implement my own AgentStatus control and added logic in functions like Gather Agent Observation to check whether sampling should be performed. For instance, if AgentStatus == PAUSE, these callback functions would not be called. However, the current ProcessExperience requires that the iteration counters for Observation, Action, Rewards, and Completion be the same. Without direct control over these counters, this approach often leads to Non-matching Iteration Number errors.

I also tried using RemoveAgent/AddAgent to replace the reset but encountered similar issues during agent initialization. Specifically:

  • If I wait for the agent’s position initialization to complete before adding the agent, I cannot ensure the agent starts from the beginning of a new sampling cycle, leading to a Non-Matching Iteration Number Error
  • If I add the agent without waiting for the position initialization to complete, data anomalies and pollution occur during sampling.

In summary, would it be possible to add a control function at the agent level for Gathering/Performing? This would allow developers to directly implement simple sampling control logic, such as pausing, starting from the beginning of next cycle, or discarding the current cycle.
This would greatly help in managing agent interactions.

1 Like

Thanks, this is valuable feedback as I haven’t experimented with a use-case like this yet.

Have you tried using the “lower” level API provided by the trainer? E.g. here’s a training program I setup in my manager for playing both sides of Connect Four like game:

Instead of calling RunTraining/RunInference, you can call the functions like:

  • Begin Training
  • Gather Completions/Rewards
  • Process Experience
  • etc.

Then have your on/off control logic here in the manager instead of inside the GatherObs function (i.e. the manager should control who is training, not the agents).

Let me know if/why that does not work if you can, but otherwise I will think about your use-case some more and see if I can’t think of a way to make it more convenient for you.

Thanks for the feedback,
Brendan

I talked to my colleague who also trains some agents using the physics and his solution is to use the training settings to trim the first few samples from each episode:

He added these settings to make physics examples easier without having to muddle around with “agent pausing”.

Thanks,
Brendan

2 Likes

Thank you for your response, this completely resolved my current issue. :100:

Thank you very much!

1 Like

You are very welcome. Thanks for the questions! I learned something too :smiley: