I followed the tutorial to create my own spaceship control Agent. During the Reset Agent Episode step, since my agent uses PhysicsConstraints and has SimulatePhysics enabled, I found that I cannot complete the agent’s position reset within a single tick. The engine requires a few ticks to achieve the desired position reset.
During this reset process, LearningAgents continues to interact with the agent, leading to anomalies in sample data. For example, data in GatherObservation is based on the pre-reset position, while GatherReward is calculated based on the post-reset position.
I attempted to implement my own AgentStatus control and added logic in functions like Gather Agent Observation to check whether sampling should be performed. For instance, if AgentStatus == PAUSE, these callback functions would not be called. However, the current ProcessExperience requires that the iteration counters for Observation, Action, Rewards, and Completion be the same. Without direct control over these counters, this approach often leads to Non-matching Iteration Number errors.
I also tried using RemoveAgent/AddAgent to replace the reset but encountered similar issues during agent initialization. Specifically:
- If I wait for the agent’s position initialization to complete before adding the agent, I cannot ensure the agent starts from the beginning of a new sampling cycle, leading to a Non-Matching Iteration Number Error
- If I add the agent without waiting for the position initialization to complete, data anomalies and pollution occur during sampling.
In summary, would it be possible to add a control function at the agent level for Gathering/Performing? This would allow developers to directly implement simple sampling control logic, such as pausing, starting from the beginning of next cycle, or discarding the current cycle.
This would greatly help in managing agent interactions.