I am working through Learning Agents 5.4 and I am running into some areas of confusion.
Trainer:
If I am setting up Completions, Rewards, and Resets, how am I accessing data at runtime not already exposed within the Trainer?
Ex. I want to reset an agents episode if it contacts the object I have tagged as ‘reset’.
Reset: If I am crudely looping through agents (regardless of whether they have made contact yet or not), how am I receiving the event or condition from the Agent blueprint?
Am I setting up an event dispatcher within the Agent and calling it On Component Hit?
Am I setting a public boolean to true?
It’s not entirely clear what best practice there is here.
Completion: Is the completion set as the effect of the Reset Episode call or is it what determines if an episode is Reset?
Reward: If I want to set a penalty or reward for contacting the object, is the GatherReward(s) function(s) accessing this event or condition independently of the Completion or Reset?
If I am attempting to create a reward for a generic velocity similarity using random vectors, am I able to access the current step of the agent or the tick of the world in order to set the random vector based on an elapsed time?
Should the values that constitute the reward (the desired velocity, the actual velocity, the difference, etc.) be managed within the Agent or the GatherReward(s) function(s)?
At the moment, I don’t have any questions related to the Manager, Interactor, Policy, Critic, Neural Networks, Agent, ImitationTrainer, various Settings, Recorder, or Recording(s). However, if anyone does feel free to post it here and if I can answer, I will.
Thank you ahead of time for any responses, and good luck to my fellow Learning Agents.