Awesome work Jonathan! You figured out how most of everything works. A couple changes needed in your implementation:
-
Your inference loop should call EncodeObservations before EvaluatePolicy otherwise your observations will be a frame delayed
-
Call the trainer methods (rewards/completions/iterate) before the Agent Type’s Encode/Evaluate/Decode (and skip trainer stuff on the first iteration of the loop)
Some notes:
-
Policies are going to move out of the agent type soon to become their own object (for more flexibility)
-
We’re going to remove the notion of AgentTypeComponents and instead you would compose your inference and training loops explicitly on an object, very similar to your “LearningManager”. We’re going to take away most of the callbacks (OnAgentSetup and OnAgentAdded) because we think the approach you are using is less error prone than using these callbacks. The major change to your implementation would be calling AddAgent(s) the trainer + calling SetupTrainer manually.
-
You can make your LearningManager Tick less than every frame if you want your actions to have some time to affect the world before you take your next observations.