Hello everyone.
I am a student at a Japanese university and I am doing research using LearningAgents.
Thank you for developing this wonderful plugin.
I have some questions in using LearningAgents and I would like to ask for your help.
Could you please elaborate on the difference between the number of steps, iterations and episodes in Learning Agents?
In the tutorial, it was mentioned that “SAC” and “Q Learning” are supported, but how can I change it from PPO?
I apologize for the trouble, but I would appreciate your response.
(*I am not good at English, so I used a translation.)
An Episode is a series of states/actions from start to finish for one agent in the game environment.
Steps is the number of states/actions pairs the agent will encounter in a training episode. This is typically controlled by calling “RunTraining” during your manager’s Tick event. If you are running your manager’s Actor & game at 60 FPS, then it will be 60 steps / seconds. We suggest ticking slower generally speaking, perhaps 10 steps / second. In Learning Agents 0.1 for UE 5.3, when this threshold is hit, the episode will automatically terminate, so adjust higher if needed.
Iterations is how many repeats of the training process to run. The full training process is 1) collect data episodes to fill replay buffer and 2) sync to python process and run training on randomly sampled batches. An iteration is “filled” when either the Max Recorded Episodes Per Iteration or Max Recorded Steps Per Iteration is met.
SAC and Q-Learning are NOT currently supported but something we would like to get working before Learning Agents 1.0
Hello.
Thank you very much for your answer.
In short
Episode → refers to the process of learning an agent from “observation”, “action”, “reward”, to “completion” (up to the agent completed state)
Steps → the number of times the agent has taken an action until completion
Iteration → Number of times the training is performed (number of times)
Is this correct?
I have a few more questions,
The average reward in the log is output every 78 iterations, but how is the average reward, average value, etc. calculated (please also tell me how the average reward is calculated when the log is output)?
Also, could you please elaborate on the relationship between episodes, iterations, and steps?
Key questions we would like you to answer
What is the relationship between the number of episodes, iterations, and steps?
How do you calculate the average reward for log output?
We need to include this information in our papers, etc. … Thank you in advance for your time and help in answering this question.
(Again, I used translation software to ask this question.)
Your definitions for the episode, steps, and iteration look good.
To be clear, an iteration is triggered to train when either the max_episodes or max_steps thresholds are crossed, whichever comes first. Most likely you would be crossing the max_steps if using the default settings.
The log is outputted every training iteration, not every 78. The averages are computed by using the mean(). See {Your-Unreal-Install-Dir}\Engine\Plugins\Experimental\LearningAgents\Content\Python\train_common.py, line 323.
I think this is explained above. You define the settings for the steps/episode threshold in the training settings. I believe the default is 1000 episodes or 10000 steps. Whichever is crossed first will trigger a training iteration.
No problem, happy to take a look at any relevant paper that is using Learning Agents (assuming you have an English translation available).
Thank you for your response.
I ask because I was repeatedly prodded by my professor about the difference between iterations, number of steps, and episodes and could not answer well.
I have an additional question, I would like to know more about the linkage between the PPO algorithm and Learning Agents.
Thank you in advance.
Currently, we are only supporting PPO and the LearningAgentsTrainer is pretty closely tied to PPO. The PPO is our own implementation with some features we think are beneficial to game devs.
You can find the PPO code in:
{Your-Workspace}\Engine\Plugins\Experimental\LearningAgents\Content\Python\train_ppo.py and the related files.
Thank you for your response.
Last but not least, there is a learning rate (RatePolicy, RateCritic) in TrainerTrainingSetting, what is the difference between these two features?
We’re training a policy and a critic since it is PPO. The policy and the critic are both neural networks that need to be trained, so we can control the learning rates for each separately.