About Learning Agents

Hello everyone.
I am a student at a Japanese university and I am doing research using LearningAgents.
Thank you for developing this wonderful plugin.
I have some questions in using LearningAgents and I would like to ask for your help.

  1. Could you please elaborate on the difference between the number of steps, iterations and episodes in Learning Agents?
  2. In the tutorial, it was mentioned that “SAC” and “Q Learning” are supported, but how can I change it from PPO?

I apologize for the trouble, but I would appreciate your response.
(*I am not good at English, so I used a translation.)

みなさんこんにちは。
私は日本の大学に通っている学生で、LearningAgentsを用いた研究を行っています。
この度はステキなプラグインを開発ありがとうございます。
LearningAgentsを使っていくうえでいくつか疑問点が出たのでご教授をお願いしたいです。

  1. 「Learning Agents」におけるステップ数とイテレーション、エピソードの違いについて詳しく教えてください。
    2)チュートリアルでは「SAC」や「Q学習」に対応してると記載されていたのですが、PPOから変更するにはどのようにすればいいのでしょうか。
  2. 最後にお願いしたいのですが、Qiitaという日本語サイトでLearning Agentsについて軽く考察したのですが,査読をお願いしたいです。(日本語で書いてありますが…)

お手数をおかけしますが、ご回答のほどよろしくお願いいたします。
(※私は英語が苦手なので,翻訳を使用しました.)

1 Like

Hello,

An Episode is a series of states/actions from start to finish for one agent in the game environment.

Steps is the number of states/actions pairs the agent will encounter in a training episode. This is typically controlled by calling “RunTraining” during your manager’s Tick event. If you are running your manager’s Actor & game at 60 FPS, then it will be 60 steps / seconds. We suggest ticking slower generally speaking, perhaps 10 steps / second. In Learning Agents 0.1 for UE 5.3, when this threshold is hit, the episode will automatically terminate, so adjust higher if needed.

Iterations is how many repeats of the training process to run. The full training process is 1) collect data episodes to fill replay buffer and 2) sync to python process and run training on randomly sampled batches. An iteration is “filled” when either the Max Recorded Episodes Per Iteration or Max Recorded Steps Per Iteration is met.

  1. SAC and Q-Learning are NOT currently supported but something we would like to get working before Learning Agents 1.0

Thanks,
Brendan

Hello.
Thank you very much for your answer.
In short
Episode → refers to the process of learning an agent from “observation”, “action”, “reward”, to “completion” (up to the agent completed state)
Steps → the number of times the agent has taken an action until completion
Iteration → Number of times the training is performed (number of times)
Is this correct?

I have a few more questions,
The average reward in the log is output every 78 iterations, but how is the average reward, average value, etc. calculated (please also tell me how the average reward is calculated when the log is output)?
Also, could you please elaborate on the relationship between episodes, iterations, and steps?

Key questions we would like you to answer
What is the relationship between the number of episodes, iterations, and steps?
How do you calculate the average reward for log output?

We need to include this information in our papers, etc. … Thank you in advance for your time and help in answering this question.
(Again, I used translation software to ask this question.)

こんにちは
ご回答くださりありがとうございます。
要するに
エピソード → エージェントを学習する上での「観察」、「行動」、「報酬」、「完了」までの流れを指す。(エージェント完了した状態まで)
ステップ → エージェントが完了するまでの行動をとった回数
イテレーション(イタレーション) → トレーニングを何回行うか(回数)
という認識で合ってますでしょうか。

あといくつか追加で質問なのですが、
Logに書かれる平均報酬ですが、イテレーション78回ごとに出力されているのですが、平均報酬、平均価値などの計算方法について(Log出力される際の平均報酬の計算方法についても教えてください)
また、エピソードとイテレーション、ステップの関係性について詳しく教えていただけないでしょうか。

お答えしていただきたい質問の要点
・エピソード、イテレーション、ステップ数の関係について
・ログ出力の平均報酬の計算方法について

論文等に記載する必要があるので…お手数をおかけしますがご回答のほどよろしくお願いいたします。

Your definitions for the episode, steps, and iteration look good.

To be clear, an iteration is triggered to train when either the max_episodes or max_steps thresholds are crossed, whichever comes first. Most likely you would be crossing the max_steps if using the default settings.

The log is outputted every training iteration, not every 78. The averages are computed by using the mean(). See {Your-Unreal-Install-Dir}\Engine\Plugins\Experimental\LearningAgents\Content\Python\train_common.py, line 323.

I think this is explained above. You define the settings for the steps/episode threshold in the training settings. I believe the default is 1000 episodes or 10000 steps. Whichever is crossed first will trigger a training iteration.

No problem, happy to take a look at any relevant paper that is using Learning Agents (assuming you have an English translation available).

Good luck!

1 Like

Thank you for your response.
I ask because I was repeatedly prodded by my professor about the difference between iterations, number of steps, and episodes and could not answer well.
I have an additional question, I would like to know more about the linkage between the PPO algorithm and Learning Agents.
Thank you in advance.

ーー
ご回答ありがとうございます。
イテレーション、ステップ数、エピソードの違いについて教授から何度も突かれてうまく答えられなかったのでお聞きしました。
追加で質問なのですが、PPOアルゴリズムとLearning Agentsの連携についてもう少し詳しくご教授願いたいです。
よろしくお願いいたします。

What exactly would you like to know?

Currently, we are only supporting PPO and the LearningAgentsTrainer is pretty closely tied to PPO. The PPO is our own implementation with some features we think are beneficial to game devs.

You can find the PPO code in:
{Your-Workspace}\Engine\Plugins\Experimental\LearningAgents\Content\Python\train_ppo.py and the related files.

Thank you for your response.
Last but not least, there is a learning rate (RatePolicy, RateCritic) in TrainerTrainingSetting, what is the difference between these two features?

ご回答ありがとうございます。
最後になりますが、TrainerTrainingSettingの学習率(RatePolicy、RateCritic)がありますが、この2つの機能の違いについて教えてください。
お忙しいところ申し訳ございません。
よろしくお願いいたします。

We’re training a policy and a critic since it is PPO. The policy and the critic are both neural networks that need to be trained, so we can control the learning rates for each separately.

1 Like

Thank you for your long time cooperation.
I am sure you will have more questions, so please let me know if you do.

長い間お付き合いくださりありがとうございました。
また疑問点が出てくると思いますのでその時はよろしくお願いいたします。