About Learning Agents

We’re training a policy and a critic since it is PPO. The policy and the critic are both neural networks that need to be trained, so we can control the learning rates for each separately.

2 Likes