We’re training a policy and a critic since it is PPO. The policy and the critic are both neural networks that need to be trained, so we can control the learning rates for each separately.
2 Likes
We’re training a policy and a critic since it is PPO. The policy and the critic are both neural networks that need to be trained, so we can control the learning rates for each separately.