Hi,
While upgrading our version of the engine to 5.5.4 I was reading through the LearningAgents code and noticed FEpisodeBuffer::GetRewards() seems to use the incorrect index for fetching rewards, where it does RewardArrays[RewardId][RewardId] instead of RewardArrays[RewardId][InstanceIdx]. Notice that PushRewards() uses RewardArrays[RewardId][InstanceIdx]. Also the other Get/Push functions use the InstanceIdx in the same way PushRewards() does, making this more suspicious.
`const TLearningArrayView<2, const float> FEpisodeBuffer::GetRewards(const int32 RewardId, const int32 InstanceIdx) const
{
UE_LEARNING_CHECKF(RewardId >= 0 && RewardId < RewardArrays.Num(), TEXT(“Reward id invalid!”));
// vvvvvvvv Should be InstanceIdx?
return RewardArrays[RewardId][RewardId].Slice(0, EpisodeStepNums[InstanceIdx]);
}`
As an aside, the whole reason we are doing this is to be able to calculate the same experience stats that the python script does - specifically the avg_reward and avg_reward_sum. Ideally this calculation would be made in C++ and passed to python so we could retrieve it instead of attempting to redo the calculation ourselves.
Thanks!