LearningAgents: FEpisodeBuffer::GetRewards() using incorrect index?

anonymous-edc · March 31, 2025, 4:56pm

Hi,

While upgrading our version of the engine to 5.5.4 I was reading through the LearningAgents code and noticed FEpisodeBuffer::GetRewards() seems to use the incorrect index for fetching rewards, where it does RewardArrays[RewardId][RewardId] instead of RewardArrays[RewardId][InstanceIdx]. Notice that PushRewards() uses RewardArrays[RewardId][InstanceIdx]. Also the other Get/Push functions use the InstanceIdx in the same way PushRewards() does, making this more suspicious.

`const TLearningArrayView<2, const float> FEpisodeBuffer::GetRewards(const int32 RewardId, const int32 InstanceIdx) const
{
UE_LEARNING_CHECKF(RewardId >= 0 && RewardId < RewardArrays.Num(), TEXT(“Reward id invalid!”));

// vvvvvvvv Should be InstanceIdx?
return RewardArrays[RewardId][RewardId].Slice(0, EpisodeStepNums[InstanceIdx]);
}`

As an aside, the whole reason we are doing this is to be able to calculate the same experience stats that the python script does - specifically the avg_reward and avg_reward_sum. Ideally this calculation would be made in C++ and passed to python so we could retrieve it instead of attempting to redo the calculation ourselves.

Thanks!

Deathcalibur · April 1, 2025, 1:47pm

Great catch. I will push up the fix shortly.

anonymous-edc · April 1, 2025, 3:28pm

Thanks for the quick turnaround. Can you paste the fix git sha or at least part of the commit message? We use git so it’ll make it easier to find.

anonymous-edc · April 1, 2025, 4:29pm

No problem, thank you!

anonymous-edc · April 1, 2025, 7:30pm

For future reference, it looks like the sha is 7eb4d365154f15fab5de21050d4eef93e4c78bed

https://github.com/EpicGames/UnrealEngine/commit/7eb4d365154f15fab5de21050d4eef93e4c78bed

Deathcalibur · April 1, 2025, 3:50pm

“Learning Agents: Fix a bug with GetRewards” on ue5-main. I don’t know the git sha. It looks like it hasn’t been pushed up to GitHub yet.

system · May 1, 2025, 10:01pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.