LearningAgents: Negative reward weights and scales (incorrect Assert)

berriers · October 1, 2023, 5:06pm

I am finding that I cannot assign a negative weight to some rewards without triggering an assert in the LearningRewardObject file. My hope is that I could use rewards as penalties simply by giving them a negative weight (but perhaps this is not a good idea and the assert is protecting me from that?)

However, digging through the LearningAgentsRewards code, I see some mixing and possibly bugs of Scale vs. Weight variables. The reward functions like ScalarSimilarityReward, ScalarPositionSimilarityReward, etc., should all assert when Scale is non-positive, but they are sometimes asserting when Weight is non-positive (at least, for what was called weight when I supplied it back when I setup the reward).

Here’s my specific situation:

float PositionSimilarityReward(
	const FVector Position0, const FVector Position1, const float Scale,
	const float Threshold, const float Weight, const float Epsilon
) {
	UE_LEARNING_ARRAY_VALUE_CHECK(Scale > 0.0f);
	return Weight * DistanceToReward(FMath::Max(FVector::Distance(Position0, Position1) - Threshold, 0.0f) / FMath::Max(Scale, Epsilon));
}

The UE_LEARNING_ARRAY_VALUE_CHECK will assert on a non-positive Scale. Weight is a different variable here and it is not checked so, I should be able to set Weight negative to make this a penalty instead of a reward.

In my code I am doing the following:

AgentProximityPenalty = UPositionArraySimilarityReward::AddPositionArraySimilarityReward(this,
	"AgentProximityPenalty", AgentProximityCount, 100.0f, -0.25
);

I’m trying to use the Position Array Similarity Reward as a penalty instead by assigning a Weight of -0.25 while Scale (in this context) is still the default of 100.0f.

When I do this, it triggers the above assert. I can’t quite unravel all the templates and macros to figure out where Weight and Scale became swapped (or even confirm that is what is happening) but really, the assert should only be triggering if the SCALE is negative, not the weight. So somewhere, the value of weight seems to have become scale (and I don’t think that is intended).

It might be time for me to get this building from source and see if I can definitively diagnose and fix this! Then I could point you to a pull request or something and you could probably more easily tell me if I’m crazy, but for now, wanted to document this here.

berriers · October 1, 2023, 5:12pm

Okay, well it just occured to me that I can work around this by swapping the input values for ‘Weight’ and ‘Scale’ (which seems harmless enough) and I did that and now the assert is not happening. I still think there’s a bug here, but that’s a workaround for anyone else!

No idea if it will still train correctly. I’ll let you know.

anorangeduck · October 3, 2023, 5:29pm

Thanks @berriers - this appears to be a bug. The arguments are swapped by UPositionArraySimilarityReward::AddPositionArraySimilarityReward unintentionally. Sorry about that and thanks for the report - we’ll fix it for the next release.

The workaround right now is to swap the arguments like you already have done.