I am finding that I cannot assign a negative weight to some rewards without triggering an assert in the LearningRewardObject file. My hope is that I could use rewards as penalties simply by giving them a negative weight (but perhaps this is not a good idea and the assert is protecting me from that?)
However, digging through the LearningAgentsRewards code, I see some mixing and possibly bugs of Scale
vs. Weight
variables. The reward functions like ScalarSimilarityReward
, ScalarPositionSimilarityReward
, etc., should all assert when Scale
is non-positive, but they are sometimes asserting when Weight
is non-positive (at least, for what was called weight when I supplied it back when I setup the reward).
Here’s my specific situation:
float PositionSimilarityReward(
const FVector Position0, const FVector Position1, const float Scale,
const float Threshold, const float Weight, const float Epsilon
) {
UE_LEARNING_ARRAY_VALUE_CHECK(Scale > 0.0f);
return Weight * DistanceToReward(FMath::Max(FVector::Distance(Position0, Position1) - Threshold, 0.0f) / FMath::Max(Scale, Epsilon));
}
The UE_LEARNING_ARRAY_VALUE_CHECK will assert on a non-positive Scale
. Weight is a different variable here and it is not checked so, I should be able to set Weight
negative to make this a penalty instead of a reward.
In my code I am doing the following:
AgentProximityPenalty = UPositionArraySimilarityReward::AddPositionArraySimilarityReward(this,
"AgentProximityPenalty", AgentProximityCount, 100.0f, -0.25
);
I’m trying to use the Position Array Similarity Reward as a penalty instead by assigning a Weight
of -0.25 while Scale
(in this context) is still the default of 100.0f.
When I do this, it triggers the above assert. I can’t quite unravel all the templates and macros to figure out where Weight
and Scale
became swapped (or even confirm that is what is happening) but really, the assert should only be triggering if the SCALE is negative, not the weight. So somewhere, the value of weight seems to have become scale (and I don’t think that is intended).
It might be time for me to get this building from source and see if I can definitively diagnose and fix this! Then I could point you to a pull request or something and you could probably more easily tell me if I’m crazy, but for now, wanted to document this here.