Unnecessary calculations of motion matrices

(This is a translation of a [Japanese [Content removed] by Kitamura Toshihide.)

Thank you for your continued support.

It seems that there is an issue in UpdateRefToLocalMatricesInner(), which updates motion matrices. The issue is that unnecessary calculations of matrices are performed in the function.

In the function, RequiredBoneIndices and necessary bone indices are passed, and status checks such as Hide are performed. However, ultimately, in the following code, as all the bones included in the skeleton are used, unnecessary matrix multiplications are performed.

`const int32 NumReferenceToLocal = ReferenceToLocal.Num();

for (int32 ThisBoneIndex = 0; ThisBoneIndex < NumReferenceToLocal; ++ThisBoneIndex)

{

VectorMatrixMultiply(&ReferenceToLocal[ThisBoneIndex], &(*RefBasesInvMatrix)[ThisBoneIndex], &ReferenceToLocal[ThisBoneIndex]);

}`

ReferenceToLocal is initialized as the Identity based on the number of RefBasesInvMatrix in UpdateRefToLocalMatrices(). (In UE4, it was not initialized and contained NaNs). I think the final multiplication should be performed only for bones included in the render mesh to be updated. I’d be grateful if you could check this.

Also, I think it is not so necessary to update a matrix when Hide specified. However, since the parent’s matrix has a scale of zero assigned, it is updated.

Additionally, the handling of ExtraRequiredBoneIndices should be a problem. This may include bone indices that are the same as those of LOD.ActiveBoneIndices. In such cases, calculations are redundantly performed.

I have attached modified code for your review.

(For MasterPose, it seems necessary to update all the bones in the skeleton here, so it also copies all the bones.)

再現手順

  • Update the code of UpdateRefToLocalMatricesInner().
  • Place the engine asset Tutorial_Walk_Fwd into the level. (A Tutorial_Walk_Fwd actor will be placed.)
  • Start with “Simulate”, and verify whether the visual appearance remains fine after the change.
  • Pause the simulation, and open the engine’s TutorialTPP.
  • Select TutorialTPP_PhysicsAsset for the Shadow Physics Asset.
  • Enable Capsule Direct Shadow on the Tutorial_Walk_Fwd actor.
  • Now ExtraRequiredBoneIndices is updated. Check for the output of the log “ReferenceToLocal(%d) updated”.

Hi, yes, you’re correct that there are some savings that we could make here. Looking specifically at the calls to VectorMatrixMultiply, we can skip the hidden bones. They just take the parent transform with zero scale, so there’s no need to perform the extra multiplication on them. I’ll make a JIRA ticket about this for the dev team to change.

For cases where ReferenceToLocal contains all bones for the mesh, and not just bones for the current mesh LOD, my understanding is that’s intentional. This data is used to compute the motion vectors, and as such, needs to be valid for multiple frames. If we just stored the bones for the current LOD, this would cause problems when the LOD changes and the number of bones changed. Storing all the bones for the mesh avoids this problem. Having said that, this could likely be done in a more efficient way than this, so I will include the information on the JIRA ticket.

Regarding ExtraRequiredBoneIndices, however, we shouldn’t be performing redundant calls to VectorMatrixMultiply. That’s because, while we loop through all the RequiredBoneSets (including LOD.ActiveBoneIndices and ExtraRequiredBoneIndices), these should be writing to the same bone indices in ReferenceToLocal before we call VectorMatrixMultiply for all the bones. So if there are duplicate bones in LOD.ActiveBoneIndices vs ExtraRequiredBoneIndices, we shouldn’t have duplicate calls to VectorMatrixMultiply.

Let me know if I’ve missed anything.

Hi,

> Regarding the hidden bones, if they are completely not calculated, it would be faster and better. In that case, there would be no need to multiply by 0 scale, just a ZeroMatrix would be enough.

We need to set the hidden bones to take the parent transform (with zero scale) because otherwise the skinning of the character will break on the first hidden bone. So this part of the code is still required, unfortunately:

ReferenceToLocal[ThisBoneIndex] = ReferenceToLocal[ParentIndex].ApplyScale(0.f);> So I guess they might have been kept updated so that the application could still retrieve the coordinates from them. For this reason, I think it would be better to continue updating the hidden bones in the same way as now.

The transforms of the hidden bones aren’t updated, so if a user is getting the transform of one of the hidden bones on the mesh, it will be in the last animated position. I don’t see that there is a use case for that, so I think it would be ok to remove the call to VectorMatrixMultiply on the hidden bones.

> I meant that if there are duplicate bones in LOD.ActiveBoneIndices and ExtraRequiredBoneIndices, the same calculation using ReferenceToLocal[ThisBoneIndex] will be performed twice.

Yes, it’s possible that ReferenceToLocalwill be calculated twice for the same bone in this case. I think that’s relatively minor compared to the calls to VectorMatrixMultiply, but I’ll include it in the information from the dev team. This whole function could potentially be reimplemented to be more efficient, but there are always risks with making those kinds of changes.

> - NeedsUpdateBoneIndices: To do VectorMatrixMultiply only on bones that have ReferenceToLocal updated finally

Your changes to generateNeedsUpdateBoneIndices don’t include bones that are driven by a parent/leader component. This will break if you’re using Leader Pose Component. In that case, you still need to call VectorMatrixMultiply on those bones, even though they are driven by the parent component.

[mention removed]​

(This is a translation of a Japanese post by Kitamura Toshihide.)

Thank you for your continued support.

Let me add my opnions:

> Hi, yes, you’re correct that there are some savings that we could make here. Looking specifically at the calls to VectorMatrixMultiply, we can skip the hidden bones. They just take the parent transform with zero scale, so there’s no need to perform the extra multiplication on them. I’ll make a JIRA ticket about this for the dev team to change.

Regarding the hidden bones, if they are completely not calculated, it would be faster and better. In that case, there would be no need to multiply by 0 scale, just a ZeroMatrix would be enough. However, looking at the code from the past to the present, even when bones were hidden and 0 scales were multiplied, their positions were still being updated. So I guess they might have been kept updated so that the application could still retrieve the coordinates from them. For this reason, I think it would be better to continue updating the hidden bones in the same way as now.

> For cases where ReferenceToLocal contains all bones for the mesh, and not just bones for the current mesh LOD, my understanding is that’s intentional. This data is used to compute the motion vectors, and as such, needs to be valid for multiple frames. If we just stored the bones for the current LOD, this would cause problems when the LOD changes and the number of bones changed. Storing all the bones for the mesh avoids this problem. Having said that, this could likely be done in a more efficient way than this, so I will include the information on the JIRA ticket.

I think there is no issue with the fact itself that ReferenceToLocal contains all bones for the mesh. Bones that are not included in the RequiredBoneIndices are also updated using VectorMatrixMultiply. I think it is possible to optimize this part for better performance.

> Regarding ExtraRequiredBoneIndices, however, we shouldn’t be performing redundant calls to VectorMatrixMultiply. That’s because, while we loop through all the RequiredBoneSets (including LOD.ActiveBoneIndices and ExtraRequiredBoneIndices), these should be writing to the same bone indices in ReferenceToLocal before we call VectorMatrixMultiply for all the bones. So if there are duplicate bones in LOD.ActiveBoneIndices vs ExtraRequiredBoneIndices, we shouldn’t have duplicate calls to VectorMatrixMultiply.

I meant that if there are duplicate bones in LOD.ActiveBoneIndices and ExtraRequiredBoneIndices, the same calculation using ReferenceToLocal[ThisBoneIndex] will be performed twice. So, in the modified code, I use the ReferenceToLocalUpdated flag array to prevent the duplicate calculation.

Additionally, NeedsUpdateBoneIndices and ReferenceToLockalUpdated hold similar information, but there are cases where ReferenceToLocal is not updated for other reasons (e.g. ComponentTransform.IsValidIndex(ThisBoneIndex) ). That’s why I created NeedsUpdateBoneIndices.

Finally, I would like to summarize my modifications:

- ReferenceToLocalUpdated: To prevent duplicate calculation using ReferenceToLocal for the same bone

- NeedsUpdateBoneIndices: To do VectorMatrixMultiply only on bones that have ReferenceToLocal updated finally

- Hidden bones: The intentional operation where only the positions of the bones are updated to the final pose has not been changed.

Thanks.

(This is a translation of a Japanese post by Kitamura Toshihide.)

Thank you very much for your reply.

I have informed you of the main points I wanted to convey, so I would like to leave the modification to you.

Regarding LeaderPose, I have made almost no changes in the code, so I would like to leave this to you, as well.

As for your final comment, I guess that it is still necessary to collect the bones required for the render mesh of the component attached to the Master implemented in the past (or the current LOD’s render mesh if there are different LOD levels) and include them in RequiredBoneIndices.

However, this can also be achieved on the application side using HideBone, so I guess it is ok to leave the engine-side operation as it is.

Thank you.