VectorCrossNoFMA issue

Hello,

I found an issue in the VectorCrossNoFMA function caused by a Clang compiler optimization that ends up generating FMA instructions. As the name suggests, this function should avoid using those instructions. I noticed an intentionally useless add-with-zero was introduced to “trick” the optimizer, but it looks like with newer Clang versions the compiler recognizes it and removes it.

Because of this optimization, the cross product of two identical vectors can produce non-zero values on all three components, and the exact values depend on the magnitude/order of the vector components being multiplied.

This might look like a negligible issue, but it can cause major problems. In my case, Sweep collision queries were returning inconsistent results, and the root cause was GJKRaycast2ImplSimd failing due to that cross product.

Since #pragma clang fp contract(off) doesn’t seem to work here, I implemented a workaround by declaring a volatile zero vector:

volatile VectorRegister4Float ZeroVectorV = VectorZero();

and changing these lines:

A = VectorMultiplyAdd(A, Vec1, VectorZero());
B = VectorMultiplyAdd(B, Vec2, VectorZero());

to:

A = VectorMultiplyAdd(A, Vec1, ZeroVectorV);
B = VectorMultiplyAdd(B, Vec2, ZeroVectorV);

This way the compiler cannot eliminate the addition and fuse it with the subsequent subtraction.

Would this be an acceptable fix?

Thanks.

Simone

[Attachment Removed]

Hi Simone,

You have already really well understood the intend of that code. Resolving stability issue caused by asymmetric cross product with an FMA, and an not FMA function which cause big numerical instability.

The first version of the function was intentionally removing FMA, but then I haven’t manage to force all compilers to avoid it, so instead I force two FMA computation to have a symmetrical cross product computation.

Your fix is acceptable and trick the compiler to keep stable result. Good job !

[Attachment Removed]