Hello,
I found an issue in the VectorCrossNoFMA function caused by a Clang compiler optimization that ends up generating FMA instructions. As the name suggests, this function should avoid using those instructions. I noticed an intentionally useless add-with-zero was introduced to “trick” the optimizer, but it looks like with newer Clang versions the compiler recognizes it and removes it.
Because of this optimization, the cross product of two identical vectors can produce non-zero values on all three components, and the exact values depend on the magnitude/order of the vector components being multiplied.
This might look like a negligible issue, but it can cause major problems. In my case, Sweep collision queries were returning inconsistent results, and the root cause was GJKRaycast2ImplSimd failing due to that cross product.
Since #pragma clang fp contract(off) doesn’t seem to work here, I implemented a workaround by declaring a volatile zero vector:
volatile VectorRegister4Float ZeroVectorV = VectorZero();
and changing these lines:
A = VectorMultiplyAdd(A, Vec1, VectorZero());
B = VectorMultiplyAdd(B, Vec2, VectorZero());
to:
A = VectorMultiplyAdd(A, Vec1, ZeroVectorV);
B = VectorMultiplyAdd(B, Vec2, ZeroVectorV);
This way the compiler cannot eliminate the addition and fuse it with the subsequent subtraction.
Would this be an acceptable fix?
Thanks.
Simone
[Attachment Removed]