SSE intrinsic may mulfunction using Clang 19 with -ffast-math (OR: Recommendation to add -fno-finite-math-only)

anonymous-edc · May 9, 2025, 9:00am

NOTE: I selected 5.5 but this problem is not dependent on the Unreal Engine version, instead specific to Clang compiler and its version 19 onward. Tested with 5 Main for the time being.

Say I insert some code like below into arbitrary UE runtime code.

{ static VectorRegister4Float v = MakeVectorRegisterFloat(4.0, -3.0, 2.0, -1.0f); auto f = [](VectorRegister4Float const& x) { return VectorAbs(x); }; static VectorRegister4Float a = f(v); printf("%f %f %f %f\n", VectorGetComponent(a, 0), VectorGetComponent(a, 1), VectorGetComponent(a, 2), VectorGetComponent(a, 3)); }When compiled with Clang 19 for any supported SSE-enabled platform with FPSemanticsMode.Imprecise (hence with -ffast-math), I found that the runtime output may NOT be the correct absolute value. It can be zero or some broken float value.

This is due to the behavior change in Clang 19 as filed in this GitHub issue [[clang] -ffast-math in 19.1.0 prevents function from returning intended __m128 bitmask]([clang] -ffast-math in 19.1.0 prevents function from returning intended __m128 bitmask · Issue #118152 · llvm/llvm-project · GitHub) .

As discussed there, constant values used for bitwise intrinsics may be optimized out if is float NAN, despite them not intended for floating point interpretation. This affects numerous operation using constants defined in UnrealMathVectorConstants.h, including VectorAbs() which uses 0x7FFFFFFF.

Because the result depends on the optimizer’s interpretation, similar code may work in some case and not others, potentially causing a nasty sleeper bug.

Only viable/available/reliable workaround at the time of writing seems to be to add -fno-finite-math-only when -ffast-math is enabled, to make sure the compiler honor the non-finite values. With this option, the above code yielded correct results.

So I would like to suggest pairing of the -fno-finite-math-only to -ffast-math option in UBT for the time being.

Best Regards,

anonymous-edc · May 9, 2025, 9:00am

Steps to Reproduce
`{
static VectorRegister4Float v = MakeVectorRegisterFloat(4.0, -3.0, 2.0, -1.0f);
auto f = (VectorRegister4Float const& x) { return VectorAbs(x); };

static VectorRegister4Float a = f(v);
printf(“%f %f %f %f\n”, VectorGetComponent(a, 0), VectorGetComponent(a, 1), VectorGetComponent(a, 2), VectorGetComponent(a, 3));
}`When above code inserted into arbitrary runtime code, and is compiled using Clang 19 (regardless of platform) with FPSemanticsMode.Imprecise (hence with -ffast-math), the output is may NOT the correct absolute value.

Patrick_Laflamme · May 9, 2025, 2:57pm

Hi,

We fixed this issue with Clang 19 on Sony platforms yesterday (CL 42410628). Are you encountering it on other platforms using Clang 19? Which one?

Regards,

Patrick

anonymous-edc · May 12, 2025, 1:10am

Thank you for addressing the issue quickly! Will take a look later today.

I suspect and presume this may happen in Linux or any other platform which support SSE intrinsics, inherit from ClangToolchain and uses Clang version 19 onward, but cannot confirm as I do not have build environment at hand at the moment. Also, as the default in ClangToolchain is FPSemanticsMode.Precise, Imprecise must be specified in the build rule via some means, unlike SonyToolchain which used that value as its default.

In case of the Windows target, the current trunk happens to be free of this problem. I managed to get Windows VC Clang (19.1.1) environment to reproduce the issue, only after commenting out the following code block in VCToolChain.cs.

if (Target.WindowsPlatform.Compiler.IsClang()) { // FMath::Sqrt calls get inlined and when reciprical is taken, turned into an rsqrtss instruction, // which is *too* imprecise for, e.g., TestVectorNormalize_Sqrt in UnrealMathTest.cpp // TODO: Observed in clang 7.0, presumably the same in Intel C++ Compiler? FPSemantics = FPSemanticsMode.Precise; }So this seems to be a fortunate side effect. I however found it a bit confusing, that FPSemanticsMode.Precise is enforced at this point anyways regardless of the configuration (after going to the hassle of disabling SharedPCH usage if the mode was modified in the module Build.cs).

Regards,

anonymous-edc · May 12, 2025, 12:30pm

Following up, confirmed that the latest UE5Main at 42475801 and UE5.6 at CL42465431 has addressed this issue for SonyToolchain platforms, using -fhonor-nans in combo with -fhonor-infinity (together making it equivalent to -fno-finite-math-only).

anonymous-edc · May 22, 2025, 12:30pm

Double checking: do you plan on addressing this for other platforms? I saw that CL 42347577/42336969 “[Linux] Fix toolchain version check to disallow llvm 19 for now” removed support for Clang 19, which suppresses this issue for the time being as well.

Also, wondering whether application of rsqrtss as described in VCToolchain.cs applies to other platforms as well in terms of (lack of) precision.

Patrick_Laflamme · May 22, 2025, 2:38pm

Hi,

Yes, this will likely be fixed. We have a JIRA opened for Clang 19 causing issues and the one you reported is among them. On the JIRA, can can see that a build engineer is going to upgrade the clang global settings to match the one from PS5.

Regards

Patrick