Switching UTF8CHAR back to char8_t now that UE5 requires C++20

This is primarily a question for Steve Robb, as I believe he drove the original change in 2021.

Now that UE5 requires C++20 across all platforms, would it be possible to revisit the definition of UTF8CHAR in GenericPlatform.h?

Today, the type is still declared as:

enum UTF8CHAR : unsigned char {};UTF8CHAR briefly became char8_t in CL16487510 when __cpp_char8_t was defined, but that branch was removed after an [ABI‑mismatch report: some modules were still being built in C++17 while others opted into C++20, so their signatures [Content removed] Today, every shipped toolchain and console SDK compiles the engine itself in C++20 (at least at the language level from what I can tell), so the mixed‑standard situation that forced the rollback shouldn’t arise. In the original UDN thread, the plan was to keep the enum until C++20 became the baseline, which it now is.

With that in mind, could we restore (or even make unconditional) the char8_t definition from CL16487510?

#if defined(__cpp_char8_t)
    using UTF8CHAR = char8_t;
#else
    enum UTF8CHAR : unsigned char {};
#endif

For the past few weeks, I’ve been building with UTF8CHAR aliased to char8_t. The only follow-up work was clearing a handful of UTF8CHAR <=> ANSICHAR overload ambiguities, which were straightforward fixes. Making char8_t the engine‑wide type should help prevent future implicit conversion ambiguities from creeping in.

Day‑to‑day development already feels safer: whenever we pass UTF‑8 text to C libraries like SQLite, we can cast u8‑prefixed literals to const char* function parameters, and be confident that the bytes are genuine UTF‑8 and that we’re not relying on undefined behavior.

Are there any remaining target platforms, SDKs, or other concerns that would block this change? If not, would Epic be open to a pull request that reinstates the char8_t feature switch (or simply makes it the default)?

Thank you for looking into this. Please let me know if further details would be helpful!

~ Connor Widtfeldt

Hi Connor, this has already been done and will be available in the 5.7 release. You can see the change here:

https://github.com/EpicGames/UnrealEngine/commit/92c26f9622a822c1f82530d7f6171b6ccedfb8e8

Steve

Oh, this is fantastic--thanks for implementing this!

Somewhat related. Are there any plans to fix the compilation issues when enabling bTCHARIsUTF8?

There are a few comments that seem to suggest that too many assumptions have been made about TCHAR to change it.

// TCHAR has been wide for too long to fix
static_assert(sizeof(TCHAR) == sizeof(WIDECHAR), "TCHAR is expected to be wide");

That could be a Windows issue, though, as I know we redefine TCHAR as ANSI for some platforms without problems.

Hey, sorry, I thought I had replied to this already but I’m getting pinged about it not being addressed, so it must have not got posted.

I am surprised that you say you have redefined TCHAR as ANSI because UE4 shipped with this in Platform.h:

checkAtCompileTime(!PLATFORM_TCHAR_IS_4_BYTES || sizeof(TCHAR) == 4,TypeTests_TCHAR_size);
checkAtCompileTime(PLATFORM_TCHAR_IS_4_BYTES || sizeof(TCHAR) == 2,TypeTests_TCHAR_size);

… which has now evolved into this:

#if PLATFORM_TCHAR_IS_4_BYTES
	static_assert(sizeof(TCHAR) == 4, "TCHAR size must be 4 bytes.");
#elif PLATFORM_TCHAR_IS_UTF8CHAR
	static_assert(sizeof(TCHAR) == 1, "TCHAR size must be 1 byte.");
#else
	static_assert(sizeof(TCHAR) == 2, "TCHAR size must be 2 bytes.");
#endif

This is not to mention engine functions that overload const ANSICHAR* and const TCHAR* and so wouldn’t compile if these are the same type, or functions which pass a TEXT(“String”) literal to functions taking const WIDECHAR*.

There is no current plan to make bTCHARIsUTF8 work - we now have FUtf8String and will be transitioning over to using that.

Steve