Download

Constructing a Unicode FString

So I have been trying to construct a Unicode string from a uint32 like this



FString Uni;
FString Result;
uint32 target = ‭128512‬;//unicode ‭1F600‬

Result=FString::Printf(TEXT("\\u000%x"),target);//Result will be "1f600‬"
Uni = TEXT("\u000" + Result);//error C4429: possible incomplete or improperly formed universal-character-name



Anyone know the correct way to do this?

Not sure what you’re trying to do here, but you need another backslash for the 2nd line - that’s why the compiler is angry.



Uni = TEXT("\\u000" + Result);


Slight correction to the code from above.


  
 FString Uni;
FString Result;
uint32 target = ‭128512‬;//unicode ‭1F600‬  
Result=FString::Printf(TEXT("\\%x"),target);//Result will be "1f600‬"  
Uni = TEXT("\u000" + Result);//error C4429: possible incomplete or improperly formed universal-character-name


The problem I have is I am reading data returned from a socket connection and when it sends Unicode data it appears to be encoded in a non-standard way so I have to decode the data myself to a uint32 .
In the case from the code above , target = 128512 which is unicode ‭1F600‬ or :).

If I did this



FString Uni = TEXT("\U0001f600");


Uni would equal :).

If I do this



Uni = TEXT("\\u000" + Result);


Uni would equal “\U0001f600”

So how do I take a hex string and turn it into a Unicode equivalent.

If you have the hex codes already in memory as uint32’s, you could likely just do a cast to wchar_t / TCHAR and see what that gets you.

If only it was that easy.
As it turns out the data stream from the socket is encoded in UTF8.
UE4’s FString is encoded in UTF16.
I have searched thru UE4 source expecting to find a UTF8 to UTF16 conversion function but as far as I can tell there is not one.
So I had to create one.
I found some code to help with that online.
https://stackoverflow.com/questions/…-16-stdwstring

I modified it to work with UE4 and to take a TArray<uint8> of the data coming from the socket.



FString UMyClass::UTF8_ArrayToUTF16_String(TArray<uint8> data)
{
    FString ReturnString = "";
    uint16 Value;
    Value = 0;

    TArray<unsigned long> unicode;
    size_t i = 0;
    while (i < data.Num())
    {
        unsigned long uni;
        size_t todo;
        bool error = false;
        unsigned char ch = data[i++];
        if (ch <= 0x7F)
        {
            uni = ch;
            todo = 0;
        }
        else if (ch <= 0xBF)
        {
            throw std::logic_error("not a UTF-8 string");
        }
        else if (ch <= 0xDF)
        {
            uni = ch & 0x1F;
            todo = 1;
        }
        else if (ch <= 0xEF)
        {
            uni = ch & 0x0F;
            todo = 2;
        }
        else if (ch <= 0xF7)
        {
            uni = ch & 0x07;
            todo = 3;
        }
        else
        {
            throw std::logic_error("not a UTF-8 string");
        }
        for (size_t j = 0; j < todo; ++j)
        {
            if (i == data.Num())
                throw std::logic_error("not a UTF-8 string");
            unsigned char ch = data[i++];
            if (ch < 0x80 || ch > 0xBF)
                throw std::logic_error("not a UTF-8 string");
            uni <<= 6;
            uni += ch & 0x3F;
        }
        if (uni >= 0xD800 && uni <= 0xDFFF)
            throw std::logic_error("not a UTF-8 string");
        if (uni > 0x10FFFF)
            throw std::logic_error("not a UTF-8 string");
        unicode.Add(uni);
    }
    std::wstring utf16;
    for (size_t i = 0; i < unicode.Num(); ++i)
    {
        unsigned long uni = unicode*;
        if (uni <= 0xFFFF)
        {
            utf16 += (wchar_t)uni;
        }
        else
        {
            uni -= 0x10000;
            utf16 += (wchar_t)((uni >> 10) + 0xD800);
            utf16 += (wchar_t)((uni & 0x3FF) + 0xDC00);
        }
    }
    //return utf16;

    ReturnString = utf16.c_str();

    return ReturnString;
}


This works fine but I am running into issues when I try to send the converted FString to be displayed in game.
One problem is when I try to display Unicode U+1f600 which is a Emoji font, as defined here , it ends up displaying 2 other characters from the font.
I first thought it had something to do with the font I was using. But I tried using a True Type font that is suppose to have that exact Unicode character in it and it still displays 2 other characters in UE4.
I know the encoding is working because I can go into Visual Studio and debug the FString it produces and it shows up in VS as the correct character.

I think there is something about how fonts are used in UE4 that I don’t know/understand.

If anyone can tell me how to take this FString



 FString Uni = TEXT("\U0001f600");


And display it in UE4 as :slight_smile: from a font I would be grateful.

Side note:
I am using the smiling face emote above from the list of emotes you can use in this web-page’s interface.
If I paste the actual Unicode smiley face it will mess up the post.
It looks fine in the preview but when you submit it to be posted it truncates everything from the smiley face on.

You can do this using my plugin Temaran Rich Text in Code Plugins - UE Marketplace
It uses glyphs instead of unicode fonts though, so if you are very adamant about using fonts it’s going to be an uphill battle I’m afraid.

/Temaran

Hi,

I’m trying to convert a string I receive via HttpRequest with special characters and accentuation … I managed to convert the result to a variable in C ++ successfully, but when I convert to FString using the forum answers or the answers I found but I still receive invalid characters … can anybody help me?
thanks…