Windows file name to std::string ?

Hello,

I need to handle zip files in my code and use a cpp library to do so (ZipLib).

My code looks like this:


FString FileFilter = "*";

    // get content
    TArray<FString> DirContent;
    IFileManager::Get().FindFilesRecursive(DirContent, *ThisGS->GetYagModulesDir(), *FileFilter, true, false);

    // +1 because of the last "/"
    int32 ModuleDirNbOfChar = AYagGameState::YagModulesDir.Len() + 1;

    // filling the zip
    for (int i = 0; i < DirContent.Num(); i++)
    {
        // source
        const FString FileOnDisk = DirContent*;
        const std::string FileOnDiskStdStr = std::string(TCHAR_TO_UTF8(*FileOnDisk));

        // target
        const FString FileInArchive = FileOnDisk.RightChop(ModuleDirNbOfChar);
        const std::string FileInArchiveStdStr = std::string(TCHAR_TO_UTF8(*FileInArchive));

        //GEngine->AddOnScreenDebugMessage(-1, 10.f, FColor::Red, FString::Printf(TEXT("ZipModule file: %s"), *FString(FileOnDiskStdStr.c_str())));
        ZipFile::AddFile(ZipPathStdStr, FileOnDiskStdStr, FileInArchiveStdStr);
    }

ZipLib needs std::string as an input.
This code works well when the file name doesn’t contain special characters.

But as soon as there are files with special character, like so:

FileNameSpecial.jpg
I get a crash because the file is not found and can’t be opened:

And it can’t be opened because its name is not read correctly:

Now, std::string(TCHAR_TO_UTF8(*FileOnDisk)) seems to be the standard way to convert an FString to std:string in UE4 (it comes from a Rama tutorial and is the standard answer found on google), but it doesn’t seem to work well with windows file name FStrings.

If i enable the debug message here, i get a wrong file name also: “hé hé” becomes “h?? h??” (and using TCHAR_TO_ANSI gives “h? h?”, so each “?” is one wrongly encoded byte apparently).

Any idea of how to handle that ?

As always, i ask in case anyone knows, and will post the answer if i find it.

Thanks
Cedric

The whole idea of IFileManager is to have a way to handle files platform independent. It won’t return ever a character set different. Which platforms you intend to run the code? PC with Windows only? If just Windows you can ignore IFileManager and use Windows SDK API to handle the find files for you, which will take care of the code page (character set) in use. If you want to use the code in any other platform support by Unreal you might just don’t allow anything in a different character set for the file names.

I didn’t know about IFileManager mentioned by @NilsonLima. Thanks for that.

Anyway, using TCHAR_TO_UTF8 or TCHAR_TO_ANSI strongly discouraged in newer engine versions. StringCast is the way to go.

I have implemented my own Save/Load system and have been using in other UDK/UE3 and ported over the same system to UE4. Here is the FileSystem implementation: Source/GodsOfDeceitPlatformImpl/Private/GPlatformImpl/GSystemImpl.cpp · master · Seditious Games Studio / GodsOfDeceit · GitLab

I never any issues with unicode file names. Just note that the code in that repository is the partial implemention since I have refactored things a bit and evantualy will upload. But, if you take a look at my methods they accept FString as parameters and as return values, while the underlying code implemented using Boost uses std::string. I did the conversion using StringCast.

You might want to take a look at this answer hub’s question why using TCHAR_TO_* or vice-versa instead of StringCast is a bad idea: TCHAR_TO_ANSI limitation to 128 characters? - Character & Animation - Unreal Engine Forums

That explanation at AnswerHub about StringCast<> is great. Nothing better than a good example with some lines explaining. Good find!

Hey,

Thanks for your answers :slight_smile:

@NelsonLima: thanks for the IFileManager info. Yes, my game is for PC/windows only, so if i don’t find a better way, i might end up using the Windows way to list files, but i’d like to do that with standard UE4 tools, like they’re supposed to^^.

@NuLL3rr0r: StringCast seems to protect me from the risk of a null ptr, but if (big if there^^) i understand correctly, that wasn’t my problem using the macros since i was getting a file name “close enough” to the original one. And indeed, when using StringCast like so:



auto Conv = StringCast<ANSICHAR>(*FileOnDisk);
const std::string FileOnDiskStdStr = Conv.Get();

I get exactly the same result (crash with a wrongly encoded file name, all special chars encoded on 1 byte as expected from ANSI: “hé” becomes “h?”).
And StringCast won’t compile with WIDECHAR, UTF8CHAR not UTF16CHAR.
But i believe std::string is waiting an ANSICHAR anyway, so i guess ANSICHAR should be the right one, although here it’s not working.

So although i see the benefit of using StringCast over macros and thank you for this info, the use of StringCast doesn’t seem to solve my problem unless i’m using it incorrectly.

I take a lot of language precautions because i’m in complete personnal uncharted territory here so i might misunderstand things :slight_smile:

Going on with my researchs, still opened to suggestions, and of course will post any progress :slight_smile:

Thanks
Cedric

Hi, my post is not a solution but a suggestion to rethink the cause of the problem. Instead of storing the special characters in the file name, why not either
a) write without special chars and use a replace convention, e.g. "whenever I have <é> I will replace with e_, <ù> becomes _u, so that you can save the file then you switch back to your chars on read, or
b) save as “Shaman he he.csv” and attach some metadata to a separate Shaman he he.meta file where you store character variables, e.g. characterName=“Shaman hé hé”

Peace

Hey AndreiMC81,

Thanks for your clever and original suggestions but they appear a bit impractical in my case and raise other problems:

  • i should have to keep my own map between official encoding (é) and my own encoding (e_), and given the number of special characters around, maintenance would be impossible: it amounts to create my own encoding system^^.
  • the files are named after the names of the character they contain and are searched according to this name
  • then, i need my files to be user readable, as i export them in csv to allow users to edit them in a 3rd party software

I wouldn’t want to impose to users a rule looking like “if your character is named hé, you’ll find it in a file named he_.csv”

Thanks though for the ideas :slight_smile:

Cedric

I’m sure there is some better way to do this, but if you want to try and just brute force convert an FString to a Multi Byte Narrow String, you can try this:




    void FStringToMultiByteString(const FString& inString, std::string& outString)
    {
        for (int i = 0; i < inString.Len(); ++i)
        {
            char mbChar[8] = { 0 };

            int cp = (int)inString*; // get the integer code point for our TCHAR.

            if (cp < 0x80) // 1 byte char
            {
                mbChar[0] = cp & 0x7F;
            }
            else if (cp < 0x0800) // 2 byte char
            {
                mbChar[0] = ((cp >> 6) & 0x1F) | 0xC0;
                mbChar[1] = (cp & 0x3F) | 0x80;
            }
            else if (cp < 0x010000) // 3 byte char
            {
                mbChar[0] = ((cp >> 12) & 0x0F) | 0xE0;
                mbChar[1] = ((cp >> 6) & 0x3F) | 0x80;
                mbChar[2] = (cp & 0x3F) | 0x80;
            }
            else // 4 byte char
            {
                mbChar[0] = ((cp >> 18) & 0x07) | 0xF0;
                mbChar[1] = ((cp >> 12) & 0x0F) | 0xE0;
                mbChar[2] = ((cp >> 6) & 0x3F) | 0x80;
                mbChar[3] = (cp & 0x3F) | 0x80;
            }

            outString.append(mbChar);
        }
    }


Hi ExtraLifeMatt,

Thanks for your answer and super interesting and informative function !

Unfortunately it has the same effect as the TCHAR_TO_UTF8 macro, so i can only confirm that your function works very well :slight_smile:

Here you can see that both variables are defined using each method and are displayed the same way after crash:

I’m not even sure that what the debugger displays is what the AddFile function is seeing, because the debugger has its own way of displaying characters (i get something different with AddOnScreenDebugMessage: “hé” becomes “h?” with TCHAR_TO_ANSI and becomes “h??” with TCHAR_TO_UTF8 and both are different from the characters displayed in the debugger.

So that’s kind of difficult to debug^^

Well, going on with my tests :slight_smile:

Cheers
Cedric

Hmmm, i’m starting to be afraid of something.

I’ve printed the three available translation of FString into std:string:

threeEncodings_src.jpg

And only the first one is correct:

threeEncodings.jpg

The UE4 encoding doc says FStrings are UTF16, which gives sense to the above result: the constructor FString will read UTF8 and ANSI as UTF16, hence the wrong displays.

Now, Windows uses UTF16, so the file name should be encoded in UTF16, so i think the TCHAR_TO_WCHAR macro should be used.

But the AddFile function in the library i use (which leads to the AddEncrpytedFile herebelow) takes only std:string (so ansi or utf8) as an input and use it in std::ifstream::open():

AddFile.jpg

But it looks that std::ifstream::open() only works with *char or &string:
http://www.cplusplus.com/reference/fstream/ifstream/open/

But to work with UTF16 i would need a function capable of working with std::wstring, which is not the case here.

So i’m afraid that ZipLib can’t work with UTF16 and is simply not compatible with Windows.

I hope it’s not that and i don’t have to go hunting another zip library that i can embed in my code.
Finding this one was nightmarish enough !

Anyone can see a flaw in my reasoning ?
I really hope so :slight_smile:

Thanks
Cedric

Oh i’m more and more afraid:

Quoting:

But of course, ZipLib::AddFile() is not overloaded with wtring/wchar, so it seems all hope is lost and i’m going to have to:

  • either find a zip library compatible with windows file names
  • write my own wstring/wchar fonction and try to blend it with this lib

Well…

Cedric

All right, problem solved :slight_smile:

I found another zip lib:
https://libzip.org/

that has a more clever add function unicode wise:
https://libzip.org/documentation/zip_file_add.html

This was what caught my eye:

It’s a C lib, but very fortunately some clever guys made a VS version of it:

So creating a static lib of it was super easy and i could quickly make a valid zip with that one:

ZipOK.jpg

And this one works properly with UTF8, here are the two main calls in the FindFile loop:


zip_source_t* ThisSourceFile = zip_source_file(ThisArchive, TCHAR_TO_UTF8(*FileOnDisk), 0, -1);
zip_file_add(ThisArchive, TCHAR_TO_UTF8(*FileInArchive), ThisSourceFile, ZIP_FL_ENC_GUESS);

So the trick was to use a library capable of properly dealing with file names encoding :slight_smile:

Well, good experience, it ends well and i learned a thing or two in the process, what more could i ask for !

Thanks to all for your time, i hope this will help anyone trying to zip from UE4.

Cheers
Cedric