FFileHelper::LoadFileToString Unsafe Size Casting

I was attempting to load a file to a string via FFileHelper::LoadFileToString. This function returned true/Success, but the Result was empty. Upon further debugging, I noticed in this code that the file size was turning into a negative value. The file size was actually ~2.7GB.

From inspecting the code, it looks like the root of the issue is the following cast, which takes an int64 value and blindly casts it to an int32 value:

https://github.com/EpicGames/UnrealEngine/blob/6978b63c8951e57d97048d8424a0bebd637dde1d/Engine/Source/Runtime/Core/Private/Misc/FileHelper.cpp#L223

int64 Size = Reader.TotalSize();

BufferToString(Result, Ch, **(int32)**Size);

My expectation is that this function should either handle larger file sizes, or there should be some sort of safeguard in this function that tests the int64 Size value for whether it can actually fit within an int32, then provides a warning and/or failure message accordingly.

I am working on optimizing our data to avoid such a large file as a way to mitigate this issue and quickly unblock our work, but I thought I should also report it here, as it seems like it’s exposing at least one bug that should be addressed in the core code (BufferToString is also a bit suspicious in how it handles negative size values).

Steps to Reproduce
Call FFileHelper::LoadFileToString on a file that has a size over 2,147,483,647.

In my case, I was using a file that was ~2.7GB, and this function returned true/Success, but the Result was empty.

Hi,

Thanks for reporting, I created a JIRA for the issue (UE-313580). I think the function should simply fail when the file is too large. Trying to load a >2GB file in a FString is probably not a good idea in general. Were you able to workaround this issue for your use case?

Regards,

Patrick

Hi,

Did you try to stream the JSON from the file? Something like that? I checked the code quickly and it will probably work. Depending how you need to consume the JSON data, that can be working for you?

TUniquePtr<FArchive> Reader = IFileManager::Get().CreateFileReader(Filename, Flags); TSharedRef< TJsonReader<> > Reader = TJsonReaderFactory<>::Create(Reader.Get()); ...If you need to load everything at once, I think the following function should get you all the file in memory. But you will need to split it to store it in several FString if you need it in this format.

bool FFileHelper::LoadFileToArray(TArray64<uint8>& Result, const TCHAR* Filename, uint32 Flags)Regards,

Patrick

I tried quite a few more things, including allocating my own buffer for the uncompression to bypass TArray usage by the file handler related functions. That seemed to get the file in memory OK, but none of my attempts to stuff that into a reader or FString worked, where there were other internal int32 limitations, often via usage of TArray, as well as issues with encoding conversion macros and functions being int32 limited.

I gave up at that point on making one big file work.

Rather than one file for a month’s worth of events, I’m just switching to a file per day’s events. There are pros and cons to that approach anyway, where in some ways it works better for our usage.

Hi,

Yeah, the engine shows its age. It was updated to handle large data when needed, but the FString is still stuck à 2GB. Splitting your file is probably the best workaround for the moment. I updated the JIRA mentioning the use case (very large JSON file). The devs will probably add the code to return false if the file is too big to be held into the FString, but the general case for handling very large strings will probably not unless this becomes an important/common issue.

Regards,

Patrick

I’ve tried quite a few things, but nothing has worked yet, as I just keep running into cases where UE code is using int32 for size values. It’s very systemic and generally unguarded and unsafely or incorrectly handling data counts greater than the max positive value for an int32. Fstring, TArray<uint8>, and many other core types all fail in various ways on values over the max value for an int32.

I am likely going to end up breaking this data up across multiple files as a way to work around this set of problems. UE just doesn’t seem capable of managing data of this scale.

I’m a bit surprised to discover this rather general limitation in UE, as coming from a game studio where we developed our our own engine and tools, we moved beyond int32 limitations like these in the early 2000’s.

I can somewhat understand gameside limitations of this nature, but on the editor side, this scale of data is within the scope I consider reasonable.

For more context here, we are loading a bunch of preprocessed telemetry events for usage in the Editor, to overlay a 3D view of that data in the world directly within the viewport, such as to view and inspect performance hotspots in the context of world objects, etc.

Currently we’re transporting a few million event data objects as JSON via this mechanism, which was working great, right up until the file helper started failing as the file size increased beyond 2 GB.

I know of ways we could avoid JSON, such as by using binary data with predetermined structure layouts, or even an actual local database, but JSON is a convenient format for the external processing tools to also utilize, as some of that is Python based. We’re also trying to stay rather generic and allow for quickly and easily handling additional properties in events as well as additional event types. JSON is very friendly for all that.

Anyway, I’m sure I’ll find some solution here, now that I’m aware that UE really isn’t designed to handle the scale of data I’m working with.