TCHAR_TO_ANSI limitation to 128 characters?

Hi,
I ran into a very weird problem, at least on my system TCHAR_TO_ANSI is returning a non valid pointer when the input FString is >128 characters.

You can test it with this small code snippet:

FString oneHundred = "1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890";
char* p1 = TCHAR_TO_ANSI(*oneHundred);

FString twoHundred = "12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890";
char* p2 = TCHAR_TO_ANSI(*twoHundred);

In debug I see the first p1 is valid and has the correct data, while p2 has garbage data in.

What I’m doing wrong?

Hi,

There is no limit to TCHAR_TO_ANSI, but the result is a pointer to a temporary.

If you want to extend the lifetime, I recommend you use StringCast instead:

FString twoHundred = ...;
auto twoHundredAnsi = StringCast<ANSICHAR>(*twoHundred);
const char* twoHundredAnsiPtr = twoHundredAnsi.Get();

twoHundredAnsiPtr will be valid for as long as twoHundredAnsi exists.

Hope this helps,

Steve

1 Like

What you mean a pointer to a temporary? I’m in debug mode and as soon as that line is executed *p2 is wrong, while for *p1 its ok. I tried with different sizes of string, and it breaks exactly at >128. I tried your code and works, but I’m not sure of the difference and why TCHAR_TO_ANSI doesn’t work in my example.

We are using char* everywhere in the code, and I don’t think the answer can be char* is not reliable. right? Otherwise it means we cannot use TCHAR_TO_ANSI at all as it will break in case of large strings.

Btw I tested now this line, based on your example and it gives me also invalid pointer:

const char* twoHundredAnsiPtr = StringCast(*twoHundred).Get();

The code doing the allocation is in Engine\Source\Runtime\Core\Public\Containers\ContainerAllocationPolicies.h : line 507

where it tries to allocate more memory

It’s nothing to do with char*, it’s to do with the lifetime of the memory allocation which the pointer is pointing at.

A C++ temporary object: Temporary Objects | Microsoft Docs

The macro basically wraps up the StaticCast code above, but it does so by creating a temporary. The macro is useful as a handy conversion when passing to functions, but is dangerous to use on the stack:

void FuncTakingCharPtr(const char* String);

FString Str = TEXT("Hello");

// This is ok
FuncTakingCharPtr(TCHAR_TO_ANSI(*Str)); // A

// This is bad
const char* Ptr = TCHAR_TO_ANSI(*Str); // B
FuncTakingCharPtr(Ptr);                // C

It’s bad because the macro hides the lifetime of the converted string. On line B, a string object is created to hold the converted string. You then get a pointer to that string, but on the same line B, the string object gets destroyed. Now you are holding a pointer to freed memory, which you pass on line C, and everything explodes - maybe. (see below)

On line A, the same thing happens, except that the string object isn’t destroyed until after the function has run. So it’s definitely fine.

The 128 byte thing is an arbitrary result. At this point you have undefined behaviour and anything. It isn’t really ‘not broken’ when the string < 128 bytes, it’s just that you have been lucky to not have any visibly negative effects. Such is the nature of undefined behaviour.

Steve

2 Likes

This is an awesome answer!!! <3 This should be added to the documentation of TCHAR_TO_ANSI.

Can you add to your example above also the following cases, mentioning if good or bad:

  1. return of a function

char * myFunction () {
FString hey(“hello”):
return TCHAR_TO_ANSI(*hey);
}

  1. Usage within standard C++ functions:
    strcpy(dest, TCHAR_TO_ANSI(*sourceFString));

  2. If I need a char pointer to write into the memory of an FString, I was doing something like:

char* p = TCHAR_TO_ANSI(*input);
*p++ = 'x ’

I understand this is not good, as the first line will invalidate my data, but what is the right way to do it keeping a pointer? (I ask this because we have some legacy functions working on char * and now we need to pass in an FString.

Same problem here, got some issues converting TCHAR to ANSI. The string above 128 characters are returning garbage. ( also tested using StringCast, same issue).
(version 4.14.3)

Please post the code.

Steve

  1. This is bad because the temporary created inside TCHAR_TO_ANSI is a local variable in the function, will be destroyed as the function returns, so the memory it owns will be freed and you’ll now have a pointer to freed memory.

  2. This is good, because the temporary created inside TCHAR_TO_ANSI is being used as an argument to strcpy, so it will not be destroyed until strcpy returns. As strcpy is the only user of that pointer, and strcpy has just returned, the pointer cannot be used unsafely.

  3. This is the same situation as your first post, i.e. bad. The memory that p is pointing to is both allocated and freed on the same line that p is assigned. Thus p points to destroyed memory.

Consider this:

void Use(const TCHAR* Str);

void FuncA(const FString& Str)
{
    Use(StringCast<ANSICHAR>(*Str).Get()); // A1
}

void FuncB(const FString& Str)
{
    const ANSICHAR* Ptr = StringCast<ANSICHAR>(*Str).Get(); // B1
    Use(Ptr);                                               // B2
}                                                           // B3

void FuncC(const FString& Str)
{
    auto Conv = StringCast<ANSICHAR>(*Str);      // C1
    const ANSICHAR* Ptr = ConvertedString.Get(); // C2
    Use(Ptr);                                    // C3
}                                                // C4

I’m using StringCast here for clarity, but TCHAR_TO_ANSI(x) simply wraps up StringCast(x).Get().

In FuncA, the following happens in this order, on these lines:

  • A1: StringCast returns an object which holds/owns the converted string. As it is not a named variable, this object is a temporary (let’s call it T).
  • A1: Get() is called on T, which returns a pointer (also a temporary - call it P) to the converted string memory held by T.
  • A1: Use() is called with P.
  • A1: P is destroyed.
  • A1: T is destroyed, freeing its converted string memory.

Note that all of this happens on line A1 because of the lifetime of temporaries.

This is important! Temporaries are destroyed at the end of the statement in which they are used, whereas named variables are not destroyed until the end of the scope in which the name resides. T is destroyed after P because temporaries are destroyed in the reverse order they were constructed.

So in FuncB:

  • B1: StringCast returns an object which owns the converted string. This object is also a temporary (let’s call it T again).
  • B1: Get() is called on T, which returns a pointer to the converted string memory held by T which we call Ptr. Ptr is not a temporary, because it has a name.
  • B1: End of statement, so temporary T gets destroyed, freeing its converted string memory. Now Ptr is invalid!
  • B2: Use() is called with Ptr, but Ptr is invalid, so we’re now into undefined behaviour!
  • B3: Ptr is destroyed here, but it’s irrelevant as the program is in a broken state by this point. If we’re lucky, it has already crashed and isn’t actively corrupting memory.

In FuncC:

  • C1: StringCast returns an object which holds the converted string which we call Conv. Conv is not a temporary, because it’s named. This also means that it’s not destroyed at the end of the statement.
  • C2: Get() is called on Conv, which returns a pointer to the converted string memory held by Conv which we call Ptr.
  • C3: Use() is called with Ptr.
  • C4: Ptr is destroyed.
  • C4: Conv is destroyed, freeing the converted string memory.

FuncA and FuncC are safe because the pointer to the converted string is always destroyed before the object which owns the memory for that string. FuncB is not safe because the converted string object is temporary and has a shorter lifetime than the pointer which is pointing to its internal string buffer.

And because using TCHAR_TO_ANSI is equivalent to what’s happening in FuncA and FuncB, you can hopefully see when this is going to go wrong. The moral of the story is basically “Only use TCHAR_TO_ANSI as a function argument”.

FuncC isn’t possible with TCHAR_TO_ANSI, because here the StringCast() and the Get() have been separated. But this is why using StringCast is preferable to TCHAR_TO_ANSI, because you are capable of getting back to safety again.

So basically, the pointer returned from Get() is valid only while the object on you called Get() still exists.

As to what is the ‘right way’ for your legacy API, I can’t say, because I don’t know if your API tried to take ownership of those strings or not. Assuming that your functions only read from strings passed to them (possibly to make their own copy) but don’t store those pointers, then what I’ve said here should apply.

Finally, I suggest reading up on C++ temporaries, object lifetimes, ownership and RAII in order to get a handle on this stuff. It is dangerous to use C++ pointers in the same way as Java/C# references.

Hope this helps,

Steve

Fixed the problem by going through a string

std::string sfilename(TCHAR_TO_UTF8(mEditableObject[i]->FileWithPath));
char
filenamewithpath = &sfilename[0];

You said you had a problem with StringCast, but the code you posted doesn’t use StringCast. StringCast will work fine when written like this:

auto Conv = StringCast<ANSICHAR>(*mEditableObject[i]->FileWithPath);
const char* filenamewithpath = Conv.Get();

I’ve used ANSICHAR here, since you mentioned ANSI originally, but then used TCHAR_TO_UTF8 here. StringCast doesn’t yet support converting to UTF8.

Steve

Thanks Steve.
This is indeed the exact code I’ve used first. Work fine until 128 chars. Above 128 chars, the filenamewithpath pointer is pointing to a bad memory aera.

I don’t believe it. :slight_smile: If you have time, please reconfirm. filenamewithpath should be legal as long as Conv is in scope.

Steve