Localization - entries with identical source text

anonymous_user_5cc732ea · July 20, 2016, 2:53pm

I have a small experience with Loc in UE3, but I’m having some trouble understanding some important characteristics of the Loc in UE4.

Mostly this part: (from Localization Overview for Unreal Engine | Unreal Engine 5.2 Documentation)

“Because entries in archives do not have keys, all entries from a manifest sharing the same source within a namespace are collapsed into a single archive entry; if text only differs by key, it is assumed they are superficially identical and will use the same translation.”

Seems like a strong assumption to me!

Let’s consider this example dialog:

Has the night fallen?
Yes, it has.
Has the cat returned?
Yes, it has.

Typically, cues 2 & 4 would have a different translated string (e.g. in French).
But with the current workflow, they’re collapsed in .manifest, .archive, and .po files (even though all paths/contexts are mentioned in .manifest and .po)

I understand having 2 different namespaces would be a workaround, but it seems really artificial to have different namespaces for 2 cues within a single dialog…

The way I see it, source text repetition is not a problem in itself, it can be handled by the translation side of the workflow, in any translation tool.

I feel there should at least be a collapse/don’t collapse flag for GatherText commandLets?

So for example, I’d set it to “don’t collapse” for my VO/dialog target(s), while I might keep it to “collapse” for some targets like “Maps” or “UI” where it might make sense to eliminate repetitions “so soon” in the workflow, if we absolutely want to reduce wordcounts at this point (might, because I still feel it’s a risk to collapse)

NB: of course that would have impacts on the Loc dashboard e.g. the translation window where translate & context windows would be “merged” for “don’t collapse” case…

Jamie_Dale · July 21, 2016, 5:21pm

Thanks for the feedback.

We’re already in the process of making changes to the localisation gathering to no longer perform this de-duplication, and instead export all text with a unique identity (namespace + key) as separate entries in the PO file.

This makes the lives of translators easier as they can perform context specific translations, like the one you mentioned above, without any back-and-forth with content creators to ensure the text has a unique namespace. They can then rely on the translation memory of their translation tools to handle any automatic translations of identical text.

This work should be part of 4.14.

SlayerGoury · April 20, 2021, 8:46am

I was about to ask what should I do reenable that deduplication (because some times you need it).
But then I’ve found the solution myself.
So I’ll put it here in case if someone will be looking for it.
Because search providers are getting this answer when I was looking for it’s polar opposite.

The way to deduplicate a phrase is to use string tables