Potential problems with current localization process from translator's POV

I took a look at localization process of UE4 and right now it has several issues that may get in the way of localization process.

1. .archive file lacks context
Previous engines kept translations in int files that provided a lot of context because of their structure: all lines related to certain things (gametype, certain menu, mutator) were grouped together, with class name working as a heading. Variable name provided additional information about the string context.
Archive file as of now contains “source: translation” pairs and Namespaces. Namespaces alone may not provide enough context, especially with 1-3 word strings that can have multiple translations depending on the context. Another problem is that lines contained within Blueprints could be bundled together with no namespace provided.
Why context is critical:

  1. Developers may have little time to answer.
    Localization is usually done in the last 3-4 months of the game development process, meaning developers are hard-pressed with finalizing the features and polishing the game.
  2. It takes time for the question to reach the developer and the answer to get back.
    For most games localization into languages other than the game’s original language is handled by the publisher. They deal either with localization agencies or with the local publisher, who works with local localization studios. This means that there are at least two managers between the developer and translator, and it takes time to go through that and back again, and these managers may be located in different time zones.
  3. Lockits are translated blindly.
    In 95-98% cases translator has no access to the build and may not know enough about the game, even if it’s a sequel to an existing game.

2. Same translation for the same source within namespace
This is a potential source of problems with lines with the same source that require different translation because of the target language grammar. For example is the game would use dropdown menus for detail settings (Highest/High/Medium) these lines would be the same for all options in English. In Russian however these might be different depending on the option name.
This is even more critical, because from what I saw so far, strings from blueprints don’t have a namespace set.

3. JSON is not the best format for editing
While JSON format is human-readable, it requires a lot actions to navigate between the translatable bits of text. The resulting file is not the best option for projects with lots of text. I know that developers can or 3rd party can convert it to another format, but it would be much easier if engine could produce a file that can be easily edited by anyone right away, e.g. .cvs file. This can also be solution to duplicate source line within the namespace, as this format could also contain the key for each string.

Hi Lynx!

Thanks for your feedback and concern! I think you’ll be happy to know that our intentions actually align quite well. The misunderstanding here is that translators are not supposed to modify the archives/manifests directly. These are formats specifically designed for storage, not editing, and thus focus on things such as driving consistency and procedural maintainability. Though the format is obtuse for translators it solves a lot of synchronous issues with keeping a large amount of duplicate data in sync.

1. Developers should be exporting their translations so translators can use a tool properly designed to enable their workflows. Things such as translation memory, spell checking, commenting/qa and glossary support are essential translator features that we don’t have the means to duplicate inside the unreal ecosystem. Here at Epic we currently export our translations to “.po” and import them into a free translation editing service called OneSky.

Right now it’s a bit obtuse managing your localization data as you have to run lots of command line scripts, but we currently have a “localization dashboard” in progress (can be enabled under the experimental section in your editor settings) which dramatically simplifies the gather, import, export process. We are also working on a localization plugin api where different developers can write unreal plugins to have the engine interface with their systems. We are hoping to get a OneSky plugin out soon.

**2. ** Homographs can be resolved on a case-by-case bases by adjusting the namespace for the required text. Namespaces help to reduce data bloat, translation costs, and technical issues such as performance and memory savings in-game.

3. As stated in step one, users should be editing the archive files by hand for all the reasons you’ve specified. They weren’t designed to be used this way. Users should export their game’s translations and work on them in a proper translation editing tool.

I hope this clears up some concerns you have regarding our intentions. I look forward to you feedback as some of our new features, such as the localization dashboard and the localization plugins come online.

Hi Sarge,

I went on to look into the dashboard, looks really great and I hope the plugin support will resolve the context issue.

As for homographs - I’m afraid this might be a real issue. I understand why it was done, but reused strings is one of the greatest localizations headaches because the developers always think from the standpoint of their native language, and rarely think of other languages. To apply the mentioned workaround, developer has to know about possible problem with the target language and alter namespaces before generating the lockit. There’s a high chance that developers will never think about it as they are not familiar with the target language and usually have a lot of other problem on their minds. Translators, on other hand, will never know there was an option to have multiple translations for these lines or that this line is used in several places, since generated lockit will have just a single line. And from my localization manager experience, it’s really hard to get something done specifically for one language, unless the option is there right from the start (or it causes real big problems). So yes, while this approach does reduce the translation costs, memory usage, etc., it also negatively affects localization quality.