Slicer translation workflow using crowdin

lassoan · January 24, 2022, 7:30pm

During the project week we explored crowding and various options for implementing a translation workflow. I’ve read up on various documentations and see how other projects do this and tested a couple of approaches and came up with the followings.

Requirements

We should minimize the burden of translation on developers. Developers just need to mark strings in their software as translatable. just by adding tr() in C++ and _() in Python should be enough. Developers are not the same people as translators: even if a developer can translate to 1-2 languages, most of the work relies on community effort.
Updated translations must be provided for released Slicer Stable versions, as users should not wait for the next stable release (ideally available in 3 months, but may be up to 6-12 months).
Translators should be able to immediately test their translations, preferably with an offline option, but definitely without committing updates into version controlled repositories - to reduce noise in repositories (too many notifications, hard-to-read change log, etc.).
Translators should be able to start translating new/modified translatable strings soon after developers introduce them.
Crowdin should only add convenience but must remain replaceable to allow private custom applications and extensions to be translated without requiring crowdin (which is only free for open projects).
Extension developers should be able to test translatable string extraction quickly. It should not require building Slicer or the modules, because extension developers who only have scripted modules do not have it. It also makes it easier to set up quick automatic translation workflows in continuous integration if building is not needed. Preferably only Python should be required, installing Slicer and some extension should be OK (that is already needed for development and testing). Installing CMake and Qt may be tolerable, but preferably should be avoided.

Proposed implementation:

Create a separate repository: SlicerLanguagePacks
- Stores .ts files (synchronized with crowdin’s github integration)
- Contains an extension: LanguagePacks
  - Offers a module for helping translators (so that they can test out everything without waiting for nightly build actions and offline): update ts template files, download updated ts files, generate qm files, install qm files immediately.
- Extension package contains compiled qm files of all extensions. Users can install latest translations via the extensions manager the same way as any other extensions. Later extension auto-update can ensure latest translations are installed.
During nightly builds on factory:
- Extract translatable strings into source .ts files from Slicer core and all extensions
- Commit source .ts files into SlicerLanguagePacks repository
- Trigger crowdin synchronization (may just wait for the automatic synchronization): this uploads the updated source .ts files from SlicerLanguagePacks to Crowdin and creates a git pull request to update localized .ts files in SlicerLanguagePacks
- Merge Crowdin’s SlicerLanguagePacks pull request
- Compile localized .ts files are compiled into .qm files
- Include Slicer core qm files into Slicer installer package. The qm files included in the Slicer core install package will be overridden by translations downloaded via SlicerLanguagePacks extension.
- Include latest Slicer core qm files and extensions qm files into SlicerLanguagePacks extension.

Rationale

Extension name: LanguagePacks (instead of Languages, Translations, Internationalization, Localization) because language pack is strongly suggests that it is a software addons to specify additional spoken languages. Languages could refer to programming languages, Translations could refer to various transformations (spatial, file formats, etc.). It is also easier to search for information (in google, discourse, etc.) with a more specific term as language pack. Internationalization and Localization are technical terms that most users would not know what refers to in this context.
Slicer module replaces the need for continuous integration (e.g., immediately compiling ts to qm file on the server). Continuous integration could continuously look for changes in all extensions, extract translatable strings, and upload to crowdin; but it might not be easy to watch hundreds of repositories and also it could generate lots of commits, while it would still not be real-time enough.
Extensions and Slicer core is updated nightly, so updating qm files more frequently than once a day has limited use (it may be useful for Python scripted modules only). Translators for all languages are not all immediately available, but probably require at least a couple of days. So, if translation can only start the next day (not on the day of the new strings creation) is not a significant delay.
Testing out translation updates in real-time requires downloading from crowdin, compiling to qm, and updating the qm files in the install tree
Qm files are bundled in the install package to make the application self-contained. If we want to allow language selection in the installer or in the application (e.g., popup when the application is first started) then we would already need localized text. Also, users may not find it easy to download language packs separately from the application and then install it in the same location as the application.
There is no need to store source .ts files in Slicer core’s repository, as these files can be easily generated anytime. Also, we would not be able to store these .ts files for extensions in each extension anyway (that would require assistance from the extension developer), so it is easier to store all source .ts files in the SlicerLanguagePacks repository.

Questions:

How to translate Python scripted module’s .py files and CLI module descriptor XML files?
- Option A. Pre-process the .py and .xml files to create intermediate files that Qt’s lupdate tool can parse
- Option B: Generate .ts files directly. For example, there are tools that can extract text from _() function calls in Python files. We could probably create an XSL transform that would create a .ts file from the module descriptor .xml file directly (very similarly to how .md files are generated from these files).
How to organize the .ts file and folder structure (what to put in a .ts file, how to organize .ts files into folders)?
- Option A: Create a separate .ts for each module. There are 150 modules in Slicer core and there are about 200 extensions, each containing typically a couple of modules. A new folder can be added for each extension. Challenges: There would be many files and the file list would change as modules are added/removed/renamed. It is not always clear which module provides text for a specific message on the GUI. It is slightly more complicated to generate per-module .ts files without building the application (creating .ts files without requiring building the application is useful, as it makes continuous/on-demand translation updates much simpler).
- Option B: Create a single .ts file for Slicer core, and a separate .ts file for each extension. The Slicer core translation file is about 700KB, which is not too big. Challenges: Slicer core uses about 2600 terms, so it cannot be browsed, only filtered/searched. It is not possible to see percentage completion per module or meaningfully use Crowdin’s priority feature (you can specify low/normal/high priority for each file).
- Option C: Have a mix between options A and B - modules could be organized into categories (such as module categories) to have a few dozens of files for Slicer core instead of 150. It would probably still make sense to have a single .ts file per extension.

Any comments and suggestions are welcome. I also hope we can discuss this tomorrow at the Slicer Weekly meeting.

Fernando · January 24, 2022, 11:27pm

Nice! I just “suggested 44 translations into Spanish”.

cpinter · January 25, 2022, 9:15am

Amazing summary @lassoan! I like the plan.

Questions:

Option B seems like a good choice for XMLs because as you said we already have a proven way of processing them. The question is how easy it is for Python. It’s not hard to write a script but maybe there is a simpler option. By the way above you write _() and here _t() for marking translatable strings. I did a quick search but didn’t find it. Which one is correct? Where does it come from?
Again I prefer Option B, for the simple fact that in that case there would be no duplicate terms, thus the same term is guaranteed to be translated the same way in each module/panel/etc. Also there are parts of Slicer that are not related to modules (main window, extension manager, python window) which could be hard to organize (they could be called Base but probably there are such components outside the Base folder). I don’t consider the lack of browsability an issue - people are used to searching I guess.

lassoan · January 25, 2022, 3:29pm

Thanks for the feedback.

That was just a typo, I’ve fixed it now.

lassoan · March 9, 2022, 8:43pm

Just a quick update: we are switching over from Crowdin to Weblate. See more information about what motivated the move here. The architecture remains the same as in the image above.

Topic		Replies	Views
SlicerLanguagePacks: New extension for translating user interface of Slicer to various languages Support feature , internationalization	3	1167	November 23, 2022
Some words in the user interface cannot be translated Feature requests internationalization	2	494	November 23, 2022
Converting Slicer into Spanish Support	4	405	March 17, 2021
Slicer Internationalization Development	4	3596	January 30, 2019
Modifying translation files Development internationalization	6	344	September 16, 2022

Slicer translation workflow using crowdin

Related topics