Split `es` locale into several variations and migrate existing user translations to `es-ES` #22982

diox · 2025-01-10T14:07:23Z

Context

As part of streamlining our locales we're splitting es into several variations. The Pontoon team have asked us to copy over the existing locale to the new ones and they'll sort them (either as actual translations or suggestions) in Pontoon. That makes the diff for this PR rather large and scary since I've copied the existing .po file into several new ones. You can diff them with the es one to make sure I haven't messed it up and then ignore them.

Testing

Locally, you need to either bypass addons-frontend in nginx.conf (by replacing the try_files $uri @frontendamo by try_files $uri @olympia) or build a custom addons-frontend image with mozilla/addons-frontend#13401 otherwise it won't let you use the new locales, as in local development we're proxying requests through addons-frontend instead of fully defining routes in nginx.

The es locale should no exist anymore, and be replaced by the new variants.

Migrations

This change contains 3 migrations that will affect hundreds of thousands of rows in dev/stage/prod. I've tested them locally with 700k+ rows and the biggest one took less than 10 seconds.

src/olympia/addons/migrations/0054_update_default_locale_es_to_es-es.py

src/olympia/addons/tests/test_models.py

src/olympia/amo/tests/test_amo_utils.py

KevinMind · 2025-01-22T16:46:08Z

src/olympia/amo/tests/test_utils.py

@@ -161,7 +161,7 @@ def test_multiple_objects_with_multiple_translations(self):
        assert set(addon2.translations[addon2.name_id]) == (
            {
                ('en-us', 'English 2 Name'),


Oh god how I wish we had enums for locales...

KevinMind · 2025-01-22T16:47:04Z

src/olympia/api/fields.py

@@ -279,6 +279,7 @@ def fetch_single_translation(self, obj, field, requested_language):
        translations = self.fetch_all_translations(obj, field) or {}
        locale = None
        value = None
+        requested_language = to_language(requested_language)


It's not clear why this change is here.. could you add some context?

This was a bug in the way we fetch translations coming from ES, it did not convert the requested language like we do when the translations are coming from the database.

It was revealed by the tests when I changed them from es to es-ES - it would only occur when we're fetching a language with a "regional" variant (xx-XX) that is not the default locale for the add-on.

Should we have an explicit test for this edge case then 1) to be damn sure it works and 2) to document the edge case itself.

I dug a bit more into what was happening here:

The broken test was TestESTranslationSerializerField.test_attach_translations_target_name(), which is meant to verify how we attach the translation dict of data coming from Elasticsearch to an object using ESTranslationSerializerField.attach_translations(). The test attaches the translations then verifies that the dummy Addon object has both a correct <field>_translations and a <field> containing the
translation in the current locale.

The <field>_translations dict was correct, only that second part was broken. The reason we never noticed is that it's not actively used in our ESAddonSerializer right now: when we perform a search and call the serializer, we call attach_translations() but only really use the first part of what it does, setting <field>_translations. Even when a lang parameter is passed and we return a single translation, we still only use that <field>_translations dict.

Why does that second part exist then ? The way our ESAddonSerializer works, it creates fake Addon (and Version, Preview, License etc instances) but then shares most of its code with the AddonSerializer that deals with database instances. Setting the translated fields correctly ensures we don't accidentally trigger a database query or return bad data if some piece of code somewhere ends up directly using the field.

Ultimately the test was previously incorrect as it was setting up bad data - the translations dict keys were incorrect. I'll add some comments and make the test more explicit.

49ce11d adds even more tests/docs.

src/olympia/translations/management/commands/process_translations.py

KevinMind · 2025-01-22T16:51:40Z

locale/es_AR/LC_MESSAGES/django.po

+#
+msgid ""
+msgstr ""
+"Project-Id-Version: messages\n"


Is the rationale that it is easier for translators to modify the original spanish translation to variants than to start from scratch?

Also.. it's weird.. it's like did we support the different spanish variants already or is this introducing them. On the one hand, you're adding the locale files ... so they weren't there.. but onb the other, there was already reference to those locales ... can you explain that?

There were no references to the variants before, I'm adding them. Translator teams have expressed different preferences regarding what to do with the files, but the conclusion we reached when talking to @mathjazz (Pontoon lead) was to "seed" the new locales by copying the old file over and they'd sort them out in pontoon later.

Got it. Probably worth dumping that in the "context" on the PR for posterity.

Got it. Probably worth dumping that in the "context" on the PR for posterity.

FYI there was a reference to the language but I guess not sufficient enough to trigger creating a translation file for that locale.

KevinMind · 2025-01-24T09:17:01Z

@diox how did you create 700k+ rows? Even better, you can zip your database and send it to me by running

make data_dump ARGS="--name pr-22982"

Then zip the directory and send it to me. I can then load that directly on my instance and boom!

diox · 2025-01-24T11:48:11Z

I'll generate the dump, but for the record, I essentially did:

translations = []
for x in range(0, 700000):
    t = Translation.new('Lorem ipsum dolor sit amet, erat graece accusata eum te', 'es')
    translations.append(t)
Translation.objects.bulk_create(translations, batch_size=1000)

(Translation.new() is a custom hack in addons-server used behind the scenes when creating translations "the normal way", that is there to maintain the separate id sequence table. I've tried to reproduce that behavior here while still being efficient with a bulk_create() after, but I haven't checked if it completely works as expected in very aspect, I was only interested in creating garbage data)

diox · 2025-01-24T16:00:54Z

Updated, and on top of providing the snippet I used to generate extra data locally I've shared my database dump privately on slack.

diox force-pushed the create-es-locales-folders branch from b0db161 to 60ef7a2 Compare January 10, 2025 14:13

diox marked this pull request as draft January 10, 2025 16:50

diox mentioned this pull request Jan 20, 2025

Split es locale into several variations mozilla/addons-frontend#13401

Open

diox changed the title ~~Add new locales folders for the various es variants~~ Split es locale into several variations and migrate existing user translations to es-ES Jan 20, 2025

diox marked this pull request as ready for review January 21, 2025 12:13

diox requested review from a team and KevinMind and removed request for a team January 21, 2025 12:41

diox added 9 commits January 21, 2025 13:42

Add new locales folders for the various es variants

9a22c64

Update

fd1cdc3

Directly enable the new variants

d9c238a

Test updates / cleanups

6ec3adc

More test updates

681b7e4

More test updates

6a6fd58

Reformat

27d23a1

Better fix

f5ab217

Migrations

cf0de0a

diox force-pushed the create-es-locales-folders branch from 35a6260 to cf0de0a Compare January 21, 2025 12:42