-
Notifications
You must be signed in to change notification settings - Fork 535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split es
locale into several variations and migrate existing user translations to es-ES
#22982
base: master
Are you sure you want to change the base?
Conversation
b0db161
to
60ef7a2
Compare
es
locale into several variations and migrate existing user translations to es-ES
35a6260
to
cf0de0a
Compare
@@ -161,7 +161,7 @@ def test_multiple_objects_with_multiple_translations(self): | |||
assert set(addon2.translations[addon2.name_id]) == ( | |||
{ | |||
('en-us', 'English 2 Name'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh god how I wish we had enums for locales...
@@ -279,6 +279,7 @@ def fetch_single_translation(self, obj, field, requested_language): | |||
translations = self.fetch_all_translations(obj, field) or {} | |||
locale = None | |||
value = None | |||
requested_language = to_language(requested_language) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear why this change is here.. could you add some context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a bug in the way we fetch translations coming from ES, it did not convert the requested language like we do when the translations are coming from the database.
It was revealed by the tests when I changed them from es
to es-ES
- it would only occur when we're fetching a language with a "regional" variant (xx-XX) that is not the default locale for the add-on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have an explicit test for this edge case then 1) to be damn sure it works and 2) to document the edge case itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dug a bit more into what was happening here:
The broken test was TestESTranslationSerializerField.test_attach_translations_target_name()
, which is meant to verify how we attach the translation dict of data coming from Elasticsearch to an object using ESTranslationSerializerField.attach_translations()
. The test attaches the translations then verifies that the dummy Addon object has both a correct <field>_translations
and a <field>
containing the
translation in the current locale.
The <field>_translations
dict was correct, only that second part was broken. The reason we never noticed is that it's not actively used in our ESAddonSerializer
right now: when we perform a search and call the serializer, we call attach_translations()
but only really use the first part of what it does, setting <field>_translations
. Even when a lang
parameter is passed and we return a single translation, we still only use that <field>_translations
dict.
Why does that second part exist then ? The way our ESAddonSerializer
works, it creates fake Addon
(and Version
, Preview
, License
etc instances) but then shares most of its code with the AddonSerializer
that deals with database instances. Setting the translated fields correctly ensures we don't accidentally trigger a database query or return bad data if some piece of code somewhere ends up directly using the field.
Ultimately the test was previously incorrect as it was setting up bad data - the translations dict keys were incorrect. I'll add some comments and make the test more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
49ce11d adds even more tests/docs.
src/olympia/translations/management/commands/process_translations.py
Outdated
Show resolved
Hide resolved
# | ||
msgid "" | ||
msgstr "" | ||
"Project-Id-Version: messages\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the rationale that it is easier for translators to modify the original spanish translation to variants than to start from scratch?
Also.. it's weird.. it's like did we support the different spanish variants already or is this introducing them. On the one hand, you're adding the locale files ... so they weren't there.. but onb the other, there was already reference to those locales ... can you explain that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were no references to the variants before, I'm adding them. Translator teams have expressed different preferences regarding what to do with the files, but the conclusion we reached when talking to @mathjazz (Pontoon lead) was to "seed" the new locales by copying the old file over and they'd sort them out in pontoon later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Probably worth dumping that in the "context" on the PR for posterity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Probably worth dumping that in the "context" on the PR for posterity.
FYI there was a reference to the language but I guess not sufficient enough to trigger creating a translation file for that locale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@diox how did you create 700k+ rows? Even better, you can zip your database and send it to me by running make data_dump ARGS="--name pr-22982" Then zip the directory and send it to me. I can then load that directly on my instance and boom! |
I'll generate the dump, but for the record, I essentially did: translations = []
for x in range(0, 700000):
t = Translation.new('Lorem ipsum dolor sit amet, erat graece accusata eum te', 'es')
translations.append(t)
Translation.objects.bulk_create(translations, batch_size=1000) ( |
Updated, and on top of providing the snippet I used to generate extra data locally I've shared my database dump privately on slack. |
See mozilla/addons#15232
Context
As part of streamlining our locales we're splitting
es
into several variations. The Pontoon team have asked us to copy over the existing locale to the new ones and they'll sort them (either as actual translations or suggestions) in Pontoon. That makes the diff for this PR rather large and scary since I've copied the existing.po
file into several new ones. You can diff them with thees
one to make sure I haven't messed it up and then ignore them.Testing
Locally, you need to either bypass addons-frontend in
nginx.conf
(by replacing thetry_files $uri @frontendamo
bytry_files $uri @olympia
) or build a custom addons-frontend image with mozilla/addons-frontend#13401 otherwise it won't let you use the new locales, as in local development we're proxying requests through addons-frontend instead of fully defining routes in nginx.The
es
locale should no exist anymore, and be replaced by the new variants.Migrations
This change contains 3 migrations that will affect hundreds of thousands of rows in dev/stage/prod. I've tested them locally with 700k+ rows and the biggest one took less than 10 seconds.