Meta description: Staring at empty Django .po files after makemessages? Here's what machine translation is, where it fails, and how to use it safely.
You run python manage.py makemessages -l es -l de, open locale/de/LC_MESSAGES/django.po, and get the usual wall of this:
msgid "Save"
msgstr ""
msgid "Welcome back, %(name)s"
msgstr ""
msgid "<strong>Upgrade</strong> your plan"
msgstr ""
That's the point where many teams fall into one of three bad workflows. Copy and paste strings into a web translator. Hand the file to a freelancer and wait. Or export the whole thing into a TMS, click through another UI, and hope the .po file that comes back still compiles.
For a Django developer, what is machine translation isn't an abstract AI topic. It's the answer to a very specific problem. You need valid msgstr entries, you need them fast, and you need placeholders, HTML, and plural forms to survive the trip.
The Problem You Already Know
The pain isn't “localization is hard” in the abstract. It's that your release is ready except for strings.
A feature branch adds a few dozen labels, validation errors, and email lines. makemessages updates your catalogs. Now you have empty translations scattered across:
locale/es/LC_MESSAGES/django.po
locale/de/LC_MESSAGES/django.po
locale/fr/LC_MESSAGES/django.po
You can fill them by hand, but that's where breakage starts. A translator drops %(name)s. HTML tags get reordered. A short UI label like Save gets translated with the wrong sense because there's no context. If you've ever had to fix invalid .po output right before deploy, you already know that translation quality isn't only about wording. It's also about file integrity.
If you want a broad, non-Django primer first, this machine translation explained article covers the general concept well. For software teams, the harder part is accuracy under real product constraints, and that's where this review of the accuracy of Google Translate is worth reading with a more skeptical eye.
Practical rule: In app localization, a translation that reads well but breaks placeholders is worse than a stiff translation that compiles.
A Brief History of Translation Machines
Machine translation didn't start with today's polished API responses. It started with a public demo in January 1954, when the Georgetown-IBM experiment translated 60 Russian sentences into English using 6 grammar rules and a 250-word vocabulary, a milestone widely treated as the symbolic start of modern MT (history overview).
That early optimism didn't last. The 1966 ALPAC report concluded that machine translation was slower, less accurate, and twice as expensive as human translation, which cut U.S. funding for years. Later, statistical systems rose in the 1980s and 1990s, and by 2016 Google had adopted neural machine translation, marking the current era of larger data and neural networks (historical sequence).
Rules first, then probability, then neural models
The core transition is clear. Rules-based MT encoded grammar and lexicons by hand. Statistical MT learned phrase correspondences from parallel text. Neural MT uses deep networks to model sequencing and context, which generally leads to more fluent output and better handling of idioms and long-range dependencies (RWS overview).
For a developer, each stage maps to a familiar failure mode:
- Rules-based systems followed instructions well, but sounded rigid.
- Statistical systems improved phrase choice, but often lost sentence-level coherence.
- Neural systems read better, but they can still hallucinate nuance, mishandle terminology, or drift when context is weak.
Machine translation approaches compared
| Approach | Core Idea | Strength | Weakness |
|---|---|---|---|
| Rules-based MT | Hand-written grammar rules and dictionaries | Predictable structure | Brittle, expensive to maintain, poor fluency |
| Statistical MT | Learns likely phrase mappings from bilingual text | Better than pure word substitution | Weak long-range context, awkward phrasing |
| Neural MT | Deep networks model context and token sequences | More fluent output, better idioms and context | Still depends heavily on training data and terminology control |
The practical takeaway is boring but true. Newer systems are better, not magical. They got better because the method changed, the training data got larger, and compute got cheaper.
Fluency improved faster than trustworthiness. That's why review still matters.
How Modern Translation APIs Actually Work
From your side, it looks like a POST request with a source string and a target language. Under the hood, modern systems behave more like probability engines than dictionaries.

IBM's framing is the one that matters in production: machine translation is a stochastic NLP mapping problem. The model learns a probabilistic function from source sentences to target sentences, not a deterministic word swap. That's why the same source can produce different outputs depending on context and sampling, and why high-stakes content still needs post-editing (IBM on machine translation).
What happens after you send the string
A typical translation API pipeline looks like this:
- Your app sends text plus source and target language hints.
- The provider tokenizes the input into smaller units.
- The model builds an internal representation from those tokens.
- The decoder predicts the target sequence token by token.
- Post-processing reassembles formatting, spacing, and output text.
- You get back a string that looks final, even when it isn't.
That model behavior explains a lot of day-to-day bugs. Send Save changes to your billing profile, and the engine has enough context to choose a decent translation. Send only Save, and you've handed it an ambiguity problem.
Dedicated translation engines versus general LLMs
For Django workflows, you usually end up choosing between two categories.
| Option | Best at | Trade-off |
|---|---|---|
| Dedicated translation API | High-volume translation, stable latency, language-pair tuning | Less flexible prompt control |
| General LLM | Custom instructions, glossary-like prompting, mixed translation plus rewriting | More variable output, more care needed around formatting |
If you work on product copy outside app strings, the same prompt-control trade-offs show up in adjacent tasks. This guide to AI copywriting for retail is useful because it highlights where generation and strict consistency pull in opposite directions. For app localization, consistency wins more often than creativity.
A lot of teams also blur “LLM translation” and “neural machine translation” into one bucket. That hides real workflow differences, so it helps to separate them. This neural machine translation overview is a good reference if you want the boundary between those terms explained more clearly.
Engineering heuristic: The shorter the source string, the more metadata you need around it.
Where Machine Translation Fails in a Django App
A 200 OK from the provider tells you the network path worked. It tells you nothing about whether the translation is usable.
Current industry guidance is pretty consistent on one point. Neural MT is the standard now, but performance still varies sharply by training data and language pair, and low-resource or specialized domains remain harder. AWS also notes that gains are uneven, especially when content is culturally nuanced or terminology-heavy, which maps directly to software strings with placeholders, HTML, and glossary terms (AWS on machine translation).
Placeholder damage
This is the first thing I check in a review diff. If placeholders move or disappear, your app breaks or renders junk.
Bad output often looks like this:
msgid "Welcome back, %(name)s"
msgstr "Willkommen zurück"
msgid "You have %s unread messages"
msgstr "Sie haben ungelesene Nachrichten"
msgid "File {0} was uploaded"
msgstr "Datei wurde hochgeladen"
Safer output preserves the formatting tokens exactly:
msgid "Welcome back, %(name)s"
msgstr "Willkommen zurück, %(name)s"
msgid "You have %s unread messages"
msgstr "Sie haben %s ungelesene Nachrichten"
msgid "File {0} was uploaded"
msgstr "Datei {0} wurde hochgeladen"
Django won't forgive sloppy format handling. If the source uses named interpolation, the translation needs the same named interpolation.
HTML and inline markup breakage
The next failure class is markup corruption.
msgid "<strong>Upgrade</strong> your plan"
msgstr "<strong>Aktualisieren Sie</strong> Ihren Plan"
That one is valid. This one is not:
msgid "<strong>Upgrade</strong> your plan"
msgstr "<b>Aktualisieren Sie Ihren Plan</strong>"
One changed tag and one mismatched close tag is enough to turn a harmless copy update into a rendering bug.
A quick explainer on the broader quality gap is useful before you automate anything heavily:
Context-free UI strings
Short strings are where confidence gets dangerous.
Consider this catalog:
msgctxt "button label"
msgid "Save"
msgstr ""
msgctxt "rescue action"
msgid "Save"
msgstr ""
If you don't use context with pgettext, both entries collapse into the same English source and the model has to guess. The same problem shows up with Open, Close, Apply, Archive, Charge, and Post.
Some other common traps:
- Plural rules: English source structure doesn't map cleanly to every target language.
- Gender agreement: Romance languages often need surrounding context the source string doesn't provide.
- Terminology drift: The model translates your product term one way in settings, another way in emails.
- Mixed content: Strings with variables, links, and punctuation invite reordering errors.
Don't judge MT quality on long paragraphs alone. Judge it on ugly little UI strings with no context, because that's where apps fail.
A Practical Workflow for Translating Your .po Files
You don't need a novel process here. You need a repeatable one that stays inside the repo.
Start with the Django tools you already use
Keep extraction boring:
python manage.py makemessages -l es -l de
That updates the catalog files under the usual layout:
locale/es/LC_MESSAGES/django.po
locale/de/LC_MESSAGES/django.po
Use gettext_lazy in code, use pgettext when a short string is ambiguous, and keep source strings stable unless you want to invalidate existing translations. Renaming English copy casually creates work for every locale.
Translate only what needs translation
The useful pattern is not “retranslate the whole app every time.” It's:
- find untranslated entries
- optionally revisit
fuzzyentries - preserve placeholders and tags
- write back into the same
.pofiles - review the diff in Git
That keeps translation under version control and makes bad output visible.
A realistic .po fragment before translation might look like this:
#: billing/templates/billing/portal.html:18
msgid "Manage your subscription"
msgstr ""
#: accounts/forms.py:42
#, python-format
msgid "Welcome back, %(name)s"
msgstr ""
#: templates/dashboard/empty.html:9
msgid "<strong>No projects</strong> yet"
msgstr ""
After a first pass, you want valid output in place, not a detached export from another system.
Review like code, not like content
A translation diff deserves the same discipline as any generated change set.
- Check placeholders first:
%(name)s,%s, and{0}must survive unchanged. - Scan contexts: if you use
msgctxt, make sure the output matches the intended meaning. - Watch fuzzy entries: they're useful hints, not approvals.
- Compile before merge: catch file-level issues before they hit production.
Then finish the normal Django cycle:
python manage.py compilemessages
If your team wants a command-line-first flow for this, that's where TranslateBot fits. It's built for Django projects that want to translate .po files without leaving the repo, while preserving placeholders and HTML and keeping review in Git.
Review rule: Never approve translation output from the rendered language alone. Also inspect the raw
.podiff.
Automating Translation in Your CI Pipeline
If you only run translation manually, it drifts. New strings pile up, release pressure rises, and localization turns into a batch job nobody wants to own.
The better pattern is to make first-pass translation part of the same pipeline that builds your app.

What to automate
A useful CI workflow usually does four things:
- run
makemessages - translate only new or changed entries
- run checks, including
compilemessages - commit or attach the updated locale diff for review
You don't need full autonomy on day one. A pull request that contains generated translations is already a major improvement over ad hoc copy-paste.
Here's the shape of a GitHub Actions workflow:
name: Translate locale files
on:
workflow_dispatch:
pull_request:
branches: [main]
jobs:
translate:
runs-on: ubuntu-latest
steps:
- name: Check out code
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Install gettext
run: sudo apt-get update && sudo apt-get install -y gettext
- name: Install dependencies
run: pip install -r requirements.txt
- name: Extract messages
run: python manage.py makemessages -l de -l es
- name: Translate messages
run: python manage.py translate --target-lang de
- name: Compile messages
run: python manage.py compilemessages
Privacy and consistency
Automation changes the risk profile, so be explicit about what you send to third-party providers.
- Source sensitivity: UI strings may contain internal product names or unreleased features.
- Provider variance: different engines won't produce identical terminology.
- Glossary control: without shared instructions, CI can bake inconsistency into every pull request.
If you want a concrete implementation path, the CI usage docs for TranslateBot show how to wire the translation step into an automated pipeline cleanly.
The point isn't to remove review. It's to move review to the right place. Engineers review diffs. Product or localization reviewers spot semantic issues. The pipeline handles the repetitive part every single time.
If you want a Django-native way to translate .po files from the command line, preserve placeholders and HTML, and keep everything reviewable in Git, TranslateBot is worth a look. It fits the makemessages to compilemessages workflow you already have, instead of pushing your team into another portal.