Friday release. compilemessages is green, the French catalog is loaded, and the first production request blows up on %(count)d. A translator changed the token, the code still expects it, and now only one locale crashes. Another team ships a missing </strong> in a marketing string and spends the afternoon tracing a layout bug that never appeared in English.
That failure mode is common because gettext validation is narrow. It checks whether catalogs compile. It does not prove your translated app still renders, formats variables correctly, preserves markup, or covers the strings your code calls. Translation tests close that gap by treating locale files as executable inputs to the app, not as static content.
Human review still matters for tone, intent, and terminology. CI should catch the breakage that machines are good at catching. Placeholder mismatches, dropped HTML, bad plural branches, and untranslated strings belong in the same pipeline that runs your unit and integration tests. If you already care about localization in your testing workflow, the practical step is to make these checks copy-pasteable and wire them into one CI gate instead of scattering them across ad hoc scripts.
The examples below do that. Each check is small enough to adopt on its own, but the main benefit comes from running them together, on every change, before a .po file reaches production.
1. Unit test for placeholder integrity
Broken placeholders are the fastest way to ship a working .po file that still crashes at runtime.
Django projects usually mix formatting styles over time. You’ll see old %s, named mappings like %(name)s, and sometimes brace-style placeholders in app strings or model content. Your test should treat them as syntax, not text.

What to assert
Read every .po file, detect placeholder tokens in msgid and msgstr, and fail if they don’t match. I prefer token matching over executing every format string, because execution gets messy across mixed styles.
from pathlib import Path
import polib
import re
from django.test import SimpleTestCase
PERCENT_NAMED_RE = re.compile(r"%\(([^)]+)\)[#0 +\-]?\d*(?:\.\d+)?[diouxXeEfFgGcrs]")
PERCENT_POSITIONAL_RE = re.compile(r"(?<!%)%(?:[#0 +\-]?\d*(?:\.\d+)?)?[diouxXeEfFgGcrs]")
BRACE_RE = re.compile(r"\{([a-zA-Z0-9_]+)\}")
def extract_tokens(text: str) -> dict[str, set[str]]:
return {
"percent_named": set(PERCENT_NAMED_RE.findall(text or "")),
"percent_positional": set(PERCENT_POSITIONAL_RE.findall(text or "")),
"brace": set(BRACE_RE.findall(text or "")),
}
class TranslationPlaceholderTests(SimpleTestCase):
def test_placeholders_match_msgid(self):
for po_path in Path("locale").glob("*/LC_MESSAGES/django.po"):
po = polib.pofile(po_path)
for entry in po:
if entry.obsolete:
continue
if not entry.msgstr:
continue
src = extract_tokens(entry.msgid)
dst = extract_tokens(entry.msgstr)
self.assertEqual(
src, dst,
msg=f"{po_path}: placeholder mismatch for msgid {entry.msgid!r}"
)
That catches the common production break: translator drops %(name)s or rewrites %s as plain text.
Practical rule: if placeholders differ, fail the build. Don’t try to auto-fix them in CI.
If you want a deeper walkthrough on app-level localization checks, TranslateBot’s post on localization in testing is aligned with the same idea: test the things that break code, not just wording.
2. Check for untranslated strings
Partial localization is worse than teams admit. One missed string in a billing flow makes the app feel unfinished fast.
There are cases where empty msgstr is fine during development. In a release branch, it usually isn’t. If your policy is “production locales must be complete,” encode that policy as a test.
A release gate that’s actually useful
from pathlib import Path
import polib
from django.test import SimpleTestCase
REQUIRED_LOCALES = {"fr", "de", "es"}
class UntranslatedStringTests(SimpleTestCase):
def test_required_locales_have_no_empty_msgstr(self):
for locale in REQUIRED_LOCALES:
po_path = Path("locale") / locale / "LC_MESSAGES" / "django.po"
po = polib.pofile(po_path)
missing = []
for entry in po:
if entry.obsolete:
continue
if entry.msgid_plural:
forms = [s for _, s in sorted(entry.msgstr_plural.items())]
if any(not s.strip() for s in forms):
missing.append(entry.msgid)
elif not entry.msgstr.strip():
missing.append(entry.msgid)
self.assertFalse(
missing,
msg=f"{po_path} has untranslated entries: {missing[:10]!r}"
)
This doesn’t judge translation quality. It enforces completeness.
That distinction matters. Existing translation test literature leans hard toward human evaluation and gives very little guidance for automated CI quality gates around locale-file integrity, placeholder safety, and regression checks in developer workflows, as discussed in Altalang’s overview of translation testing gaps.
Use that gap to your advantage. Define a policy your team can live with.
- Production locales only: Fail for
frandde, ignore experimental locales. - Release branches only: Allow partial translations on feature branches.
- Review fuzzy entries manually: Django fuzzy flags need a team rule, not guesswork.
3. Run a pseudo-localization build
Pseudo-localization finds UI breakage that real translation often hides until late.
You don’t need French or Japanese to catch clipped buttons, hard-coded English, or layouts that explode when strings get longer. You need a fake locale that exaggerates expansion and makes untranslated text obvious.
A tiny pseudo-localizer
ACCENT_MAP = str.maketrans({
"a": "à", "e": "ë", "i": "ï", "o": "ô", "u": "ü",
"A": "À", "E": "Ë", "I": "Ï", "O": "Ô", "U": "Ü",
})
def pseudolocalize(text: str) -> str:
translated = text.translate(ACCENT_MAP)
return f"[{translated} ~~~~]"
You can wire that into a script that rewrites msgstr values for a dedicated test locale, then run browser screenshots or Playwright checks against it.
from pathlib import Path
import polib
for po_path in Path("locale").glob("qps-ploc/LC_MESSAGES/django.po"):
po = polib.pofile(po_path)
for entry in po:
if entry.msgid and not entry.obsolete:
entry.msgstr = pseudolocalize(entry.msgid)
po.save()
Then boot your app with that locale enabled and click through the pages that usually break:
- Nav bars: labels get truncated first.
- Forms: submit buttons and validation messages overflow.
- Tables: fixed-width columns collapse fast.
- Emails: inline styles and spacing often assume English length.
The trade-off is obvious. Pseudo-localization won’t catch grammar, tone, or locale-specific agreement. It catches rendering failures and missing i18n coverage. That’s still worth a lot.
Pseudo-localization is a UI test, not a translation quality test. Treat it that way.
4. Validate HTML tag preservation
Strings with markup are where “looks fine in the .po diff” turns into broken DOM.
AI systems and human translators both get into trouble here. Professional translation testing frameworks explicitly include curveballs like broken XML or HTML tags, formatting-only segments, and mixed-language content because these are real failure modes in production pipelines, as described in Loc’d and Loaded’s translation testing examples.

Compare tag structure, not full text
You don’t need a browser in CI for this. Parse fragments and compare tag sequences plus attributes you care about.
from html.parser import HTMLParser
from pathlib import Path
import polib
from django.test import SimpleTestCase
class TagCollector(HTMLParser):
def __init__(self):
super().__init__()
self.tags = []
def handle_starttag(self, tag, attrs):
self.tags.append(("start", tag))
def handle_endtag(self, tag):
self.tags.append(("end", tag))
def extract_tags(text: str):
parser = TagCollector()
parser.feed(text or "")
return parser.tags
class HtmlPreservationTests(SimpleTestCase):
def test_html_tag_sequence_matches(self):
for po_path in Path("locale").glob("*/LC_MESSAGES/django.po"):
po = polib.pofile(po_path)
for entry in po:
if entry.obsolete or not entry.msgstr:
continue
if "<" not in entry.msgid and "<" not in entry.msgstr:
continue
self.assertEqual(
extract_tags(entry.msgid),
extract_tags(entry.msgstr),
msg=f"{po_path}: HTML tag mismatch for {entry.msgid!r}",
)
That catches missing closing tags, moved tags, and wrappers that weren’t in the source string.
What doesn’t work is relying on compilemessages alone. It won’t tell you that:
msgid "Click <strong>here</strong> to continue"
msgstr "Cliquez <strong>ici pour continuer"
That file can still compile. Your page can still break.
5. Test against a glossary with TRANSLATING.md
Terminology drift makes apps feel inconsistent even when nothing crashes.
One screen says “Workspace.” Another says “Project.” A billing page switches between “plan” and “subscription.” Human reviewers notice it late, and AI providers may vary by language pair and domain context. In The ARF’s advertising localization case study, model behavior diverged by language, and Hindi output showed meaningful differences in terminology handling and detail retention across systems in their translation comparison.
Make the glossary executable
If your repo already has TRANSLATING.md, stop treating it like documentation only. Parse it and enforce it.
Example TRANSLATING.md snippet:
## Glossary
- Company => Société
- Workspace => Espace de travail
- Billing => Facturation
Test code:
from pathlib import Path
import polib
import re
from django.test import SimpleTestCase
GLOSSARY = {
"Company": "Société",
"Workspace": "Espace de travail",
"Billing": "Facturation",
}
class GlossaryTests(SimpleTestCase):
def test_glossary_terms_are_respected_in_french(self):
po_path = Path("locale/fr/LC_MESSAGES/django.po")
po = polib.pofile(po_path)
violations = []
for entry in po:
if entry.obsolete or not entry.msgstr:
continue
for source_term, target_term in GLOSSARY.items():
if source_term in entry.msgid and target_term not in entry.msgstr:
violations.append((entry.msgid, entry.msgstr, target_term))
self.assertFalse(violations, msg=f"Glossary violations: {violations[:10]!r}")
This is blunt, and that’s fine. CI checks should be blunt.
For branded language and awkward business terms, a versioned glossary beats trying to “prompt better” every release. TranslateBot’s write-up on business jargon translation hits the same operational point: terminology belongs in source control, not in somebody’s memory.
“Consistent wording” sounds editorial until your support docs, UI, and invoices all use different nouns.
6. Assert correct pluralization forms
Plural bugs don’t always throw exceptions. They just produce nonsense.
Django will happily work with plural-aware entries, but your locale file still has to match the language header. If the header says one thing and entries provide the wrong count of msgstr[n] forms, you’ll get bad output at runtime and waste time debugging the wrong layer.
Trust the header, then verify every plural entry
from pathlib import Path
import polib
import re
from django.test import SimpleTestCase
NPLURALS_RE = re.compile(r"nplurals\s*=\s*(\d+)")
def get_nplurals(po):
metadata = po.metadata.get("Plural-Forms", "")
match = NPLURALS_RE.search(metadata)
if not match:
raise AssertionError("Missing or invalid Plural-Forms header")
return int(match.group(1))
class PluralFormTests(SimpleTestCase):
def test_plural_entries_match_declared_count(self):
for po_path in Path("locale").glob("*/LC_MESSAGES/django.po"):
po = polib.pofile(po_path)
expected = get_nplurals(po)
for entry in po:
if entry.obsolete or not entry.msgid_plural:
continue
actual = len(entry.msgstr_plural)
self.assertEqual(
expected,
actual,
msg=f"{po_path}: {entry.msgid!r} expected {expected} plural forms, found {actual}",
)
A real example of the failure:
msgid "%(count)s file deleted"
msgid_plural "%(count)s files deleted"
msgstr[0] "%(count)s fichier supprimé"
If that locale expects more plural forms than you provided, the file is incomplete even if your reviewer approves the wording.
The broader point is quality measurement needs rigor. Raw metric differences can mislead if they aren’t statistically validated. The Lingvanex summary of significance testing notes that BLEU improvements need to clear a meaningful threshold before they count as statistically significant, and COMET differences can still fall within noise even when the raw gap looks large in their discussion of translation evaluation significance. In app localization, plural-form correctness is a better release gate than vanity metrics.
7. Integrate manage.py translate with a dry run
If you’re auto-translating in CI, dry runs stop bad surprises before files change.
That includes bad prompts, bad diffs, and bad bills. I don’t like translation jobs that mutate locale files as the first step. Preview first, then write.
Put dry run before the real translation step
python manage.py makemessages -l fr
python manage.py translate --target-lang fr --dry-run
python manage.py compilemessages
For TranslateBot specifically, keep the dry run in CI and fail fast if the preview shows work you didn’t expect. The point isn’t to trust the provider. It’s to catch accidental extraction churn before merge. Their CI docs for running translation commands in automation fit that workflow.
You can wrap it in a smoke test:
import subprocess
from django.test import SimpleTestCase
class TranslateCommandTests(SimpleTestCase):
def test_translate_dry_run_exits_cleanly(self):
result = subprocess.run(
["python", "manage.py", "translate", "--target-lang", "fr", "--dry-run"],
capture_output=True,
text=True,
)
self.assertEqual(result.returncode, 0, msg=result.stderr or result.stdout)
That won’t validate semantic quality. It validates pipeline behavior, credentials, command wiring, and basic command health.
Use dry run when:
- New strings land late: you want preview before mutating tracked files.
- Provider settings changed: model swaps can alter output patterns.
- You suspect duplicate work: changed extraction can trigger unnecessary translations.
What doesn’t work is running auto-translation as a black-box post-merge job and hoping review catches everything.
7-Point Translation Test Comparison
| Item | Implementation Complexity 🔄 | Resource / Speed ⚡ | Expected Outcomes 📊 | Ideal Use Cases 💡 | Key Advantages ⭐ |
|---|---|---|---|---|---|
| 1. Unit Test for Placeholder Integrity | Low, simple pytest loop validating .mo formatting | Minimal, runs in milliseconds within test suite | Catches missing/corrupted placeholders before runtime | Any project using formatted translations in CI | Prevents runtime KeyError/ValueError and build-breakers |
| 2. Check for Untranslated Strings | Low, grep or small parser over .po files | Minimal, fast text scan before compilemessages | Detects empty msgstr entries to avoid fallback to source | Projects enforcing no-partial-translation releases | Ensures complete locale coverage and professional UI |
| 3. Run a Pseudo-Localization Build | Medium, script to generate/compile qps-ploc and run app | Moderate, requires running app to visually inspect layouts | Reveals UI overflow, hard-coded strings, and layout breaks | UI-heavy apps and early design validation before real TMs | Finds internationalization UI bugs cheaply before translation |
| 4. Validate HTML Tag Preservation | Medium, parse and compare tag trees (lxml/BS4) | Moderate, parsing overhead per string set | Prevents broken markup and potential XSS vectors | Any strings that include HTML or markup | Protects visual integrity and reduces security risk |
| 5. Test Against a Glossary with TRANSLATING.md | Medium, scan .po files and match against glossary rules | Minimal, text comparisons in CI | Enforces consistent terminology and brand voice | Projects with strict terminology/brand requirements | Keeps translations consistent and reviewable in PRs |
| 6. Assert Correct Pluralization Forms | Low, map nplurals per locale and validate headers | Minimal, header checks and count assertions | Avoids grammatical errors and pluralization runtime issues | Languages with multiple plural forms (e.g., Polish) | Prevents incorrect grammar and edge-case failures |
7. Integrate manage.py translate with Dry Run |
Medium, CI step invoking TranslateBot dry-run and parsing output | Variable, contacts AI provider but does not write files | Provides cost estimates and a preview of changes | Teams using paid automated translation services | Controls translation cost and prevents unexpected bills |
A complete CI gate for your translations
A bad locale file should fail the build before it reaches review. The practical setup is one CI gate that starts with message extraction, runs the cheap structural checks first, and saves provider calls and app-level validation for the end.
Run makemessages at the start so CI evaluates the exact strings introduced in the branch, not whatever happened to be committed last week. Then execute the fast checks in one test target: placeholder integrity, untranslated strings, HTML tag preservation, glossary rules from TRANSLATING.md, and plural-form validation. Those tests are local, deterministic, and cheap. They should be the first thing that turns the pipeline red.
After that, switch to workflow checks that reflect how translations enter the repo. If you use manage.py translate, run the dry run first and fail on suspicious output before any file changes land in the workspace. Then run the actual translation step, compile with compilemessages, and treat compilation errors as merge blockers. For UI-heavy apps, build and test against the pseudo-localized variant after compilation. That is the stage that catches clipped buttons, hard-coded English, and layout regressions that unit tests will never see.
A GitHub Actions job for this usually looks like:
- run
makemessages - run the Python translation test suite
- run
manage.py translate --dry-run - run the actual translation command if the dry run passes
- run
compilemessages - run app or UI tests against the localized or pseudo-localized build
That order is what keeps the pipeline usable. Placeholder mismatches and broken tags should fail in seconds. Network-bound translation steps should run only after the local checks pass. Otherwise every typo in a .po file burns CI time and, depending on the provider, money.
Human review still matters. As noted earlier, professional translation depends on context, audience, and intent. CI handles the mechanical failures: broken interpolation tokens, missing plural entries, malformed markup, glossary drift, and compile errors. Review handles tone, ambiguity, and whether the string is right for the feature.
If locale files already live in Git, the rest is straightforward. Treat .po files like source code, keep the tests next to the app, and make the pipeline enforce the full path from extraction to compilation to app validation. That turns the seven examples in this article into one implementable gate instead of seven disconnected ideas.
If you want to automate the translation step itself, TranslateBot fits the Django flow used here: makemessages, translate .po files in place, review the diff, then compilemessages. It keeps the translation step inside CI instead of pushing developers into a separate portal.