Back to blog

Master Django Translation Tests Examples for CI Success

2026-05-06 12 min read
Master Django Translation Tests Examples for CI Success

Friday release. compilemessages is green, the French catalog is loaded, and the first production request blows up on %(count)d. A translator changed the token, the code still expects it, and now only one locale crashes. Another team ships a missing </strong> in a marketing string and spends the afternoon tracing a layout bug that never appeared in English.

That failure mode is common because gettext validation is narrow. It checks whether catalogs compile. It does not prove your translated app still renders, formats variables correctly, preserves markup, or covers the strings your code calls. Translation tests close that gap by treating locale files as executable inputs to the app, not as static content.

Human review still matters for tone, intent, and terminology. CI should catch the breakage that machines are good at catching. Placeholder mismatches, dropped HTML, bad plural branches, and untranslated strings belong in the same pipeline that runs your unit and integration tests. If you already care about localization in your testing workflow, the practical step is to make these checks copy-pasteable and wire them into one CI gate instead of scattering them across ad hoc scripts.

The examples below do that. Each check is small enough to adopt on its own, but the main benefit comes from running them together, on every change, before a .po file reaches production.

1. Unit test for placeholder integrity

Broken placeholders are the fastest way to ship a working .po file that still crashes at runtime.

Django projects usually mix formatting styles over time. You’ll see old %s, named mappings like %(name)s, and sometimes brace-style placeholders in app strings or model content. Your test should treat them as syntax, not text.

A diagram illustrating string formatting types using puzzle pieces labeled with programming syntax for localization.

What to assert

Read every .po file, detect placeholder tokens in msgid and msgstr, and fail if they don’t match. I prefer token matching over executing every format string, because execution gets messy across mixed styles.

from pathlib import Path
import polib
import re

from django.test import SimpleTestCase

PERCENT_NAMED_RE = re.compile(r"%\(([^)]+)\)[#0 +\-]?\d*(?:\.\d+)?[diouxXeEfFgGcrs]")
PERCENT_POSITIONAL_RE = re.compile(r"(?<!%)%(?:[#0 +\-]?\d*(?:\.\d+)?)?[diouxXeEfFgGcrs]")
BRACE_RE = re.compile(r"\{([a-zA-Z0-9_]+)\}")

def extract_tokens(text: str) -> dict[str, set[str]]:
    return {
        "percent_named": set(PERCENT_NAMED_RE.findall(text or "")),
        "percent_positional": set(PERCENT_POSITIONAL_RE.findall(text or "")),
        "brace": set(BRACE_RE.findall(text or "")),
    }

class TranslationPlaceholderTests(SimpleTestCase):
    def test_placeholders_match_msgid(self):
        for po_path in Path("locale").glob("*/LC_MESSAGES/django.po"):
            po = polib.pofile(po_path)
            for entry in po:
                if entry.obsolete:
                    continue
                if not entry.msgstr:
                    continue

                src = extract_tokens(entry.msgid)
                dst = extract_tokens(entry.msgstr)

                self.assertEqual(
                    src, dst,
                    msg=f"{po_path}: placeholder mismatch for msgid {entry.msgid!r}"
                )

That catches the common production break: translator drops %(name)s or rewrites %s as plain text.

Practical rule: if placeholders differ, fail the build. Don’t try to auto-fix them in CI.

If you want a deeper walkthrough on app-level localization checks, TranslateBot’s post on localization in testing is aligned with the same idea: test the things that break code, not just wording.

2. Check for untranslated strings

Partial localization is worse than teams admit. One missed string in a billing flow makes the app feel unfinished fast.

There are cases where empty msgstr is fine during development. In a release branch, it usually isn’t. If your policy is “production locales must be complete,” encode that policy as a test.

A release gate that’s actually useful

from pathlib import Path
import polib

from django.test import SimpleTestCase

REQUIRED_LOCALES = {"fr", "de", "es"}

class UntranslatedStringTests(SimpleTestCase):
    def test_required_locales_have_no_empty_msgstr(self):
        for locale in REQUIRED_LOCALES:
            po_path = Path("locale") / locale / "LC_MESSAGES" / "django.po"
            po = polib.pofile(po_path)

            missing = []
            for entry in po:
                if entry.obsolete:
                    continue
                if entry.msgid_plural:
                    forms = [s for _, s in sorted(entry.msgstr_plural.items())]
                    if any(not s.strip() for s in forms):
                        missing.append(entry.msgid)
                elif not entry.msgstr.strip():
                    missing.append(entry.msgid)

            self.assertFalse(
                missing,
                msg=f"{po_path} has untranslated entries: {missing[:10]!r}"
            )

This doesn’t judge translation quality. It enforces completeness.

That distinction matters. Existing translation test literature leans hard toward human evaluation and gives very little guidance for automated CI quality gates around locale-file integrity, placeholder safety, and regression checks in developer workflows, as discussed in Altalang’s overview of translation testing gaps.

Use that gap to your advantage. Define a policy your team can live with.

3. Run a pseudo-localization build

Pseudo-localization finds UI breakage that real translation often hides until late.

You don’t need French or Japanese to catch clipped buttons, hard-coded English, or layouts that explode when strings get longer. You need a fake locale that exaggerates expansion and makes untranslated text obvious.

A tiny pseudo-localizer

ACCENT_MAP = str.maketrans({
    "a": "à", "e": "ë", "i": "ï", "o": "ô", "u": "ü",
    "A": "À", "E": "Ë", "I": "Ï", "O": "Ô", "U": "Ü",
})

def pseudolocalize(text: str) -> str:
    translated = text.translate(ACCENT_MAP)
    return f"[{translated} ~~~~]"

You can wire that into a script that rewrites msgstr values for a dedicated test locale, then run browser screenshots or Playwright checks against it.

from pathlib import Path
import polib

for po_path in Path("locale").glob("qps-ploc/LC_MESSAGES/django.po"):
    po = polib.pofile(po_path)
    for entry in po:
        if entry.msgid and not entry.obsolete:
            entry.msgstr = pseudolocalize(entry.msgid)
    po.save()

Then boot your app with that locale enabled and click through the pages that usually break:

The trade-off is obvious. Pseudo-localization won’t catch grammar, tone, or locale-specific agreement. It catches rendering failures and missing i18n coverage. That’s still worth a lot.

Pseudo-localization is a UI test, not a translation quality test. Treat it that way.

4. Validate HTML tag preservation

Strings with markup are where “looks fine in the .po diff” turns into broken DOM.

AI systems and human translators both get into trouble here. Professional translation testing frameworks explicitly include curveballs like broken XML or HTML tags, formatting-only segments, and mixed-language content because these are real failure modes in production pipelines, as described in Loc’d and Loaded’s translation testing examples.

A hand-drawn diagram illustrating correct and incorrect HTML nesting structures within a paragraph tag.

Compare tag structure, not full text

You don’t need a browser in CI for this. Parse fragments and compare tag sequences plus attributes you care about.

from html.parser import HTMLParser
from pathlib import Path
import polib

from django.test import SimpleTestCase

class TagCollector(HTMLParser):
    def __init__(self):
        super().__init__()
        self.tags = []

    def handle_starttag(self, tag, attrs):
        self.tags.append(("start", tag))

    def handle_endtag(self, tag):
        self.tags.append(("end", tag))

def extract_tags(text: str):
    parser = TagCollector()
    parser.feed(text or "")
    return parser.tags

class HtmlPreservationTests(SimpleTestCase):
    def test_html_tag_sequence_matches(self):
        for po_path in Path("locale").glob("*/LC_MESSAGES/django.po"):
            po = polib.pofile(po_path)
            for entry in po:
                if entry.obsolete or not entry.msgstr:
                    continue
                if "<" not in entry.msgid and "<" not in entry.msgstr:
                    continue

                self.assertEqual(
                    extract_tags(entry.msgid),
                    extract_tags(entry.msgstr),
                    msg=f"{po_path}: HTML tag mismatch for {entry.msgid!r}",
                )

That catches missing closing tags, moved tags, and wrappers that weren’t in the source string.

What doesn’t work is relying on compilemessages alone. It won’t tell you that:

msgid "Click <strong>here</strong> to continue"
msgstr "Cliquez <strong>ici pour continuer"

That file can still compile. Your page can still break.

5. Test against a glossary with TRANSLATING.md

Terminology drift makes apps feel inconsistent even when nothing crashes.

One screen says “Workspace.” Another says “Project.” A billing page switches between “plan” and “subscription.” Human reviewers notice it late, and AI providers may vary by language pair and domain context. In The ARF’s advertising localization case study, model behavior diverged by language, and Hindi output showed meaningful differences in terminology handling and detail retention across systems in their translation comparison.

Make the glossary executable

If your repo already has TRANSLATING.md, stop treating it like documentation only. Parse it and enforce it.

Example TRANSLATING.md snippet:

## Glossary

- Company => Société
- Workspace => Espace de travail
- Billing => Facturation

Test code:

from pathlib import Path
import polib
import re

from django.test import SimpleTestCase

GLOSSARY = {
    "Company": "Société",
    "Workspace": "Espace de travail",
    "Billing": "Facturation",
}

class GlossaryTests(SimpleTestCase):
    def test_glossary_terms_are_respected_in_french(self):
        po_path = Path("locale/fr/LC_MESSAGES/django.po")
        po = polib.pofile(po_path)

        violations = []
        for entry in po:
            if entry.obsolete or not entry.msgstr:
                continue

            for source_term, target_term in GLOSSARY.items():
                if source_term in entry.msgid and target_term not in entry.msgstr:
                    violations.append((entry.msgid, entry.msgstr, target_term))

        self.assertFalse(violations, msg=f"Glossary violations: {violations[:10]!r}")

This is blunt, and that’s fine. CI checks should be blunt.

For branded language and awkward business terms, a versioned glossary beats trying to “prompt better” every release. TranslateBot’s write-up on business jargon translation hits the same operational point: terminology belongs in source control, not in somebody’s memory.

“Consistent wording” sounds editorial until your support docs, UI, and invoices all use different nouns.

6. Assert correct pluralization forms

Plural bugs don’t always throw exceptions. They just produce nonsense.

Django will happily work with plural-aware entries, but your locale file still has to match the language header. If the header says one thing and entries provide the wrong count of msgstr[n] forms, you’ll get bad output at runtime and waste time debugging the wrong layer.

Trust the header, then verify every plural entry

from pathlib import Path
import polib
import re

from django.test import SimpleTestCase

NPLURALS_RE = re.compile(r"nplurals\s*=\s*(\d+)")

def get_nplurals(po):
    metadata = po.metadata.get("Plural-Forms", "")
    match = NPLURALS_RE.search(metadata)
    if not match:
        raise AssertionError("Missing or invalid Plural-Forms header")
    return int(match.group(1))

class PluralFormTests(SimpleTestCase):
    def test_plural_entries_match_declared_count(self):
        for po_path in Path("locale").glob("*/LC_MESSAGES/django.po"):
            po = polib.pofile(po_path)
            expected = get_nplurals(po)

            for entry in po:
                if entry.obsolete or not entry.msgid_plural:
                    continue

                actual = len(entry.msgstr_plural)
                self.assertEqual(
                    expected,
                    actual,
                    msg=f"{po_path}: {entry.msgid!r} expected {expected} plural forms, found {actual}",
                )

A real example of the failure:

msgid "%(count)s file deleted"
msgid_plural "%(count)s files deleted"
msgstr[0] "%(count)s fichier supprimé"

If that locale expects more plural forms than you provided, the file is incomplete even if your reviewer approves the wording.

The broader point is quality measurement needs rigor. Raw metric differences can mislead if they aren’t statistically validated. The Lingvanex summary of significance testing notes that BLEU improvements need to clear a meaningful threshold before they count as statistically significant, and COMET differences can still fall within noise even when the raw gap looks large in their discussion of translation evaluation significance. In app localization, plural-form correctness is a better release gate than vanity metrics.

7. Integrate manage.py translate with a dry run

If you’re auto-translating in CI, dry runs stop bad surprises before files change.

That includes bad prompts, bad diffs, and bad bills. I don’t like translation jobs that mutate locale files as the first step. Preview first, then write.

Put dry run before the real translation step

python manage.py makemessages -l fr
python manage.py translate --target-lang fr --dry-run
python manage.py compilemessages

For TranslateBot specifically, keep the dry run in CI and fail fast if the preview shows work you didn’t expect. The point isn’t to trust the provider. It’s to catch accidental extraction churn before merge. Their CI docs for running translation commands in automation fit that workflow.

You can wrap it in a smoke test:

import subprocess

from django.test import SimpleTestCase

class TranslateCommandTests(SimpleTestCase):
    def test_translate_dry_run_exits_cleanly(self):
        result = subprocess.run(
            ["python", "manage.py", "translate", "--target-lang", "fr", "--dry-run"],
            capture_output=True,
            text=True,
        )
        self.assertEqual(result.returncode, 0, msg=result.stderr or result.stdout)

That won’t validate semantic quality. It validates pipeline behavior, credentials, command wiring, and basic command health.

Use dry run when:

What doesn’t work is running auto-translation as a black-box post-merge job and hoping review catches everything.

7-Point Translation Test Comparison

Item Implementation Complexity 🔄 Resource / Speed ⚡ Expected Outcomes 📊 Ideal Use Cases 💡 Key Advantages ⭐
1. Unit Test for Placeholder Integrity Low, simple pytest loop validating .mo formatting Minimal, runs in milliseconds within test suite Catches missing/corrupted placeholders before runtime Any project using formatted translations in CI Prevents runtime KeyError/ValueError and build-breakers
2. Check for Untranslated Strings Low, grep or small parser over .po files Minimal, fast text scan before compilemessages Detects empty msgstr entries to avoid fallback to source Projects enforcing no-partial-translation releases Ensures complete locale coverage and professional UI
3. Run a Pseudo-Localization Build Medium, script to generate/compile qps-ploc and run app Moderate, requires running app to visually inspect layouts Reveals UI overflow, hard-coded strings, and layout breaks UI-heavy apps and early design validation before real TMs Finds internationalization UI bugs cheaply before translation
4. Validate HTML Tag Preservation Medium, parse and compare tag trees (lxml/BS4) Moderate, parsing overhead per string set Prevents broken markup and potential XSS vectors Any strings that include HTML or markup Protects visual integrity and reduces security risk
5. Test Against a Glossary with TRANSLATING.md Medium, scan .po files and match against glossary rules Minimal, text comparisons in CI Enforces consistent terminology and brand voice Projects with strict terminology/brand requirements Keeps translations consistent and reviewable in PRs
6. Assert Correct Pluralization Forms Low, map nplurals per locale and validate headers Minimal, header checks and count assertions Avoids grammatical errors and pluralization runtime issues Languages with multiple plural forms (e.g., Polish) Prevents incorrect grammar and edge-case failures
7. Integrate manage.py translate with Dry Run Medium, CI step invoking TranslateBot dry-run and parsing output Variable, contacts AI provider but does not write files Provides cost estimates and a preview of changes Teams using paid automated translation services Controls translation cost and prevents unexpected bills

A complete CI gate for your translations

A bad locale file should fail the build before it reaches review. The practical setup is one CI gate that starts with message extraction, runs the cheap structural checks first, and saves provider calls and app-level validation for the end.

Run makemessages at the start so CI evaluates the exact strings introduced in the branch, not whatever happened to be committed last week. Then execute the fast checks in one test target: placeholder integrity, untranslated strings, HTML tag preservation, glossary rules from TRANSLATING.md, and plural-form validation. Those tests are local, deterministic, and cheap. They should be the first thing that turns the pipeline red.

After that, switch to workflow checks that reflect how translations enter the repo. If you use manage.py translate, run the dry run first and fail on suspicious output before any file changes land in the workspace. Then run the actual translation step, compile with compilemessages, and treat compilation errors as merge blockers. For UI-heavy apps, build and test against the pseudo-localized variant after compilation. That is the stage that catches clipped buttons, hard-coded English, and layout regressions that unit tests will never see.

A GitHub Actions job for this usually looks like:

That order is what keeps the pipeline usable. Placeholder mismatches and broken tags should fail in seconds. Network-bound translation steps should run only after the local checks pass. Otherwise every typo in a .po file burns CI time and, depending on the provider, money.

Human review still matters. As noted earlier, professional translation depends on context, audience, and intent. CI handles the mechanical failures: broken interpolation tokens, missing plural entries, malformed markup, glossary drift, and compile errors. Review handles tone, ambiguity, and whether the string is right for the feature.

If locale files already live in Git, the rest is straightforward. Treat .po files like source code, keep the tests next to the app, and make the pipeline enforce the full path from extraction to compilation to app validation. That turns the seven examples in this article into one implementable gate instead of seven disconnected ideas.

If you want to automate the translation step itself, TranslateBot fits the Django flow used here: makemessages, translate .po files in place, review the diff, then compilemessages. It keeps the translation step inside CI instead of pushing developers into a separate portal.

Stop editing .po files manually

TranslateBot automates Django translations with AI. One command, all your languages, pennies per translation.