A Developer's Guide to Every Major Subtitle File Format

You've embedded a video in your app. Now you need to add subtitles and have landed in the confusing world of subtitle file formats. You'll run into a mess of acronyms, from the dead-simple SRT to the complex TTML.

Picking the right format has real consequences for your development workflow, platform compatibility, and user experience. For basic captions, a simple SRT file is often enough. But for most modern web projects, you'll want the styling and advanced features that come with WebVTT.

Choosing The Right Subtitle File Format

If you're building a multilingual app, you know internationalization is more than just translating strings in .po files. When video content enters the picture, providing accurate, well-formatted subtitles is essential. But with formats like SRT, VTT, ASS, and TTML all competing, it's hard to know where to start.

This guide is a practical reference manual. We’ll break down the structure, capabilities, and common use cases for each major format to help you make an informed decision for your project.

We can group most formats into a few distinct categories:

Plain-text formats: Simple, universal, and easy to edit by hand. Think of these as the .txt files of the subtitle world.
Advanced styling formats: For when you need precise control over color, position, and font, common in anime fansubbing and creative projects.
XML-based formats: The heavy-duty standard for professional broadcasting, OTT streaming, and complex delivery workflows.
Image-based formats: A totally different approach common on physical media like DVDs and Blu-rays, where subtitles are rendered as images.

This decision tree helps visualize the most common paths. For the vast majority of web projects, your choice will boil down to WebVTT for its rich features or SRT for its simplicity.

A decision tree flowchart illustrating subtitle format choices, guiding users to VTT for web projects or SRT for basic captions.

The context is everything. The needs of a simple marketing video on a landing page are worlds apart from a full-featured media player app.

Quick Reference Subtitle Format Cheat Sheet

To make a quick decision, this table breaks down the most common formats, their primary use cases, and their capabilities at a glance.

Format	Primary Use Case	Styling Support	Complexity
SRT	Universal simple captions, offline media	None	Very Low
WebVTT	Modern web video (HTML5 `<track>`)	Good (CSS-based)	Low
SSA/ASS	Advanced creative styling, fansubbing	Excellent	Medium
TTML/DFXP	Professional broadcasting, streaming	Excellent (XML)	High

This should give you a starting point. For web developers, the choice is almost always between SRT and WebVTT. For everyone else, the decision gets more complicated.

Key Considerations For Your Project

Choosing a format isn't just about features; it’s about your entire workflow. Think about the tools you use, the platforms you're targeting, and the technical skills of the people who will be creating or editing these files.

One of the most common pitfalls is character encoding. A subtitle file that looks perfect on your machine can show up as a screen full of gibberish (’ÂÊù) on a user's device if the encoding is wrong. This happens all the time.

The fix is simple: always save your files with UTF-8 encoding. This single step prevents a huge class of problems, especially when you're dealing with languages that use non-Latin characters. If you want to understand why this is so critical, check out our post on why you should use UTF-8 everywhere. By understanding the trade-offs upfront, you can avoid these common headaches and build a better video experience.

Plain Text Formats: SRT and WebVTT

When you need to get subtitles onto a video, you're almost certainly going to start with a plain-text format. These files are simple, you can write them by hand in any text editor, and they work almost everywhere. For any web developer, there are really only two you need to know: SubRip (.srt) and WebVTT (.vtt).

SubRip (.srt) is the undisputed king of compatibility for a reason. Its greatest strength is its simplicity. An SRT file is just a text file with a rigid structure that nearly every video player on the planet, from VLC to YouTube, understands immediately. This universal support makes it a bulletproof choice for offline media or any time you can't control the playback environment.

The Anatomy Of An SRT File

The structure of an SRT file is completely straightforward. It’s a sequence of numbered blocks, and each block has three parts:

A sequential number: A counter for each subtitle cue, starting from 1.
The timecode: The start and end time for the text to appear, formatted as hours:minutes:seconds,milliseconds. The two times are separated by -->.
The subtitle text: The actual caption to be displayed, which can span one or two lines.

After the text, a single blank line signals the end of that cue and the start of the next. It’s that simple.

Here’s what a minimal SRT file looks like. You can create and edit this with any plain text editor.

1 00:00:03,450 --> 00:00:05,250 This is the first subtitle.

2 00:00:06,000 --> 00:00:08,750 It appears for a few seconds.

The big limitation with SRT is styling. While some players might recognize basic HTML tags like  or , support is wildly inconsistent. You can't rely on it for a consistent look, and you have virtually no control over positioning or color.

WebVTT: The Modern Successor

This is where WebVTT (.vtt) comes in. Short for Web Video Text Tracks, it was designed from the ground up for HTML5 video. It’s the official standard for the <track> element in all modern browsers, making it the default choice for any web-based project.

At first glance, a VTT file looks a lot like an SRT. But it brings some key improvements. It uses a period (.) for milliseconds in timestamps instead of a comma and makes the sequential numbers optional.

Every VTT file must start with a WEBVTT header on the first line. It also officially supports comments (prefixed with NOTE), which players ignore but are useful for leaving instructions for translators or notes for other developers.

WEBVTT

NOTE This is a comment and will not be displayed.

00:00:03.450 --> 00:00:05.250 This is a VTT subtitle.

00:00:06.000 --> 00:00:08.750 It supports more advanced features.

The real power of WebVTT is in its styling and positioning. It lets you use a CSS-like syntax right inside the file to control exactly how and where subtitles appear. You can define styles for position, alignment, color, and much more.

For example, you can precisely align text, change its vertical position, and even apply specific styles to parts of a sentence.

WEBVTT

00:00:10.500 --> 00:00:13.000 line:10% position:50% align:middle This text will be vertically centered.

00:00:15.000 --> 00:00:18.000 You can also apply styles <c.highlight>like this</c.highlight>.

For any new web project, you should default to WebVTT. Its native integration with HTML5, reliable styling capabilities, and metadata support make it a far more powerful and flexible choice than SRT for building modern, accessible video experiences in your apps. Only fall back to SRT when you absolutely need maximum compatibility with older, non-web players.

When SRT or even WebVTT's styling options aren't enough, you'll find yourself in the world of SubStation Alpha (.ssa) and its more capable successor, Advanced SubStation Alpha (.ass). There's a reason this format is the undisputed king in the anime fansubbing community: it offers precise, per-line control over nearly every visual aspect of the text.

If your project demands custom fonts, sizes, colors, outlines, shadows, or the ability to place text anywhere on screen with pixel-perfect accuracy, ASS is the tool for the job. While it's less common for corporate video or standard web content, it's indispensable for creative work where subtitles are an integral part of the artistic expression.

A side-by-side comparison of SRT and VTT subtitle file formats, showing their different structures.

The Structure Of An ASS File

At its core, an ASS file is plain text, but it's far more structured than an SRT file. It's organized into distinct sections, each marked with a header in brackets. The three you'll work with most are [Script Info], [V4+ Styles], and [Events].

[Script Info]: Think of this as the metadata header for the entire subtitle script. It holds details like the script's author, the video resolution it was timed against (PlayResX and PlayResY), and rules for handling style conflicts.
[V4+ Styles]: This is where the real magic happens. Here, you define named styles that can be applied to any subtitle line. Each Style: line is a long list of comma-separated values controlling everything from font name and size to primary/secondary colors, borders, shadows, and alignment.
[Events]: This section contains the actual subtitle cues. Each Dialogue: line specifies a layer, start time, end time, the style to use from the [V4+ Styles] section, and the text content itself.

Here’s a simplified look at how these pieces fit together in an actual .ass file.

[Script Info] Title: My Awesome Video ScriptType: v4.00+ PlayResX: 1920 PlayResY: 1080

[V4+ Styles] Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding Style: Default,Arial,60,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,0,0,0,0,100,100,0,0,1,2,1,2,10,10,10,1 Style: TopBanner,Impact,72,&H0000FFFF,&H000000FF,&H00000000,&H00000000,-1,0,0,0,100,100,0,0,1,3,2,8,10,10,10,1

[Events] Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text Dialogue: 0,0:00:05.25,0:00:08.10,Default,,0,0,0,,This is a standard subtitle. Dialogue: 0,0:00:10.50,0:00:15.00,TopBanner,,0,0,0,,{\an8}Big yellow text at the top!

The Trade-Offs Of Using ASS

All that power comes with a significant trade-off: poor web player support. Most standard HTML5 video players and major streaming platforms like YouTube don't know how to render ASS styling. They'll either ignore it or fail to display the subtitles entirely.

Playback is largely confined to desktop media players like VLC and MPC-HC, or specialized web players that bundle a dedicated ASS rendering engine.

This makes the ASS format an excellent choice for content you distribute as a final video file meant for offline viewing, but a difficult one for most web-based applications. If you're building a video feature in a web app, it's almost always better to stick with WebVTT for styling unless you have the ability to control the playback environment completely.

Once you graduate from simple web videos and start dealing with professional broadcasting or major streaming platforms, you’ll run into a different class of subtitle formats. These were built for complex workflows, strict broadcast standards, and total control over the final output. The two big players in this space are TTML and the legacy EBU-STL.

For streaming giants like Netflix or broadcast networks, a simple text file like SRT just doesn't cut it. They need a single format that can manage precise styling, on-screen positioning, multiple languages, and rich metadata all in one package. That's why TTML (Timed Text Markup Language) was created. It's an XML-based format, which makes it powerful but also verbose and a nightmare to write by hand.

A sketch illustrating a subtitle editor interface with sections for script info, V4+ styles, and events.

Understanding TTML And IMSC

The full TTML specification is massive. In the real world, nobody uses the whole thing. Instead, platforms adopt a specific profile, or subset, of TTML to make sure everything works consistently. The most widely used profile today is IMSC (Internet Media Subtitles and Captions), which has become the de facto standard for modern streaming.

A TTML file is just an XML document. Its structure is far more descriptive than an SRT or VTT file, defining regions for text placement, styling rules, and timing within a series of nested tags. This approach gives you granular control over how subtitles appear on screen.

Here’s a minimal example just to give you a feel for the structure.

This is a TTML subtitle. It's structured with XML.

The verbosity of XML is both a strength and a weakness. It provides a clear, machine-readable structure that’s perfect for automated broadcast systems, but it makes manual editing a huge pain. As a developer, you'll almost never write TTML by hand; you'll use specialized tools to generate it from a simpler format.

The Legacy EBU-STL Format

Long before digital streaming took over, European broadcasting ran on EBU-STL (European Broadcasting Union - Subtitling data exchange format). The format’s roots go back to the early 1980s, when broadcasters like the BBC and the US National Captioning Institute collaborated to standardize how they exchanged captions. This effort led to the EBU format, which by 2000 was used in over 50 countries and reportedly cut subtitle preparation time by up to 40%. You can read more about how subtitle technology evolved over on RedBeeMedia.com.

Unlike modern text-based formats, EBU-STL is a binary format with a rigid, fixed-width structure. It was designed for a totally different era, originally for transfer on 1.44 MB floppy disks. The file contains a General Subtitle Information (GSI) block with metadata, followed by a series of Text and Timing Information (TTI) blocks for each individual subtitle.

Because it’s a binary format, you can’t just open it in a text editor. You need specific software like FFmpeg or professional subtitling tools to read or convert it. While TTML has largely replaced it for modern delivery, you might still find EBU-STL files when working with archival broadcast content. Knowing its context is key if you ever have to process these legacy media assets.

When Subtitles Aren't Text: DVD and Blu-ray Formats

Not all subtitle formats are simple text files. When you start working with physical media like DVDs and Blu-rays, you run into a different beast: image-based subtitles. These files don't contain any text at all. Instead, they’re a sequence of pictures.

The two main formats you'll hit are VobSub (.idx/.sub) for DVDs and Presentation Graphic Stream (PGS) (.sup) for Blu-rays. The easiest way to think of them is as a series of transparent PNGs. Each "PNG" is a rendered image of the subtitle text, paired with a timestamp telling the player exactly when to show it.

How VobSub and PGS Work

The VobSub format is actually a pair of files that work together:

.idx (Index File): This is a small text file that acts as a map. It contains all the timestamps and byte offsets, telling the player when to show each subtitle and where to find its image inside the .sub file.
.sub (Subtitle File): This is a much larger binary file containing the raw bitmap images for every single subtitle line in the movie.

PGS functions in a similar way but bundles everything into a single .sup file. It's a more modern container designed for high-definition Blu-ray content, but the core idea is the same: timed images. The big advantage of this approach is perfect visual fidelity. Because the subtitles are pre-rendered, they look exactly the same on every player, preserving the original font, styling, and positioning without any compatibility drama.

But for developers, the downsides are huge.

Massive File Sizes: A sequence of images takes up way more space than a text file. An SRT for a movie might be 100 KB, while its PGS counterpart could easily be 50 MB or more.
Not Searchable or Editable: The text is literally burned into a picture. You can't just open the file in a text editor to fix a typo or search for a phrase.
A Localization Nightmare: You can't just translate the text. Localizing an image-based subtitle requires a whole different, much more painful workflow.

The OCR Challenge for Localization

Sooner or later, you'll rip a movie from a DVD or Blu-ray and end up with these image-based subtitle files. To make them useful for anything on the web, or even just for translation, you have to extract the text from the images. This process is called Optical Character Recognition (OCR).

Tools like Subtitle Edit or the command-line powerhouse FFmpeg have built-in OCR engines that can try to convert VobSub or PGS files into a text format like SRT. The tool analyzes each subtitle image, recognizes the characters, and spits out the text with its original timestamp.

Be warned, though: OCR is far from perfect. Accuracy depends entirely on the font, color, and quality of the original subtitle images. It almost always requires a round of manual review to fix misrecognized letters, weird symbols, and other glitches. It’s a time-consuming but necessary evil if you need to pull usable text from physical media.

Tools and Workflows for Managing Subtitle Files

Diagram illustrating the conversion process from a subtitle image to .idx/.sub and then to .sup file formats.

Managing subtitles with a simple text editor just doesn't scale. As a developer, you need a reliable toolkit for creating, converting, and quality-checking subtitle files, ideally from your terminal and integrated into your existing CI/CD pipelines. A few key open-source tools are indispensable here.

Essential Command-Line Tools

For anything involving video, audio, or subtitles on the command line, FFmpeg is the first tool everyone reaches for. It’s the Swiss Army knife for media manipulation and can handle almost any format conversion you throw at it.

A common task is converting a standard SRT file to the web-friendly WebVTT format. FFmpeg makes this a one-liner, perfect for a build script or CI job that prepares assets for a web app.

# Convert a .srt file to a .vtt file
ffmpeg -i input.srt output.vtt

That’s it. This command reads input.srt, adjusts the syntax for things like timestamp separators, and spits out a valid output.vtt file. It's fast, scriptable, and saves you from dealing with clunky online converters or manual edits. FFmpeg is also your best bet for ripping subtitle tracks directly out of video containers like MP4 or MKV.

Tools for Manual Editing and Synchronization

While the command line is great for automation, sometimes you need a GUI to create subtitles from scratch or fix tricky timing issues. For that, Subtitle Edit is a fantastic free and open-source tool for Windows. It gives you a visual timeline, an audio waveform display, and powerful features for nudging timings until they’re perfect.

Specialized software has a long history in this space. Tools like SubtitleNEXT have been around since the early 1990s, driving the industry’s shift from tape-based to modern, file-based workflows. It began as a DOS tool back in 1991, evolved for Windows by 1992, and introduced hybrid pipelines that drastically cut down turnaround times. Relaunched in 2016, it now supports modern formats like IMSC and WebVTT on all major operating systems. You can read about the history of SubtitleNEXT to get a sense of how these workflows have evolved.

As developers, our goal is to automate everything we can. A good subtitle workflow should feel a lot like your i18n process for .po files: a set of repeatable, scriptable commands that fit right into your existing development and deployment pipeline. For more on that, you can check out our guide on managing .po files effectively.

Common Questions About Subtitle Formats

Even after you get a feel for the different subtitle formats, a few practical questions always pop up when it's time to actually implement them. Here are some quick answers to the ones developers hit most often.

What Is The Best Subtitle File Format For A Website?

For any web project, WebVTT (.vtt) is the best choice. It's the modern standard built for HTML5 video and is directly supported by the <track> element in every major browser.

While SRT works almost everywhere, VTT was specifically designed for the web. It gives you far more control, with features like CSS styling, text positioning, and metadata support right out of the box. If you're building for the web, go with VTT.

How Do I Convert Between Subtitle Formats?

The most common and powerful tool for converting between pretty much any subtitle format is FFmpeg. It’s a free, open-source command-line utility that handles nearly any media conversion task you can throw at it.

For instance, converting an SRT file to a WebVTT file is a one-liner:

ffmpeg -i input.srt output.vtt

This is perfect for scripting and automating asset preparation in a CI/CD pipeline. If you need a graphical interface, free applications like Subtitle Edit (for Windows) or Aegisub (cross-platform) offer robust conversion features alongside powerful editing and timing tools.

Can I Put HTML Tags In An SRT File?

Technically, yes. You can put basic HTML tags like  for italics,  for bold, and  in an .srt file. The problem is that support for these tags is wildly inconsistent across video players.

Some players will render the HTML correctly, but many others will just display the raw tags as plain text. It's completely unreliable. If you need any styling, you should use WebVTT. Its CSS-based approach is standardized and far more predictable. The translation process itself is also a critical factor. For a deeper look at how modern translation engines work, you can check out our guide on the benefits of neural machine translation. This knowledge helps in understanding how text-based formats are processed.

Are you tired of manually translating .po files for your Django app? TranslateBot is an open-source CLI tool that automates the entire process with one command. It integrates directly into your existing workflow, uses a glossary to maintain consistency, and only translates what’s new, saving you time and money. Check it out at https://translatebot.dev.