posted on Thu, Dec 07 '23 under tags: languages, foss, ai

Is it the best time to be alive as a subtitle-r?

This year I’ve been dealing with subtitles in many ways. Yes, I mean, that text which appears under videos writing down what is spoken.

It helps in many ways:

Automatic subtitles

At SOCHARA’s School of Public Health Equity and Action, there is a very big challenge in creating learning materials for learners in all languages. We have been hoping to get the new generation AI tools to help us in some ways. And it’s slowly becoming a reality.

First we thought we will generate proper English subtitles (our videos are often created primarily in English) through manual editing of potentially machine generated subtitles. And then using this we could summarize the topic (using LLM tools like ChatGPT) and then translate those summaries to various languages (using tools like Google Translate).

Towards this, Hrishikesh Barman who’s part of the GLAMWW workgroup of SOCHARA built a tool called wscribe to generate subtitles using faster-whisper and a frontend to edit this manually.

We also figured out that we can run this online completely free using Google colab using a notebook like this. This was a huge boon for me because my LC230 takes hours to run anything related to AI.

Interestingly, that notebook can also be used to generate English subtitles for Indian language videos. This was useful for me in another project that I was consulting for.

Manual subtitling

But for the best quality, I’m still stuck with manual subtitling. I use aegisub for this. The workflow look roughly like this:

By default Aegisub uses ass format, but you can export to other formats.

Etching subtitles into videos

Here are some links that help in this:

There are mainly two ways of adding subtitles:

The former passes the subtitle as a video filter thereby hard etching it. Meaning you can’t turn off the subtitle, it becomes pixels of the video.

The latter adds it as metadata and makes it part of the video (so you don’t need a separate file), but in a way that it can be turned off and styled differently.

I had 6 videos to do this for, so I came up with a bash script to do this. I kept the videos named as video1.mp4, video2.mp4, etc in the folder inputs. Then I put the names of the video in the array names.

I also had a small flag that would create scaled down previews for initial drafts to be shared.

set -x

 "Name of video 1"
 "Name of video 2"
 "Name of video 3"
 "Name of video 4"
 "Name of video 5"
 "Name of video 6"

convert() {
    local index=$1
    local small=$2
    local video="video$((index + 1))"
    local name="${names[index]}"
    local vf_option
    if [ "$small" = true ]; then
        vf_option="scale=320:-1, ass=inputs/${video}.ass"
    yes n | ffmpeg -i "inputs/${video}.mp4" -vf "$vf_option" "outputs/${name}.mp4"
    cp "inputs/${video}.ass" "outputs/${name}.ass"

while [ "$v" -lt 6 ]; do
    convert "$v" false

As you can see, I’ve done the hard etching here.

YouTube’s auto-translation of auto-generated subtitles

I’ve recently discovered that YouTube allows to automatically translate subtitle, and since YouTube can also automatically generate subtitle, this means it can effectively solve our original problem at SOPHEA.

It is also conveniently possible to download these subtitles using yt-dlp like this:

yt-dlp --write-auto-sub --skip-download --sub-lang kn ""

This downloads the Kannada (kn) subtitle for a video, without downloading the actual video.

Like what you are reading? Subscribe (by RSS, email, mastodon, or telegram)!