mkv2sub - Export all MKV subtitles (Blu-Ray SUP)


2 posts in this topic Last Reply

Recommended Posts

The following script recursively scans a folder for "mkv" files and exports all subtitles as SUP file (one for each execution of the script). The original MKV files is left untouched. All SUP files smaller than 3MB are treated as forced subtitle and named accordingly. Example:

image.png.3562067137d795f992f5789b426f8912.png

 

To do:

- support for DVD subtitles

 

Donate? 🤗

 

 

#!/bin/bash
# #####################################
# mkv2sub v0.6
# 
# Notes:
# mkv2sub automatically exports all subtitles of all MKV files in a specific folder.
# After that it determines the forced subtitles. MKVs without (compatible) subtitles will be skipped.
# 
# Changelog:
# 0.6
# - scan files for mkv files recursively
# - renamed from mkv2sup to mkv2sub#
# 0.5
# - "bin/bash" added to the head of the script to force the usage of the correct interpreter
# 0.4
# - check mkv modification filetime to ensure its not currently written through an other app
# 0.3
# - Docker is now optional
# - No exported SUP will be deleted if the new option <preserve_sup_files> is enabled
# - Check if mkvtoolnix docker container is in use by other bash script before killing it
# 0.2
# - Bug fix: Changed some exit status codes
# - Bug fix: Some file names containing dots were cut
# - Bug fix: Now all <sub_langs> of forced subtitles are renamed and not only those with <default_lang>
# - Bug fix: Named SUP files that were only skipped, will now be deleted, too
# 0.1
# - first release
# 
# Todo:
# - update subtitle track names in mkv file
# - set forced track as default in mkv file
# - how to solve doubles (two forced subtitle tracks in the same language, at the moment those will be tried to be renamed, but this fails as there already exists one)
# - add support for DVDs S_VOBSUB subtitles
# - determine all SRT subtitles (Regular, SDH, etc.) by using word/char matching
# - while writing SUP files check if one contains the word "Forced" and is <default_lang>
# #####################################
# 
# ######### Settings ##################
movies_path="/mnt/user/TV"
docker_config_path="/mnt/appdata/mkvtoolnix"
sub_langs="ger,eng,tur" # Use "all" to preserve all subtitle languages. Note: The first language is set as default.
sub_forced_max_size="3MB"
preserve_sup_files=false
# #####################################
# 
# ######### Script ####################
# make script race condition safe
if [[ -d "/tmp/${0///}" ]] || ! mkdir "/tmp/${0///}"; then exit 1; fi; trap 'rmdir "/tmp/${0///}"' EXIT;
# check user settings
movies_path=$([[ "${movies_path: -1}" == "/" ]] && echo "${movies_path%?}" || echo "$movies_path")
docker_config_path=$([[ "${docker_config_path: -1}" != "/" ]] && echo "${docker_config_path}/" || echo "$docker_config_path")
default_lang="${sub_langs:0:3}" # first language is used as default language
sub_forced_max_size="${sub_forced_max_size//[!0-9.]/}" # float filtering (https://stackoverflow.com/a/19724571/318765)
sub_forced_max_size=$(awk "BEGIN { print $sub_forced_max_size*1000000}") # convert MB to Bytes
mkv_path=""
function exitus() {
    exit_status=$1
    # check if container exists
    if [[ -x "$(command -v docker)" ]] && [[ "$(docker ps -q -f name=mkvtoolnix_mkv2sub)" ]]; then
        # stop container only if its not in use (by other shell script)
        mkvtoolnix_cpu_usage="$(docker stats mkvtoolnix_mkv2sub --no-stream --format "{{.CPUPerc}}")"
        # if [[ ${mkvtoolnix_cpu_usage%.*} -lt 1 ]]; then
            # we do not stop the container as our script is not race-condition safe!
            # echo "Stop mkvtoolnix container"
            # docker stop mkvtoolnix_mkv2sub
            # docker rm mkvtoolnix_mkv2sub
        # fi
    fi
    exit $exit_status
}
function mkv_next() {
    path=$1
    echo "Parsing $path ..."
    for file in "$path"/*; do
        # mkv has been already found
        if [[ -n $mkv_path ]]; then
            return
        # regular file
        elif [ -f "$file" ]; then
            # file extension must be .mkv
            mkv_basename=$(basename -- "$file") # https://stackoverflow.com/a/965072/318765
            file_extension="${mkv_basename##*.}"
            mkv_filename=${mkv_basename%.*}
            #echo "mkv_filename is $mkv_filename"
            if [[ $file_extension != "mkv" ]]; then
                continue
            fi
            # skip this mkv file if there is already a sup subtitle file with the same name
            file_dirname="$(dirname "$file")"
            #echo "file_dirname is $file_dirname"
            for sup_path in "$file_dirname"/*.sup; do
                sup_basename=$(basename "$sup_path")
                if [[ $sup_basename == *"$mkv_filename."* ]]; then
                    # skip this movie dir because sup subtitle has been found
                    continue 2
                fi
            done
            # skip this mkv file if there is already a srt subtitle file with the same name
            for srt_path in "$file_dirname"/*.srt; do
                srt_basename=$(basename "$srt_path")
                if [[ $srt_basename == *"$mkv_filename."* ]]; then
                    # skip this movie dir because srt subtitle has been found
                    continue 2
                fi
            done
            # skip mkv file if it is not older than 2 minutes
            file_time=$(stat -c %Y "$file") # file modification time
            file_time=$(($file_time+120)) # the last modification of the file should be a few time ago
            current_time=$(date +%s) # actual timestamp
            if [[ $file_time -gt $current_time ]]; then
                continue
            fi
            mkv_path=$file
            mkv_dirname=$(dirname "$file")
            mkv_dirname="${mkv_dirname/$movies_path/}"
            mkv_dirname="${mkv_dirname:1}" # remove first slash
            mkv_dirname="${mkv_dirname/$movies_path/}" # remove <movies_path> from path
            docker_mkv_path="/storage/${mkv_dirname}/${mkv_basename}"
            echo "Found $mkv_path"
            break
        # dir
        else
            mkv_next "$file"
        fi
    done
}
function mkv_getinfo() {
    # check if mkvtoolnix exists
    if [[ -x "$(command -v mkvmerge)" ]]; then
        echo "mkvtoolnix will be used to fetch tracks information"
        mkv_info="$(mkvmerge -J "$mkv_path")"
        return "$mkv_info"
    # check if docker exists
    elif [[ -x "$(command -v docker)" ]]; then
        echo "Docker will be used to fetch tracks information"
        # check if mkvtoolnix container exists
        if [[ ! "$(docker ps -q -f name=mkvtoolnix_mkv2sub)" ]]; then # https://stackoverflow.com/a/38576401/318765
            # check for blocking container
            if [[ "$(docker ps -aq -f status=exited -f name=mkvtoolnix_mkv2sub)" ]]; then
                docker rm mkvtoolnix_mkv2sub
            fi
            echo "mkvtoolnix container needs to be started"
            # start mkvtoolnix container
            docker_options=(
                run -d
                --name=mkvtoolnix_mkv2sub
                -e TZ=Europe/Berlin
                -v "${docker_config_path}mkvtoolnix_mkv2sub:/config:rw"
                -v "${movies_path}:/storage:rw"
                jlesage/mkvtoolnix
            )
            echo "docker ${docker_options[@]}"
            docker "${docker_options[@]}"
        fi
        mkv_info="$(docker exec mkvtoolnix_mkv2sub /usr/bin/mkvmerge -J "$docker_mkv_path")"
        return
    fi
    echo "mkvtoolnix and docker do not exist!"
    exitus 1
}
function mkv_extract() {
    # check if mkvtoolnix exists
    if [[ -x "$(command -v mkvmerge)" ]]; then
        echo "mkvtoolnix will be used to extract tracks"
        mkv_info="$(mkvextract "$mkv_path" "${mkvextract_options[@]}")"
        return
    # check if docker exists
    elif [[ -x "$(command -v docker)" ]]; then
        echo "mkvtoolnix@docker will be used to extract tracks"
        # check if mkvtoolnix container exists
        if [[ ! "$(docker ps -q -f name=mkvtoolnix_mkv2sub)" ]]; then # https://stackoverflow.com/a/38576401/318765
            # check for blocking container
            if [[ ! "$(docker ps -aq -f status=exited -f name=mkvtoolnix_mkv2sub)" ]]; then
                docker rm mkvtoolnix_mkv2sub
            fi
            echo "mkvtoolnixcontainer needs to be started"
            # start mkvtoolnix container
            docker_options=(
                run -d
                --name=mkvtoolnix_mkv2sub
                -e TZ=Europe/Berlin
                -v "${docker_config_path}mkvtoolnix_mkv2sub:/config:rw"
                -v "${movies_path}:/storage:rw"
                jlesage/mkvtoolnix
            )
            echo "docker ${docker_options[@]}"
            docker "${docker_options[@]}"
        fi
        echo "docker exec mkvtoolnix_mkv2sub /usr/bin/mkvextract \"$docker_mkv_path\" ${mkvextract_options[@]}"
        docker exec mkvtoolnix_mkv2sub /usr/bin/mkvextract "$docker_mkv_path" "${mkvextract_options[@]}"
        return
    fi
    echo "mkvtoolnix and docker do not exist!"
    exitus 1
}
# get next mkv file
shopt -s nullglob # avoid empty directory errors (https://unix.stackexchange.com/questions/56051/avoiding-errors-due-to-unexpanded-asterisk)
mkv_next "$movies_path" # fills $mkv_path and $docker_mkv_path
shopt -u nullglob # its important to reset this setting (https://unix.stackexchange.com/questions/534858/why-does-shopt-s-nullglob-remove-a-string-with-question-mark-in-an-array-elemen)
# no mkv file found
if [[ -z $docker_mkv_path ]]; then
    echo "No mkv files found or all subtitles have been exported!"
    exitus 0
fi
mkv_getinfo # uses $mkv_path, fills $mkv_info
if [[ -z $mkv_info ]]; then
    echo "Error while fetching tracks information with mkvmerge"
    exitus 1
fi
echo "Informations of all tracks have been obtained."
# parse info
sub_track_ids=(); track_langs=(); track_names=(); track_codec_ids=();
while read -r line ; do
    echo $line
    # Note: we did not use "jq -r" to parse JSON as it needs installation
    track_codec_name=$(echo $line | grep -oP '^.*?(?=\")')
    track_id=$(echo $line | grep -oP '(?<="id": )[0-9]+')
    track_bits=$(echo $line | grep -oP '(?<="audio_bits_per_sample": )[0-9]+')
    track_channels=$(echo $line | grep -oP '(?<="audio_channels": )[0-9]+')
    track_codec_id=$(echo $line | grep -oP '(?<="codec_id": ").*?[^\\](?=\",)')
    track_lang=$(echo $line | grep -oP '(?<="language": ")[a-z]+')
    track_name=$(echo $line | grep -oP '(?<="track_name": ").*?[^\\](?=\",)') # most flexible way of getting a JSON value (https://stackoverflow.com/a/6852427/318765)
    track_default=$(echo $line | grep -oP '(?<="default_track": )(true|false)')
    track_forced=$(echo $line | grep -oP '(?<="forced_track": )(true|false)')
    track_type=$(echo $line | grep -oP '(?<=")[a-z]+$')
    # collect track langs
    if [[ -n $track_lang ]]; then
        track_langs[$track_id]=$track_lang
    else
        track_langs[$track_id]='und' # und = undetermined
    fi
    # collect track names
    if [[ -n $track_name ]]; then
        track_names[$track_id]=$track_name
    else
        track_names[$track_id]='und'
    fi
    # collect codec ids
    if [[ -n $track_codec_id ]]; then
        track_codec_ids[$track_id]=$track_codec_id
    else
        track_codec_ids[$track_id]='und'
    fi
    # collect subtitles in prefered languages
    if [[ $track_type == "subtitles" ]] && [[ $track_codec_id == "S_HDMV/PGS" ]]; then
        if [[ $sub_langs == "all" ]] || [[ $sub_langs == *"$track_lang"* ]]; then
            sub_track_ids+=("$track_id")
        fi
    fi
done < <(echo "$mkv_info" | 
        tr -d '\n' | # we need to remove line breaks with "tr" to force grep to return one-liners
        grep -oP '(?<=codec": ").*?"type": "[a-z]+') # Regex is faster than looping through all lines
# create empty sup file if mkv file does not contain any subtitles (by that it will be skipped in next turn)
if [[ ${#sub_track_ids[@]} -eq 0 ]];then
    empty_srt_filename="${mkv_path%.*}.nosubs.srt"
    #empty_srt_filename="${movies_path}/${mkv_dirname}/${mkv_filename}.nosubs.srt"
    echo "The empty SRT file '${empty_srt_filename}' will be created to skip MKV file '${mkv_path}' in the next turn as it does not contain any (compatible) subtitles."
    touch "$empty_srt_filename"
    exitus 0
fi
# build mkvextract export parameter
mkvextract_options=(tracks)
for track_id in "${sub_track_ids[@]}"; do
    # file naming scheme "Movie_Name.[Language_Code].forced.ext" adopted from Plex (https://support.plex.tv/articles/200471133-adding-local-subtitles-to-your-media/#toc-3)
    mkvextract_options+=("${track_id}:/storage/${mkv_dirname}/${mkv_filename}.track${track_id}.${track_langs[$track_id]}.${track_names[$track_id]}.sup")
done
# export all subtitles
mkv_extract # uses mkv_path, docker_mkv_path, mkvextract_options, movies_path
echo "Successfully extracted all subtitles"
# determine forced subtitle
shopt -s nullglob
shopt -s nocasematch # insensitive string comparison (https://stackoverflow.com/a/14138301/318765)
forced_found=false
for sup_path in "${movies_path}/${mkv_dirname}/${mkv_filename}"*.sup; do
    # get path parts
    sup_dirname=$(dirname "$sup_path")
    sup_basename=$(basename "$sup_path")
    sup_filename=${sup_basename%.*.*.*.*} # (filename).track[0-9].<lang>.<name>.sup
    sup_extension=${sup_basename/#"$sup_filename"./} # filename.(track[0-9].<lang>.<name>.sup)
    # fetch track data through filename
    IFS='.' # set internal field separator to dot (default is whitespace)
    read -ra track_data <<< "$sup_extension" # explode to array (https://stackoverflow.com/a/918931/318765)
    unset IFS; # unset internal field separator
    track_id=${track_data[0]}
    track_id=${track_id/track/} # remove the word "track"
    track_lang=${track_data[1]}
    track_name=${track_data[2]}
    # skip SUP files with wrong naming scheme
    if [[ -n ${track_id//[0-9]/} ]]; then
        echo "'$track_id' is not a track id"
        continue
    fi
    if [[ ${#track_lang} -lt 2 ]] || [[ ${#track_lang} -gt 3 ]] || [[ -n "${track_lang//[a-zA-Z]/}" ]]; then
        echo "'$track_lang' is not a track lang"
        continue
    fi
    if [[ -n ${track_name//[a-zA-Z \']/} ]]; then
        echo "'$track_name' is not a track name"
        continue
    fi
    # set Plex compatible filename (https://support.plex.tv/articles/200471133-adding-local-subtitles-to-your-media/#toc-3)
    sup_filename_new="${sup_dirname}/${sup_filename}.${track_lang}.forced.sup"
    # determine by track name
    if [[ $track_name == "forced" ]] && [[ $sub_langs == *"$track_lang"* ]] || [[ $default_lang == "all" ]]; then
        forced_found=true
        mv "$sup_path" "$sup_filename_new"
        echo "'$sup_path' has been renamed to '$sup_filename_new'"
        continue
    fi
    # skip subtitle tracks that already have names like "Regular", "SDH", etc.)
    if [[ $track_name != "und" ]];then
        if [[ $preserve_sup_files == "false" ]]; then
            rm -rf "$sup_path"
            echo "'$sup_path' has been deleted"
        fi
        continue
    fi
    # determine by filesize
    filesize=$(stat -c%s "$sup_path")
    if [ $sub_forced_max_size -ge $filesize ]; then
        forced_found=true
        echo "'$sup_path' is small enough to be a forced subtitle"
        mv "$sup_path" "$sup_filename_new"
        # cp --backup "$sup_path" "$sup_filename_new"
        echo "'$sup_path' has been renamed to '$sup_filename_new'"
        continue
    fi
    # delete all other exported subtitles
    if [[ $preserve_sup_files == "false" ]]; then
        rm -rf "$sup_path"
        echo "'$sup_path' has been deleted"
    fi
done
shopt -u nocasematch
shopt -u nullglob
# create empty sup file if mkv does not contain at least one forced subtitle (by that it will be skipped in next turn)
if [[ $preserve_sup_files != "true" ]] && [[ $forced_found != "true" ]]; then
    empty_sr_filename="${mkv_path%.*}.noforced.srt"
    #empty_sr_filename="${movies_path}/${mkv_dirname}/${mkv_filename}.noforced.srt"
    echo "The empty SRT file '${empty_sr_filename}' will be created to skip MKV file '${mkv_path}' in the next turn as it does not contain forced subtitles."
    touch "$empty_sr_filename"
    exitus 0
fi
exitus 0

 

 

 

Link to post

After exporting them as SUP file I:

- open Subtitle Edit -> Tools -> Batch Convert

- search through Windows Explorer for "*.ger.forced.sup"

- copy & paste all files into the Batch Convert window

- set "Save in source file folder"

- set "SupRip"

- press "Convert"

 

502679362_2021-06-0413_25_34.thumb.png.ee5927b9a029f3723cd5b1a2a7ccebec.png

 

Important: Subtitle Edit automatically uses the last used language of the usual OCR tool. If you like to switch between languages, your need first to open one SUP file through Subtitle Edit and change the language before using the batch tool:

image.png.3218d2fc66f8f166b3bb796d55455650.png

 

I already opened an issue as I like to be able to select the language through the Batch Conversion Tool itself. Maybe you like to post a comment there 😉

 

After the conversion is done the folder looks like this (contains SUP and SRT file):

image.png.0b325f0a55f1fd5ae64ab8f9be4dcd4b.png

 

Now I search for all *.sup files and delete them.

 

Plex automatically selects external SRT files if they use the correct filename scheme:

2001519786_2021-06-0413_02_04.png.e915ef4f4bd98de325dbeba21519565b.png

 

 

Note: OCR is never perfect. If you like to have perfect subtitles you need to open each SUP file manually and check the detected words and correct them. I described only the fastest method.

Link to post
  • mgutt changed the title to mkv2sub - Export all MKV subtitles (Blu-Ray SUP)

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.