mgutt Posted June 4, 2021 Share Posted June 4, 2021 The following script recursively scans a folder for "mkv" files and exports all subtitles as SUP file (one for each execution of the script). The original MKV files is left untouched. All SUP files smaller than 3MB are treated as forced subtitle and named accordingly. Example: To do: - support for DVD subtitles Donate? 🤗 #!/bin/bash # ##################################### # mkv2sub v0.6 # # Notes: # mkv2sub automatically exports all subtitles of all MKV files in a specific folder. # After that it determines the forced subtitles. MKVs without (compatible) subtitles will be skipped. # # Changelog: # 0.6 # - scan files for mkv files recursively # - renamed from mkv2sup to mkv2sub# # 0.5 # - "bin/bash" added to the head of the script to force the usage of the correct interpreter # 0.4 # - check mkv modification filetime to ensure its not currently written through an other app # 0.3 # - Docker is now optional # - No exported SUP will be deleted if the new option <preserve_sup_files> is enabled # - Check if mkvtoolnix docker container is in use by other bash script before killing it # 0.2 # - Bug fix: Changed some exit status codes # - Bug fix: Some file names containing dots were cut # - Bug fix: Now all <sub_langs> of forced subtitles are renamed and not only those with <default_lang> # - Bug fix: Named SUP files that were only skipped, will now be deleted, too # 0.1 # - first release # # Todo: # - update subtitle track names in mkv file # - set forced track as default in mkv file # - how to solve doubles (two forced subtitle tracks in the same language, at the moment those will be tried to be renamed, but this fails as there already exists one) # - add support for DVDs S_VOBSUB subtitles # - determine all SRT subtitles (Regular, SDH, etc.) by using word/char matching # - while writing SUP files check if one contains the word "Forced" and is <default_lang> # ##################################### # # ######### Settings ################## movies_path="/mnt/user/TV" docker_config_path="/mnt/appdata/mkvtoolnix" sub_langs="ger,eng,tur" # Use "all" to preserve all subtitle languages. Note: The first language is set as default. sub_forced_max_size="3MB" preserve_sup_files=false # ##################################### # # ######### Script #################### # make script race condition safe if [[ -d "/tmp/${0///}" ]] || ! mkdir "/tmp/${0///}"; then exit 1; fi; trap 'rmdir "/tmp/${0///}"' EXIT; # check user settings movies_path=$([[ "${movies_path: -1}" == "/" ]] && echo "${movies_path%?}" || echo "$movies_path") docker_config_path=$([[ "${docker_config_path: -1}" != "/" ]] && echo "${docker_config_path}/" || echo "$docker_config_path") default_lang="${sub_langs:0:3}" # first language is used as default language sub_forced_max_size="${sub_forced_max_size//[!0-9.]/}" # float filtering (https://stackoverflow.com/a/19724571/318765) sub_forced_max_size=$(awk "BEGIN { print $sub_forced_max_size*1000000}") # convert MB to Bytes mkv_path="" function exitus() { exit_status=$1 # check if container exists if [[ -x "$(command -v docker)" ]] && [[ "$(docker ps -q -f name=mkvtoolnix_mkv2sub)" ]]; then # stop container only if its not in use (by other shell script) mkvtoolnix_cpu_usage="$(docker stats mkvtoolnix_mkv2sub --no-stream --format "{{.CPUPerc}}")" # if [[ ${mkvtoolnix_cpu_usage%.*} -lt 1 ]]; then # we do not stop the container as our script is not race-condition safe! # echo "Stop mkvtoolnix container" # docker stop mkvtoolnix_mkv2sub # docker rm mkvtoolnix_mkv2sub # fi fi exit $exit_status } function mkv_next() { path=$1 echo "Parsing $path ..." for file in "$path"/*; do # mkv has been already found if [[ -n $mkv_path ]]; then return # regular file elif [ -f "$file" ]; then # file extension must be .mkv mkv_basename=$(basename -- "$file") # https://stackoverflow.com/a/965072/318765 file_extension="${mkv_basename##*.}" mkv_filename=${mkv_basename%.*} #echo "mkv_filename is $mkv_filename" if [[ $file_extension != "mkv" ]]; then continue fi # skip this mkv file if there is already a sup subtitle file with the same name file_dirname="$(dirname "$file")" #echo "file_dirname is $file_dirname" for sup_path in "$file_dirname"/*.sup; do sup_basename=$(basename "$sup_path") if [[ $sup_basename == *"$mkv_filename."* ]]; then # skip this movie dir because sup subtitle has been found continue 2 fi done # skip this mkv file if there is already a srt subtitle file with the same name for srt_path in "$file_dirname"/*.srt; do srt_basename=$(basename "$srt_path") if [[ $srt_basename == *"$mkv_filename."* ]]; then # skip this movie dir because srt subtitle has been found continue 2 fi done # skip mkv file if it is not older than 2 minutes file_time=$(stat -c %Y "$file") # file modification time file_time=$(($file_time+120)) # the last modification of the file should be a few time ago current_time=$(date +%s) # actual timestamp if [[ $file_time -gt $current_time ]]; then continue fi mkv_path=$file mkv_dirname=$(dirname "$file") mkv_dirname="${mkv_dirname/$movies_path/}" mkv_dirname="${mkv_dirname:1}" # remove first slash mkv_dirname="${mkv_dirname/$movies_path/}" # remove <movies_path> from path docker_mkv_path="/storage/${mkv_dirname}/${mkv_basename}" echo "Found $mkv_path" break # dir else mkv_next "$file" fi done } function mkv_getinfo() { # check if mkvtoolnix exists if [[ -x "$(command -v mkvmerge)" ]]; then echo "mkvtoolnix will be used to fetch tracks information" mkv_info="$(mkvmerge -J "$mkv_path")" return "$mkv_info" # check if docker exists elif [[ -x "$(command -v docker)" ]]; then echo "Docker will be used to fetch tracks information" # check if mkvtoolnix container exists if [[ ! "$(docker ps -q -f name=mkvtoolnix_mkv2sub)" ]]; then # https://stackoverflow.com/a/38576401/318765 # check for blocking container if [[ "$(docker ps -aq -f status=exited -f name=mkvtoolnix_mkv2sub)" ]]; then docker rm mkvtoolnix_mkv2sub fi echo "mkvtoolnix container needs to be started" # start mkvtoolnix container docker_options=( run -d --name=mkvtoolnix_mkv2sub -e TZ=Europe/Berlin -v "${docker_config_path}mkvtoolnix_mkv2sub:/config:rw" -v "${movies_path}:/storage:rw" jlesage/mkvtoolnix ) echo "docker ${docker_options[@]}" docker "${docker_options[@]}" fi mkv_info="$(docker exec mkvtoolnix_mkv2sub /usr/bin/mkvmerge -J "$docker_mkv_path")" return fi echo "mkvtoolnix and docker do not exist!" exitus 1 } function mkv_extract() { # check if mkvtoolnix exists if [[ -x "$(command -v mkvmerge)" ]]; then echo "mkvtoolnix will be used to extract tracks" mkv_info="$(mkvextract "$mkv_path" "${mkvextract_options[@]}")" return # check if docker exists elif [[ -x "$(command -v docker)" ]]; then echo "mkvtoolnix@docker will be used to extract tracks" # check if mkvtoolnix container exists if [[ ! "$(docker ps -q -f name=mkvtoolnix_mkv2sub)" ]]; then # https://stackoverflow.com/a/38576401/318765 # check for blocking container if [[ ! "$(docker ps -aq -f status=exited -f name=mkvtoolnix_mkv2sub)" ]]; then docker rm mkvtoolnix_mkv2sub fi echo "mkvtoolnixcontainer needs to be started" # start mkvtoolnix container docker_options=( run -d --name=mkvtoolnix_mkv2sub -e TZ=Europe/Berlin -v "${docker_config_path}mkvtoolnix_mkv2sub:/config:rw" -v "${movies_path}:/storage:rw" jlesage/mkvtoolnix ) echo "docker ${docker_options[@]}" docker "${docker_options[@]}" fi echo "docker exec mkvtoolnix_mkv2sub /usr/bin/mkvextract \"$docker_mkv_path\" ${mkvextract_options[@]}" docker exec mkvtoolnix_mkv2sub /usr/bin/mkvextract "$docker_mkv_path" "${mkvextract_options[@]}" return fi echo "mkvtoolnix and docker do not exist!" exitus 1 } # get next mkv file shopt -s nullglob # avoid empty directory errors (https://unix.stackexchange.com/questions/56051/avoiding-errors-due-to-unexpanded-asterisk) mkv_next "$movies_path" # fills $mkv_path and $docker_mkv_path shopt -u nullglob # its important to reset this setting (https://unix.stackexchange.com/questions/534858/why-does-shopt-s-nullglob-remove-a-string-with-question-mark-in-an-array-elemen) # no mkv file found if [[ -z $docker_mkv_path ]]; then echo "No mkv files found or all subtitles have been exported!" exitus 0 fi mkv_getinfo # uses $mkv_path, fills $mkv_info if [[ -z $mkv_info ]]; then echo "Error while fetching tracks information with mkvmerge" exitus 1 fi echo "Informations of all tracks have been obtained." # parse info sub_track_ids=(); track_langs=(); track_names=(); track_codec_ids=(); while read -r line ; do echo $line # Note: we did not use "jq -r" to parse JSON as it needs installation track_codec_name=$(echo $line | grep -oP '^.*?(?=\")') track_id=$(echo $line | grep -oP '(?<="id": )[0-9]+') track_bits=$(echo $line | grep -oP '(?<="audio_bits_per_sample": )[0-9]+') track_channels=$(echo $line | grep -oP '(?<="audio_channels": )[0-9]+') track_codec_id=$(echo $line | grep -oP '(?<="codec_id": ").*?[^\\](?=\",)') track_lang=$(echo $line | grep -oP '(?<="language": ")[a-z]+') track_name=$(echo $line | grep -oP '(?<="track_name": ").*?[^\\](?=\",)') # most flexible way of getting a JSON value (https://stackoverflow.com/a/6852427/318765) track_default=$(echo $line | grep -oP '(?<="default_track": )(true|false)') track_forced=$(echo $line | grep -oP '(?<="forced_track": )(true|false)') track_type=$(echo $line | grep -oP '(?<=")[a-z]+$') # collect track langs if [[ -n $track_lang ]]; then track_langs[$track_id]=$track_lang else track_langs[$track_id]='und' # und = undetermined fi # collect track names if [[ -n $track_name ]]; then track_names[$track_id]=$track_name else track_names[$track_id]='und' fi # collect codec ids if [[ -n $track_codec_id ]]; then track_codec_ids[$track_id]=$track_codec_id else track_codec_ids[$track_id]='und' fi # collect subtitles in prefered languages if [[ $track_type == "subtitles" ]] && [[ $track_codec_id == "S_HDMV/PGS" ]]; then if [[ $sub_langs == "all" ]] || [[ $sub_langs == *"$track_lang"* ]]; then sub_track_ids+=("$track_id") fi fi done < <(echo "$mkv_info" | tr -d '\n' | # we need to remove line breaks with "tr" to force grep to return one-liners grep -oP '(?<=codec": ").*?"type": "[a-z]+') # Regex is faster than looping through all lines # create empty sup file if mkv file does not contain any subtitles (by that it will be skipped in next turn) if [[ ${#sub_track_ids[@]} -eq 0 ]];then empty_srt_filename="${mkv_path%.*}.nosubs.srt" #empty_srt_filename="${movies_path}/${mkv_dirname}/${mkv_filename}.nosubs.srt" echo "The empty SRT file '${empty_srt_filename}' will be created to skip MKV file '${mkv_path}' in the next turn as it does not contain any (compatible) subtitles." touch "$empty_srt_filename" exitus 0 fi # build mkvextract export parameter mkvextract_options=(tracks) for track_id in "${sub_track_ids[@]}"; do # file naming scheme "Movie_Name.[Language_Code].forced.ext" adopted from Plex (https://support.plex.tv/articles/200471133-adding-local-subtitles-to-your-media/#toc-3) mkvextract_options+=("${track_id}:/storage/${mkv_dirname}/${mkv_filename}.track${track_id}.${track_langs[$track_id]}.${track_names[$track_id]}.sup") done # export all subtitles mkv_extract # uses mkv_path, docker_mkv_path, mkvextract_options, movies_path echo "Successfully extracted all subtitles" # determine forced subtitle shopt -s nullglob shopt -s nocasematch # insensitive string comparison (https://stackoverflow.com/a/14138301/318765) forced_found=false for sup_path in "${movies_path}/${mkv_dirname}/${mkv_filename}"*.sup; do # get path parts sup_dirname=$(dirname "$sup_path") sup_basename=$(basename "$sup_path") sup_filename=${sup_basename%.*.*.*.*} # (filename).track[0-9].<lang>.<name>.sup sup_extension=${sup_basename/#"$sup_filename"./} # filename.(track[0-9].<lang>.<name>.sup) # fetch track data through filename IFS='.' # set internal field separator to dot (default is whitespace) read -ra track_data <<< "$sup_extension" # explode to array (https://stackoverflow.com/a/918931/318765) unset IFS; # unset internal field separator track_id=${track_data[0]} track_id=${track_id/track/} # remove the word "track" track_lang=${track_data[1]} track_name=${track_data[2]} # skip SUP files with wrong naming scheme if [[ -n ${track_id//[0-9]/} ]]; then echo "'$track_id' is not a track id" continue fi if [[ ${#track_lang} -lt 2 ]] || [[ ${#track_lang} -gt 3 ]] || [[ -n "${track_lang//[a-zA-Z]/}" ]]; then echo "'$track_lang' is not a track lang" continue fi if [[ -n ${track_name//[a-zA-Z \']/} ]]; then echo "'$track_name' is not a track name" continue fi # set Plex compatible filename (https://support.plex.tv/articles/200471133-adding-local-subtitles-to-your-media/#toc-3) sup_filename_new="${sup_dirname}/${sup_filename}.${track_lang}.forced.sup" # determine by track name if [[ $track_name == "forced" ]] && [[ $sub_langs == *"$track_lang"* ]] || [[ $default_lang == "all" ]]; then forced_found=true mv "$sup_path" "$sup_filename_new" echo "'$sup_path' has been renamed to '$sup_filename_new'" continue fi # skip subtitle tracks that already have names like "Regular", "SDH", etc.) if [[ $track_name != "und" ]];then if [[ $preserve_sup_files == "false" ]]; then rm -rf "$sup_path" echo "'$sup_path' has been deleted" fi continue fi # determine by filesize filesize=$(stat -c%s "$sup_path") if [ $sub_forced_max_size -ge $filesize ]; then forced_found=true echo "'$sup_path' is small enough to be a forced subtitle" mv "$sup_path" "$sup_filename_new" # cp --backup "$sup_path" "$sup_filename_new" echo "'$sup_path' has been renamed to '$sup_filename_new'" continue fi # delete all other exported subtitles if [[ $preserve_sup_files == "false" ]]; then rm -rf "$sup_path" echo "'$sup_path' has been deleted" fi done shopt -u nocasematch shopt -u nullglob # create empty sup file if mkv does not contain at least one forced subtitle (by that it will be skipped in next turn) if [[ $preserve_sup_files != "true" ]] && [[ $forced_found != "true" ]]; then empty_sr_filename="${mkv_path%.*}.noforced.srt" #empty_sr_filename="${movies_path}/${mkv_dirname}/${mkv_filename}.noforced.srt" echo "The empty SRT file '${empty_sr_filename}' will be created to skip MKV file '${mkv_path}' in the next turn as it does not contain forced subtitles." touch "$empty_sr_filename" exitus 0 fi exitus 0 1 Quote Link to comment
mgutt Posted June 4, 2021 Author Share Posted June 4, 2021 After exporting them as SUP file I: - open Subtitle Edit -> Tools -> Batch Convert - search through Windows Explorer for "*.ger.forced.sup" - copy & paste all files into the Batch Convert window - set "Save in source file folder" - set "SupRip" - press "Convert" Important: Subtitle Edit automatically uses the last used language of the usual OCR tool. If you like to switch between languages, your need first to open one SUP file through Subtitle Edit and change the language before using the batch tool: I already opened an issue as I like to be able to select the language through the Batch Conversion Tool itself. Maybe you like to post a comment there 😉 After the conversion is done the folder looks like this (contains SUP and SRT file): Now I search for all *.sup files and delete them. Plex automatically selects external SRT files if they use the correct filename scheme: Note: OCR is never perfect. If you like to have perfect subtitles you need to open each SUP file manually and check the detected words and correct them. I described only the fastest method. Quote Link to comment
tkato Posted November 21, 2021 Share Posted November 21, 2021 I'm confused, when I'm running the script it runs on one file in the path and stops, have to run it again and again for it to move to other files Quote Link to comment
i-B4se Posted March 26, 2023 Share Posted March 26, 2023 Hey @mgutt, i have the same problem as tkato. I still have some older Blu Rays without subtitles and I would like to add them, but I have to run the script again after each folder. Is there any solution for this? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.