AUTOMATED TRANSCRIPTION

Overview

Automated Transcription – Overview

Transana offers several choices for automated transcription. As of Transana 5.23, these options include Faster Whisper, and Speechmatics. There are several factors to consider in choosing which option is best for you.

Regardless of which option(s) you choose, you should check the accuracy of all automated transcripts and translations before using them for analysis. Automated transcripts can be inaccurate, and sometimes suggest the opposite of what was actually said. While automated transcription can save time over fully-manual transcription, there are times when it gets things wrong.

Data Security

The first issue to consider is that of data security and confidentiality.

Transana supports two types of Automated Transcription tools, Embedded and Server-based.

Embedded tools run entirely on your computer; your data remains on your computer. You can use automated transcription without any risk of that process making your data available to others. While this is an issue most server-based tools take seriously, Institutional Review Boards (IRBs) tend to be extremely cautious with data custody and the security risks that can be introduced with the transmission and storage of data during and after automated transcription processing on a server-based system. Embedded systems avoid those issues entirely. The more sensitive your human-subjects data is, the stronger the argument for using an embedded system.

Server-based tools require that the audio portion of your data be sent to a remote server for processing. Automated transcription companies take great pains to protect customer data, but researchers need to read about the data security steps taken and receive IRB or other ethical committee approval before submitting sensitive research data for server-based automated transcription. As general rule, server-based systems are able to devote more powerful computer processing resources to the transcription process and tend to produce automated transcript more quickly than embedded systems are able to.

When working with video data, Transana separates the audio track and only submits audio to server-based automated transcription services. While this eliminates the possibility of visual identification of participants, auditory identification would still be possible if the data were compromised.

Within Transana, Faster Whisper is an embedded tool, while Speechmatics is a server-based tool.

It should be noted that server-based generally systems require payment for their services. Faster Whisper is provided as part of Transana and requires no additional payment for use.

Transcription Speed and Accuracy

Automated transcription tools offer multiple options. These options usually represent different balances between speed and accuracy. This is a complex topic that will be addressed only briefly here. For more detail on the issue of accuracy, please see this Transana Blog post. It is important to note that accuracy is highly dependent on the characteristics of the data being transcribed.

Speechmatics performed solidly, with transcription typically taking less than half the length of the media file in my tests. The Standard model was ranked 5th of 12 for accuracy, and the Enhanced model was the most accurate of all the models I tested.

Faster Whisper presents a more complicated picture. Faster Whisper offers seven models to choose from, called Tiny, Base, Small, Medium, Large, Large-v1, and Large-v2. The three Large models came in 3rd, 4th, and 2nd in the quality tests, all close behind Speechmatics’ Enhanced model. In my preliminary tests, transcription speed varied a lot based on both model selected and on the capacity of the computer running the software.

  • The Tiny and Base, models were very fast, taking between 7 % and 20% of the length of the file being processed.
  • The Small model took between 30% and 60% of the file’s length for processing.
  • The Medium model took between 75% and 90% of the media’s time.
  • The three Large models varied wildly in how long they took to create a transcript, varying from “merely” 150% the media’s length to a whopping 15 times the media’s length. Processing times using these models were inconsistent even when processing the same media file with the same model on the same computer.

Because Faster Whisper runs locally on your computer, it runs much more slowly than the server-based options and is greatly influenced by the hardware you have. I tested using two older Windows computer (4 to 6 years old) and two newer Macs (3 year old M1 and 1 year old M2). The older Windows computers consistently performed better than the newer Apple hardware, sometimes a lot better. Only one time score on the Mac beat any of the Windows scores. The M2 Mac performed notably better than the M1 Mac.

Supported Languages

The three tools provided by Transana offer very different language options. In a few instances, they offer regional variations within a language.

Language Faster Whisper Speechmatics
Afrikaans Yes No
Albanian Yes No
Amharic Yes No
Arabic Yes Yes
Armenian Yes No
Assamese Yes No
Aserbaijani Yes No
Bashkir Yes Yes
Basque Yes Yes
Belarusian Yes Yes
Bengali Yes No
Bosnian Yes No
Breton Yes No
Bulgarian Yes Yes
Burmese Yes No
Catalan Yes Yes
Chinese Yes Cantonese,
Simplified Mandarin,
Traditional Mandarin
Croatian Yes Yes
Czech Yes Yes
Danish Yes Yes
Dutch Yes Yes
English Yes Australian, British, US
Esperanto No Yes
Estonian Yes Yes
Faroese Yes No
Finnish Yes Yes
Flemish No No
French Yes Yes
Galician Yes Yes
Georgian Yes No
German Yes Yes
Greek Yes Yes
Gujarati Yes No
Haitian Yes No
Hausa Yes No
Hawaiian Yes No
Hebrew Yes No
Hindi Yes Yes
Hungarian Yes Yes
Icelandic Yes No
Indonesian Yes Yes
Interlingua No Yes
Italian Yes Yes
Japanese Yes Yes
Javanese Yes No
Kannada Yes No
Kazakh Yes No
Central Khmer Yes No
Korean Yes Yes
Latin Yes No
Latvian Yes Yes
Lao Yes No
Lingala Yes No
Lithuanian Yes Yes
Luxembourgish Yes No
Macedonian Yes No
Malagasy Yes No
Malay Yes Yes
Malayalam Yes No
Maltese Yes No
Maori Yes No
Marathi Yes Yes
Mongolian Yes Yes
Nepali Yes No
Norwegian Bokmal, Nynorsk Bokmal
Occitan Yes No
Pashto Yes No
Persian Yes No
Polish Yes Yes
Portuguese Yes Yes
Punjabi Yes No
Romanian Yes Yes
Russian Yes Yes
Sanskrit Yes No
Serbian Yes No
Shona Yes No
Sindhi Yes No
Sinhala Yes No
Slovak Yes Yes
Slovenian Yes Yes
Somali Yes No
Spanish Yes Yes
Sundanese Yes No
Swahili Yes No
Swedish Yes Yes
Tagalog Yes No
Tajik Yes No
Tamil Yes Yes
Tatar Yes No
Telugu Yes No
Thai Yes Yes
Tibetan Yes No
Turkish Yes Yes
Turkmen Yes No
Ukranian Yes Yes
Urdu Yes No
Uyghur No Yes
Uzbek Yes No
Vietnamese Yes Yes
Welsh Yes Yes
Yiddish Yes No
Yoruba Yes No

This list is subject to change, but is theoretically accurate as of Transana 5.23.

Translation Capacity

At this time (Transana 5.10), Faster Whisper offers the capacity to provide a translation to English from any of the non-English languages it supports. Please note that you will want to check the accuracy of all automated translations before using them in analysis.