AUTOMATED TRANSCRIPTION
Overview
Automated Transcription – Overview
Transana offers several choices for automated transcription. As of Transana 5.23, these options include Faster Whisper, and Speechmatics. There are several factors to consider in choosing which option is best for you.
Regardless of which option(s) you choose, you should check the accuracy of all automated transcripts and translations before using them for analysis. Automated transcripts can be inaccurate, and sometimes suggest the opposite of what was actually said. While automated transcription can save time over fully-manual transcription, there are times when it gets things wrong.
Data Security
The first issue to consider is that of data security and confidentiality.
Transana supports two types of Automated Transcription tools, Embedded and Server-based.
Embedded tools run entirely on your computer; your data remains on your computer. You can use automated transcription without any risk of that process making your data available to others. While this is an issue most server-based tools take seriously, Institutional Review Boards (IRBs) tend to be extremely cautious with data custody and the security risks that can be introduced with the transmission and storage of data during and after automated transcription processing on a server-based system. Embedded systems avoid those issues entirely. The more sensitive your human-subjects data is, the stronger the argument for using an embedded system.
Server-based tools require that the audio portion of your data be sent to a remote server for processing. Automated transcription companies take great pains to protect customer data, but researchers need to read about the data security steps taken and receive IRB or other ethical committee approval before submitting sensitive research data for server-based automated transcription. As general rule, server-based systems are able to devote more powerful computer processing resources to the transcription process and tend to produce automated transcript more quickly than embedded systems are able to.
When working with video data, Transana separates the audio track and only submits audio to server-based automated transcription services. While this eliminates the possibility of visual identification of participants, auditory identification would still be possible if the data were compromised.
Within Transana, Faster Whisper is an embedded tool, while Speechmatics is a server-based tool.
It should be noted that server-based generally systems require payment for their services. Faster Whisper is provided as part of Transana and requires no additional payment for use.
Transcription Speed and Accuracy
Automated transcription tools offer multiple options. These options usually represent different balances between speed and accuracy. This is a complex topic that will be addressed only briefly here. For more detail on the issue of accuracy, please see this Transana Blog post. It is important to note that accuracy is highly dependent on the characteristics of the data being transcribed.
Speechmatics performed solidly, with transcription typically taking less than half the length of the media file in my tests. The Standard model was ranked 5th of 12 for accuracy, and the Enhanced model was the most accurate of all the models I tested.
Faster Whisper presents a more complicated picture. Faster Whisper offers seven models to choose from, called Tiny, Base, Small, Medium, Large, Large-v1, and Large-v2. The three Large models came in 3rd, 4th, and 2nd in the quality tests, all close behind Speechmatics’ Enhanced model. In my preliminary tests, transcription speed varied a lot based on both model selected and on the capacity of the computer running the software.
- The Tiny and Base, models were very fast, taking between 7 % and 20% of the length of the file being processed.
- The Small model took between 30% and 60% of the file’s length for processing.
- The Medium model took between 75% and 90% of the media’s time.
- The three Large models varied wildly in how long they took to create a transcript, varying from “merely” 150% the media’s length to a whopping 15 times the media’s length. Processing times using these models were inconsistent even when processing the same media file with the same model on the same computer.
Because Faster Whisper runs locally on your computer, it runs much more slowly than the server-based options and is greatly influenced by the hardware you have. I tested using two older Windows computer (4 to 6 years old) and two newer Macs (3 year old M1 and 1 year old M2). The older Windows computers consistently performed better than the newer Apple hardware, sometimes a lot better. Only one time score on the Mac beat any of the Windows scores. The M2 Mac performed notably better than the M1 Mac.
Supported Languages
The three tools provided by Transana offer very different language options. In a few instances, they offer regional variations within a language.
Language | Faster Whisper | Speechmatics |
---|---|---|
Afrikaans | Yes | No |
Albanian | Yes | No |
Amharic | Yes | No |
Arabic | Yes | Yes |
Armenian | Yes | No |
Assamese | Yes | No |
Aserbaijani | Yes | No |
Bashkir | Yes | Yes |
Basque | Yes | Yes |
Belarusian | Yes | Yes |
Bengali | Yes | No |
Bosnian | Yes | No |
Breton | Yes | No |
Bulgarian | Yes | Yes |
Burmese | Yes | No |
Catalan | Yes | Yes |
Chinese | Yes | Cantonese, Simplified Mandarin, Traditional Mandarin |
Croatian | Yes | Yes |
Czech | Yes | Yes |
Danish | Yes | Yes |
Dutch | Yes | Yes |
English | Yes | Australian, British, US |
Esperanto | No | Yes |
Estonian | Yes | Yes |
Faroese | Yes | No |
Finnish | Yes | Yes |
Flemish | No | No |
French | Yes | Yes |
Galician | Yes | Yes |
Georgian | Yes | No |
German | Yes | Yes |
Greek | Yes | Yes |
Gujarati | Yes | No |
Haitian | Yes | No |
Hausa | Yes | No |
Hawaiian | Yes | No |
Hebrew | Yes | No |
Hindi | Yes | Yes |
Hungarian | Yes | Yes |
Icelandic | Yes | No |
Indonesian | Yes | Yes |
Interlingua | No | Yes |
Italian | Yes | Yes |
Japanese | Yes | Yes |
Javanese | Yes | No |
Kannada | Yes | No |
Kazakh | Yes | No |
Central Khmer | Yes | No |
Korean | Yes | Yes |
Latin | Yes | No |
Latvian | Yes | Yes |
Lao | Yes | No |
Lingala | Yes | No |
Lithuanian | Yes | Yes |
Luxembourgish | Yes | No |
Macedonian | Yes | No |
Malagasy | Yes | No |
Malay | Yes | Yes |
Malayalam | Yes | No |
Maltese | Yes | No |
Maori | Yes | No |
Marathi | Yes | Yes |
Mongolian | Yes | Yes |
Nepali | Yes | No |
Norwegian | Bokmal, Nynorsk | Bokmal |
Occitan | Yes | No |
Pashto | Yes | No |
Persian | Yes | No |
Polish | Yes | Yes |
Portuguese | Yes | Yes |
Punjabi | Yes | No |
Romanian | Yes | Yes |
Russian | Yes | Yes |
Sanskrit | Yes | No |
Serbian | Yes | No |
Shona | Yes | No |
Sindhi | Yes | No |
Sinhala | Yes | No |
Slovak | Yes | Yes |
Slovenian | Yes | Yes |
Somali | Yes | No |
Spanish | Yes | Yes |
Sundanese | Yes | No |
Swahili | Yes | No |
Swedish | Yes | Yes |
Tagalog | Yes | No |
Tajik | Yes | No |
Tamil | Yes | Yes |
Tatar | Yes | No |
Telugu | Yes | No |
Thai | Yes | Yes |
Tibetan | Yes | No |
Turkish | Yes | Yes |
Turkmen | Yes | No |
Ukranian | Yes | Yes |
Urdu | Yes | No |
Uyghur | No | Yes |
Uzbek | Yes | No |
Vietnamese | Yes | Yes |
Welsh | Yes | Yes |
Yiddish | Yes | No |
Yoruba | Yes | No |
This list is subject to change, but is theoretically accurate as of Transana 5.23.
Translation Capacity
At this time (Transana 5.10), Faster Whisper offers the capacity to provide a translation to English from any of the non-English languages it supports. Please note that you will want to check the accuracy of all automated translations before using them in analysis.