AUTOMATED TRANSCRIPTION

Overview

Automated Transcription – Overview

Transana offers several choices for automated transcription. As of Transana 5.10, these options include Deepgram, Faster Whisper, and Speechmatics. There are several factors to consider in choosing which option is best for you.

Regardless of which option(s) you choose, you should check the accuracy of all automated transcripts and translations before using them for analysis. Automated transcripts can be inaccurate, and sometimes suggest the opposite of what was actually said. While automated transcription can save time over fully-manual transcription, there are times when it gets things wrong.

Data Security

The first issue to consider is that of data security and confidentiality.

Transana supports two types of Automated Transcription tools, Embedded and Server-based.

Embedded tools run entirely on your computer; your data remains on your computer. You can use automated transcription without any risk of that process making your data available to others. While this is an issue most server-based tools take seriously, Institutional Review Boards (IRBs) tend to be extremely cautious with data custody and the security risks that can be introduced with the transmission and storage of data during and after automated transcription processing on a server-based system. Embedded systems avoid those issues entirely. The more sensitive your human-subjects data is, the stronger the argument for using an embedded system.

Server-based tools require that the audio portion of your data be sent to a remote server for processing. Automated transcription companies take great pains to protect customer data, but researchers need to read about the data security steps taken and receive IRB or other ethical committee approval before submitting sensitive research data for server-based automated transcription. As general rule, server-based systems are able to devote more powerful computer processing resources to the transcription process and tend to produce automated transcript more quickly than embedded systems are able to.

When working with video data, Transana separates the audio track and only submits audio to server-based automated transcription services. While this eliminates the possibility of visual identification of participants, auditory identification would still be possible if the data were compromised.

Within Transana, Faster Whisper is an embedded tool, while Deepgram and Speechmatics are server-based tools.

It should be noted that both of the server-based systems Transana supports require payment for their services. Faster Whisper is provided as part of Transana and requires no additional payment for use.

Transcription Speed and Accuracy

Automated transcription tools offer multiple options. These options usually represent different balances between speed and accuracy. This is a complex topic that will be addressed only briefly here. For more detail on the issue of accuracy, please see this Transana Blog post. It is important to note that accuracy is highly dependent on the characteristics of the data being transcribed.

Deepgram is far and away the fastest option offered by Transana. Deepgram was able to process a 1.5 hour long movie file in less than half a minute, which is amazingly fast. However, this speed comes at the cost of accuracy, with Deepgram’s three quality options (called Base, Enhanced, and Nova) capturing the 12th, 11th, and 8th quality ratings respectively in the 12 comparisons I made for the blog post.

Speechmatics performed solidly, with transcription typically taking less than half the length of the media file in my tests. The Standard model was ranked 5th of 12 for accuracy, and the Enhanced model was the most accurate of all the models I tested.

Faster Whisper presents a more complicated picture. Faster Whisper offers seven models to choose from, called Tiny, Base, Small, Medium, Large, Large-v1, and Large-v2. The three Large models came in 3rd, 4th, and 2nd in the quality tests, all close behind Speechmatics’ Enhanced model. In my preliminary tests, transcription speed varied a lot based on both model selected and on the capacity of the computer running the software.

  • The Tiny and Base, models were very fast, taking between 7 % and 20% of the length of the file being processed.
  • The Small model took between 30% and 60% of the file’s length for processing.
  • The Medium model took between 75% and 90% of the media’s time.
  • The three Large models varied wildly in how long they took to create a transcript, varying from “merely” 150% the media’s length to a whopping 15 times the media’s length. Processing times using these models were inconsistent even when processing the same media file with the same model on the same computer.

Because Faster Whisper runs locally on your computer, it runs much more slowly than the server-based options and is greatly influenced by the hardware you have. I tested using two older Windows computer (4 to 6 years old) and two newer Macs (3 year old M1 and 1 year old M2). The older Windows computers consistently performed better than the newer Apple hardware, sometimes a lot better. Only one time score on the Mac beat any of the Windows scores. The M2 Mac performed notably better than the M1 Mac.

Supported Languages

The three tools provided by Transana offer very different language options. In a few instances, they offer regional variations within a language.

Language Faster Whisper Speechmatics Deepgram
Afrikaans Yes No No
Albanian Yes No No
Amharic Yes No No
Arabic Yes Yes No
Armenian Yes No No
Assamese Yes No No
Aserbaijani Yes No No
Bashkir Yes Yes No
Basque Yes Yes No
Belarusian Yes Yes No
Bengali Yes No No
Bosnian Yes No No
Breton Yes No No
Bulgarian Yes Yes No
Burmese Yes No No
Catalan Yes Yes No
Chinese Yes Cantonese,
Simplified Mandarin,
Traditional Mandarin
Base: China (zh-CN), Taiwan (zh-TW), China (zh)
Croatian Yes Yes No
Czech Yes Yes No
Danish Yes Yes Base, Enhanced
Dutch Yes Yes Base, Enhanced
English Yes Australian, British, US Base: Australia, India, New Zealand, UK, US
Enhanced: English, US
Nova: Australia, India, New Zealand, UK, US
Esperanto No Yes No
Estonian Yes Yes No
Faroese Yes No No
Finnish Yes Yes No
Flemish No No Base, Enhanced
French Yes Yes Base: French, Canadian
Enhanced
Galician Yes Yes No
Georgian Yes No No
German Yes Yes Base, Enhanced
Greek Yes Yes No
Gujarati Yes No No
Haitian Yes No No
Hausa Yes No No
Hawaiian Yes No No
Hebrew Yes No No
Hindi Yes Yes Base:Hindi, Roman Script
Enhanced
Hungarian Yes Yes No
Icelandic Yes No No
Indonesian Yes Yes Base
Interlingua No Yes No
Italian Yes Yes Base, Enhanced
Japanese Yes Yes Base, Enhanced
Javanese Yes No No
Kannada Yes No No
Kazakh Yes No No
Central Khmer Yes No No
Korean Yes Yes Base, Enhanced
Latin Yes No No
Latvian Yes Yes No
Lao Yes No No
Lingala Yes No No
Lithuanian Yes Yes No
Luxembourgish Yes No No
Macedonian Yes No No
Malagasy Yes No No
Malay Yes Yes No
Malayalam Yes No No
Maltese Yes No No
Maori Yes No No
Marathi Yes Yes No
Mongolian Yes Yes No
Nepali Yes No No
Norwegian Bokmal, Nynorsk Bokmal Base, Enhanced
Occitan Yes No No
Pashto Yes No No
Persian Yes No No
Polish Yes Yes Base, Enhanced
Portuguese Yes Yes Base: Brazil, Portugal
Enhanced: Brazil, Portugal
Punjabi Yes No No
Romanian Yes Yes No
Russian Yes Yes Yes
Sanskrit Yes No No
Serbian Yes No No
Shona Yes No No
Sindhi Yes No No
Sinhala Yes No No
Slovak Yes Yes No
Slovenian Yes Yes No
Somali Yes No No
Spanish Yes Yes Base: Spanish, Latin America
Enhanced: Spanish, Latin America
Nova: Spanish, Latin America
Sundanese Yes No No
Swahili Yes No No
Swedish Yes Yes Base, Enhanced
Tagalog Yes No No
Tajik Yes No No
Tamil Yes Yes Enhanced
Tatar Yes No No
Telugu Yes No No
Thai Yes Yes No
Tibetan Yes No No
Turkish Yes Yes Base
Turkmen Yes No No
Ukranian Yes Yes Base
Urdu Yes No No
Uyghur No Yes No
Uzbek Yes No No
Vietnamese Yes Yes No
Welsh Yes Yes No
Yiddish Yes No No
Yoruba Yes No No

This list is subject to change, but is theoretically accurate as of Transana 5.10.

Translation Capacity

At this time (Transana 5.10), Faster Whisper offers the capacity to provide a translation to English from any of the non-English languages it supports. Please note that you will want to check the accuracy of all automated translations before using them in analysis.