Transana 5.0, which includes the integrated automated transcription feature described below, was released on May 22, 2023. The following post was written about 2 months prior to this release.
Automated Transcription is coming soon. It’s exciting. It’s progress. It’s also less than perfect. In this post, I will talk about a couple of relevant issues and lay out my development roadmap for automated transcription.
Automated transcription has been the dream, well one of them, ever since I started working on Transana over 20 years ago. Some of the most common support questions I get are about automated transcription. “How do I get Transana to produce the transcript for me?” “Does Transana work with my language
The tide has finally turned. Transana 5.0 will include automated transcription. For the first time, I’ve found a system for automated transcription using artificial intelligence where it takes less time to correct an automated transcript than it does to produce one from scratch.
The first issue that needs to be addressed is automation, and its first cousin, ease of use. I want a system I can integrate into Transana. Otter.ai, for example, has a very nice voice recognition tool, Unfortunately, they do not currently offer a way for other programs to integrate their tools. Speechmatics.com, on the other hand, offers tools that have allowed me to integrate automated transcription directly into Transana. With Transana 5.0, when you create a new transcript, you will be able to request automated transcription of your media data using Speechmatics. It’s as simple as can be.
The second issues that need to be addressed are confidentialty and security. There are two approaches to implementing automated transcription that have implications for confidentiality.
One approach is the server-based approach. In this approach, Transana sends the audio from your media file to a server controlled by the company providing the automated transcription service for processing and recieves the text transcription to display in Transana’s document window. This approach is favored by companies producing such technology because it allows them to set up web portals for end users, to apply a lot of computering power, and to continuously update their product as they improve their technology. The security issue is that this approach requires submitting your (probably) confidential data to an external server. Security and data handling policies vary from provider to provider. Transana 5.0 will offer a server-based approach to automated transcription. Users will likely need to consult their IRBs for approval to use this technology.
The second approach is to embed the automated transcription techology into programs like Transana so that automated transcription is processed locally on the end-users computer. From a security standpoint, this is ideal as your data never leaves your control. From a techinical standpoint, this is challenging. I am currently working with Speechmatics as they develop code that will support this approach, and plan to include the resulting embedded automated transcription technology with Transana as soon as practical.
The third issue to address is accuracy. There are two parts of this to think about. First is the accuracy of the automated transcription itself. Accuracy of voice recognition tools has increased tremendously in the past year or so, and continues to improve by leaps and bounds. However, we are still a long way from being able to request automated transcription and begin analysis immediately. Even with great quality audio recorded under studio conditions, accuracy varies. Speechmatics provides “confidence” values with each word in its automated transcription, and Transana 5.0 reflects those confidence values using color in the resulting transcript. (Please note, however, that there are places where automated transcripts with high confidence are just plain wrong, so the color coding is not a reasonable substitute for proof-reading.) Second is the accuracy of time code information. Many of the systems I’ve looked at in the past 6 months (Otter.ai, YouTube, etc.) have provided time accuracy to the nearest second, which is okay but not really good enough for Transana. I’d have to spend a lot of time re-doing time codes. Speechmatics provides much greater accuracy, allowing Transana to automatically embed time codes with near frame accuracy.
Finally, there is the issue of support for multiple languages. Thanks to the power of the of the Speechmatics engine, Transana 5.0 will offer support for automated transcription in an amazing 48 languages.
The Transana development plan is to release Transana 5.0, which supports automated transcription using Speechmatics’ server-based technology, as soon as possible. This release is currently undergoing testing, and will be published when testing is done and the subsequent translation process is complete. (Please don’t ask me when, as I don’t know yet. I’ll announce it on the mailing list when it is available.) I plan to work with Speechmatics to develop a version of Transana that supports the embedded version of their technology as soon as that becomes available, hopefully as Transana 5.1, thereby resolving the data confidentiality issue. [NOTE on May 30, 2023: I just heard from my contact at Speechmatics, who tells me that it will be at least a year before they will be able to create the embedded technology we discussed.]
Automated Transcription technology has finally reached the point where it is worth using. You will still need to spend some time going through automated transcripts to correct them and format them to fit your analytic needs, but we have finally reached the tipping point, at least in many cases, where this will take less time to create and correct automated transcripts than to manually transcribe from scratch. Sure, there is still data that will require a lot of attention and manual work, particularly when there is over-talk or poor audio quality. But Transana 5.0 will finally, finally provide a good answer to all of those support questions I’ve gotten over the past 20 years.