Speech recognition software is becoming more common everywhere. Siri, Google Now, Cortana, and Alexa can field questions on our smart phones, computers, and home “virtual assistant” devices. These tools do a reasonably good job recognizing and interpreting human speech without a lot of setup or training in a growing number of common languages.
Speech recognition technology has not yet crossed the threshold into the world of automated transcription. Automated transcription is considerably more complex than single sentence questions. It is rarely clearly enunciated or recorded, it often involves multiple speakers, and adds the elements of complex sentence structure, pauses, and punctuation. 95% accuracy is reasonably good over a six to ten word phrase that we can easily repeat if misunderstood but is not yet attainable over a lengthy interview or group conversation containing thousands or tens of thousands of words.
I received the following support question this week:
I am thinking about on the asynchronous communication between the users of Transana MU
the chat windows of Transana if only for synchronous communications.
How did you deal with this problem in your past practices?
Here’s my response, which of course can be generalized to other qualitative software that can be used collaboratively. Some of the ideas even apply to individual researchers not working collaboratively.
Transana offers two overlapping systems for indicating analytic meaning in text and media data, Categorization and Coding. In Transana, you code by creating a system of Keywords and applying these keywords to selections of text called Quotes or segments of media data called Clips. You categorize data selections by creating Quotes and Clips in containers called Collections.
Both categorization and coding serve the same purpose, to link analytic meaning to segments of data, but they involve somewhat different metaphors and procedures within the software. There are several reasons behind the evolution of the two systems for attaching analytic meaning to segments of data in Transana.
I have wine
my name is Gonzales
and Mrs. Chandler sorry because everyone Safari
today my guests hara-hara-hara
I am you say how about you
And goofy and carefully deduce yourself
Hi I’m gonna terrorize some facilities
and with the famous Fiona solid water
so here are she here must I learn
drops a lot of clothes and still flows
We get a fair number of requests to incorporate automated transcripts as a feature of Transana. We understand that transcription can be a time-consuming and potentially expensive part of doing qualitative research with audio or video data. Many people want to skip what seems like a tedious typing task and jump right to coding and analysis!
There are many instances when collecting data where there will be more going on at one time than we could possibly observe. A classroom can have one or more teachers providing a variety of learning activities simultaneously to a significant number of students. An expert can perform complex tasks with incredible subtlety and finesse. A “simple” interaction between two people may occur on multiple levels, verbal and non-verbal, which can be challenging to track and note all at once. A reseacher may want to see both the facial expression of the speaker in a focus group and the visceral reaction of all other members of the group at the same time.
When we know there will be more going on than we can observe and preserve at once, one solution is to create a plan to collect video as a representation of the reality we will observe. While video certainly has limitations, (for example, where you point the camera matters a lot,) it also has the advantages of permanence, allowing, for example, repeated viewing and easy sharing with colleagues for consultation. There are times when recording data with more than one camera at a time can provide exponentially more information that we can explore and interpret afterwards.