Word Frequencies and Word Clouds

by | Analysis, Transcription, Word Cloud, Word Frequency

For many qualitative researchers, looking at the specific word choices used by your research participants offers an important first step in exploring your data. You listen to or read your data over and over; you become immersed in it, trying to identify themes and connections. But it can also be very beneficial to simply create a list of the words they are using, count how often certain terms are being used, and examine the contexts in which those words appear. This seemingly simple exercise can allow the researcher to note themes worthy of further exploration, by allowing the language of your data to guide your interpretation.

However, sorting and counting individual words by hand or with a spreadsheet or word file can require hours of tedious work. In Transana, the Word Frequency Report allows the researcher to take text or transcribed data, and easily generate a list of words used in your Documents, Transcripts, Quotes, or Clips, along with the number of times each word is used.

The following examples represent data from questions asked in the U.S. presidential debates in 2008, 2012, and 2016.

In addition to being organized in a list, you can generate a Word Cloud to present your Word Frequency Report data in a graphical format, with the frequency of each word represented by its relative size.

To make your word count list less unwieldy, you will want to modify your word count data by removing words like articles, conjunctions, propositions and pronouns if they are not significant to your research. Transana’s Word Frequency Report allows you to exclude individual words of your choice, plus those of a particular length (one or two letter words, for instance) or that do not meet other criteria of your choosing. The examples above were modified by removing short words, titles, and words that were not analytically relevant like prepositions, certain proper names, and “this,” and “that.”

Transana implements “word groups,” allowing the researcher to easily group different word forms or synonyms together without having to edit the source data. The lack of consistent vocabulary, word usage, and even spelling during the transcription process has historically made the process of identifying important concepts through word frequencies more difficult. (Dempster et. al., 2013). For example, Dempster et. al. grouped “doctor,” “physician,” “surgeon,” “general practitioner,” and several other synonyms under the term “doctor” in their word frequency counts.

Transana also implements text search directly from within the word frequency tool, including allowing for multiple term searches to accommodate word groups. This simplifies the process of moving from an abstract understanding of the data based only on individual word usage to being able to easily explore how those words are used in context.


Dempster, P., Woods, D., and Wright, J. (2013). Using CAQDAS in the Analysis of Foundation Trust Hospitals in the National Health Service: Mustard Seed Searches as an Aid to Analytic Efficiency. Forum Qualitative Sozialforschung / Forum: Qualitative Social Research, 14(2).