Using AI in Transana

tl;dr

This page starts with a comparison of External vs. Embedded AI options. It describes essential information about embedded models.

Next, the page lays out how I evaluated the numerous AI models available in Transana. I go into a lot of detail in this section so you can understand exactly how I reached my conclusions. Lessons Learned describes those conclusions.

The section called Having AI Evaluate and Rank AI provides rankings of the AI models available in Transana for exploring Text and for exploring Images. If you are here looking for guidance about what AI models to use, read this section.

But if you read nothing else, please read and re-read the last section, called The Last Word. Understanding that section is vital to all researchers who want to use AI as part of qualitative data analysis.

Introduction

To use AI well in Transana, researchers need to make several important decisions. This article highlights two major choices.

The choice between using external AI tools and embedded (internal) AI tools is fairly straight-forward, and this article lays out the advantages and disadvantages of each option.

The choice of which AI model to use when exploring research data is more challenging. While external AI tools generally offer limited choices and limited flexibility, embedded AI offers an overwhelming set of choices. The bulk of this article is dedicated to laying out these choices, describing the process I used narrow down the options, and sharing the results of my extensive testing of AI models..

External AI and Embedded AI

The first choice a researcher must make when using AI in Transana is between external AI and embedded (or internal) AI.

External AI uses an outside service such as ChatGPT, Claude, Gemini, or Co-pilot for AI queries. Currently, Transana supports ChatGPT and Claude for external exploration of data.

Embedded AI uses an AI service on a computer controlled by the researcher. Currently, Transana supports the Ollama system, which offers hundreds of individual AI models to choose from.

Each of these systems has advantages and disadvantages:

External AI	Embedded AI
Advantages: Quality. Claude and ChatGPT 5 often offered excellent quality responses during my AI prompt tests. (See below.) It was less prone to making false statements than some of the other models I tested. Speed: Fast. Data is processed on powerful commercial servers. Less Complex. External services offer fewer models to choose from. Transana offers ChatGPT and Claude at this time.	Advantages: Private, Secure. When properly configured, data is securely processed on a computer controlled by the researcher. No data is sent to external servers. More options. Embedded AI gives users many more options, including over a hundred AI models, to choose from. Price: Free. Embedded AI exploration is free in Transana
Disadvantages: Requires an account. External AI providers require user accounts. They want a credit card, and track usage. Price: Not free. External AI services charge for using their products. Less secure. Sending data to an external server presents a degree of risk to data confidentiality. Some services retain user data for their own purposes or for legal reasons. Check with your IRB before using an external AI service.	Disadvantages: Requires setup. The researcher must set up their own AI server. This is fairly easy. but requires installing an additional program. (See the Ollama Setup Instructions.) Researchers must also download the model(s) they want to use. Less consistent quality. It takes some effort to identify models that produces top quality output. Some models are produce results with errors and hallucinations. (See below.) Speed: Slow. Because data is processed on a less powerful computer, it takes longer for AI to formulate responses to prompts.
Implementation: Transana supports OpenAI’s ChatGPT and Anthropic’s Claude AI tools to provide external AI exploration.	Implementation: Transana uses the Ollama system to provide embedded AI exploration.

Privacy and Confidentiality

Data privacy, security, and confidentiality are central issues for most research projects. When a researcher uses an external AI tool, they send their data to a server that is not under their control. There are a wide variety of company policies and legal issues that influence what happens to the data once is received by the external computer. It is imperative that researchers understand the privacy and confidentiality policies of the companies they work with for external AI exploration of their data. Researchers should never submit data for external AI exploration without the explicit approval of their Institutional Review Board or other ethics board overseeing their research.

With embedded AI processing in Transana, all AI processing is handled by the Ollama server select by the researcher. This Ollama Server may reside on their own computer, or they may configure Transana to use an Ollama server on a different computer under their control. When properly configured, the Ollama server does not share or retain any data. Researchers should only connect to Ollama servers on their own computer or that is controlled by someone they trust, such as their department, university, or organization IT department. Choosing a server they control is how they ensure the privacy and confidentiality of their data during the AI exploration phase of your analysis.

Selecting Models

For external AI, Transana supports ChatGPT and Claude AI options. These tools recommend a small number of their newest models. If other external options such as Co-Pilot, Gemini, and others, were available, the evaluation and selection of external model would take a bit more work. Thus, external model selections appear simpler when compared to embedded model selection.

For Embedded AI, there are many models so choose from, models that are generally not so well known, making this choice less obvious. The remainder of this article is devoted to the topic of selecting AI models for best results.

Embedded AI Models and Model Parameters

AI Models are algorithms that determine how an AI works with data. They are trained on (typically very large) data sets and are designed to handle certain types of tasks and achieve certain types of goals. Transana supports the external ChatGPT AI service from OpenAI and the external Claude AI service from Anthropic. It also supports an embedded AI tool called Ollama to manage the download, selection, and use of pre-defined AI models to allow the embedded exploration of qualitative data.

ChatGPT and Claude offer limited options for users, effectively hiding a lot of complexity from end users. Ollama presents a more complex AI landscape, requiring more background knowledge of the researcher.

Models

ChatGPT offers a small handful of models, mostly different versions of the same set of algorithms. Recent options (as of this writing) include gpt-5, gpt-5.1,gpt-5.2, and gpt-5.4.

Claude offers Claude-Haiku, Claude-Opus, and Claude-Sonnet models with differing levels of functionality and sophistication at different price points.

Ollama offers a large catalog of models. See the Models page on Ollama’s web site for more information.

Parameters

Ollama AI models are built with a characteristic called parameters, which represents internal variables used by the AI model to map input data to outputs, influencing the model’s ability to see patterns in data. To over-simplify, the higher the value of “parameters” a model is built with, the more sophisticated a response the model should be able to generate. However, the number of parameters also affects AI processing factors such as memory requirements and processing speed.

Some Ollama models support several different parameter options, and we have determined that the parameters value is important in how models work within Transana. Models are always presented as “(model name):(parameters)” pairs, for example, “gemma3:12b” for the gemma3 model with the 12 billion parameter setting. We strongly recommend that Transana users avoid the use of “cloud” and “turbo” parameters, as these require external processing within Ollama and may compromise data confidentiality.

You can explore the full list of models available with Ollama with their parameter options through the “Models” section of the Ollama website.

Testing AI Models

(You can skip the detailed discussion of testing supported AI models by clicking here.)

Ollama offers a huge number of models. I tested as many of these models as I could in an effort to determine which models did a good job at exploring qualitative data within Transana. The next several sections of this article describe what I found. Please note that this is an ongoing process, and your results may vary.

AI Exploration of Text

For testing the analysis of text (including transcripts), I used a transcript of the movie “12 Angry Men” as my initial testing data because it was long enough (90 minutes) and complex enough (12 major speakers) to represent a potential challenge to AI, and because it is non-confidential data that others can obtain if they want to explore, replicate, or extend my test results.

While this does not represent typical qualitative research data, the narrative of the movie provides information that can be analyzed qualitatively and leads to clear conclusions, making between-model comparisons of AI results easier than if actual qualitative data had been used.

I ran the following prompt using each of about 145 Ollama models, 3 Claude models, and 4 ChatGPT models.

This is a transcript of a jury deliberation. Describe each juror in a separate paragraph, including juror number, name if known, occupation, and personality.

After a little experimentation, I settled on a context size of 32K, large enough to hold the 20K+ tokens of data I was exploring and a reasonable-sized response. I initially used a Temperature setting of 0.8, but have switched to a setting of 0.3 recently in an effort to reduce randomness and increase AI response consistency. I ran each test on multiple computers. Individual test queries could take anywhere from a few seconds to over 24 hours; any test taking longer than 24 hours was deemed a failure. I made the decision to skip some tests on slower computers that seemed unlikely to finish within this time frame for the sake of efficiency and my sanity.

A summary of the result of this testing are presented in expandable tables below. The first table presents the the model and parameter combinations that produced “good” results on at least one computer. The second table is a more detailed description of the computers used for testing.

Models for Exploring Text (Click to expand)

Windows Models (1)

Model Name:	Size:	Rank
claude-haiku-4-5-20251001	–	6
claude-opus-4-6	–	1
claude-sonnet-4-6	–	2
devstral-small-2:24b	14.1 GB	7
devstral:24b	13.3 GB
falcon3:10b	5.9 GB
falcon3:7b	4.3 GB
gemma3:12b	7.6 GB
gemma4:26b	16.8 GB	8
gemma4:31b	18.5 GB	5
gemma4:e2b	6.7 GB
gemma4:e4b	8.9 GB	19
gpt-5	–	14
gpt-5.1	–	10
gpt-5.2	–	12
gpt-5.4	–	4
gpt-oss:20b	12.8 GB
granite3.2:8b	4.6 GB
magistral:24b	13.3 GB	15
marco-o1:7b	4.4 GB
ministral-3:14b	8.5 GB
ministral-3:3b	2.8 GB
ministral-3:8b	5.6 GB	3
mistral-small3.2:24b	14.1 GB	11
mixtral:8x7b	24.6 GB	20
olmo-3.1:32b	18.1 GB
olmo-3:32b	18.1 GB	17
phi4-reasoning:14b	10.4 GB
qwen2.5:7b	4.4 GB
qwen3-vl:30b	18.2 GB	13
qwen3-vl:8b	5.7 GB
qwen3.5:27b	16.2 GB
qwen3:14b	8.6 GB	18
qwen3:4b	2.3 GB
qwen3:8b	4.9 GB
qwq:32b	18.5 GB	9

macOS Models (2)

Model Name:	Size:	Rank
claude-haiku-4-5-20251001	–	3
claude-opus-4-6	–	1
claude-sonnet-4-6	–	2
deepseek-r1:14b	8.4 GB
devstral:24b	13.3 GB
falcon3:10b	5.9 GB
gemma3:12b	7.6 GB	5
gemma4:e2b	6.7 GB	14
gemma4:e4b	8.9 GB	9
gpt-5	–	10
gpt-5.1	–	17
gpt-5.2	–	20
gpt-5.4	–	19
granite3.1-dense:8b	4.6 GB
granite3.3:8b	4.6 GB
llama3.2-vision:11b	7.3 GB
marco-o1:7b	4.4 GB
ministral-3:3b	2.8 GB
ministral-3:8b	5.6 GB
mistral-small3.2:24b	14.1 GB	18
mixtral:8x7b	24.6 GB
nemotron-cascade-2:30b	22.6 GB
qwen3:4b	2.3 GB
qwen3:8b	4.9 GB	7

(1) The model ranked 16 does not work well on Windows.

(2) Models ranked 4, 6, 8, 11, 12, 13, 15, and 16 do not work well my 8GB macOS computers.

Additional Details About Testing Computers (Click to expand)

Our testing has been conducted on four computers.

macOS testing was done on two computers.

An M2-bases Mac Mini with 8 GB of RAM. All Ollama models were stored on an external hard drive.
An M1-based Macbook Pro with 8 GB of RAM.

Windows testing was conducted using two computers.

An older desktop with 64 GB of RAM and a 6 GB Nvidia graphics card
A newer laptop with 32 GB of RAM and an 8 GB Nvidia graphics card

Both computer are running Windows 11.

(If you are interested in helping help with additional testing, please let me know through the Contact Form.)

There are several important points to keep in mind when reviewing this.

New models are coming out all the time. This is a snapshot of a moving target.
I used a very broad prompt. In actual analysis, revising the prompt is an important step in AI exploration, and it is likely that changing the prompt will affect the AI output in both expected and unexpected ways.
This is one example of a prompt and data. The data is from a movie, so may not represent real-world research data in important ways.
Both macOS computers used for testing had 8 GB of RAM. It’s possible that more RAM or newer processors would allow more models to run successfully.

AI Exploration of Images

For testing still image analysis by AI, I used a photograph I took a few years ago while traveling. The image included several distinct elements that could be included in the analysis. I submitted the following simple prompt:

Describe the following image:

Simply prompting for a description of the image revealed something very interesting in the image tests. Some models described an image in response to this prompt that was clearly not the image I submitted. This description of a different image could sometimes be quite detailed. Thus, for images, this description prompt ends up revealing models that are not able to process images the way Transana submits them but that do not inform the researcher of this failure.

I settled on a context size of 48K for the image. For this test, I set the Temperature to 0.3, as I wanted to get more consistent results. I ran each test on multiple computers. As with the text tests, I stopped tests that ran over 24 hours, considering them a failure, and I made the decision to skip some tests that seemed unlikely to succeed within the 24 hour time frame for the sake of efficiency and my sanity.

The results of this testing are presented in an expandable table below. This table presents the the model and parameter combinations that produced “good” results on at least one computer.

Models for Exploring Images (Click to expand)

Windows Models

Model Name:	Size:	Rank:
bakllava:7b	4.4 GB
claude-haiku-4-5-20251001	–	3
claude-opus-4-6	–	1
claude-sonnet-4-6	–	2
devstral-small-2:24b	14.1 GB
gemma3:12b	7.6 GB	5
gemma3:4b	3.1 GB
gemma4:26b	16.8 GB	8
gemma4:31b	18.5 GB	6
gemma4:e2b	6.7 GB	14
gemma4:e4b	8.9 GB	9
gpt-5	–	10
gpt-5.1	–	17
gpt-5.2	–	20
gpt-5.4	–	19
granite3.2-vision:2b	2.3 GB
llama3.2-vision:11b	7.3 GB
llava-llama3:8b	5.2 GB
llava-phi3:3.8b	2.7 GB
llava:13b	7.5 GB
llava:7b	4.4 GB
minicpm-v:8b	5.1 GB
ministral-3:14b	8.5 GB	15
ministral-3:3b	2.8 GB
ministral-3:8b	5.6 GB
mistral-small3.2:24b	14.1 GB	18
qwen2.5vl:3b	3 GB
qwen2.5vl:7b	5.6 GB
qwen3-vl:2b	1.8 GB
qwen3-vl:30b	18.2 GB	4
qwen3-vl:4b	3.1 GB	16
qwen3-vl:8b	5.7 GB	7
qwen3.5:27b	16.2 GB	12
qwen3.5:2b	2.6 GB
qwen3.5:4b	3.2 GB	13
qwen3.5:9b	6.1 GB	11

macOS Models

Model Name:	Size:	Rank:
bakllava:7b	4.4 GB
claude-haiku-4-5-20251001	–	3
claude-opus-4-6	–	1
claude-sonnet-4-6	–	2
devstral-small-2:24b	14.1 GB
gemma3:12b	7.6 GB	5
gemma3:4b	3.1 GB
gemma4:31b	18.5 GB	6
gemma4:26b	16.8 GB	8
gemma4:e2b	6.7 GB	14
gemma4:e4b	8.9 GB	9
gpt-5	–	10
gpt-5.1	–	17
gpt-5.2	–	20
gpt-5.4	–	19
granite3.2-vision:2b	2.3 GB
llama3.2-vision:11b	7.3 GB
llava-llama3:8b	5.2 GB
llava-phi3:3.8b	2.7 GB
llava:13b	7.5 GB
llava:7b	4.4 GB
minicpm-v:8b	5.1 GB
ministral-3:14b	8.5 GB	15
ministral-3:3b	2.8 GB
ministral-3:8b	5.6 GB
mistral-small3.2:24b	14.1 GB	18
qwen3-vl:2b	1.8 GB
qwen3-vl:30b	18.2 GB	4
qwen3-vl:4b	3.1 GB	16
qwen3-vl:8b	5.7 GB	7
qwen3.5:27b	16.2 GB	12
qwen3.5:2b	2.6 GB
qwen3.5:4b	3.2 GB	13
qwen3.5:9b	6.1 GB	11

Models that failed

The models listed below failed to produce adequate results for either text and image exploration.

Models That Failed AI Exploration (Click to expand)

all-minilm:33m
aya-expanse:8b
aya:35b
aya:8b
bespoke-minicheck:7b
bge-large:335m
bge-m3:567m
cogito:14b
cogito:3b
cogito:8b
command-a:111b
command-r7b-arabic:7b
command-r7b:7b
command-r:35b
deepscalar-r:1.5b
deepseek-llm:7b
deepseek-v3:671b
deepseek-r1:7b
deepseek-v2.5:236b
deepseek-v2:16b
deepseek-v2:236b
dolphin-llama3:8b
dolphin-mistral:7b
dophin3:8b
embeddinggemma:300m
everythinglm:13b

exaone-deep:7.8b
exaone3.5:7.8b
falcon3:3b
firefunction-v2:70b
gemma2:9b
gemma3:1b
gemma3n:e2b
gemma3n:e4b
gemma:7b
glm4:9b
granite-embedding:278m
granite3-guardian:8b
granite3.1-dense:2b
granite3.1-moe:3b
granite3.2:2b
granite3.3:2b
granite4:1b
granite4:3b
hermes3:3b
hermes3:8b
internlm2:7b
lfm2.5-thinking:1.2b
lfm2:24b
llama2-uncensored:7b
llama2:13b
llama3.1:8b

llama3.2:3b
llama3.3:70b
llama3:8b
llama3:8b-instruct-q2_K
llama4:16x17b
mistral-large:123b
mistral-nemo:12b
mistral-small:22b
mistral:7b
mistrallite:7b
nemotron-3-nano:30b
nemotron-3-nano:4b
nemotron-mini:4b
nemotron:70b
nuextract:3.8b
olmo-3:7b
olmo2:13b
olmo2:7b
openthinker:7b
orca2:13b
orca2:7b
phi3.5:3.8b
phi3:14b
phi3:3.8b
phi4-mini-reasoning:3.8b
phi4-mini:3.8b

phi4:14b
qwen2.5:14b
qwen2.5:3b
qwen2:0.5b
qwen2:7b
qwen3-embedding:8b
qwen:14b
r1-1776:70b
reflection:70b
rnj-1:8b
sailor2:8b
smallthinker:3b
smollm2:1.7b
snowflake-arctic-embed:335m
snowflake-arctic-embed2:568m
solar-pro:22b
stable-beluga:13b
stablelm2:1.6b
stablelm2:12b
starling-lm:7b
tinyllama:1b
tulu3:70b
tulu3:8b
wizardlm2:7b
yarn-llama2:13b
zephyr:7b

Summary and Lessons Learned

I tested 150 combinations of models and model parameters for Ollama, Claude, and ChatGPT.

Using 2 Windows computers, I tested all text analysis and image analysis for all 150 models for a total of 600 tests. (2 computers x 2 data types x 150 models = 600 tests.)
Tests on macOS tool a LOT longer than tests on Windows.
Using 2 macOS computers, I ran 149 tests for text and 143 tests for images, for a total of 292 tests.
Both Windows computers engaged GPUs on NVidia graphics cards, and both macOS computers utilized the GPUs in their Apple processors for AI processing.
(Total test run time was over 179 hours for Windows for 600 tests and over 632 hours on macOS for fewer than 300 tests.)

AI Model matters.
- A significant percentage of Ollama models failed to produce reasonable results to these test queries. Of the 150 Ollama models tested, 41 produced “good” results on at least one test computer in our text tests and 36 produced at least one “good” result in our image tests.
- External models fared better in this respect. All Claude and ChatGPT models tested produced good results for both text and image data.
- When using the same prompt on the same data with different models, AI results can differ significantly. This is not surprising. However, with some prompts, even the same model will produce different results when run repeatedly.
- This suggests a challenging environment for qualitative researchers with confidential data. The task of picking a model or set of models for embedded AI data exploration can be a bit complicated.
Hardware matters.
- Because I had only a few computers to test with, I can’t sort out all the factors. Your computer will likely differ from mine, so your results will differ from mine. The following is speculation.
- As a generalization, the more memory (RAM) a computer had, the more models ran successfully, and the better the quality was of those responses.
- I still haven’t figured out why some tests failed sporadically, especially on the 32 GB Windows computer.
- My Windows computers ran tests a lot faster than my macOS computers.
- Newer Apple processors (M3, M4, and M5) might perform better. I don’t have a way to test this at this time.
- The Windows computers also had more RAM than the Macs, which is likely a confounding variable here. Both Macs had only 8 GB of RAM, and, due to the infinite wisdom of Apple, neither can be is upgraded.

This is, of course, all part of a rapidly changing landscape. Different models have different designs and capabilities. New Ollama models come out frequently. New chips are announced regularly. I can speed up AI processing on my slowest Mac by linking it to the Ollama server on my fastest Windows computer. This page only scratches the surface.

Having AI Evaluate and Rank AI

The task of evaluating and ranking the AI results produced by all of these tests, as described above, proved quite difficult and time-consuming. It is a task that I have not had adequate time to complete as of this time of this writing. Then it occurred to me that I could ask AI to handle this task.

Text

I created Quotes of the juror descriptions from the “Who Are the Jurors?” AI Summaries of transcripts described above. I then explored the resulting Collection using the following query:

The following are descriptions of the jurors in the movie “12 Angry Men.” These descriptions were created by different AI models. Rank all descriptions from best to worst and explain your rankings. Point out any factual errors in each summary.

I started with model ministral-3:8b, which has historically produced good results for me, and recorded the results from that model. I repeated the query with with the most highly rated of models in each of the summaries produced. I continued with this process until I found a consensus of the “top” ranked models. These rankings are available in the following expandable table

Models Ranked (by AI) for Description of Text (Click to expand)

Rank	Model	Mentions	Points
1	claude-opus-4-6	5	198
2	claude-sonnet-4-6	5	192
3	ministral-3:8b	5	169
4	gpt-5.4	4	153
5	gemma4:31b	4	138
6	claude-haiku-4-5	4	135
7	devstral-small-2:24b	4	128
8	gemma4:26b	4	126
9	qwq:32b	4	122
10	gpt-5.1	3	120
11	mistral-small3.2:24b	4	119
12	gpt-5.2	3	115
13	qwen3-vl:30b	4	114
14	gpt-5	3	113
15	magistral:24b	4	112
16	granite3.3:8b	5	111
17	olmo-3:32b	4	107
18	qwen3:14b	4	103
19	gemma4:e4b	4	101
20	mixtral:8x7b	4	83

For non-confidential data, claude-opus is ranked best with claude-sonnet coming in a close second. gpt-5.4 came in 4th. However, these models are likely not suitable for use with confidential or sensitive data.

For confidential data, Ollama’s ministral-3:8b model was ranked at number 3, followed by gemma4:31b (#5), devstral-small-2:24b (7), gemma4:26b (8), and qwq:32 (9). Computers with only 8 GB of RAM may struggle or run slowly with large models, and might look at ministral-3:8b (3), granite3.3:8b (16, Mac only), gemma4:e4b (19), and ministral-3:3b (21) that require less overall memory.

Images

I also explored the still image descriptions, using the top text models listed above. (That is, I used the best image models to generate text descriptions of the images, which were then evaluated using the best text models.) I used the following prompt:

The following are descriptions of a photo of Taormina, Italy created by different AI models. Which 5 models do the best job? Please justify your response with quotes from the different descriptions.

The results are available in the expandable table below:

Models Ranked (by AI) for Description of Still Image (Click to expand)

Rank	Model	Mentions	Points
1	claude-opus-4-6	5	179
2	claude-sonnet-4-6	5	176
3	claude-haiku-4-5	5	170
4	qwen3-vl:30b	5	157
5	gemma3:12b	5	153
6	gemma4:31b	5	150
7	qwen3-vl:8b	5	138
8	gemma4:26b	5	128
9	gemma4:e4b	5	123
10	gpt-5	5	118
11	qwen3.5:9b	5	114
12	qwen3.5:27b	5	110
13	qwen3.5:4b	5	106
14	gemma4:e2b	5	105
15	ministral-3:14b	5	105
16	qwen3-vl:4b	4	101
17	gpt-5.1	5	99
18	mistral-small3.2:24b	4	93
19	gpt-5.4	4	87
20	gpt-5.2	4	80

For non-confidential images, the claude models, opus, sonnet, and haiku, were all ranked quite well, ranked as the top three models by all AI models that assessed image summaries. gpt-5 was the best model from OpenAI, coming in at number 10, significantly higher than several newer gpt models.

For confidential images, qwen3-vl:30b (4), gemma3:12b (5), gemma4:31b (6), and qwen3-vl:8b (7) were highly ranked. For computers with 8 GB of RAM, gemma3:12b (5), qwen3-vl:8b (7), and gemma4:e4b (9) are worth consideration and have lower memory requirements.

The Last Word

I want to emphasize one last point as part of this discussion. For both text and for images, I asked several models to rank the AI results I had generated. Each model did so, assertively and confidently. And while consensus emerged across the models, these models disagreed far more than they agreed. No single model agreed with this consensus, and no two models agreed with each other.

At one point, I accidentally ran one of the image tests twice. The selections and rankings of the two identical test runs were quite different, even though the Temperature setting used (0.3) should have limited the amount of randomness coming out of the AI model. So the AI models don’t even agree with themselves!

This reinforces in my mind that all AI results must be reviewed and checked against the data by a researcher before being reported as a research finding, and that all use of AI in qualitative analysis must be carefully described in qualitative write-ups and presentations. AI comments on qualitative data may be interesting, and they may sometimes suggest useful ideas. It’s vital to recognize that there is no actual, real “intelligence” behind AI, even when the clever application of high-level mathematics makes it appear so.

Large language models do not, cannot, and will not “understand” anything at all. They are not emotionally intelligent or smart in any meaningful or recognizably human sense of the word. LLMs are impressive probability gadgets that have been fed nearly the entire internet, and produce writing not by thinking but by making statistically informed guesses about which lexical item is likely to follow another.

What Happens When People Don’t Understand How AI Works by Tyler Austin Harper, The Atlantic, June, 2025.