Cognitive Load: The Missing Measure in Evaluating AI Tools

Satyam Ghodasara, MD

May 20, 2026

While I was reading images on call last Tuesday, the large vessel occlusion (LVO) detection algorithm flagged a study on the worklist. I spent an extra five minutes looking for what it saw. I never found it. I moved on with a nagging uncertainty, wondering whether I had missed an occlusion or whether the algorithm had overcalled one. But this is not a story about a bad tool. This is the story about why the study disappeared from my worklist but not from my mind.

What I felt at the workstation has a name: cognitive load. Cognitive load determines whether a tool improves or worsens the workday and is not always captured by more quantifiable metrics like accuracy and speed. It is not directly observable, and it depends on the case mix, the radiologist’s experience, the workstation, the time of day, and the rest of the worklist.

Cognitive Load in Clinical Practice

The LVO tool is a useful place to start. At my institution, the algorithm flags a study without showing me what it saw — no slice number, no vessel territory, and no overlay due to current regulatory constraints, despite the vendor being able to localize the LVO. The algorithm labels the study as “abnormal”, but it’s my job to figure out why. When I can’t, I am left unsatisfied and drained because there are only two possibilities: either I missed something I should’ve caught or the algorithm wasted my time. In the former case, I worry I’ve hurt a patient and doubt my own skills. In the latter, the algorithm is assigning me extra work during an already busy shift. I can’t convince myself of either possibility, so I feel the weight of both.

The inverse case (of a tool that lessens cognitive load) is just as revealing. A generative tool that drafts the impression section of reports may or may not make a radiologist meaningfully faster according to the existing literature. However, summarizing a study, anticipating the referrer’s questions, and precisely framing uncertainty into a helpful impression are tasks that seem to carry the most cognitive load in my experience. Impression generation tools absorb some of the load. They help ensure I haven’t forgotten to mention a critical finding or inadvertently included a dictation error that changes what I mean. These tools may not necessarily make me faster, but I know I feel less depleted at five o’clock.

The tools that have become comfortably integrated into my workflow are the ones that subtract cognitive load, sometimes without improving efficiency. The tools that I find myself resistant to are the ones that add cognitive load even if they move raw efficiency metrics in the right direction. Cognitive load is a tax, and many modern radiology AI tools inadvertently pay it because our procurement frameworks were not designed to measure it.

Measuring the Cognitive Tax

None of this means accuracy metrics are dispensable, or that subjective experience should replace diagnostic performance or efficiency. A tool that is wrong but has an excellent user experience is still a bad tool. Rather, measuring efficiency without human-factors measurement fails to capture the actual value of a tool that inherently depends on a human using it.

There is encouraging movement here. Multi-society guidance already recognizes that AI evaluation should extend beyond clinical accuracy and efficiency, and frameworks such as DECIDE-AI ask us to report human factors, learning curves, and errors in use. Adjacent fields are further along; for example, ambient clinical scribes are showing that objective time savings may be modest while the burden of documentation experienced by clinicians changes dramatically. Imperfect but validated instruments exist, including NASA-TLX, eye-tracking, and longitudinal use data, but they require deliberate study design and a willingness to treat human factors as a primary endpoint rather than an afterthought. It’s easy to count seconds saved per study, so we count seconds.

Cognitive load is hard to measure, but it is not unmeasurable. To assess new AI tools, we can start with a simple question: six months after deployment, do you still want to use it?

Satyam Ghodasara, MD (@_Satyam_) is associate staff of neuroradiology in the Diagnostics Institute at the Cleveland Clinic Foundation and serves on the Radiology: Artificial Intelligence Trainee Editorial Board. His research applies informatics and machine learning to optimize clinical workflows while supporting safe, practical deployment of AI in routine care and operational innovations that enhance patient-centered radiology services.

Cognitive Load: The Missing Measure in Evaluating AI Tools

Satyam Ghodasara, MD

Ready for more?