The New Normal
by Charles E. Kahn, Jr, MD, MS, Editor, Radiology: Artificial Intelligence
Back when I was a medical student, I was puzzled by the idea of “normal values.” Why was the threshold for total bilirubin set at 1.2 mg/dL? Why is someone with a T-Bili of 1.2 called normal, while someone with 1.3 is abnormal?
I dutifully learned about sensitivity and specificity, but it wasn’t until radiology residency that the concepts of receiver operating characteristic (ROC) analysis made it clearer.
ROC Analysis
To understand ROC analysis, imagine that you’re in a radar tower along the coast of England during World War II. You’re using a new-fangled technology that sends out radio waves and uses their echoes to tell if there’s an airplane flying across the English Channel and how far away it is. You carefully watch the glowing phosphorescent screen for blips that might signal an attack of enemy bombers so you can scramble the RAF to intercept them. The problem, of course, is that a flock of birds might cause echoes, too.
As you turn up the gain on your radar scope, lots and lots of small blips appear on the screen. Yes, you’ll spot the bombers, but now you’re picking up every pelican. Turn down the gain, and you reduce the clutter, but you run the risk of missing an enemy aircraft. Those gain settings represent the “operating points” that we plot as an ROC curve of false-positive fraction (1 – specificity) versus true-positive fraction (sensitivity). Thus, setting the “normal value” of a test is like tuning the gain: it’s a trade-off between sensitivity and specificity.
Opportunities of Modern AI
One of the as-yet-unsung opportunities of modern AI is to help us establish a new idea of “normal.”
In their recent Radiology: Artificial Intelligence article, Gaonkar and colleagues provide a superb example: they use a machine-learning approach to measure the cross-sectional area of the neural foramina of the lumbar spine in asymptomatic subjects (1). From their data, they were able to construct nomograms to show the distributions of foraminal areas by patient age, sex, and height.
So why is that important? First, no one would measure these values manually on even one patient, let alone on 1156 patients! But AI tools automate the process. Now, we have more than simple threshold values: we can state the percentile for a given cross-sectional area, tailored to the specifics of the patient’s age, sex, and height.
For example, although a radiologist might note that a 55-year-old woman’s left L4-L5 neural foramen appears to be narrowed, the data might indicate that its area is at the 45th percentile for her age and height. Slightly less than average, but well within the “1 sigma” (1 standard deviation) that we would consider normal. The same narrowing in a 30-year-old man might be at the 5th percentile for his age and height – which would indicate a clear abnormality. Imagine if we could provide such information to help physicians and patients understand better just how “abnormal” their findings are!
McCullough and his colleagues showed how the inclusion of epidemiologic information into radiology reports can affect physicians’ thinking about the abnormalities reported on imaging studies (2). Primary care providers who received information about the frequency of lumbar spine degenerative changes in asymptomatic patients, for example, were less likely to prescribe narcotics on the basis of the imaging findings alone.
Conclusion
Our understanding of “normal” impacts the way we tailor our treatments.
AI tools can make all sorts of measurements that we never would obtain manually: brain volume, subcutaneous and intra-abdominal fat volumes, truncal muscle mass, airway diameters, spleen volume, and more. Some factors already have been linked to known diseases and genetic risk factors.
The field of radiomics seeks to quantify various features from radiologic studies to help us paint a picture of our “imaging phenotype” (3). We may never include all of these measurements in our radiology reports, but by storing the values as part of the examination, we gather data that will help us better understand what’s normal and what’s not.
References
Gaonkar B, Beckett J, Villaroman D, Ahn C, Edwards M, Moran S, et al. Quantitative analysis of neural foramina in the lumbar spine: an imaging informatics and machine learning study. Radiol: Artif Intell 2019; 1(2):e180037. Published 6 March 2019. https://pubs.rsna.org/doi/10.1148/ryai.2019180037
McCullough BJ, Johnson GR, Martin BI, Jarvik JG. Lumbar MR imaging and reporting epidemiology: do epidemiologic data in reports affect clinical management? Radiology 2012; 262(3):941-946. https://pubs.rsna.org/doi/10.1148/radiol.11110618
Gillies RJ, Kinahan PE, Hricak H. Radiomics: images are more than pictures, they are data. Radiology 2016; 278(2):563-577. https://pubs.rsna.org/doi/10.1148/radiol.2015151169
Charles E. Kahn, Jr, MD, MS is professor and vice chair of radiology at the University of Pennsylvania, and editor of Radiology: Artificial Intelligence.
Follow him on Twitter: @cekahn, @Radiology_AI



