Artificial intelligence (AI) for science and medicine makes headlines. We see stories about AI that spits out science misinformation and an imagined AI-driven utopia with few to no medical errors. AI looks like the stuff of both dreams and nightmares. How do we make sense of it all?
One recent attempt to grapple with these possibilities is S. Scott Graham’s The Doctor and the Algorithm: Promise, Peril, and the Future of Health AI. Graham brings together scholarship in critical algorithm studies, bioethics research on AI, and rhetoric. The book covers the same conceptual ground as Eric Topol, Ruja Benjamin and others, but focuses on the “hype” surrounding health AI. His goal is to show how hype shapes everything from scientific publications to public-facing discourse. He works to separate the wheat of actual promise from the chaff of excessive exaggeration while warning against how health AI might exacerbate racism and other inequalities.
The Black Box Problem
The first chapter introduces readers to the construction of AI. Graham offers a specific example of AI that diagnose cancer to explain the data curation, feature management/optimization, modeling, and benchmarking that produce any AI system. He uses examples of health AI throughout the book to illustrate the technical and conceptual issues examined. This chapter also introduces Graham’s take on Latour’s black box. Creating a black box and making a technology easily used and portable is necessary, but at the same time, peering inside the black box is necessary to avoid the ways technology reinforce social injustices.
Medical Futurism
Next, Graham turns to medical futurism, “the overly expansive promises about AI’s future and what AI can do for your future” (p. 42). The chapter addresses prognosis and AI that try to predict hospital transfers to intensive care units and the AI-mediated creation of personalized predictive health advice. The chapter is organized around the conceit of the Oracle of Delphi and her three maxims to “Know Thyself,” “Nothing to Excess,” and “Surety Brings Ruin.” Medical AI, according to Graham, poses as a Delphic oracle, but the “unflinching belief in the tech fix has and will continue to lead to all manner of harms from health inequity to possibly even patient death” (61). Yet, Graham does not reject prognostic AI completely. Rather, he challenges developers to avoid “the worst excesses of misplace surety” (61).
Integrated Marketing Communication
The need to avoid misplaced confidence requires grappling with how AI are discussed and promoted, which is the topic of the next chapter. The goal of any technology is widespread adoption, but that goal can lead to excessive or misleading hype. Graham examines health AI’s integrated marketing and communication strategies and how “medical science, education, and marketing are often deeply interpenetrated” (72). Graham makes the implications of “integrated marcomm” clear, using examples of business-driven innovation and open science innovation. In business-driven innovation, “the scientific reporting and medical education are timed perfectly to correspond to regulatory approval, and the end result is that innovation dissemination and adoption are essentially contemporaneous” (78). Open science innovation provides an “an interesting counterexample” that avoids the problems Graham identifies in the business-as-usual approach (79).
Groundtruthing
Given the integration of hype and promotion into so much of health AI development, chapter 4 returns the reader to the processes of AI development, described in the first chapter, for further engagement. He returns to the issue of benchmarking or “groundtruthing” for a closer look. Groundtruthing is the “term of art for the research practices involved in cultivating a data set against which a prototype AI will be measured” (p. 88). The chapter describes the different ways that groundtruth can be created, statistically assessed, and the ways it can fail to support claims for health AI’s accuracy.
Here, Graham’s book makes its most innovative move. He produced AI to assess the discourse and practices around health AI. He first creates an AI simulation of the relationship between groundtruth and diagnosis. The simulation indicates “that many AI development teams are significantly overstating the accuracy of the systems they design” (107). Graham concludes with a call to replace existing “gold standards” for development with a “platinum standard” using multiple data sets to improve benchmarking and health AI’s accuracy.
Hype and Hedging
In chapter 5, Graham returns to the combination of scientific publications, regulatory documents, and marketing materials that make up health AI’s “integrated marcomm.” Given the shortcomings of gold standard health AI, Graham evaluates claims about AI performance and accuracy. He begins by noting that hype exists across all types of health AI discourse, not just marketing-focused materials. The first part of the chapter illustrates the existence of hype in science, regulation, and marketing. The chapter’s second part introduces Graham’s AI creations that assessed hype in scientific abstracts. HypeDX examined claims about speed, accuracy, and value, and HedgeDX looked at qualifiers on these claims. Graham used his platinum standard to develop them. The AI found that hype appears less often than we might fear, but metaanalyses of AI “may be more likely to overstate the benefits” (134).
Ethics and Regulation
The last two chapters address AI ethics and regulation and how they apply to health AI. Graham describes an “ethical AI” and “just AI” approach. Ethical AI emphasizes transparency. Just AI emphasizes the precautionary principle and anti-racist values. The remainder of the ethics chapter highlights the challenges of creating ethical and just AI in healthcare. It uses examples of ableism and racism in medical care and closes with recommendations for health AI drawn from ethical AI, just AI, and clinical bioethics. The regulatory chapter turns to various European and American proposals for AI regulation to identify common recommendations. It then brings those recommendations into conversation with the issues for ethical AI and just AI to consider how regulatory proposals might be improved.
Conclusion
The Doctor and the Algorithm is ambitious in scope. It successfully engages a broad range of issues presenting them so that non-experts can follow the argument, while providing sufficient nuance and citation for experts in each area. Readers see the social and technical systems that make the black box of an “AI” possible, as well as seeing the many problems and possibilities that the black box hides. The most innovative element in the book is the use of AI to facilitate the analysis in the fourth and fifth chapters. Because it employs the platinum standard Graham proposes (details are in the chapters and a set of helpful appendices), readers could recreate these tools to assess Graham’s research themselves. (For example, I think Graham’s take on hedges and qualifiers might be too cynical, but I can take the tools he developed and tinker with them to perform my own analysis.) Readers in bioethics will obviously find the last two chapters provide the most familiar ground, but they will appreciate how the entire book addresses the ethics of health AI and the communication strategies used to promote it.
John Lynch, PhD is a professor of communication at the University of Cincinnati.