The New England Journal of Medicine is charging ahead with its coverage of the impact of artificial intelligence (AI) in medicine, publishing a special report, a review, and an editorial in this week’s issue, and announcing the creation of a journal dedicated to AI.
The special report, written by researchers from Microsoft, outlined potential medical uses of a generative AI chatbot powered by GPT-4 (generative pretrained transformer 4), including physician note taking, medical education, and participation in “curbside consults.”
GPT-4 achieved an accuracy of about 90% when given a battery of test questions from the U.S. Medical Licensing Exam (USMLE), even though it was only trained on openly available information from the internet and never received specific medical training, Peter Lee, PhD, of Microsoft Research in Redmond, Washington, and co-authors reported.
When tasked with writing a medical note from only a provider-patient interaction, GPT-4 performed well but it did misstate the patient’s body-mass index and contains no information as to how the BMI was calculated, “another example of a hallucination,” the researchers wrote.
They also provide an example of how a physician may interact with GPT-4 on a medical consult, in which it “generally provides useful responses that may help the health professional who made the query address the problem of concern.”
While GPT-4 could be a powerful tool in both clinical settings and medical research, the researchers said, they emphasized the need for careful consideration for any potential uses of GPT-4 in medicine, adding that “GPT-4 is an intelligent system that, similar to human reason, is fallible.”
“We believe that the question regarding what is considered to be acceptable performance of general AI remains to be answered,” they wrote. “Our hope is to contribute to what we believe will be an important public discussion about the role of this new type of AI, as well as to understand how our approach to health care and medicine can best evolve alongside its rapid evolution.”
New AI Journal
In an accompanying editorial, Andrew Beam, PhD, of the Harvard T.H. Chan School of Public Health in Boston, and colleagues explained that NEJM decided to publish the series about AI in medicine because of the “enormous interest” and “a rapidly increasing number of manuscript submissions” related to its uses.
The editorialists also announced that NEJM plans to launch a new journal dedicated to AI in medicine, NEJM AI, in 2024. They added that the special report series and the new journal will focus on “the reasonably established and the growing possible roles of AI and machine learning technologies in all aspects of health and health care.”
NEJM AI will address “the breadth of potential AI applications,” Beam and co-authors said, adding that there is “virtually no area in medicine and care delivery that is not already being touched by AI.”
However, the editorialists said, the new focus on AI is not solely to promote the potential of AI in medicine. “[N]ew AI methods are not necessarily a panacea; they can be brittle, they may work only in a narrow domain, and they can have built-in biases that disproportionally affect marginalized groups,” Beam and co-authors wrote.
The goal of NEJM is to provide a platform for a diverse group of authors, editors, and reviewers to discuss the potential uses and harms of AI in medicine, according to the editorial.
Trusting the AI Process
Also as part of the series, a review article by NEJM international correspondent Charlotte Haug, MD, PhD, and NEJM Group Editor Jeffrey Drazen, MD, highlighted the speed of innovation in this new technology, citing Moore’s law, the observation that the number of transistors in an integrated circuit doubles about every 2 years.
Haug and Drazen wrote that the implementation of similar technological innovation has been occurring since the 1990s and early 2000s, when “the problem of having machines successfully perform certain medical tasks that were repetitive, and therefore prone to human error, was being solved,” even with slow computers and limited memory.
“We firmly believe that the introduction of AI and machine learning in medicine has helped health professionals improve the quality of care that they can deliver and has the promise to improve it even more in the near future and beyond,” Haug and Drazen wrote. “Just as computer acquisition of radiographic images did away with the x-ray file room and lost images, AI and machine learning can transform medicine.”
Similarly, Beam and colleagues predicted AI chatbots will be commonly used by healthcare professionals and patients “with increasing frequency.” They cautioned, however, that “GPT-4 is not an end in and of itself,” but rather an “opening of a door to new possibilities as well as new risks.”
Lee and one co-author are employed by Microsoft; the other co-author is employed by Nuance Communications, a wholly owned subsidiary of Microsoft.
Beam reported no conflicts of interest; a co-author reported a financial relationship with Inovalon.
New England Journal of Medicine
Source Reference: Lee P, et al “Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine” N Engl J Med 2023; DOI: 10.1056/NEJMsr2214184.