ChatGPT outperformed doctors again? Pitfalls of reporting AI progress

Media headlines suggest that artificial intelligence diagnoses patients more accurately than doctors, AI is more empathetic than humans, AI has better bed manners, and AI even outperforms medical students on exams. But how much of this is true?

AI can be an assistant if you know how to prompt

A recent study by UVA Health’s Andrew S. Parsons, MD, MPH, and colleagues examined AI’s disease diagnosis capabilities. 50 doctors specializing in family, internal, and emergency medicine were divided into two groups. One group had access to the premium version of ChatGPT (ChatGPT Plus), while the other relied solely on their medical expertise and “classical” tools like Google or clinical decision support systems. Both were asked to find a diagnosis for complex clinical cases. Additionally, ChatGPT was asked to diagnose the cases independently.

The results showed that the group without ChatGPT achieved 74% diagnostic accuracy, while the group using ChatGPT scored slightly higher at 76%. Remarkably, ChatGPT alone achieved 90% accuracy when analyzing cases independently.

These findings puzzled researchers. Why did the doctors’ performance barely improve with ChatGPT? Many speculations started to pop up: Is the authority bias—the tendency to rely on one’s expertise over external expertise—a reason? Why are doctors not willing to use AI? Do doctors distrust AI? Well, the study's lead author proposed another reason: many doctors were unfamiliar with how to use ChatGPT effectively, particularly when crafting accurate prompts.

Dr. Parsons noted that doctors cooperating with ChatGPT didn’t know they could copy-paste the symptoms and a patient’s medical record into ChatGPT for a comprehensive analysis. Such overlooked capabilities suggest a need for greater education on integrating AI into clinical workflows.

Despite this, the study highlights AI's potential. ChatGPT’s high precision in evaluating clinical cases independently underscores its value, particularly in diagnosing rare or complex conditions. While doctors, following their expertise and experience, are good enough at identifying diseases they frequently encounter, they can struggle with recognizing the thousands of rare conditions they may seldom see in their clinical practice. AI could bridge this gap, provided healthcare professionals are trained to use these tools safely and effectively.

Clinical work is much more than analyzing data

While AI may provide empathetic, comprehensive responses in controlled studies, it faces challenges in real-world clinical settings. Doctors juggle numerous tasks, including interpreting data from medical records, assessing a patient’s social and financial circumstances, predicting treatment outcomes, managing risks, and addressing the emotional aspects of care. These human interactions—motivating, supporting, and instilling hope—remain beyond AI's capabilities.

Even studies showing AI’s superior performance in medical exams, such as the United States Medical Licensing Examination (USMLE), must be interpreted cautiously. In another study, GPT-4 scored an impressive 95.54% on USMLE tests conducted from 2021 to 2023, compared to the average student score of 72.15%. However, passing exams is not the purpose of a physician. Their role is far more complex, blending science with human connection.

Media are responsible for inspiring trust in AI

While AI holds immense promise, it remains a developing technology that inspires both excitement and fear. Unfortunately, emotional reactions frequently overshadow objective discussions about its role in healthcare.

Sensational headlines about AI having better bedside manners than doctors may grab attention, but they divert focus from meaningful debates about its potential. Such narratives can fuel unnecessary anxiety and skepticism within the medical community. It is happening already. For example, the Standing Committee of European Doctors (CPME) has released a policy, “Deployment of artificial intelligence in healthcare,” that advocates for stricter controls before the technology is fully embraced. The document repeats old fears like “AI replacing doctors” or “doctors being forced to use AI and follow its recommendation.”

Experts agree that AI can support doctors but cannot replace them. Generative AI, like ChatGPT, is a powerful statistical tool, yet lacks the human ability to understand, observe, feel, and think creatively. Instead of fostering misleading comparisons, we should emphasize AI's role as a collaborative tool designed to augment—not compete with—medical professionals.

Is AI beneficial for medicine? We don’t know until we try

Healthcare must begin implementing AI with a clear delineation between tasks suited to humans and those better handled by machines. Other industries, such as media, provide valuable inspiration for this approach.

AI is particularly efficient in analyzing electronic medical records, prioritizing information, and tailoring communication to patient preferences—enhancing the effectiveness of prevention programs. It can also assist in diagnosing complex clinical cases by referencing current studies and medical guidelines.

Beyond diagnosis, AI can analyze patient-doctor interactions, helping create accurate electronic medical records while also prompting doctors with relevant follow-up questions, such as those concerning specific medications or diagnoses.

These capabilities do not replace the expertise of doctors; instead, they extend and enhance it, bringing the theory of personalized care into practice. The sooner the healthcare sector starts experimenting with AI, the less unjustified prejudices and the greater the potential benefits for both providers and patients.