Detailed study compares AI vs doctors and it's closer than what doctors may like

A recent study took a close look at how generative artificial intelligence (AI) performs in diagnosing medical conditions compared to physicians. It was conducted by a research group led by Dr. Hirotaka Takita and Associate Professor Daiju Ueda at Osaka Metropolitan University’s Graduate School of Medicine.

This systematic review and meta-analysis went through a huge number of studies, 18,371 in total, and narrowed it down to 83 for detailed analysis. The findings shed light on both the strengths and weaknesses of AI in healthcare.

The research covered different generative AI models, like GPT-4, Llama3 70B, Gemini 1.5 Pro, and Claude 3 Sonnet, across various medical fields. GPT-4 was the most studied. Overall, the diagnostic accuracy of these AI models averaged 52.1% (95% CI: 47.0–57.1%). Some models were about as accurate as non-expert physicians, with no major statistical difference (accuracy difference: 0.6% [95% CI: −14.5% to 15.7%], p=0.93). However, expert physicians still outperformed AI, with a significant accuracy gap of 15.8% (95% CI: 4.4%–27.1%, p=0.007)], although with all the advancements, that may only be a matter of time.

The study also found that AI performed similarly across most medical specialties, with a couple of exceptions: dermatology and urology. AI showed stronger results in dermatology, likely because the field involves recognizing patterns, something AI is particularly good at. But since dermatology also requires complex reasoning and patient-specific decision-making, the results don’t tell the whole story. For urology, the findings were based on a single large study, which makes it harder to apply the results more broadly.

“This research shows that generative AI’s diagnostic capabilities are comparable to non-specialist doctors. It could be used in medical education to support non-specialist doctors and assist in diagnostics in areas with limited medical resources,” Dr. Takita added. “Further research, such as evaluations in more complex clinical scenarios, performance evaluations using actual medical records, improving the transparency of AI decision-making, and verification in diverse patient groups, is needed to verify AI’s capabilities.”

Beyond diagnosis, the study highlighted the potential for using AI in medical education. According to the researchers, "the comparable performance of current generative AI models to physicians in non-expert settings reveals an opportunity for integrating AI into medical training." AI could be used to simulate real-life cases, helping medical students and trainees learn and assess their skills.

However, there are concerns about transparency and bias in these models. Many AI systems don’t share details about their training data, which raises questions about whether their results can be applied to all populations. Researchers pointed out that "transparency ensures an understanding of the model’s knowledge, context, and limitations" and stressed the need for clear, ethical, and thoroughly validated AI applications.

For now, generative AI, while promising, tends to struggle with complex cases where detailed patient information is involved. Should doctors begin to worry about losing their jobs? It is difficult to say at this point, but as far as diagnostics go, that"s certainly a possibility.

Source: Osaka Metropolitan University, Nature | Image via Depositphotos

This article was generated with some help from AI and reviewed by an editor.

Tags