Chatbots often give incomplete medical information

If you are looking for information about a medical condition, googling or asking an AI chatbot is often faster than contacting a doctor these days. However, we have to consider how reliable the answers you get from Google or ChatGPT are. As far as AI chatbots are concerned, researchers at UT Southwestern Medical Center, examined how reliable, and complete, is the information given by three AI chatbots about endometriosis, a painful gynaecological condition that affects 1 in 10 women.

To determine how well popular chatbots answer questions about endometriosis, the researchers collected answers from ChatGPT-4, Claude and Gemini. To do so, they asked the chatbots10 questions that patients often ask about this disease. ‘We did this study because we wanted to know what patients learn from these chatbots. Is it accurate? Is it reliable? Does it match updated clinical recommendations and what we know from current research?’ said research leader Kimberly Kho, M.D., professor of obstetrics and gynaecology at UT Southwestern.

Chatbots answered questions about endometriosis

The chatbots provided answers to questions such as ‘What is endometriosis?’ ‘How common is endometriosis?’ and ’How is endometriosis treated? Nine qualified gynaecologists were then asked to assess the accuracy and completeness of the answers based on current guidelines and their expertise.

The medical experts concluded that the three chatbots almost always gave the right answers to questions such as what endometriosis is, how common it is and what the symptoms are. However, to questions about treatment or the risk of recurrence, answers were often incomplete and therefore sometimes incorrect. The study was published this month in Science Direct.

This incompleteness could be due to several factors, Dr Kho told us, including a lack of patient-specific context in the questions, not enough chatbot training data reflecting the latest advances in clinical practice and a lack of consensus among experts in the field. Of the three chatbots studied, ChatGPT provided the most comprehensive and correct answers.

The researchers therefore concluded that while chatbots can serve as a useful starting point for medical information, patients should still consult their doctor for questions and concerns. Medical experts should be consulted and involved in the quality control process for healthcare-specific chatbots currently under development.

Forewarned is forearmed…

Anyway, with the advent of the internet and search engines, it is increasingly easy to search for information and answers to (medical) questions on your own. Since the emergence of generative AI chatbots such as ChatGPT and Gemini, the search for information and answers seems to be getting even easier. There are several examples showing that generative AI tools like ChatGPT can add value to the medical world, from diagnostics, to automatically generating (conversation) reports and improving patient information.

However, what has been true for the internet and Google for years - beware of incorrect and/or incomplete information - certainly applies to AI chatbots as well. In the medical world, the danger of erroneous or incomplete information is even greater.

Yet the advance of generative AI hardly seems to be stopping. Studies such as those conducted by researchers at UT Southwestern Medical Center should therefore be seen as a warning to anyone turning to generative AI for medical information.

AI CHATGPT Research