AI doctor and human doctors agreed by 96% on medical examination

On June 30, an evaluation was conducted at Huaxi Hospital of Sichuan University to see if “AI doctors” match up with human doctors. On the day of the evaluation, many patients lined up at the entrance of the hospital to apply for a free consultation with an AI doctor (MedGPT), and then waited in line at the consultation room. The interview room had seven departments, including cardiology, urology, and orthopedics, and eight interview tables. Each interview table was staffed by a physician assistant, who was in charge of communicating the patient’s symptoms to MedGPT, the “AI doctor,” via text. At the same time, the human physician simultaneously obtained information about the patient through the interview system. To ensure the truthfulness of MedGPT’s evaluation results, the human doctor did not conduct a face-to-face interview with the patient.

There was 96% agreement between AI and Human Medical Examination results.

On the same day, seven experts and professors from Peking University People’s Hospital, China-Japan Friendship Hospital, Friendship Hospital, and Fugai Hospital reviewed and checked the medical records of 91 patients. They then scored the MedGPT on seven evaluation items: medical interview, diagnosis, treatment suggestions, proposed auxiliary tests, accuracy of data analysis and provision of explainable information, and natural language interview and interaction.

According to the final evaluation results, the overall score of the human physicians was 7.5, while the MedGPT score was 7.2. 96% agreement between the MedGPT and the scores of the attending physicians at was achieved.

Dr. Xue Feng, Chief of Orthopedic Surgery at Peking University People’s Hospital, said in an online live-streamed commentary the same evening: “The results of the AI doctor’s interview are generally favorable; the AI doctor’s verbal content is detailed and accurate. Human doctors have less interaction with patients during the interview and provide less information to patients. For example, AI doctors asked female patients about their menstrual periods and gestational age, whereas human plastic surgeons did not ask so much”, he said. Dr. Xue also pointed out that “the most important part of the orthopedic surgeon’s interview is the examination of the patient’s bone problems, but the AI physician’s MedGPT cannot do that, and a human is needed.

Dr. Liu Guoliang, chief of respiratory medicine at China-Japan Friendship Hospital, said, “AI physicians can assume all possibilities and triggers of morbidity, and can consider each aspect, such as drug allergies. This helps supplement and extend the knowledge structure of human physicians. However, AI physicians can easily duplicate test recommendations, and some test items may not be necessary.

An official from the organizer’s side told the press: “MedGPT is currently able to perform medical examinations for more than 3,000 common diseases. Once the first phase of testing is completed by the end of the year, the number of types of diseases that can be interviewed will increase significantly”.

AI doctor and human doctors agreed by 96% on medical examination
Scroll to top