The conversational AI bot ChatGPT is having a moment and promises to revolutionise how we create written material, conduct web searches, and obtain knowledge.
The most recent ChatGPT success? It nearly passed US Medical Licencing Exam (USMLE) which is known to be a very challenging exam, one that typically takes 300 to 400 hours to prepare for and includes everything from fundamental scientific ideas to bioethics.
The USMLE is essentially three tests in one, and ChatGPT’s proficiency in handling its inquiries demonstrates that these AI bots may someday prove helpful for medical education and even in making specific diagnoses.
“ChatGPT performed at or near the passing threshold for all three exams without any specialized training or reinforcement,” the researchers wrote in their paper published in PLOS Digital Health.
“Additionally, ChatGPT demonstrated a high level of concordance and insight in its explanations.”
ChatGPT is a large language model, LLM, or artificial intelligence. Similar to the big brother of your phone’s predictive text feature, these LLMs are specifically designed for written responses. Using a tonne of sample text and some sophisticated algorithms, they can anticipate which words should fit together in a phrase.
That’s a bit of an oversimplification, but the point is: ChatGPT doesn’t truly “know” anything, but by studying a tonne of web content, it can create phrases that seem reasonable on just about any subject.
The essential phrase here is “plausible-sounding” though. Depending on the likelihood of different wording, the AI may appear eerily intelligent or draw the silliest conclusions.
Ansible Health startup researchers put ChatGPT to the test using USMLE sample questions after making sure the answers weren’t on Google. This way, they could be sure that ChatGPT would be coming up with fresh answers based on the data it has been taught on.
When put to the test, ChatGPT passed all three with scores ranging from 52.4% to 75%. (the pass mark is usually around 60%). The researchers defined at least one major insight as anything that was “novel, non-obvious, and clinically valid” in 88.9% of its responses.
“Reaching the passing score for this notoriously difficult expert exam, and doing so without any human reinforcement, marks a notable milestone in clinical AI maturation,” the study authors said in a press statement.
Additionally, ChatGPT showed to be remarkably constant in its responses and even had the ability to justify each response. It also outperformed PubMedGPT, a bot trained exclusively on medical literature, which had an accuracy rate of just 50.3%.