Home / Opinion / Columns /  An AI chatbot can pass the US test for a medical licence

In the past, I have been mildly critical of what OpenAI’s ChatGPT can actually do. As I have said in this column before, a scientist friend sent a sample in which he asked ChatGPT to create a short explanation of itself. He was rewarded with what seems like a well-written essay by a 6th grade student. It was also an advertorial for the product. This second part is what is problematic about generative artificial intelligence (AI) models. They regurgitate either the filth or the hyperbole that has been fed to them.

In contrast, the more-focused cognitive models have smaller data sets that are used to train AI programs on specific use cases. For instance, a medico-radiological system would limit itself to magnetic resonance imaging (MRI) scans, X-ray snaps and other such medical images, and would likely not be training itself on poetry or music or anything with no relevance to the task at hand.

Generative models have become popular as they have blown past traditional methods of training AI programs with small datasets. The most salient feature of generative models for AI is that they scour almost every shred of information available on the web, a data store that is doubling in size every two years, and then use the results of this to train AI programs to generate output.

OpenAI, heavily backed by Microsoft, has GPT-3, which is mainly for documents. GPT-3 analysed thousands of digital books, and nearly a trillion words posted to blogs, social media and the rest of the internet. Most industry watchers expected generative AI to move on to newer models, such as a potential GPT-4 in 2023. However, it seems as if OpenAI isn’t done tinkering with their old models yet. Early in December, the San Francisco-based company released a demo of a model called ChatGPT, a spin-off of GPT-3 that is geared toward answering questions via back-and-forth dialogue. As expected, this can power up industry applications such as chatbots, widely used in customer service applications, and could serve as a different type of engine to provide contextual searches that deliver more than what traditional web searches done by Google, Duck Duck Go or Microsoft’s own Bing can.

Now, I may stand corrected in my view of generative models. There is news that ChatGPT has the ability to pass the US medical licensing examination. A US-based friend sent over a news clipping from MedPage Today (bit.ly/3XsrkbG) that seems to indicate that ChatGPT can pass the exam that will allow it to become a doctor.

Quoting directly from the article: “The first paper, published on Medrxiv (bit.ly/3Xuk27p) in December 2022, investigated ChatGPT’s performance on the USMLE without any special training or reinforcement prior to the exams. According to Victor Tseng, MD, of Ansible Health in Mountain View, California, and colleagues, the results showed ‘new and surprising evidence’ that this AI tool was up to the challenge. Tseng and team noted that ChatGPT was able to perform at >50% accuracy across all of the exams, and even achieved 60% in most of their analyses. While the USMLE passing threshold does vary between years, the authors said that passing is approximately 60% most years. ‘ChatGPT performed at or near the passing threshold for all three exams without any specialized training or reinforcement,’ they wrote, noting that the tool was able to demonstrate ‘a high level of concordance and insight in its explanations. These results suggest that large language models may have the potential to assist with medical education, and potentially, clinical decision-making,’ they concluded."

Knowing that scientific honesty is mostly a thing of the past, and that many journals are happy to publish papers without sufficient peer review, I was leery. I clicked through to Medrxiv.org and found to my surprise that it was clear that the authors’ paper had not yet been peer reviewed and was a ‘pre-print’.

Evidently, Arxiv (pronounced ‘archive’) and Medrxiv are legitimate pre-print sites. They are forums to post pre-prints to get feedback. They form a ‘soft peer review’ process before authors submit to a journal with formal peer reviews. For a formal peer review, the journal editor forwards it to other noted experts for comments before the paper is published. Posting a pre-print on Arxiv also helps authors put a stake in the ground on research and say they were first to announce something. Using Arxiv gets the author useful feedback that helps fix any issues before the more tedious and time consuming peer review process. (The X in Arxiv is the ‘chi’ sound from Greek). Also, it appears that Medrxiv is administered by Cold Spring Harbor Laboratory, BMJ and Yale, and is not a trivial forum.

Fortunately, medical exams are proctored exams and include oral exams. So the fact that a chatbot that is trained on the source data could interpret questions well enough to pass is commendable, but not surprising. Importantly, that does not in any way translate to a human being able to pass the certification process using a chatbot.

What might happen, it appears to me, is that generative models will get so good at absorbing medical information that they will iteratively ask the right questions and come up with a diagnosis, given historical information and the sum total of past medical knowledge. This can provide inputs for doctors to form their own opinions. They could, for example, use bots as sounding boards to see if they concur.

Siddharth Pai is co-founder of Siana Capital, a venture fund manager.

Catch all the Business News, Market News, Breaking News Events and Latest News Updates on Live Mint. Download The Mint News App to get Daily Market Updates.
More Less

Recommended For You

Trending Stocks

×
Get alerts on WhatsApp
Set Preferences My ReadsWatchlistFeedbackRedeem a Gift CardLogout