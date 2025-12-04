MUCH OF THE art of medicine involves working out, through detailed questioning and physical examination, which disease a given patient has contracted. Far harder, but no less desirable, would be identifying which diseases a patient might develop in the future. This is what the team behind a new artificial-intelligence (AI) model, details of which were published in Nature on September 17th, claims to do.

Though the model, named Delphi-2M, is not yet ready for deployment in hospitals, its creators hope it could one day allow doctors to predict if their patients are likely to get one of more than 1,000 different conditions, including Alzheimer’s disease, cancer and heart attacks, which all affect many millions every year. In addition to helping flag patients who are at high risk, it might also help health authorities allocate budgets for disease areas that may need extra funds in the future.

The model was developed by teams at the European Molecular Biology Laboratory (EMBL) in Cambridge and the German Cancer Research Centre in Heidelberg. It takes inspiration from large language models (LLMs)—such as GPT-5, which powers ChatGPT—that are capable of producing fluent prose. LLMs are trained to spot patterns in enormous amounts of text scraped from the internet, which allows them to select the word most likely to come next in any given sentence. Delphi-2M’s creators reasoned that an AI model fed on large amounts of human-health data could have similar predictive power.

In many respects, the design of established LLMs was well-suited to the task. One major tweak that was needed, however, was to teach such a model to account for the time that had passed between events in a patient’s life. In written text, consecutive words immediately follow one another; the same is not true for diagnoses in a patient’s history. High blood pressure following a positive pregnancy test, for example, requires different interpretations depending on whether the two are separated by weeks—in which cases the pregnancy can be affected—or years.

This adjustment was performed by swapping out the part of an LLM that encodes a word’s position for one encoding a person’s age. (It wasn’t without mishaps: in an early version of the model new diagnoses were sometimes predicted after a person had died.)

Delphi-2M was then trained on data from 400,000 people from UK Biobank, a database that contains arguably the world’s most complete human biological data set. The model was given the timing and sequence of ICD-10 codes, the international medical shorthand doctors use to register officially recognised diagnoses, representing the 1,256 different diseases that appeared in the Biobank data set. The model was subsequently validated on data from the remaining 100,000 people in the Biobank before being tested further on Danish health records, which are famously long-running and thorough. In this case, the team used data from 1.9m Danes going back to 1978, ensuring a much more diverse and representative sample than the UK Biobank could provide.

To judge the model’s performance, researchers measured its AUC (short for “area under the curve", referencing a region in a probability chart), in which a value of 1 would mean perfect predictions and 0.5 would be no better than random. For predictions of diagnoses within five years of a previous one, on average Delphi-2M performed at a value of 0.76 on British data, with a small drop to 0.67 for the Danish data. Events that would often follow a specific previous one—death following sepsis, say—were correctly predicted more often, whereas those caused by more random, external factors, such as picking up a virus, were harder to predict. Unsurprisingly, the model’s accuracy also dropped a little over time: when forecasting ten years into the future, it scored 0.7 on average.

Real-world applications remain far off for now. Delphi-2M will first need to go through a much more rigorous trial period giving clinicians the opportunity to explore if it leads to better outcomes for their patients. That process could take many years. The Delphi-2M team is also working on updating the model to enable it to take in more sophisticated data than chronological lists of diagnoses. As the UK Biobank also contains medical images and genome sequences, adding this data to the model might further improve its accuracy.

As impressive as Delphi-2M appears, it is not the only artificial health forecaster in town. For instance, an AI model called Foresight, originally developed at King’s College London in 2024, also uses patients’ medical histories to predict future health events. (A larger version of the project was paused in June following concerns that NHS England had not sought the proper approvals when it gave the Foresight team access to the data.) The ETHOS model being developed at Harvard University also has similar aims.

Although patients will have to wait to feel the direct benefits of Delphi-2M, even the preliminary version of the model already offers a potential treasure trove for biologists. Its style of prediction reveals which conditions cluster together, which may in turn suggest previously unexplored relationships between diseases. Future, beefier AI models, could take that work even further. The possibilities are exciting, says Ewan Birney, a geneticist at EMBL. “I’m like a kid in a candy shop."

