Supercomputer enables AI model to ‘speak’ better Protein
27 Oct 2025
A computer system used to study the vast expanse of the universe has been employed to develop a machine learning model to help translate the ‘language’ of proteins.
Teams based at the University of Glasgow developed a protein language model (PLM) –a specific type of LLM/large language model that approaches protein sequences in a similar way to sentences.
Their customised PLM, known as PLM-interact, can help predict protein properties, function, and structure and outperforms rival models, claim the researchers.
They say it can predict protein interactions with between 16% and 28% more accuracy than other AI protein models
In addition, they argue, PLM-interact can predict five key protein interactions that govern essential biological functions, whereas competitors including Google DeepMind-powered AlphaFold3can predict just one.
Furthermore, the possibility that the technology could reveal fresh insights into how viruses interact with host species creates new potential to predict their pandemic potential as well as highlighting likely drug targets.
The hope is that the study, published in Nature Communications, can spur improvements in the understanding of diseases including cancer and viral infection.
Leading the work is Dr Ke Yuan from the University’s School of Cancer Sciences and the Cancer Research UK Scotland Institute, together with professors Craig Macdonald from the School of Computing Science and David L Robertson from the MRC-University of Glasgow Centre for Virus Research.
Said Robertson: “Being able to quickly and accurately gain insight into how viruses interact with our proteins could help us better understand virus emergence and disease risks, which in turn can help speed up the development of new treatments and therapies,” he added.
“Our results are a very promising contribution to developing a system capable of predicting protein interactions at an unprecedented scale and level of accuracy.”
A crucial contribution was made by the UK’s DiRAC High Performance Super Computer facility, in particular its Tursa GPU-based system for particle physics calculations. This was originally developed to help theoretical physicists simulate aspects of the workings of the universe.
Being able to quickly and accurately gain insight into how viruses interact with our proteins could help us better understand virus emergence and disease risks, which in turn can help speed up the development of new treatments and therapies
Professor David L Robertson, MRC-University of Glasgow Centre for Virus Research
Located at EPCC in Edinburgh and managed by the DiRAC facility, Tursa enabled the Glasgow teams to more speedily and effectively build the model, which involves more than 650 million individual parameters. It also helped to train PLM-interact on more than 421,000 human protein pairs.
Yuan, one of the paper’s corresponding authors, commented: “It’s great to think that DiRAC, which was developed to help scientists understand the laws of nature from the smallest subatomic particles to the largest scales in the Universe, has helped us build this new model to explore the inner space of protein interactions instead.
“Colleagues from our School of Computing Science provided support with the language modelling aspects of creating PLM-interact, but in order to train the model itself, we needed access to vast amounts of computing power.”
The scientists added that their PLM could accurately identify the impact of mutations on protein interactions, including those causing negative consequences (such as genetic diseases) and those inhibiting essential protein-protein interactions, (such as cancers).
It’s great to think that DiRAC, which was developed to help scientists understand the laws of nature from the smallest subatomic particles to the largest scales in the Universe, has helped us build this new model to explore the inner space of protein interactions instead
Dr Ke Yuan, Glasgow University School of Cancer Sciences and Cancer Research UK Scotland Institute
PLM-interact was also trained with nearly 22,400 protein-to-protein interactions, from 5,882 human and 996 virus proteins – again outperforming other protein models in predicting how human and virus proteins interacted, they claimed.
Funders included the European Union Horizon 2020 research and innovation 562 program, the Medical Research Council with support from Cancer Research UK, Prostate Cancer UK and the Biotechnology and Biological Sciences Research Council.
Pic: Brano