MIT researchers have trained an artificial intelligence model that quickly predicts how two proteins will attach

Antibodies, small proteins produced by the immune system, can attach to specific parts of a virus to neutralize it. Thus, to fight against Covid-19, laboratories have made vaccines but have also been interested in synthetic antibodies which, by binding to the advanced proteins of the virus, can prevent the virus from entering a human cell. MIT researchers have created Equidock, a machine learning model that can directly predict the complex that will form when two proteins bind. The research will be presented at the International Conference on Representations of Learning.

To develop a successful synthetic antibody, researchers must understand exactly how it will attach to proteins. The latter, with lumpy 3D structures containing many folds, can clump together in millions of combinations, so finding the right protein complex among almost countless candidates is extremely time-consuming.

Octavian-Eugen Ganea, post-doctoral fellow at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Xinyuan Huang, graduate student at ETH Zurich are the co-lead authors of this study. Regina Barzilay, Professor in the School of Engineering for AI and Health at CSAIL, and Tommi Jaakkola, Thomas Siebel Professor of Electrical Engineering at CSAIL and Fellow of the Institute for Data, Systems and Society also collaborated.

Equidock, a deep learning model

To streamline the process, the MIT researchers created a machine learning model that can directly predict the complex that will form when two proteins bind. Their technique is between 80 and 500 times faster than state-of-the-art software methods and often predicts protein structures closer to the actual structures observed experimentally.

Octavian-Eugen Ganea said:

“Deep learning is very good at capturing interactions between different proteins that are otherwise difficult for chemists or biologists to write experimentally. Some of these interactions are very complicated and people haven’t found good ways to express them. This deep learning model can learn these types of interactions from the data. »

Protein attachment

Equidock, focuses on rigid body docking, which occurs when two proteins attach while rotating or moving in 3D space, but their shapes do not compress or bend.

The model takes the 3D structures of two proteins and converts these structures into 3D graphics that can be processed by the neural network. Proteins are formed from chains of amino acids, and each of these amino acids is represented by a node in the graph.

The researchers built geometric knowledge into the model, so it understands how objects can change if they are rotated or moved in 3D space. The model also incorporates mathematical knowledge that ensures that proteins always bind in the same way, regardless of their location in 3D space, as they do in the human body.

Using this information, Equidock identifies the atoms of the two proteins most likely to interact and form chemical reactions, called binding pocket points. Then it uses these points to place the two proteins together in a complex.

Octavian-Eugen Ganea explains:

“If we can understand from the proteins which individual parts are likely to be these binding pocket points, then that will capture all the information we need to place the two proteins together. Assuming we can find these two sets of points, we can simply figure out how to rotate and translate proteins so that one set matches the other set. »

One of the biggest difficulties in building this model was the lack of training data.

Octavian-Eugen Ganea adds:

“Because there is so little experimental 3D data for proteins, it was particularly important to integrate geometric knowledge into Equidock,” explains Ganea. Without these geometric constraints, the model could detect false correlations in the data set. »

An almost immediate prediction

Once the model was trained, the researchers compared it to four software methods. Equidock is able to predict the final protein complex after only one to five seconds. All baselines took significantly longer, between 10 minutes and an hour or more.

In quality measures, which calculate how closely the predicted protein complex matches the actual protein complex, Equidock was often comparable to baselines, but it sometimes underperformed them.

Octavian-Eugen Ganea clarifies:

“We are always behind on one of the baselines. Our method can still be improved, and it can still be useful. It could be used in a very large virtual screen where we want to understand how thousands of proteins can interact and form complexes. Our method could be used to generate an initial set of candidates very quickly, and then these could be refined with some of the more accurate, but slower, traditional methods. »

In addition to using this method with traditional models, the team wants to incorporate specific atomic interactions into Equidock so it can make more accurate predictions. For example, sometimes the atoms of proteins attach through hydrophobic interactions, which involve water molecules.

This technique could help scientists better understand certain biological processes that involve protein interactions, such as DNA replication and repair, which could also speed up the process of developing new drugs.

Octavian-Eugen confirms:

“Our technique could also be applied to the development of small drug-like molecules. These molecules bind to protein surfaces in specific ways, so quickly determining how this binding occurs could shorten the drug development timeline. »

In the future, they plan to improve Equidock so that it can make predictions for flexible protein docking. The biggest obstacle is the lack of data for training, the team aims to generate synthetic data to improve the model.

We want to thank the writer of this short article for this awesome material

MIT researchers have trained an artificial intelligence model that quickly predicts how two proteins will attach

Check out our social media accounts along with other pages related to them