How AI is impacting protein structure research

Each human being has more than 20,000 proteins. For example hemoglobin, which takes care of the transport of oxygen from the lungs to the cells of the whole body, or insulin, which indicates to the body the presence of sugar in the blood.

Each protein is made up of a series of amino acids, the sequence of which determines its folding and its spatial structure – a bit like if a word were folded up in space according to the sequences of letters of which it is composed. This sequence and this folding (or structure) of the protein determine its biological function: their study is the field of “structural biology”. It is based on various complementary experimental methods, which have enabled considerable progress in our understanding of the living world in recent decades, and in particular enables the design of new drugs.

Since the 1970s, attempts have been made to know the structures of proteins based solely on the knowledge of the amino acid sequence (we say “ab initio”). It was only very recently, in 2020, that this became possible almost systematically, with therise of artificial intelligence and in particular of AlphaFoldan AI system developed by a company owned by Google.

Read more:
Artificial intelligence to the challenge of protein design: prowess and limits of AlphaFold

Faced with these advances in artificial intelligence, what is the role of structural biology researchers now?

To understand this, you have to know that one of the challenges of tomorrow’s biology is the “integrative biology”, which aims to understand biological processes at the molecular level in their cellular-scale contexts. Given the complexity of biological processes, a multidisciplinary approach is essential. It relies on experimental techniques, which remain essential for studying the structure of proteins, their dynamics and their interactions. Moreover, each of the experimental techniques can benefit in its own way from the theoretical predictions of AlphaFold.

The structures of three proteins of the bacterium Escherichia coli, determined by the three experimental methods explained in the article, at the Institut de Biologie Structurale de Grenoble.
Beate Bersch, IBS, from an illustration by David Goodsell, Provided by the author

X-ray crystallography

Crystallography is, to date, the technique most used in structural biology. It has made it possible to identify more than 170,000 protein structures in the “Protein Data Bank”, with more than 10,000 different foldings.

[Près de 80 000 lecteurs font confiance à la newsletter de The Conversation pour mieux comprendre les grands enjeux du monde. Abonnez-vous aujourd’hui]

To use X-ray crystallography, you have to “crystallize proteins”. It is often said that this technique is limited by the quality of protein crystals, which is lower for large proteins. But this notion does not always correspond to reality: for example, the ribosome structure, the huge molecular machine that assembles proteins, has been resolved to 2.8 Angstroms of resolution. Venkatraman Ramakrishnan, Thomas Steitz and Ada Yonath received the Nobel Prize in Chemistry in 2009 for this work.

With the recent development of X-ray free electron laser (XFEL)it has become possible to simultaneously study thousands of protein microcrystals at room temperature and on the scale of the femtosecond (10-15 seconds, or one millionth of a billionth of a second, the time scale at which chemical reactions and protein folding take place). This technique makes it possible to image proteins before they are destroyed. She’s in the middle of revolutionizing “kinetic crystallography”which allows you to see proteins “in action”, as well as the search for drugs.

So far, AlphaFold’s contribution to the study of protein structure by crystallography has focused on generating protein models accurate enough to apply the so-called “molecular replacement” technique for solving structures.

Nuclear magnetic resonance spectroscopy

Another experimental method for studying protein structure is “nuclear magnetic resonance spectroscopy”. While its medical imaging alter ego, MRI, looks at the spatial distribution of a single signal, characteristic of chemical elements in observed biological tissues, in nuclear magnetic resonance spectroscopy it is a set of signals from atoms making up the protein that is recorded (this is called the “spectrum”).

Generally, magnetic resonance structure determination is limited to proteins of modest size. Models of molecules are calculated based on structural parameters (such as interatomic distances), derived from the analysis of experimental spectra. We can imagine this as in the beginnings of cartography, where distances between reference points made it possible to draw 2D maps. To facilitate the interpretation of spectra that contain a lot of information, models obtained by prediction (rather than experimentally) can be used, as with AlphaFold.

In addition to structural determination, nuclear magnetic resonance spectroscopy brings two major advantages. On the one hand, in general, the study is carried out with a sample in aqueous solution and it is possible to observe the particularly flexible parts of the proteins, often invisible with other techniques. One can even quantify their movement in terms of amplitude and frequency, which is extremely useful because the internal dynamics of proteins are as crucial to their function as their structure.

On the other hand, nuclear magnetic resonance spectroscopy makes it easy to detect the interactions of proteins with small molecules (ligands, inhibitors) or other proteins. This makes it possible to identify the interaction sites, essential information, among other things, for the rational design of active molecules such as drugs.

These properties make nuclear magnetic resonance spectroscopy an extraordinary tool for the functional characterization of proteins in complementarity with other experimental techniques and AI.

Electron cryomicroscopy

Electron cryomicroscopy consists of freezing a hydrated sample ultra-rapidly (approximately -180°C) in a thin layer of ice, through which electrons pass. The transmitted electrons will generate an image of the sample, which after analysis, allows access to structures that can reach atomic resolution. In comparison, an optical microscope only has a resolving power of a few hundred nanometers, which corresponds to the wavelength of the light used; only a microscope using a source with sufficiently low wavelengths (such as electrons for electron microscopy) has a theoretical resolving power of the order of one angstrom. The 2017 Nobel Prize in Chemistry was awarded to Jacques Dubochet, Richard Henderson and Joachim Frank for their contributions to development of electron cryomicroscopy.

With numerous technological developments, including that of direct electron detectorssince the mid-2010s, this technique has been become essential in structural biology by initiating a “resolution revolution”. Indeed, electron cryomicroscopy now makes it possible toobtain structures with atomic resolutionas in the case of apoferritin – a protein in the small intestine that helps iron absorption – at 1.25 angstrom resolution.

Its main asset is to make it possible to determine the structure of objects of average size, beyond 50,000 Dalton (one Dalton corresponds approximately to the mass of a hydrogen atom), such ashemoglobin of 64,000 Daltons, but also of objects of a few billion Daltons (such as the mimivirus, giant virus about 0.5 micrometers).

Despite all the technological advances mentioned above, cryomicroscopy does not always make it possible to resolve at sufficiently high resolution the structure of “complexes”, made up of several proteins. It is here that AlphaFold can help and allow, in complementarity with cryomicroscopy, to describe the interactions at the atomic level between the different constituents of a complex. This complementarity gives new strength to electron cryomicroscopy for its future role in structural biology.

AlphaFold’s contributions

AlphaFold makes it possible to predict the structure of proteins solely from their sequence with the knowledge acquired by experimental structural biology. This approach is revolutionary because the sequences of many proteins are known through genome sequencing effortsbut determining their structures experimentally would require colossal human and technical resources.

Protein folding: solved by artificial intelligence AlphaFold? (Amazing science).

At present, this type of program therefore represents an additional actor of complementarity, but does not replace experimental techniques which, as we have seen, also provide additional information (dynamics, interfaces), at different scales. (from metallic sites to multiprotein complexes) and more reliable, because experimentally verified. Beyond the pure structural determination of an isolated protein, the complexity of biological systems often requires a multidisciplinary approach in order to elucidate the mechanisms and functions of these fascinating biomolecules that are proteins.

We wish to thank the author of this write-up for this remarkable content

How AI is impacting protein structure research

We have our social media profiles here and other related pages here