Erasmus Mundus Joint Master - ChEMoinformatics+ : AlphaFold2 : A brilliant Project

by: Sophiya DYMURA Track « In Silico Drug Design », Strasbourg-Milan-Paris, 2022

Protein structure determination is an important step at the beginning of long drug discovery pathway. Although Protein Data Bank contains a big amount of evaluated 3D structures, still a huge quantity of potential target molecules remains not visualized. Despite the major achievements in this field, the demand for fast and high-quality protein models has not been satisfied yet.

During the previous year of studying in the university I discovered for myself a brilliant project in the field of structural bioinformatics named AlphaFold2. This approach is a result of grounded work of several dozens of people from DeepMind and scientists from the European Bioinformatics Institute. As a result, two nearly simultaneous publications were presented to the scientific world. First one discusses the architecture, efficiency and accuracy of the protein structure predicting tool[1]. Other one describes in details the application of the tool on human proteome[2].

AlphaFold2 may perform the prediction with or without template. Template is a protein with known structure which has high identity coefficient with a protein with unknown structure. After gathering the data from open sources like PDB and multiple alignments in case of template absence and pair alignment in case of presence, the prepared set goes through a graph neural network. At the first stage, or so-called Evoformer, the information about the relationship between amino acids is attentively processed by the GNN. The second part turns vectors into actual 3D models of protein sequence[1].

Workflow of the AlphaFold2 model taken from ref. [1]

As a result, we obtain a structure which can fill the lack of information in the field of structural biology. It is crucial to have such data for drug development and this project is a big step in solving the so-called protein folding problem. Authors calculated the structures of 98.5% of human proteome. At the moment of publication only one third of sequences had an experimentally established structure. Using such proteins, the authors demonstrated incredible results of their work[2].

AlphaFold2 was widely discussed by scientists. As every model, it has limitations, also mentioned by the authors. For example, structures of proteins which are involved into more complex supramolecular ensembles cannot be predicted by this tool[2]. Also if the protein function is expressed by conformation changes, so it is highly important to determine active and inactive structures, AlphaFold2 will not be helpful. Moreover, the application of predicted structures should be used carefully, because the modelling of binding sites is not as accurate. In 10% of cases it might be not predicted at all. The last drawback corresponds to the usage complexity, the training data for model occupies 1 terabyte of space.

After all, AlphaFold2 combined with an experimental approach like CryoEM is an extremely powerful tool[3]. For example, it indicates incorrect experimental data. However, application of any in silico methods requires practical validation.

References :
1. Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). https://doi.org/10.1038/s41586-021-03819-2
2. Tunyasuvunakool, K., Adler, J., Wu, Z. et al. Highly accurate protein structure prediction for the human proteome. Nature 596, 590–596 (2021). https://doi.org/10.1038/s41586-021-03828-1
3. Corey F. Hryc, Matthew L. Baker. AlphaFold2 and CryoEM : Revisiting CryoEM modeling in near-atomic resolution density maps. iScience 25, 7 (2022).
https://doi.org/10.1016/j.isci.2022.104496