In the previous article we discussed how AI is suitable for resolving the protein folding problem, and explained the method using coevolutionary data that AlphaFold is considering.
In this article, we will present in more detail how AlphaFold works. At the same time, we will discuss the improvements made for the second version, AlphaFold2, that had so much impact in the scientific community.
Although originally we wanted to directly explain AlphaFold2, it is impossible to give a detailed presentation of its methods with the limited amount of information DeepMind has revealed on their blog regarding the system.
So while we wait for the corresponding paper, the only thing we can do is to review the Nature article they released for AlphaFold early in 2020 and speculate a little about the changes in the new version. If you are familiar with coding, they also released the code for AlphaFold, which you can find here.
The two algorithms seem to be similar, or at least depend on similar input data, so we will first explain how the original AlphaFold predicts protein structures.
What AlphaFold does, very easily explained, is to predict the physical properties of a protein structure based on their 1D sequence and then use it to generate a predicted 3D structure of the protein. The entire system can be seen as a two-step process that we will explain in the following paragraphs (see Figure 1 below).
AlphaFold Step 1: Convolutional Neural Network Prediction Distogram
The central component of AlphaFold is a convolutional neural network (CNN) that is trained on Protein Data Bank (PDB) structures and features derived from Multiple Sequence Analysis (MSA) of that sequence. This component predicts two things: (a) a distance distribution matrix (distogram), which refers to the distances between pairs of amino acids, and (b) a torsion angle matrix, which measures the angles between chemical bonds that connect those amino acids.
Figure 1: AlphaFold overall process. Photo credit: Nature. Source: https://www.nature.com/articles/s41586-019-1923-7
As we have discussed previously, MSA-derived features contain the information on the correlation between the series of amino acids residues in the protein. Such information enables us to infer which residues are in contact and to predict a discrete probability distribution for every pair of amino acids in a protein.
So far, it does not make much difference from previous approaches using deep learning to solve the protein folding problem. However, the convolutional neural network used in AlphaFold not only predicts where the contact is, but also finds the probability distributions over distances between the pairs of amino acids in protein structure.
When you compare other deep learning-based approaches for the protein folding problem, the prediction of the neural network generated by such approaches only provides predictions for contact between the amino acids residues (see Figure 2 below).
Figure 2: Example of statistical analysis of contact pattern prediction by DESTINI. Image Credit: Nature Source: https://www.nature.com/articles/s41598-019-40314-1
On the other hand, DeepMind claimed that the distogram used in AlphaFold provides much richer information, such as the distance prediction between the amino acids and the probability distribution for the predicted distance.
As you can see in Figure 3, the distogram provides relatively precise predictions on the distance between amino acids that have possible contacts (with the value of the predicted distance under 8 Å, painted in green) compared with the actual distance (the red line) in the protein structure. Thus, distogram also works pretty well as the contact map used in the previous approach.
Note that contacts are typically defined to occur when ꞵ-carbon atoms of two residues are within 8 Å of one another. In the figure below, 8 Å is drawn as a line in black.
However, it also contains quality information regarding predicted distances between the amino acids residues with distances predicted at over 8 Å, thus it provides more information in the second step (especially in computing energy potential of the predicted structure) than other deep learning methods that use only the contact map.
Figure 3: Example of the predicted probability distributions for distances of one residue (residue 29) to all other residues. Photo credit: Nature. Source: https://www.nature.com/articles/s41586-019-1923-7
In step 1, the neural network also predicts the probability distributions of the torsion angles of the amino acids. With the result of predictions made by convolutional neural networks trained with PDB dataset, the AlphaFold takes the next step, applying gradient descent with the proposed structure based on the predictions made in step 1.
AlphaFold Step 2: Repeated Gradient Descent on Protein-specific Potential
We mentioned above that the neural network trained in step 1 also predicts torsion angles of the amino acids residue. Then what is the torsion angle?
Torsion angle is defined as the dihedral angle between amino acids residues, and it is essential for solving protein folding problems. Each residue contains three backbone atoms, including one nitrogen atom and two carbon atoms. Since the distance between the backbone atoms and the angles made by three backbone atoms do not change, the shape of connected amino acids residues solely depends on torsion angles of both ends of the residue.
Therefore, the entire structure of protein can be defined by the set of these torsion angles. This means that solving protein folding problems depends on figuring out the torsion angles of residues in the protein.
Figure 4: Protein torsion angle. Credits: Protopedia.org. Source: https://proteopedia.org/wiki/index.php/Dihedral/Index
In short, during the second step, AlphaFold builds a protein-specific potential that measures how good a configuration is for a given folding configuration of that sequence, generating structures that conform to the distance predictions.
Since the torsion angle can predict the protein structure, AlphaFold can build the protein-specific potential as a function of 2*L (number of residues in protein) torsion angles.
In the second step, the initially predicted protein structure is proposed. With the initially proposed structure, AlphaFold calculates three types of energy potential of the structure.
The first is the distance potential. The distance potential is calculated based on the distance between residues in the initially proposed structure and the distogram predicted in step 1. The bigger the difference is between the distance predicted in the distogram and the initial structure, the bigger the potential energy is in the proposed structure.
The second is the geometric potential. The geometric potential is calculated based on the difference between the initially proposed structure and the structure predicted by the neural network in step 1, as a function of the torsion angles. The bigger the difference between the two, the bigger the potential energy is.
The third is the smooth potential. Since step 1 only predicts the structure of backbone atoms, it lacks the consideration of whether a “side chain” exists. Here, using Rosetta to add side chain into the predicted structure allows AlphaFold to incorporate van der Waals term in order to prevent steric clashes.
Since the result of the predictions can be converted into a function of 2*L torsion angles, the three different types of energy potentials calculated also can be combined into one potential energy function of 2L torsion angles.
Therefore, for the combined potential energy, called protein-specific potential, the function is differentiable with variables of 2*L torsion angles. It is then optimized to generate the protein fold using gradient descent minimization to obtain well-packed protein structures. Through gradient descent, the system directly optimizes the combined potential with respect to the structure torsion angles.
Figure 5: Extract from Nature article. Photo credit: Nature. Source: https://www.nature.com/articles/s41598-019-55047-4.
Basically, the input tells the system how far apart each amino acid residue is from each other and runs gradient descent on the torsion angles. This makes them smaller, which then can match the distances in the model, called differential geometrical model, with the predicted distances. The process is also explained in DeepMind’s Nature article, where they say, “after multiple restarts of the gradient descent process, the optimization converges and the lowest potential structure is chosen as the best candidate structure”.
Furthermore, the network also provides information on distance predictions variances, which indicates the level of confidence that should be associated with each prediction.