AlphaFold 2 and Related Patents: When AI Challenges the Protein Folding Problem [2]

Summary

This article discusses the fundamental changes in AlphaFold 2 compared to its previous version for protein structure prediction, as well as the patents owned by Google covering these technologies

In our previous article, we have discussed why deep learning is suitable for solving one of the biggest mysteries of life, the protein folding problem. I have dealt with how the original version of AlphaFold works, and in this article, I will discuss what has been upgraded in AlphaFold 2.

Google's DeepMind AI system AlphaFold 2 gained attention when it participated in a competition on protein structure prediction. The competition is about figuring out the shape which proteins fold, with just its amino acid sequence information.

For more details, please check our previous article.

In December 2020, AlphaFold 2 practically solved the problem, obtaining protein structure predictions with an accuracy on par with experimental methods.The improvements of AlphaFold 2 not only outperformed other teams, but also its previous version.

What’s New in AlphaFold 2?

Not much is known about AlphaFold 2 yet. DeepMind team published two articles about AlphaFold 1. However, there are no publications about the methods of the new version yet. What has been revealed so far is just the posting in DeepMind’s blog and the presentation from the CASP14 competition.

But here are the things we can assume so far:

The embedding was done through Multiple Sequence Alignment, MSA.
It uses a transformer. In other words, an attention algorithm was used.
Predicts the protein structure in an end-to-end manner.
The position vector and direction vector of each amino acid are output from a Graph Neural Network.

Embedding

As you may already be aware, embedding is the process of representing a particular object as a vector. These vectors can be understood by computers.

Below is an example. Word2vec is a method of vectorizing the close relationship between words based on the position of words in a sentence.

Attention Algorithm and Transformer

In the context of neural networks, attention algorithms started with natural language processing. But recently, it has become widely used in other fields, including vision. The effect of attention is to enhance the critical parts of the input data.

Consider the case where you are looking for some video on YouTube. Your search (query) will be mapped against a set of information such as the title, description, tags, etc (keys) associated with candidate videos in the database. Consequently, the search engine will present you with the best-matched videos (values).

An attention algorithm calculates the similarity between the query(Q) and key(K) vectors to retrieve the corresponding value(V).

Of course, for this purpose, the query, key and value must be all vectorized and embedded before any calculation.

In self-attention or K = V = Q, if the input is, for example, a sentence, then each word in the sentence needs to undergo attention. If you perform self-attention with the appropriate embedded words in a sentence, you can learn the dependencies between the words and use that to find out which world to look for when interpreting it in the given sentence.

As for transformers, they have a structure that encodes and decodes using attention. In the example, the transformer uses attention to “understand” the relevant context of a particular word(even from distant parts of the sentence) and then encodes that context in the vector that represents that word to help up understand the word and its role in the sentence.

Graph Neural Network

A graph network consists of nodes and edges which contain information in those nodes, edges, and features of the edges. Graph Neural Networks or GNNs are models that learn the information of graph networks by capturing the dependence of graphs via message passing between the nodes of graphs.

Now that we have revised the context, should we figure out how does AlphaFold 2 work?

AlphaFold 2: Embedding, Trunk and Head Stages

Overall, AlphaFold 2 can be divided into three stages: embedding, trunk and heads.

The previous article on AlphaFold 1 explained that the system uses MSA data to predict contact points in the protein structure. AlphaFold 2 uses MSA data in the embedding step.

The trunk stage shows two tracks; updating the sequence-residue edge in one part and the residue-residue edge in the other. The outputs here are probably the initial stage of the graph network in the last part, the head stage.

The sequence-residue edge uses attention between several different sequences and residues. And the residue-residue edge is the equivalent of the distance map discussed in AlphaFold 1.

As mentioned above, the relationship between words in a sentence can be found during self-attention. If we put this in the context of protein structure, you are trying to find the relationship between amino acids in a protein and other amino acids. Hence, this could be substituting the previous distance map from Alpha Fold 1. The overall structure diagram (the previous figure) also shows that the residue-residue edge is connected to the distance map.

In AlphaFold 1, the final folding structure of a protein is expressed as the pair of torsion angles of residues. While in AlphaFold 2, it is seemed to be expressed as the position vector of the central carbon atom (α-carbon) and the direction vector the amino acid is facing.

In the final module, it is assumed that the system applies a transformer that updates the graph neural network with attention.

A researcher named Fabian Fuchs argued in a blog his thesis has a similar structure to this. Those who are curious about the exact structure can refer to his paper.

Patents Related to AlphaFold

Patents associated with AlphaFold 2 are not yet disclosed as well as the papers. Therefore, the last section will discuss the three patents associated with AlphaFold 1.

The first patent is on probability distribution of distances between amino acids.

The second patent is about finding a stable structure by calculating the protein potential, corresponding to stage 2 of AlphaFold 1.

The last patent is about the method of updating the predicted structure through iteration.

A question may arise for people who are not familiar with patents and intellectual property. That is, Why are there three patents for one technology?

This is because the technology running AlphaFold 1 may be one, but several technological components are contained in it. In order to cover all aspects of AlphaFold 1 with a single patent, all stages should be combined all together. However, as I will explain in detail later, the scope of a patent's rights becomes narrower when broader contents are covered in it.

For this reason, if there are multiple patent aspects contained in a technology, it is advantageous to apply for individual patents focusing on a single aspect instead of combining all in one patent.

WO 2020/058176, Machine Learning for Determining Protein Structures.

The first patent is related to the method of making a distance map. Patent publication number WO 2020/058176.

Although the details on this patent were not given in the first article, Deepmind disclosed that in order to create a distance histogram through CNN, they used training data by cropping it. This way, a single image can be divided into multiple images, and this process can augment the training images.

If you look at claim 1 of the patent, it is claimed as a critical point that for training, they crop the distance map and use the plurality of distance map crops for generating a distance map.

Figure 6 in the patent contains the details of how to perform the cropping.

A claim is the part that declares the scope of rights that it wants to secure through the patent. Multiple claims can exist in a single patent.

The aforementioned part was the first claim, and the next part is the contents of claim 16.

The contents of claim 16 is similar to claim 1, but is much brief. It takes the amino acid sequence data of a given protein and generates a distance map. And with the given distance map, it calculates a quality score of the predicted protein structure.

How is this different to the actual contents of AlphaFold 1? AlphaFold 1 evaluates the predicted structure by using not only a potential function based on a distance map, but also the potential that takes into account torsion angles and steric clash. However, the claim of the patent only covers contents related to distance map.

But in the interpretation of this patent, the scope of rights of the patent not only covers a model only using the distance map, but also other models that considers other quality criteria in addition to the distance map, just as AlphaFolds does.

WO 2020/058174, Machine Learning for Determining Protein Structures.

The second patent is patent publication number WO 2020/058174, which is related to the second step of AlphaFold 1.

This patent refers to the steps of generating several predicted structures of the target protein and selecting a final structure from the plurality of predicted structures.

During this process, the initial value of the protein structure is determined. And it involves repeatedly updating the corresponding protein structure parameters. Therefore, updating eventually refers to applying gradient descent.

The point that we need to pay attention to here is how the potential function to evaluate the predicted structure to apply gradient descent has been expressed.

It is this function below:

I think this patent pays a lot of attention to describing this part. As you can see, it says that the quality score is determined, and the quality score is expressed as being calculated based on the values of the current structural parameters and the representation of the amino acid sequence of the protein.

The function is described as general as possible.

If you describe the potential function in AlphaFold 1 as the way it is, you could list sub-functions one by one. However, instead of such a method, the patent uses a more general expression by describing common input factors and output results. This kind of approach maximizes the scope of rights of the patent.

WO 2020/058177, Machine Learning for Determining Protein Structures.

The last patent is publication number WO 2020/058177. It shows the conditions for replacing the current predicted structure or solution with an alternative structure using a quality score characterizing the quality of the current predicted structure and the quality score characterizing the quality of the alternative predicted structure.

The two articles on AlphaFold have dealt with various contents including patent, biology, and deep learning, so it might have been difficult for readers to understand in-depth. Since the articles are about specialized topics on patents, the details regarding patent claims would have been tricky.

However, we will get back to this aspect with a step-by-step explanation in the future, so please look forward to our upcoming articles.