The information on patents included in this article was obtained from Keywert, a patent analysis database based on artificial intelligence.
Among the patents mentioned in the first article of this series, we will now take a closer look at the following registered patent:
- System and method for addressing overfitting in a neural network
This patent relating to dropout was filed on 30 August 2013 and registered on 2 August 2016.
This patent describes the basic function of dropout to disable nodes (feature detectors) in the training process to prevent overfitting. During the examination of the patent application, the USPTO (United States Patent and Trademark Office) raised one rejection reason, and the patent was eventually registered following some slight modifications by Google.
Generally, in order to obtain a patent, you have to draft a patent specification (a document describing your invention in detail) and file an application with the patent office of the country in which you wish to obtain a patent. The patent office then examines whether a patent can be granted in accordance with the relevant patent laws. For a typical patent application, the patent office will raise at least one rejection reason. In Korea, for example, statistics show that at least one rejection reason is raised against 90% of all patent applications. Usually, the patent office will issue a notification informing the applicant of one or multiple rejection reasons. Such a notification is referred to as an office action (OA).
An OA is not a final decision to reject a patent application, and the applicant is given an opportunity to argue against the rejection reasons raised in the OA. Alternatively, the applicant can amend the patent specification, in order to overcome the rejection reasons. The result of an amendment is usually a narrowing (i.e. a weakening) of the scope of the patent right.
The importance of these procedures will be explained in detail in future articles. We will also discuss strategies that will help you to maximize the strength of your patents.
The actual scope of a patent right is defined in the so-called patent claims of the patent specification, in particular in claim 1. Claim 1 of the patent specification that was finally registered by Google is provided below:
“1. A computer-implemented method comprising:
obtaining a plurality of training cases; and training a neural network having a plurality of layers on the plurality of training cases, each of the layers including one or more feature detectors, each of the feature detectors having a corresponding set of weights, and a subset of the feature detectors being associated with respective probabilities of being disabled during processing of each of the training cases, wherein training the neural network on the plurality of training cases comprises, for each of the training cases respectively:determining one or more feature detectors to disable during processing of the training case, comprising determining whether to disable each of the feature detectors in the subset based on the respective probability associated with the feature detector, disabling the one or more feature detectors in accordance with the determining, and processing the training case using the neural network with the one or more feature detectors disabled to generate a predicted output for the training case.”
In the second article of this series, we explained that Tensorflow’s Apache 2.0 License grants users the right to use patents that are inevitably infringed if the open source software is used.
As mentioned above, the patent claims define the scope of a patent right, and therefore the patent claims need to be examined when trying to determine whether an allegedly infringing product actually infringes a patent. The general rule is that if all of the elements of a patent claim exist in the allegedly infringing product, then the product infringes the patent.
Let's look at the elements of claim 1 quoted above.
It is natural to obtain labeled data during the training of a neural network (obtaining a plurality of training cases). Moreover, a neural network, especially a deep neural network, has a plurality of layers (training a neural network having a plurality of layers on the plurality of training cases) and each layer has a number of nodes with weight information (each of the layers including one or more feature detectors, each of the feature detectors having a corresponding set of weights).
Therefore, the portions marked in red in the claims can be regarded as portions of that must inevitably be used when using Tensorflow.
Next, let's look at the following link, that describes the dropout library in Tensorflow:
It is described that the “dropout sets a random percentage of input units to zero at each update during training.” Therefore, you can determine for which node to set the input to 0 (determining one or more feature detectors to disable during processing of the training case), disable the node (disabling the one or more feature detectors in accordance with the determining), and perform the training in that state (processing the training case using the neural network with the one or more feature detectors disabled to generate a predicted output for the training case).
Now the key is to determine whether Tensorflow sets a probability that each node will be disabled (a subset of the feature detectors being associated with respective probabilities of being disabled during processing of each of the training cases), and determines the percentage of nodes to be disabled (make the input 0) based on this probability (determining whether to disable each of the feature detectors in the subset based on the respective probability associated with the feature detector) when performing dropout.
Let's look at the source code for dropout.
[Source: Hwe-hee Chung (firstname.lastname@example.org) of Sualab Co., Ltd.]
This code shows how many of the nodes are to be left out among the entirety of nodes. That is, it shows a configuration that implements dropout on a node by using the keep_prob variable (a value between 0 and 1) which determines the ratio of nodes to be disabled among the nodes.
In lines 2102-2103, the source code creates a random value that takes a uniform distribution of 0-1 and adds it to the keep_prob variable.
This creates a random value that is greater than 1 by the ratio of the size of the keep_prob value, and the remaining random values are less than 1.
Lines 2016-2108 of the source code show that if the random value generated is larger than 1, a value of 1 is assigned, and if it is less than 1, a value of 0 is assigned, and said value is multiplied with the corresponding node value.
This ensures that the input value of each node keeps the value corresponding to the value of keep_prob.
In this way, among the input values of each node, the nodes corresponding to the ratio of the keep_prob value retain their values, and the remaining nodes drop out.
That is, it becomes clear that the source code of Tensorflow sets the probability that each node will be disabled (a subset of the feature detectors being associated with respective probabilities of being disabled during processing of each of the training cases), and determines the percentage of nodes to disable (make the input 0) based on this probability (determining whether to disable each of the feature detectors in the subset based on the respective probability associated with the feature detector).
As a result, all of the components of the claim are found to be essential elements of Tensorflow's dropout functionality. Therefore, Tensorflow users do not need to worry about this claim in accordance with the Apache 2.0 license of Tensorflow.
Judging the actual scope of a patent right can be more complex than this, because a patent usually has multiple claims, all of which need to be considered individually. It is therefore very time consuming and costly to evaluate the scope of the Apache 2.0 license for all claims of a patent.
What are Google's intentions? We can speculate that the core patents which constitute Tensorflow's core functionality are not aimed at Tensorflow users. But Google has not yet declared the availability of its deep learning patents via the OPN (Open Patent Non-Assertion Pledge). This may be a business decision, or Google may be planning to update the list in the near future. Perhaps this article may even prompt Google to declare some deep learning patents via the OPN.
We hope that if one thing has become clear in this series of articles, it is that you should definitely be aware of and concerned about Google’s deep learning patents, regardless of whether Google offers open source licenses for its software. Google's patent portfolio covers more than just the basic functions provided by Tensorflow. Google has many patents that can be applied in real product implementation. Even if applications that have been researched and developed using Tensorflow do not infringe any of Google's core deep learning patents, there is a chance that they could be infringing specific detailed service patents owned by Google. And even if you believe in Google’s good intentions, other patent owners may not be similarly inclined.