Out of our articles on deep learning, it can be said that one of the topics that gained more attention has been Google's patent on batch normalization layers. This patent has now finally been accepted by the United States Patent and Trademark Office (USPTO).
Ultimately, in this case, we were able to witness Google’s ability to achieve the best possible outcome under the given circumstances.
Considering the initial rejection in October last year and the stance of the USPTO conveyed in the examiner’s reports, it was inevitable that Google had to compromise the scope of rights of the patent claims in order for Google to register the patent.
However, on April 1, Google amended the claims in a way that seemed to compromise their rights significantly. However, they successfully secured their claim which dealt with the critical techniques of performing batch normalization on the convolutional layer.
Initial Claim | Allowed Claim |
1. A neural network system implemented by one or more computers, the neural network system comprising: a batch normalization layer between a first neural network layer and a second neural network layer, wherein the first neural network layer generates first layer outputs having a plurality of components, and wherein the batch normalization layer is configured to, during training of the neural network system on a batch of training examples: receive a respective first layer output for each training example in the batch; compute a plurality of normalization statistics for the batch from the first layer outputs; normalize each component of each first layer output using the normalization statistics to generate a respective normalized layer output for each training example in the batch; generate a respective batch normalization layer output for each of the training examples from the normalized layer outputs; andprovide the batch normalization layer output as an input to the second neural network layer. |
1. A neural network system implemented by one or more computers, the neural network system comprising: instructions for implementing a batch normalization layer between a first neural network layer and a second neural network layer in a neural network, wherein the first neural network layer generates first layer outputs having a plurality of components, and wherein the instructions cause the one or more computers to perform operations comprising: during training of the neural network on a plurality of batches of training data, each batch comprising a respective plurality of training examples and for each of the batches: receiving a respective first layer output for each of the plurality of training examples in the batch; computing a plurality of normalization statistics for the batch from the first layer outputs, comprising: determining, for each of a plurality of subsets of the plurality of the components of the first layer outputs, a mean of the components of the first layer outputs for each of the plurality of training examples in batch that are in the respective subset, and determining, for each of a plurality of subsets of the plurality of the components of the first layer outputs, a standard deviation of the components of the first layer outputs for each of the plurality of training examples in the batch that are in the respective subset; normalizing each of the plurality of the components of each first layer output using the normalization statistics to generate a respective normalized layer output for each training example in the batch, comprising: for each first layer output and for each of the plurality of subsets, normalizing the components of the first layer output that are in the respective subset using the mean for the respective subset and standard deviation for the respective subset; generating a respective batch normalization output for each of the training examples from the normalized layer outputs; and providing the batch normalization layer output as an input to the second neural network layer. |
Previously, in the column related to the examiner's report of the patent on batch normalization, I stated that the examiner had mentioned the nuance of the inventive step of claim 9 of the patent.
Claim 9:
"The neural network system of Claim 1, wherein the first neural network layer is a convolutional layer, wherein the plurality of components of the first layer output are indexed by feature index and spatial location index, and wherein computing a plurality of normalization statistics for the first layer outputs comprises:
computing, for each combination of feature index and spatial location index, a mean of the components of the first layer outputs having the feature index and spatial location index;
computing, for each feature index, an average of the means for combinations that include the feature index;
computing, for each combination of feature index and spatial location index, a variance of the components of the first layer outputs having the feature index and spatial location index; and
computing, for each feature index, an average of the variances for combinations that include the feature index."
Claim 9 narrows the scope of the patent to be used only in CNN and when constructing a unit of normalization, ensures that it falls within the scope of right of the patent only when constructing a data unit composed of values having the same feature index and spatial location index.
In other words, when CNN performs batch normalization in a convolution layer, each normalized component (mean and variance) is calculated for each output of each channel to represent normalized contents.
In the previous column, I informed you that it would be difficult for Google to compromise by accepting the proposals of the examiner and securing only the scope of right of Claim 9.
However, Google leveraged the content and succeeded in getting a registration decision with the maximum scope of right within the given situation by amending Claim 1. Without any content that is used only by CNN, or with restrictions on constructing mini-batches, Google secured its patent.
Now, we need to accept Google's current registration decision and see what decisions we will make to further broaden our scope of rights. Although the claims for using the BN in the CV Layer have been preserved, it may be that a more generalized neural network may seek to secure the scope of right to technology to perform BN.
In the next article, I will discuss the scope of right that Google has gained and the choices that Google has left.
Thank you.