구글의 성공적 BN 특허 등록, 그 뒷 이야기

Summary

구글의 BN 특허는 1차 거절 통보를 받았음에도 불구하고 최종적으로 등록 결정을 받았습니다. 이대호 변리사가 특허 등록까지의 과정을 분석하여, 성공적 특허 등록의 비결을 짚어드립니다.

안녕하세요, 이대호 변리사입니다.

저희가 딥러닝 관련 칼럼을 작성하기 시작하면서 가장 많은 관심을 받았던 주제 중 하나인 구글의 Batch Normalization Layers (BN) 관련 특허가 드디어 미국 특허청으로부터 등록 결정을 받았습니다.

결론부터 말씀드리자면, 제가 보기에는 구글이 주어진 상황하에서 최대한의 성과를 이끌어 냈다고 보입니다.. 작년 10월의 최초 거절이유와 이후 심사관 리포트에 나타난 미국 특허청의 스탠스를 봤을때, 아무리 구글이라도 해당 사건에 대해서 특허 청구항의 권리범위를 상당부분 양보해야 등록이 가능할 것으로 보였습니다.

그러나 구글은 4월 1일에 권리범위를 굉장히 양보하는 것 처럼 보이는 청구항 수정을 하면서, 사실상 Convolutional Layer에 Batch Normalization을 수행하는 경우의 기술적 핵심을 그대로 지켜내는 권리범위를 사수해 냈습니다. 최종 등록결정을 받아낸 청구항을 최초 청구항과 비교하여 보여드리겠습니다.

최초 청구항	최종 등록결정 받은 청구항
1. A neural network system implemented by one or more computers, the neural network system comprising: a batch normalization layer between a first neural network layer and a second neural network layer, wherein the first neural network layer generates first layer outputs having a plurality of components, and wherein the batch normalization layer is configured to, during training of the neural network system on a batch of training examples: receive a respective first layer output for each training example in the batch; compute a plurality of normalization statistics for the batch from the first layer outputs; normalize each component of each first layer output using the normalization statistics to generate a respective normalized layer output for each training example in the batch; generate a respective batch normalization layer output for each of the training examples from the normalized layer outputs; andprovide the batch normalization layer output as an input to the second neural network layer.	1. A neural network system implemented by one or more computers, the neural network system comprising: instructions for implementing a batch normalization layer between a first neural network layer and a second neural network layer in a neural network, wherein the first neural network layer generates first layer outputs having a plurality of components, and wherein the instructions cause the one or more computers to perform operations comprising: during training of the neural network on a plurality of batches of training data, each batch comprising a respective plurality of training examples and for each of the batches receiving a respective first layer output for each of the plurality of training examples in the batch; computing a plurality of normalization statistics for the batch from the first layer outputs, comprising: determining, for each of a plurality of subsets of the plurality of the components of the first layer outputs, a mean of the components of the first layer outputs for each of the plurality of training examples in batch that are in the respective subset, and determining, for each of a plurality of subsets of the plurality of the components of the first layer outputs, a standard deviation of the components of the first layer outputs for each of the plurality of training examples in the batch that are in the respective subset; normalizing each of the plurality of the components of each first layer output using the normalization statistics to generate a respective normalized layer output for each training example in the batch, comprising: for each first layer output and for each of the plurality of subsets, normalizing the components of the first layer output that are in the respective subset using the mean for the respective subset and standard deviation for the respective subset; generating a respective batch normalization output for each of the training examples from the normalized layer outputs; and providing the batch normalization layer output as an input to the second neural network layer.

최초 청구항

최종 등록결정 받은 청구항

1. A neural network system implemented by one or more computers, the neural network system comprising:
a batch normalization layer between a first neural network layer and a second neural network layer, wherein the first neural network layer generates first layer outputs having a plurality of components, and wherein the batch normalization layer is configured to, during training of the neural network system on a batch of training examples:
receive a respective first layer output for each training example in the batch;
compute a plurality of normalization statistics for the batch from the first layer outputs;
normalize each component of each first layer output using the normalization statistics to generate a respective normalized layer output for each training example in the batch;
generate a respective batch normalization layer output for each of the training examples from the normalized layer outputs; andprovide the batch normalization layer output as an input to the second neural network layer.

1. A neural network system implemented by one or more computers, the neural network system comprising:
instructions for implementing a batch normalization layer between a first neural network layer and a second neural network layer in a neural network, wherein the first neural network layer generates first layer outputs having a plurality of components, and wherein the instructions cause the one or more computers to perform operations comprising:
during training of the neural network on a plurality of batches of training data, each batch comprising a respective plurality of training examples and for each of the batches
receiving a respective first layer output for each of the plurality of training examples in the batch;
computing a plurality of normalization statistics for the batch from the first layer outputs, comprising:
determining, for each of a plurality of subsets of the plurality of the components of the first layer outputs, a mean of the components of the first layer outputs for each of the plurality of training examples in batch that are in the respective subset, and
determining, for each of a plurality of subsets of the plurality of the components of the first layer outputs, a standard deviation of the components of the first layer outputs for each of the plurality of training examples in the batch that are in the respective subset;
normalizing each of the plurality of the components of each first layer output using the normalization statistics to generate a respective normalized layer output for each training example in the batch, comprising:
for each first layer output and for each of the plurality of subsets, normalizing the components of the first layer output that are in the respective subset using the mean for the respective subset and standard deviation for the respective subset;
generating a respective batch normalization output for each of the training examples from the normalized layer outputs; and
providing the batch normalization layer output as an input to the second neural network layer.

이전에 Batch normalization 특허의 심사관 리포트 관련 칼럼에서, 심사관이 해당 특허의 청구항 9항에 대해서는 진보성이 있다는 뉘앙스의 언급을 했었다고 말씀드렸습니다.

청구항 9항
9. The neural network system of Claim 1, wherein the first neural network layer is a convolutional layer, wherein the plurality of components of the first layer output are indexed by feature index and spatial location index, and wherein computing a plurality of normalization statistics for the first layer outputs comprises: computing, for each combination of feature index and spatial location index, a mean of the components of the first layer outputs having the feature index and spatial location index; computing, for each feature index, an average of the means for combinations that include the feature index; computing, for each combination of feature index and spatial location index, a variance of the components of the first layer outputs having the feature index and spatial location index; and computing, for each feature index, an average of the variances for combinations that include the feature index.

청구항 9항

9. The neural network system of Claim 1, wherein the first neural network layer is a convolutional layer, wherein the plurality of components of the first layer output are indexed by feature index and spatial location index, and wherein computing a plurality of normalization statistics for the first layer outputs comprises:
computing, for each combination of feature index and spatial location index, a mean of the components of the first layer outputs having the feature index and spatial location index;
computing, for each feature index, an average of the means for combinations that include the feature index;
computing, for each combination of feature index and spatial location index, a variance of the components of the first layer outputs having the feature index and spatial location index; and
computing, for each feature index, an average of the variances for combinations that include the feature index.

청구항 9항은 CNN에서만 사용되도록 해당 특허의 범위를 좁히고, 또 정규화의 단위를 구성할때 동일한 피처 인덱스와 공간 로케이션 인덱스를 가지는 값들로 구성된 데이터 단위를 구성할때만 해당 특허의 권리범위 안에 들어가도록 하는 청구항입니다.

다시말해서, CNN에서 Convolution Layer에서 Batch Normalization을 할때 각 채널의 출력 마다 각각 정규화 컴포넌트(평균 및 분산)를 계산하여 각각 정규화하는 내용을 표현한 것입니다.

이전 칼럼에서 저도, 구글이 선뜻 심사관의 제안을 받아들이고, 청구항 9항의 권리범위만 확보하는 것으로 타협하기는 어려울 것이라고 말씀드렸었습니다.

그러나 구글은 해당 내용을 레버리지 삼아, CNN에서만 사용된다거나, 미니 배치를 구성하기 위한 제한 조건들이 붙어 있는 내용들을 빼놓고 청구항 1항을 수정 함으로서, 주어진 상황 내에서 최대한의 권리범위를 확보한 상태로 등록결정을 받아내는데 성공했습니다.

이제 구글의 현재 확보한 등록결정을 받아들이고, 추가적으로 더 넓은 권리범위를 확보하기 위해 어떠한 결정을 할지 지켜보아야 하겠습니다. CV Layer에서 BN을 사용하기 위한 청구항은 지켜냈지만, 좀더 일반화된 뉴럴 네트워크에서 BN을 수행하는 기술에 대한 권리범위 확보를 노릴 수도 있기 때문입니다.

이후 칼럼에서는 이번에 구글이 확보한 권리범위에 대한 자세한 해설과, 구글에게 남은 선택지에 대하여 말씀드리겠습니다.

감사합니다.