WO2025110327A1

WO2025110327A1 - Calibration method and apparatus using input/output distribution of adjacent layer

Info

Publication number: WO2025110327A1
Application number: PCT/KR2023/020678
Authority: WO
Inventors: 손진우; 임지은; 홍덕기; 이원재
Original assignee: Sapeon Korea Inc
Current assignee: Sapeon Korea Inc
Priority date: 2023-11-21
Filing date: 2023-12-14
Publication date: 2025-05-30
Anticipated expiration: 2026-05-21

Abstract

The present invention relates to a calibration method and apparatus using an input/output distribution of an adjacent layer. According to one aspect of the present invention, provided is a calibration method comprising: an operation process of generating a first operation result in an operation layer of an artificial neural network; a calibration process of acquiring a first effective input range by using information about another layer adjacent to the operation layer; a first quantization process of generating a first quantization result by quantizing the first operation result on the basis of the first effective input range; and an activation process of generating a first activation output by inputting the first quantization result to the first activation function.

Description

Calibration method and device utilizing input/output distribution of adjacent layers

본 개시는 인접 레이어의 입출력 분포를 활용한 캘리브레이션 방법 및 장치에 관한 것이다.The present disclosure relates to a calibration method and device utilizing input/output distribution of adjacent layers.

이하에 기술되는 내용은 단순히 본 실시예와 관련되는 배경 정보만을 제공할 뿐 종래기술을 구성하는 것이 아니다.The content described below merely provides background information related to the present embodiment and does not constitute prior art.

딥러닝 모델은 학습 가능한 가중치(weights)로 선형 연산을 수행하는 계층(Convolution Layer, Linear layer)과 활성화 함수(activation function)로 구성된 블록들의 조합을 포함한다. 이러한 블록들은 수십 개에서 수백 개까지 조합으로 구성될 수 있다.Deep learning models consist of a combination of blocks consisting of layers (convolution layers, linear layers) that perform linear operations on learnable weights and activation functions. These blocks can be combined in dozens to hundreds of combinations.

딥러닝 모델의 파라미터 및 연산은 정밀하게 표현되는 게 요구되기 때문에 일반적으로 부동 소수점 데이터 타입인 FP32, FP16 등을 사용한다. Since the parameters and operations of deep learning models require precise representation, floating point data types such as FP32 and FP16 are generally used.

데이터 타입으로 FP32 또는 FP16를 사용하는 경우, FP32의 크기는 32bit이고 FP16의 크기는 16bit이므로, 연산에 많은 메모리가 요구된다.When using FP32 or FP16 as the data type, the size of FP32 is 32 bits and the size of FP16 is 16 bits, so a lot of memory is required for the operation.

보통 딥러닝 모델에서 연산의 가속화와 메모리 사용량을 줄이기 위해 연산 결과를 양자화(quantization)하여 더 적은 수의 비트로 이루어진 데이터 타입(예컨대, 8비트의 INT8 및 FP8 데이터 타입)으로 표현한다.Typically, in deep learning models, the computational results are quantized to accelerate computation and reduce memory usage, and expressed as data types with fewer bits (e.g., 8-bit INT8 and FP8 data types).

양자화는 주로 가중치(weight), 연산 결과 및 활성화(activation)에 대해 적용된다. 연산 결과의 양자화 및 활성화 함수의 출력인 활성화의 양자화 시에 각각 양자화의 실제 입력 분포를 근사화하여 표현하기 위해 임계값이 필요한데, 이 과정을 캘리브레이션(calibration) 과정이라 한다.Quantization is mainly applied to weights, operation results, and activation. When quantizing the operation results and quantizing the activation, which is the output of the activation function, a threshold value is required to approximate and express the actual input distribution of the quantization, and this process is called the calibration process.

캘리브레이션 과정은 특정 계층(layer)의 출력 분포를 분석하여 적절한 임계값(threshold)을 결정하는 단계를 의미하는데, 이 임계값은 연산 결과를 양자화 할 때 및 활성화를 양자화할 때 각각 산출되어 사용된다.The calibration process refers to the step of analyzing the output distribution of a specific layer to determine an appropriate threshold, which is calculated and used when quantizing the operation results and quantizing the activation.

기존의 캘리브레이션은 대표적인 캘리브레이션 방법인 최대, 백분위 및 엔트로피(max/percentile/entropy) 캘리브레이션 중에 가장 효율이 좋은 캘리브레이션 방법을 사용한다.Conventional calibration uses the most efficient calibration method among the representative calibration methods, namely maximum, percentile, and entropy (max/percentile/entropy) calibration.

도 1a는 특정 계층의 연산 출력값의 절대값 도수 분포 및 캘리브레이션 기준점이 되는 백분위 위치를 예시한 도면이고, 도 1b는 캘리브레이션 결과 양자화 대상 영역 및 양자화 제외 영역을 예시한 도면이다.Figure 1a is a diagram illustrating the absolute value frequency distribution of the operation output value of a specific layer and the percentile position that serves as a calibration reference point, and Figure 1b is a diagram illustrating the quantization target area and the quantization exclusion area as a result of the calibration.

도 1a에 도시된 바와 같이, 연산 출력값에 대한 대칭 양자화를 위하여 연산 출력값의 절대값 도수 분포의 백분위 99.99%의 지점의 값을 캘리브레이션 기준점으로 잡는 경우, 도 1b와 같이 백분위 99.99%의 지점의 내의 값에 해당하는 +캘리브레이션 기준점과 -캘리브레이션 기준점 사이의 연산 출력에 대하여 대칭 양자화를 수행하여 제1 양자화 출력을 생성하는 반면, 연산 출력의 절대값이 캘리브레이션 기준점보다 큰 값들(즉, 도 1b에서 Clip으로 표시된 영역)은 양자화 대상에서 제외하여 일정한 비트 수의 양자화 코드를 이용하여 상대적으로 효율적인 양자화를 도모한다.As illustrated in FIG. 1a, in order to perform symmetric quantization on an operation output value, when the value at the 99.99% percentile point of the absolute value frequency distribution of the operation output value is set as a calibration reference point, as illustrated in FIG. 1b, symmetric quantization is performed on the operation output between the + calibration reference point and the - calibration reference point corresponding to the values within the 99.99% percentile point to generate the first quantized output, while values whose absolute values of the operation output are larger than the calibration reference point (i.e., the area indicated by Clip in FIG. 1b) are excluded from the quantization target, thereby achieving relatively efficient quantization using a quantization code of a constant number of bits.

도 2는 전체 연산 출력값 도수 분포에 활성화 함수의 입력 영역을 함께 도시한 도면이다.Figure 2 is a diagram showing the input area of the activation function along with the frequency distribution of the entire operation output value.

도 2에 도시된 바와 같이. 활성화 함수의 유효 입력 영역이 0≤x≤6 으로 정의된 경우, 제1 양자화 출력을 활성화 함수에 입력시킬 때 활성화 함수의 유효 입력 영역 외의 제1 양자화 출력은 활성화 함수에는 불필요한 입력이 되고, 활성화 함수에 불필요하지 않은 제1 양자화 출력에도 양자화 코드가 할당되는 비효율이 발생한다.As illustrated in Fig. 2, when the valid input region of the activation function is defined as 0≤x≤6, when the first quantized output is input to the activation function, the first quantized output outside the valid input region of the activation function becomes an unnecessary input to the activation function, and an inefficiency occurs in which a quantization code is assigned to the first quantized output that is not unnecessary to the activation function.

따라서, 종래의 방법으로 캘리브레이션을 하는 경우에는, 캘리브레이션 이후의 양자화 및 활성화 과정을 고려하면 캘리브레이션 결과를 양자화하는 경우의 불필요한 양자화 코드의 할당에 의한 정보의 손실이 발생하므로, 이러한 손실을 최소화하기 위하여 캘리브레이션을 더욱 정교하게 할 필요가 있다.Therefore, when performing calibration using a conventional method, when considering the quantization and activation processes after calibration, information loss occurs due to the allocation of unnecessary quantization codes when quantizing the calibration results. Therefore, in order to minimize this loss, calibration needs to be made more precise.

본 개시는 인접 레이어의 입출력 분포를 활용한 캘리브레이션 방법 및 장치를 제공하는 데에 주된 목적이 있다.The main purpose of the present disclosure is to provide a calibration method and device utilizing the input/output distribution of adjacent layers.

본 발명이 해결하고자 하는 과제들은 이상에서 언급한 과제들로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The problems to be solved by the present invention are not limited to the problems mentioned above, and other problems not mentioned will be clearly understood by those skilled in the art from the description below.

본 개시의 일 측면에 의하면, 인공신경망의 연산 레이어에서 제1 연산 결과를 생성하는 연산 과정; 상기 연산 레이어에 인접한 다른 레이어의 정보를 이용하여 제1 유효 입력 범위를 획득하는 캘리브레이션 과정; 상기 제1 유효 입력 범위에 기초하여 상기 제1 연산 결과를 양자화하여 제1 양자화 결과를 생성하는 제1 양자화 과정; 및 상기 제1 양자화 결과를 상기 제1 활성화 함수에 입력시켜 제1 활성화 출력을 생성하는 활성화 과정을 포함하는 캘리브레이션 방법을 제공한다.According to one aspect of the present disclosure, a calibration method is provided, including: a computational process for generating a first computational result in a computational layer of an artificial neural network; a calibration process for obtaining a first valid input range by using information of another layer adjacent to the computational layer; a first quantization process for quantizing the first computational result based on the first valid input range to generate a first quantized result; and an activation process for inputting the first quantized result to the first activation function to generate a first activation output.

상기의 캘리브레이션 방법이 포함하는 각 과정을 실행시키기 위하여 컴퓨터로 읽을 수 있는 기록매체에 저장된 컴퓨터 프로그램을 제공한다.A computer program stored on a computer-readable recording medium is provided to execute each process included in the above calibration method.

본 개시의 또 다른 측면에 의하면, 인공신경망의 연산 레이어에서 제1 연산 결과를 생성하는 연산부; 상기 연산 레이어에 인접한 다른 레이어의 정보를 이용하여 제1 유효 입력 범위를 획득하는 캘리브레이션부; 상기 제1 유효 입력 범위에 기초하여 상기 제1 연산 결과를 양자화하여 제1 양자화 결과를 생성하는 제1 양자화부; 및 상기 제1 양자화 결과를 상기 제1 활성화 함수에 입력시켜 제1 활성화 출력을 생성하는 활성화부를 포함하는 캘리브레이션 장치를 제공한다.According to another aspect of the present disclosure, a calibration device is provided, including: a calculation unit that generates a first calculation result in a calculation layer of an artificial neural network; a calibration unit that obtains a first valid input range by using information of another layer adjacent to the calculation layer; a first quantization unit that quantizes the first calculation result based on the first valid input range to generate a first quantization result; and an activation unit that inputs the first quantization result to the first activation function to generate a first activation output.

본 개시의 실시예에 의하면, 인공신경망에서 활성화 이전의 양쟈화 과정을 더욱 정교하게 하는 효과가 있다.According to an embodiment of the present disclosure, there is an effect of further elaborating the quantization process prior to activation in an artificial neural network.

기 정장된 활성화 함수의 종류 외의 활성화 함수를 사용하는 경우에는 유효 입력 범위를 효과적으로 찾을 수 있게 하는 효과가 있다.When using an activation function other than the types of standard activation functions, it has the effect of effectively finding a valid input range.

본 개시의 효과들은 이상에서 언급한 효과들로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The effects of the present disclosure are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the description below.

도 1a는 특정 계층의 연산 출력값의 절대값 도수분포 및 캘리브레이션 기준점이 되는 백분위 위치를 예시한 도면이고, 도 1b는 캘리브레이션 결과 양자화 대상 영역 및 양자화 제외 영역을 예시한 도면이다.Figure 1a is a diagram illustrating the absolute value frequency distribution of the operation output value of a specific layer and the percentile position that serves as a calibration reference point, and Figure 1b is a diagram illustrating the quantization target area and the quantization exclusion area as a calibration result.

도 3은 본 개시의 일 실시예에 따른 캘리브레이션 장치(300)의 구성을 예시한 블록도이다.FIG. 3 is a block diagram illustrating the configuration of a calibration device (300) according to one embodiment of the present disclosure.

도 4a는 활성화 함수 ReLU6()의 입력 및 출력을 예시한 도면이고, 도 4b는 활성화 함수 sigmoid()의 입력 및 출력을 예시한 도면이고, 도 4c는 활성화 함수 Swish()의 입력 및 출력을 예시한 도면이다.Fig. 4a is a diagram illustrating the input and output of the activation function ReLU6(), Fig. 4b is a diagram illustrating the input and output of the activation function sigmoid(), and Fig. 4c is a diagram illustrating the input and output of the activation function Swish().

도 5는 활성화 함수 Swish() 및 그 미분에 해당하는 Swish'() 함수의 그래프를 예시한 도면이다.Figure 5 is a diagram illustrating a graph of the activation function Swish() and its derivative, the Swish'() function.

도 6은 본 개시의 일 실시예에 따른 캘리브레이션 방법을 예시한 흐름도이다.FIG. 6 is a flowchart illustrating a calibration method according to one embodiment of the present disclosure.

이하, 본 개시의 일부 실시예들을 예시적인 도면을 이용해 상세하게 설명한다. 각 도면의 구성 요소들에 참조 부호를 부가함에 있어서, 동일한 구성 요소들에 대해서는 비록 다른 도면 상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 개시를 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 개시의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.Hereinafter, some embodiments of the present disclosure will be described in detail using exemplary drawings. When adding reference numerals to components of each drawing, it should be noted that the same numerals are used for identical components as much as possible even if they are shown in different drawings. In addition, when describing the present disclosure, if it is determined that a specific description of a related known configuration or function may obscure the gist of the present disclosure, the detailed description thereof will be omitted.

본 개시에 따른 실시예의 구성요소를 설명하는 데 있어서, 제1, 제2, i), ii), a), b) 등의 부호를 사용할 수 있다. 이러한 부호는 그 구성요소를 다른 구성 요소와 구별하기 위한 것일 뿐, 그 부호에 의해 해당 구성요소의 본질 또는 차례나 순서 등이 한정되지 않는다. 명세서에서 어떤 부분이 어떤 구성요소를 '포함' 또는 '구비'한다고 할 때, 이는 명시적으로 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. In describing components of embodiments according to the present disclosure, symbols such as first, second, i), ii), a), b), etc. may be used. These symbols are only for distinguishing the components from other components, and the nature or order or sequence of the components is not limited by the symbols. When a part in the specification is said to "include" or "provide" a component, this does not mean that other components are excluded, but rather that other components can be further included, unless explicitly stated otherwise.

첨부된 도면과 함께 이하에 개시될 상세한 설명은 본 개시의 예시적인 실시형태를 설명하고자 하는 것이며, 본 개시가 실시될 수 있는 유일한 실시형태를 나타내고자 하는 것이 아니다.The detailed description set forth below, together with the accompanying drawings, is intended to explain exemplary embodiments of the present disclosure and is not intended to represent the only embodiments in which the present disclosure may be practiced.

도 3에 도시하듯이, 본 실시예에 따른 캘리브레이션 장치(300)는 연산부(calculating unit, 310), 캘리브레이션부(calibrating unit, 320), 제1 양자화부(first quantizing unit, 330), 활성화부(activating unit, 340), 제2 양자화부(second quantizing unit, 350) 및 테이블 저장부(table storing unit, 360)를 포함하여 구현될 수 있다. 본 실시예에 따른 캘리브레이션 장치(300)는 도 3의 구성요소 중에서 일부의 구성요소는 생략하여 구현되거나 도 3에 도시되지 않은 다른 구성요소를 추가하여 구현될 수 있다.As illustrated in FIG. 3, a calibration device (300) according to the present embodiment may be implemented by including a calculating unit (310), a calibrating unit (320), a first quantizing unit (330), an activating unit (340), a second quantizing unit (350), and a table storing unit (360). The calibration device (300) according to the present embodiment may be implemented by omitting some of the components of FIG. 3 or by adding other components not illustrated in FIG. 3.

연산부(calculating unit, 310)는 딥러닝 시의 연산 레이어에서 제1 연산 결과를 생성한다. 예컨대, 연산부(310)는 딥러닝과 관련된 컨볼루션 신경망(CNN) 및 순환 신경망(RNN) 등의 인공신경망에서 하나의 레이어의 연산을 수행한다.The calculating unit (310) generates a first calculation result in a calculation layer during deep learning. For example, the calculating unit (310) performs a calculation of one layer in an artificial neural network such as a convolutional neural network (CNN) and a recurrent neural network (RNN) related to deep learning.

예를 들어, 컨볼루션 신경망에서 입력 이미지의 일부 데이터 셋(set)과 커널을 이용한 행렬곱을 수행하는 경우, 도 3에 도시된 바와 같이, 연산부(310)는 입력 이미지의 4개의 데이터 셋(x₁, x₂, x₃, x₄)에 대하여 각각 대응하는 커널의 가중치(w₁, w₂, w₃, w₄)를 이용한 행렬곱 ∑w_ix_i를 수행하여 제1 연산 결과를 출력한다.For example, in the case of performing matrix multiplication using a kernel and a part of a data set of an input image in a convolutional neural network, as illustrated in FIG. 3, the operation unit (310) performs matrix multiplication ∑w i x i using the weights (w ₁ , w ₂ , w ₃ , w ₄ ) of the corresponding kernels for each of the four data _sets (x ₁ , x ₂ , x ₃ , x ₄ ) of the input _image , thereby outputting the first operation result.

연산부(310)는 서로 다른 복수의 데이터 셋 각각에 대하여 커널의 가중치를 적용하여 각각의 행렬곱을 수행하여 각각의 제1 연산 결과를 출력한다.The operation unit (310) applies kernel weights to each of a plurality of different data sets, performs matrix multiplication for each, and outputs each first operation result.

연산부(310)에 의해 한 번의 행렬곱이 수행되어 제1 연산 결과가 출력될 때마다 캘리브레이션부(320)의 동작이 수행된다.The operation of the calibration unit (320) is performed each time a matrix multiplication is performed by the operation unit (310) and the first operation result is output.

가중치를 이용한 선형 연산을 수행하는 계층(Convolution Layer, Linear layer)과 활성화 함수로 구성된 블록에서 활성화 함수의 출력 분포를 예측할 수 있다면, 이 출력 분포를 인접해 있는 이전 계층의 캘리브레이션 단계에서 활용할 수 있다.If the output distribution of an activation function can be predicted in a block consisting of a layer (Convolution Layer, Linear layer) that performs linear operations using weights and an activation function, this output distribution can be utilized in the calibration stage of the adjacent previous layer.

실제 입력에 대한 활성화 함수의 출력 분포뿐만 아니라 활성화 함수의 특성 정보를 이전 계층에서 활용한다면, 이전 계층의 양자화 과정을 더욱 정교하게 수행할 수 있다.If the characteristic information of the activation function as well as the output distribution of the activation function for the actual input is utilized in the previous layer, the quantization process of the previous layer can be performed more precisely.

캘리브레이션부(320)는 연산부(110)의 제1 연산 결과와 관련된 제1 활성화 함수의 입력값 및 제1 활성화 함수의 출력값 사이의 관계와 관련한 제1 유효 입력 범위를 획득한다.The calibration unit (320) obtains a first valid input range related to the relationship between the input value of the first activation function and the output value of the first activation function related to the first operation result of the operation unit (110).

캘리브레이션부(320)는 제1 연산 결과와 관련된 제1 활성화 함수가 어떤 함수인지 알고 있는 상태인 경우를 가정한다.It is assumed that the calibration unit (320) knows what the first activation function related to the first operation result is.

도 4a에 도시된 바와 같이, 제1 연산 결과와 관련된 제1 활성화 함수가 ReLU6(x)의 경우, y=ReLU6(x)는 아래의 수식:As shown in Fig. 4a, if the first activation function related to the first operation result is ReLU6(x), then y=ReLU6(x) is the following formula:

x<0이면, y=0If x<0, then y=0

0≤x≤6이면, y=xIf 0≤x≤6, then y=x

x>60이면, y=6If x>60, then y=6

으로 정의된다.is defined as

따라서, 캘리브레이션부(320)는 ReLU6(x)의 유효한 입력범위 [0:6]을 획득하며, [0:6]는 0 이상이고 6 이하인 값으로 정의된다.Therefore, the calibration unit (320) obtains a valid input range [0:6] of ReLU6(x), where [0:6] is defined as a value greater than or equal to 0 and less than or equal to 6.

또한, 도 4b에 도시된 바와 같이, 제1 연산 결과와 관련된 제1 활성화 함수가 sigmoid(z)인 경우, y=sigmoid(z)=1/(1+e^-z)에서는, z<-6이거나 z>6이면 y≒0이 된다. 따라서, 캘리브레이션부(320)는 제1 활성화 함수 sigmoid(z)의 유효 입력 범위 [-6:6]을 획득하며, 유효 입력 범위 [-6:6]는 -6 이상이고 6 이하인 값으로 정의된다.In addition, as illustrated in FIG. 4b, when the first activation function related to the first operation result is sigmoid(z), in y=sigmoid(z)=1/(1+e ^-z ), if z<-6 or z>6, y≒0. Accordingly, the calibration unit (320) obtains the valid input range [-6:6] of the first activation function sigmoid(z), and the valid input range [-6:6] is defined as a value greater than or equal to -6 and less than or equal to 6.

또한, 도 4c에 도시된 바와 같이, 활성화 함수 제1 연산 결과와 관련된 제1 활성화 함수가 Swish(x)인 경우, y=Swish(x)=x*sigmoid(x)에서는, x<-6이면 y≒0이 된다. 따라서, 캘리브레이션부(320)는 제1 활성화 함수 Swish(x)의 유효 입력 범위 [-6:INF]을 획득하며, 유효 입력 범위 [-6:INF]는 -6 이상인 값으로 정의된다.In addition, as illustrated in Fig. 4c, when the first activation function related to the result of the first operation of the activation function is Swish(x), in y=Swish(x)=x*sigmoid(x), if x<-6, y≒0. Accordingly, the calibration unit (320) obtains the valid input range [-6:INF] of the first activation function Swish(x), and the valid input range [-6:INF] is defined as a value greater than or equal to -6.

테이블 저장부(360)는 적어도 하나의 활성화 함수에 대하여 각 활성화 함수의 유효 입력 범위를 유효범위 테이블(370)에 저장한다.The table storage unit (360) stores the valid input range of each activation function for at least one activation function in the valid range table (370).

본 실시예에서 ReLU6(), sigmoid() 및 Swish() 중에 하나를 활성화 함수로서 사용하는 경우, 테이블 저장부(360)는 복수의 후보 활성화 함수 ReLU6(), sigmoid() 및 Swish()에 대한 각각의 유효 입력 범위를 유효범위 테이블(370)에 저장한다.In the present embodiment, when one of ReLU6(), sigmoid(), and Swish() is used as an activation function, the table storage unit (360) stores the valid input ranges for each of the multiple candidate activation functions ReLU6(), sigmoid(), and Swish() in the valid range table (370).

캘리브레이션부(320)는 제1 연산 결과와 관련된 제1 활성화 함수가 어떤 함수인지 알고 있으므로, 캘리브레이션부(320)는 유효범위 테이블(370) 내에서 복수의 후보 활성화 함수 중 제1 활성화 함수를 검색한다.Since the calibration unit (320) knows which function is the first activation function related to the first operation result, the calibration unit (320) searches for the first activation function among the multiple candidate activation functions within the valid range table (370).

제1 연산 결과와 관련된 제1 활성화 함수가 유효범위 테이블(370) 내에 존재하는 경우, 캘리브레이션부(320)는 찾아낸 제1 활성화 함수와 관련한 유효 입력 범위를 유효범위 테이블(370)에서 획득하고, 획득된 유효 입력 범위를 제1 유효 입력 범위로서 설정한다.If the first activation function related to the first operation result exists in the valid range table (370), the calibration unit (320) obtains the valid input range related to the found first activation function from the valid range table (370) and sets the obtained valid input range as the first valid input range.

제1 양자화부(330)는 제1 유효 입력 범위에 기초하여 제1 연산 결과를 양자화하여 제1 양자화 결과를 생성한다. 제1 유효 입력 범위 외에 있는 제1 연산 결과에 대해서는 제1 양자화부(330)가 양자화를 수행하지 않는다.The first quantization unit (330) quantizes the first operation result based on the first valid input range to generate the first quantization result. The first quantization unit (330) does not perform quantization on the first operation result that is outside the first valid input range.

제1 연산 결과에 대해서 제1 유효 입력 범위 내로 한정하여 양자화하므로 최대, 백분위 및 엔트로피 등의 방법으로 캘리브레이션 하고 양자화하는 경우에 비해 더욱 촘촘한 양자화가 가능하므로 제1 연산 결과의 양자화로 인한 데이터 손실이 최소화될 가능성이 상대적으로 매우 크다.Since the first operation result is quantized within the first valid input range, a finer quantization is possible compared to cases where it is calibrated and quantized using methods such as maximum, percentile, and entropy, so there is a relatively high possibility that data loss due to quantization of the first operation result will be minimized.

제1 연산 결과와 관련된 제1 활성화 함수가 유효범위 테이블(370) 내에 존재하지 않는 경우도 발생할 수 있다.There may also be cases where the first activation function related to the first operation result does not exist within the valid range table (370).

사용자가 유효범위 테이블(370) 내에 존재하지 않는 제1 활성화 함수가 주어진 경우, 캘리브레이션부(320)는 주어진 제1 활성화 함수 상에서 입력값에 대한 출력값의 1차 미분값을 획득하고, 획득된 1차 미분값으로부터 제1 유효 입력 범위를 획득한다.If a user is given a first activation function that does not exist in the valid range table (370), the calibration unit (320) obtains the first derivative of the output value with respect to the input value on the given first activation function, and obtains the first valid input range from the obtained first derivative.

제1 활성화 함수의 제1 유효 입력 범위는 제1 활성화 함수의 1차 미분을 활용할 수 있다.The first valid input range of the first activation function can utilize the first derivative of the first activation function.

제1 활성화 함수의 1차 미분 값이 입력값의 일정 구간 동안 연속적으로 0의 값을 갖거나 입력값의 일정 구간 동안 0과의 차이가 기설정 크기 이하인 경우가 연속적으로 나타나는 경우, 상기 일정 구간에 기초하여 제1 활성화 함수의 제1 유효 입력 범위를 획득한다.When the first derivative of the first activation function has a value of 0 continuously for a certain interval of input values or the difference from 0 is less than or equal to a preset size continuously for a certain interval of input values, the first valid input range of the first activation function is obtained based on the certain interval.

활성화 함수 Swish()의 유효 입력 범위에 대한 정보가 유효범위 테이블(370) 내에 존재하지 않고 제1 활성화 함수로서 Swish()가 주어진 경우, 캘리브레이션부(320)는 주어진 제1 활성화 함수로서 Swish()의 1차 미분 값을 바탕으로 제1 활성화 함수의 제1 유효 입력 범위를 획득한다.If information on the valid input range of the activation function Swish() does not exist in the valid range table (370) and Swish() is given as the first activation function, the calibration unit (320) obtains the first valid input range of the first activation function based on the first derivative value of Swish() as the given first activation function.

도 5에 도시된 바와 같이, 제1 활성화 함수 Swish()에서 입력 구간 -6 이하에서는 Swish()의 1차 미분 값이 0과의 차이가 기설정 크기 이하인 경우가 일정 구간 동안 연속적으로 나타나고, 입력 구간 -6 이상의 영역에서는 Swish()의 1차 미분 값이 0과의 차이가 기설정 크기 이하인 경우가 일정 구간 동안 연속적으로 나타나 하지 않음을 알 수 있다.As illustrated in Fig. 5, in the first activation function Swish(), in the input interval -6 or less, cases in which the difference between the first derivative of Swish() and 0 is less than or equal to a preset size appear continuously for a certain interval, and in the input interval -6 or greater, cases in which the difference between the first derivative of Swish() and 0 is less than or equal to a preset size do not appear continuously for a certain interval.

따라서, 도 5의 경우는, 캘리브레이션부(320)는 Swish()의 1차 미분 값이 0과의 차이가 기설정 크기 이하인 경우가 일정 구간 동안 연속적으로 나타나는 구간을 제외한 [-6:INF]의 구간을 제1 활성화 함수 Swish()의 제1 유효 입력 범위로서 설정한다.Accordingly, in the case of Fig. 5, the calibration unit (320) sets the section of [-6:INF], excluding the section in which the difference between the first derivative value of Swish() and 0 is less than or equal to a preset size, as the first valid input range of the first activation function Swish().

만일, 활성화 함수 ReLU6()의 유효 입력 범위에 대한 정보가 유효범위 테이블(370) 내에 존재하지 않고 제1 활성화 함수로서 ReLU6()가 주어진 경우, 캘리브레이션부(320)는 주어진 제1 활성화 함수로서 ReLU6()의 1차 미분 값을 바탕으로 제1 활성화 함수의 제1 유효 입력 범위를 획득한다.If information on the valid input range of the activation function ReLU6() does not exist in the valid range table (370) and ReLU6() is given as the first activation function, the calibration unit (320) obtains the first valid input range of the first activation function based on the first derivative value of ReLU6() as the given first activation function.

ReLU6(x)의 경우, 입력값 x가 0 이하인 경우 및 입력값 x가 6 이상인 경우에는 출력값이 일정하므로 ReLU6(x)의 1차 미분값이 연속해서 0으로 나타남을 알 수 있다.In the case of ReLU6(x), we can see that the first derivative of ReLU6(x) continuously appears as 0 because the output value is constant when the input value x is less than or equal to 0 and when the input value x is greater than or equal to 6.

따라서, 캘리브레이션부(320)는 ReLU6()의 1차 미분값이 연속해서 0으로 나타나는 x<0, x>6의 구간을 제외한 [0:6]의 구간을 제1 활성화 함수 ReLU6()의 제1 유효 입력 범위로서 설정한다.Accordingly, the calibration unit (320) sets the interval [0:6], excluding the interval x<0, x>6 where the first derivative of ReLU6() continuously appears as 0, as the first valid input range of the first activation function ReLU6().

유효범위 테이블(370) 내에 존재하지 않는 제1 활성화 함수가 사용되는 경우, 캘리브레이션부(320)는 제1 연산 결과와 관련한 예상 하한 값 및 예상 상한 값을 획득한다.When a first activation function that does not exist in the valid range table (370) is used, the calibration unit (320) obtains an expected lower limit value and an expected upper limit value related to the first operation result.

캘리브레이션부(320)는 예상 하한 값과 예상 상한 값 사이의 입력값에 대하여 1차 미분값이 연속적으로 0의 값을 갖거나 제1 활성화 함수의 1차 미분값이 연속적으로 0과의 차이가 기설정 크기 이하인 일정 구간을 획득하고, 예상 하한 값 및 예상 상한 값 사이의 구간 중에서 일정 구간을 제외한 나머지 구간을 제1 활성화 함수의 제1 유효 입력 범위로서 결정한다.The calibration unit (320) obtains a certain section in which the first derivative has a value of 0 continuously for input values between an expected lower limit value and an expected upper limit value or in which the difference between the first derivative of the first activation function and 0 continuously is less than or equal to a preset size, and determines the remaining section, excluding the certain section, among the sections between the expected lower limit value and the expected upper limit value, as the first valid input range of the first activation function.

캘리브레이션부(320)는 0으로부터 예상 하한 값에 이르는 구간의 제1 활성화 함수의 입력값에 대하여 1차 미분값이 연속적으로 0의 값을 갖거나 1차 미분값이 연속적으로 0과의 차이가 기설정 크기 이하인 제1 일정 구간을 획득하고, 0으로부터 예상 상한 값에 이르는 구간의 제1 활성화 함수의 입력값에 대하여 제1 활성화 함수의 1차 미분값이 연속적으로 0의 값을 갖거나 1차 미분값이 연속적으로 0과의 차이가 기설정 크기 이하인 제2 일정 구간을 획득하고, 제1 일정 구간 이상이고 제2 일정 구간 이하의 범위를 제1 유효 입력 범위로서 결정한다.The calibration unit (320) obtains a first predetermined interval in which the first derivative of the first activation function in the interval from 0 to the expected lower limit value has a value of 0 continuously or the difference between the first derivative and 0 continuously is equal to or smaller than a preset size, obtains a second predetermined interval in which the first derivative of the first activation function in the interval from 0 to the expected upper limit value has a value of 0 continuously or the difference between the first derivative and 0 continuously is equal to or smaller than a preset size, and determines a range that is equal to or larger than the first predetermined interval and equal to or smaller than the second predetermined interval as a first valid input range.

이와 같이, 제1 일정 구간 및 제2 일정 구간을 획득하는 과정을 수행하는 경우 제1 일정 구간보나 작은 다른 입력값에 대해서는 1차 미분값을 확인할 필요가 없고 제2 일정 구간보나 큰 다른 입력값에 대해서는 1차 미분값을 확인할 필요가 없게 되어 효율적으로 제1 유효 입력 범위를 결정할 수 있다.In this way, when performing the process of obtaining the first and second schedule intervals, there is no need to check the first derivative for the first schedule interval or other smaller input values, and there is no need to check the first derivative for the second schedule interval or other larger input values, so that the first valid input range can be efficiently determined.

전술하였듯이, 제1 양자화부(330)는 제1 유효 입력 범위에 기초하여 제1 연산 결과를 양자화하여 제1 양자화 결과를 생성한다. 제1 유효 입력 범위 외에 있는 제1 연산 결과에 대해서는 제1 양자화부(330)가 양자화 대상에서 제외됨으로써 제1 연산 결과에 대한 양자화 결과가 조금 더 정확하게 표현될 수 있게 해준다.As described above, the first quantization unit (330) quantizes the first operation result based on the first valid input range to generate the first quantization result. For the first operation result outside the first valid input range, the first quantization unit (330) excludes the first operation result from the quantization target, thereby allowing the quantization result for the first operation result to be expressed more accurately.

활성화부(150)는 제1 양자화 결과를 제1 활성화 함수에 입력시켜 제1 활성화 출력을 생성한다.The activation unit (150) inputs the first quantization result into the first activation function to generate a first activation output.

활성화부(150)가 사용하는 제1 활성화 함수의 유효 입력 범위가 제1 양자화 결과의 출력 범위와 서로 동일하기 때문에 제1 양자화 결과에 의한 양자화 손실이 최소화됨을 알 수 있다.It can be seen that the quantization loss due to the first quantization result is minimized because the effective input range of the first activation function used by the activation unit (150) is identical to the output range of the first quantization result.

제2 양자화부(350)는 제1 활성화 출력을 양자화하여 제2 양자화 결과를 생성한다. 제2 양자화 시에는 제1 활성화 출력에 대하여 캘리브레이션을 수행하여 제1 활성화 출력에 대한 유효 양자화 입력 범위가 획득될 수 있다.The second quantization unit (350) quantizes the first activation output to generate a second quantization result. During the second quantization, calibration is performed on the first activation output so that an effective quantization input range for the first activation output can be obtained.

제1 활성화 출력에 대한 캘리브레이션 시에는 최대, 백분위 및 엔트로피 방법 중에서 가장 효율적인 방법이 선택될 수 있다.When calibrating the first activation output, the most efficient method among the maximum, percentile and entropy methods can be selected.

본 개시의 일 실시예에 따른 캘리브레이션 방법은 캘리브레이션 장치(300)에 의하여 이루어진다.A calibration method according to one embodiment of the present disclosure is performed by a calibration device (300).

연산부(310)는 인공신경망의 연산 레이어에서 제1 연산 결과를 생성하는 연산 과정을 수행한다(S610).The operation unit (310) performs an operation process to generate a first operation result in the operation layer of the artificial neural network (S610).

캘리브레이션부(320)는 연산 레이어에 인접한 다른 레이어의 정보를 이용하여 제1 유효 입력 범위를 획득하는 캘리브레이션 과정을 수행한다(S620).The calibration unit (320) performs a calibration process to obtain a first valid input range by using information from another layer adjacent to the operation layer (S620).

제1 양자화부(330)는 제1 유효 입력 범위에 기초하여 제1 연산 결과를 양자화하여 제1 양자화 결과를 생성하는 제1 양자화 과정을 수행한다(S630).The first quantization unit (330) performs a first quantization process to generate a first quantization result by quantizing the first operation result based on the first valid input range (S630).

활성화부(340)는 제1 양자화 결과를 제1 활성화 함수에 입력시켜 제1 활성화 출력을 생성하는 활성화 과정을 수행한다(S640).The activation unit (340) performs an activation process that inputs the first quantization result into the first activation function to generate a first activation output (S640).

본 발명에 따른 장치 또는 방법의 각 구성요소는 하드웨어 또는 소프트웨어로 구현되거나, 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다. 또한, 각 구성요소의 기능이 소프트웨어로 구현되고 마이크로프로세서가 각 구성요소에 대응하는 소프트웨어의 기능을 실행하도록 구현될 수도 있다.Each component of the device or method according to the present invention may be implemented as hardware or software, or as a combination of hardware and software. In addition, the function of each component may be implemented as software, and a microprocessor may be implemented to execute the function of the software corresponding to each component.

본 명세서에 설명되는 시스템들 및 기법들의 다양한 구현예들은, 디지털 전자 회로, 집적회로, FPGA(field programmable gate array), ASIC(application specific integrated circuit), 컴퓨터 하드웨어, 펌웨어, 소프트웨어, 및/또는 이들의 조합으로 실현될 수 있다. 이러한 다양한 구현예들은 프로그래밍가능 시스템 상에서 실행 가능한 하나 이상의 컴퓨터 프로그램들로 구현되는 것을 포함할 수 있다. 프로그래밍가능 시스템은, 저장 시스템, 적어도 하나의 입력 디바이스, 그리고 적어도 하나의 출력 디바이스로부터 데이터 및 명령들을 수신하고 이들에게 데이터 및 명령들을 전송하도록 결합되는 적어도 하나의 프로그래밍가능 프로세서(이것은 특수 목적 프로세서일 수 있거나 혹은 범용 프로세서일 수 있음)를 포함한다. 컴퓨터 프로그램들(이것은 또한 프로그램들, 소프트웨어, 소프트웨어 애플리케이션들 혹은 코드로서 알려져 있음)은 프로그래밍가능 프로세서에 대한 명령어들을 포함하며 "컴퓨터가 읽을 수 있는 기록매체"에 저장된다.Various implementations of the systems and techniques described herein can be implemented as digital electronic circuits, integrated circuits, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), computer hardware, firmware, software, and/or combinations thereof. These various implementations can include implementations of one or more computer programs executable on a programmable system. The programmable system includes at least one programmable processor (which may be a special purpose processor or a general purpose processor) coupled to receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device. Computer programs (also known as programs, software, software applications, or code) include instructions for the programmable processor and are stored on a "computer-readable medium."

컴퓨터가 읽을 수 있는 기록매체는, 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 이러한 컴퓨터가 읽을 수 있는 기록매체는 ROM, CD-ROM, 자기 테이프, 플로피디스크, 메모리 카드, 하드 디스크, 광자기 디스크, 스토리지 디바이스 등의 비휘발성(non-volatile) 또는 비일시적인(non-transitory) 매체일 수 있으며, 또한 데이터 전송 매체(data transmission medium)와 같은 일시적인(transitory) 매체를 더 포함할 수도 있다. 또한, 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수도 있다.A computer-readable recording medium includes any type of recording device that stores data that can be read by a computer system. Such a computer-readable recording medium can be a non-volatile or non-transitory medium, such as a ROM, a CD-ROM, a magnetic tape, a floppy disk, a memory card, a hard disk, a magneto-optical disk, a storage device, and may further include a transitory medium, such as a data transmission medium. In addition, the computer-readable recording medium can be distributed over a network-connected computer system, so that the computer-readable code can be stored and executed in a distributed manner.

본 명세서의 흐름도/타이밍도에서는 각 과정들을 순차적으로 실행하는 것으로 기재하고 있으나, 이는 본 개시의 일 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것이다. 다시 말해, 본 개시의 일 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 개시의 일 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 흐름도/타이밍도에 기재된 순서를 변경하여 실행하거나 각 과정들 중 하나 이상의 과정을 병렬적으로 실행하는 것으로 다양하게 수정 및 변형하여 적용 가능할 것이므로, 흐름도/타이밍도는 시계열적인 순서로 한정되는 것은 아니다.Although the flowchart/timing diagram of this specification describes each process as being executed sequentially, this is only an illustrative description of the technical idea of one embodiment of the present disclosure. In other words, a person having ordinary skill in the art to which one embodiment of the present disclosure belongs may change and modify and apply various modifications and variations such as changing the order described in the flowchart/timing diagram and executing it or executing one or more of the processes in parallel without departing from the essential characteristics of one embodiment of the present disclosure. Therefore, the flowchart/timing diagram is not limited to a chronological order.

이상의 설명은 본 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 실시예들은 본 실시예의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 실시예의 기술 사상의 범위가 한정되는 것은 아니다. 본 실시예의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 실시예의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely an illustrative description of the technical idea of the present embodiment, and those with ordinary skill in the art to which the present embodiment belongs may make various modifications and variations without departing from the essential characteristics of the present embodiment. Therefore, the present embodiments are not intended to limit the technical idea of the present embodiment, but to explain it, and the scope of the technical idea of the present embodiment is not limited by these embodiments. The protection scope of the present embodiment should be interpreted by the following claims, and all technical ideas within a scope equivalent thereto should be interpreted as being included in the scope of the rights of the present embodiment.

[부호의 설명][Explanation of symbols]

300: 캘리브레이션 장치 310: 연산부300: Calibration device 310: Operation unit

320: 캘리브레이션부 330: 제1 양자화부320: Calibration section 330: First quantization section

340: 활성화부 350: 제2 양자화부340: Activation section 350: Second quantization section

360: 테이블 저장부 370: 유효범위 테이블360: Table storage 370: Valid range table

후원 연구 또는 개발에 관한 진술Statement regarding sponsored research or development

본 발명은 연구과제(과제고유번호: 1711117060, 세부과제번호: 2020-0-01305-001, 부처명: 과학기술정보통신부, 과제관리(전문)기관명: 정보통신기획평가원, 연구사업명: 차세대지능형반도체기술개발(설계)R&D, 연구과제명: 2,000 TFLOPS급 서버 인공지능 딥러닝 프로세서 및 모듈 개발, 기여율: 1/1, 과제수행기관명: 에스케이텔레콤(주), 주식회사 사피온코리아, 연구기간: 2020.04.01. ~ 2027.12.31.)의 결과물이다.The present invention is a result of a research project (Project Unique Number: 1711117060, Subproject Number: 2020-0-01305-001, Ministry: Ministry of Science and ICT, Project Management (Specialized) Agency: National IT Industry Promotion Agency, Research Project Name: Next-Generation Intelligent Semiconductor Technology Development (Design) R&D, Research Project Name: Development of 2,000 TFLOPS-Class Server Artificial Intelligence Deep Learning Processor and Module, Contribution Rate: 1/1, Project Performing Agency: SK Telecom Co., Ltd., Sapion Korea Co., Ltd., Research Period: 2020.04. 01. ~ 2027.12. 31.).

CROSS-REFERENCE TO RELATED APPLICATIONCROSS-REFERENCE TO RELATED APPLICATION

본 특허출원은, 본 명세서에 그 전체가 참고로서 포함되는, 2023년 11월 21일에 한국에 출원한 특허출원번호 제10-2023-0162768호 및 2023년 12월 13일에 한국에 출원한 특허출원번호 제10-2023-0181018호에 대해 우선권을 주장한다.This patent application claims priority to Korean patent application No. 10-2023-0162768, filed on November 21, 2023, and Korean patent application No. 10-2023-0181018, filed on December 13, 2023, which are incorporated herein by reference in their entirety.

Claims

A computational process that produces the first computational result in the computational layer of an artificial neural network;

A calibration process for obtaining a first valid input range by using information from another layer adjacent to the above operation layer;

A first quantization process that quantizes the first operation result based on the first valid input range to generate a first quantization result; and

An activation process that inputs the first quantization result to the first activation function to generate a first activation output.

A calibration method comprising:

In the first paragraph,

The above calibration process is,

A calibration method characterized by obtaining the first valid input range related to the relationship between the input value of the first activation function and the output value of the first activation function for the first activation function of the activation layer related to the first operation result.

In the second paragraph,

The above activation layer is,

A calibration method characterized in that the activation layer appears immediately after quantization of the first operation result.

In the second paragraph,

Further comprising a table storage process for storing the valid input range of each activation function in a valid range table for at least one activation function,

The above calibration process is,

A calibration method characterized by obtaining the first activation function among the at least one activation function within the above valid range table, and obtaining the valid input range related to the obtained first activation function as the first valid input range.

In the second paragraph,

The above calibration process is,

A calibration method characterized by obtaining a first derivative of the output value with respect to the input value and obtaining the first valid input range from the first derivative.

In paragraph 5,

The above calibration process is,

A calibration method characterized in that the first valid input range is obtained based on the predetermined interval when the first differential value has a value of 0 continuously for a certain interval of the input value or the difference from 0 continuously for a certain interval of the input value is less than or equal to a preset size.

In Article 6,

The above calibration process is,

Obtain the expected lower bound value and the expected upper bound value related to the above first operation result,

A calibration method characterized in that the first differential value continuously has a value of 0 for the input value between the expected lower limit value and the expected upper limit value, or the first differential value continuously has a difference from 0 less than or equal to a preset size, is obtained in a predetermined interval, and an interval remaining between the expected lower limit value and the expected upper limit value, excluding the predetermined interval, is determined as the first valid input range.

In Article 6,

The above calibration process is,

A calibration method characterized in that a first predetermined interval is obtained in which the first derivative continuously has a value of 0 for the input values in the interval from 0 to the expected lower limit value or the difference between the first derivative and 0 is less than or equal to a preset size, a second predetermined interval is obtained in which the first derivative continuously has a value of 0 for the input values in the interval from 0 to the expected upper limit value or the difference between the first derivative and 0 is less than or equal to a preset size, and a range of values greater than or equal to the first predetermined interval and less than the second predetermined interval is determined as the first valid input range.

A computer program stored on a computer-readable recording medium for executing each process included in a calibration method according to any one of claims 1 to 8.

A computational unit that generates the first computational result in the computational layer of an artificial neural network;

A calibration unit that obtains a first valid input range by using information of another layer adjacent to the above operation layer;

A first quantization unit that quantizes the first operation result based on the first valid input range to generate a first quantization result; and

An activation unit that inputs the first quantization result to the first activation function to generate a first activation output.

A calibration device comprising:

In Article 10,

The above calibration unit,

A calibration device characterized in that it obtains the first valid input range related to the relationship between the input value of the first activation function and the output value of the first activation function for the first activation function of the activation layer related to the first operation result.

In Article 11,

Further comprising a table storage unit storing the valid input range of each activation function in a valid range table for at least one activation function,

The above calibration unit,

A calibration device characterized in that it obtains the first activation function among the at least one activation function within the above valid range table, and obtains the valid input range related to the obtained first activation function as the first valid input range.

In Article 11,

The above calibration unit,

A calibration device characterized in that it obtains the first derivative of the output value with respect to the input value and obtains the first valid input range from the first derivative.