KR20230121401A

KR20230121401A - Precision-scalable computing-in-memory for quantized neural networks

Info

Publication number: KR20230121401A
Application number: KR1020220018230A
Authority: KR
Inventors: 정성우; 이영서; 공영호
Original assignee: 고려대학교 산학협력단
Priority date: 2022-02-11
Filing date: 2022-02-11
Publication date: 2023-08-18
Anticipated expiration: 2042-02-11
Also published as: KR102737193B1

Abstract

The present invention discloses a precision-scalable memory internal operation method for a quantized neural network and a device therefor. According to the present invention, provided is a memory internal operation device, comprising: an input buffer which stores n-bit input vectors and sequentially outputs n first binary vectors obtained from individual bit positions from a highest bit to a lowest bit; a memory array including a plurality of sub-arrays storing a weight matrix, wherein each of the plurality of sub-arrays includes a plurality of memory cells, and at least a portion of the plurality of sub-arrays stores n second binary vectors obtained at individual bit positions from a highest bit to a lowest bit of the weight matrix and sequentially performs a binary matrix-vector product operation of one of the n first binary vectors and the n second binary vectors within the plurality of memory cells of each of the plurality of sub-arrays; an accumulator which receives a result of the binary matrix-vector product operation from the memory array and accumulates the received result; an output buffer which collects and outputs results of binary matrix-vector product operations performed n times repeatedly for each of the n first binary vectors from the accumulator; and a control unit which controls the input buffer, the memory array, the accumulator, and the output buffer. According to the present invention, since operations are performed inside a memory, performance and energy efficiency can be significantly increased compared to existing accelerator structures.

Description

Precision-scalable computing-in-memory for quantized neural networks

본 발명은 양자화 신경망을 위한 정밀도 변환 가능 메모리 내부 연산 방법 및 장치에 관한 것이다.The present invention relates to a precision convertible memory internal operation method and apparatus for a quantization neural network.

기존 컨볼루션 신경망(convolution neural network, CNN) 내 연산의 대부분을 차지하고 있는 행렬 벡터 곱(matrix-vector multiplication, MVM) 연산의 성능 향상 및 에너지 절감을 위해서, 양자화 신경망 (quantized neural network, QNN)이 등장했다. In order to improve the performance and save energy of matrix-vector multiplication (MVM) operations, which account for most of the operations in existing convolution neural networks (CNNs), quantized neural networks (QNNs) have emerged. did.

양자화 신경망은 기존 부동 소수점(floating-point) 기반의 데이터를 정수(integer) 기반의 데이터로 변경함으로써, MVM 연산을 빠르게 수행하여 성능을 향상시키고 에너지를 절감하지만, 네트워크의 정확도(accuracy)가 감소할 수 있다는 단점이 있다. Quantization neural networks improve performance and save energy by quickly performing MVM operations by changing existing floating-point-based data to integer-based data, but the accuracy of the network may decrease. There are downsides to being able to.

정수 기반 데이터의 비트 정밀도(bit precision)가 더 작아질수록, 연산하는 단위가 작아지는 것이므로 성능 향상 및 에너지 절감이 가능하지만, 데이터가 가지고 있는 값의 정밀성은 더 낮아지기 때문에 네트워크의 정확도는 더 감소한다. As the bit precision of integer-based data becomes smaller, the unit of operation becomes smaller, so performance can be improved and energy can be saved. .

그러므로, 비트 정밀도와 네트워크의 정확도를 함께 고려하면, 특정 네트워크가 필요로 하는 정확도 요구량을 유지하면서 성능 및 에너지 측면의 최적화를 달성할 수 있다. Therefore, considering the bit precision and accuracy of the network together, it is possible to achieve optimization in terms of performance and energy while maintaining the accuracy required by a specific network.

이를 위해, 단일 비트 정밀도만을 제공하는 기존 가속기들(예를 들어, neural processing unit in mobile APs)과는 다르게 여러 비트 정밀도를 동시에 제공하는 가속기들이 등장했다. 하지만, 여러 비트 정밀도를 동시에 제공하는 가속기들은 단일 비트 정밀도를 제공하는 가속기 대비 상대적으로 더 복잡한 연산 장치를 필요로 하기 때문에 성능, 에너지, 그리고 면적 측면에서 추가적인 overhead를 야기하는 문제가 있다.To this end, unlike conventional accelerators (eg, neural processing units in mobile APs) that provide only single bit precision, accelerators that simultaneously provide multiple bit precisions have appeared. However, since accelerators that simultaneously provide multiple bit precision require relatively more complex computing devices than accelerators that provide single bit precision, there is a problem of causing additional overhead in terms of performance, energy, and area.

KR 등록특허 10-2233174KR Registered Patent No. 10-2233174

상기한 종래기술의 문제점을 해결하기 위해, 본 발명은 성능, 에너지, 그리고 면적 측면에서의 추가적인 오버헤드를 절감하면서 다중 비트 정밀도를 제공할 수 있는 양자화 신경망을 위한 정밀도 변환 가능 메모리 내부 연산 방법 및 장치를 제안하고자 한다. In order to solve the problems of the prior art, the present invention provides a precision convertible memory internal calculation method and apparatus for a quantization neural network capable of providing multi-bit precision while reducing additional overhead in terms of performance, energy, and area. would like to propose

상기한 바와 같은 목적을 달성하기 위하여, 본 발명의 일 실시예에 따르면, 양자화 신경망을 위한 정밀도 변환 가능 메모리 내부 연산 장치로서, n 비트의 입력 벡터를 저장하며 최상위 비트에서 최하위 비트까지 개별 비트 위치에서 획득된 n개의 제1 이진 벡터를 순차적으로 출력하는 입력 버퍼; 가중치 행렬(weight matrix)을 저장하는 복수의 서브 어레이를 포함하는 메모리 어레이-상기 복수의 서브 어레이 각각은 복수의 메모리 셀을 포함하고, 상기 복수의 서브 어레이의 적어도 일부는 상기 가중치 행렬의 최상위 비트에서 최하위 비트까지 개별 비트 위치에서 획득된 n개의 제2 이진 벡터를 저장하고, 상기 복수의 서브 어레이 각각의 상기 복수의 메모리 셀 내부에서는 상기 n개의 제1 이진 벡터 중 하나와 상기 n개의 제2 이진 벡터의 이진 행렬 벡터 곱 연산을 순차적으로 수행함-; 상기 메모리 어레이로부터 상기 이진 행렬 벡터 곱 연산의 결과를 입력 받아 누산하는 누산기; 상기 누산기로부터 상기 n개의 제1 이진 벡터 각각에 대해 n번 반복 수행된 이진 행렬 벡터 곱 연산의 결과를 수합하여 출력하는 출력 버퍼; 및 상기 입력 버퍼, 메모리 어레이, 누산기 및 출력 버퍼를 제어하는 제어 유닛을 포함하는 메모리 내부 연산 장치가 제공된다. In order to achieve the above object, according to an embodiment of the present invention, as a precision convertible memory internal arithmetic device for a quantization neural network, an n-bit input vector is stored and at individual bit positions from the most significant bit to the least significant bit. an input buffer for sequentially outputting n first binary vectors obtained; A memory array including a plurality of sub-arrays storing a weight matrix, each of the plurality of sub-arrays including a plurality of memory cells, and at least a portion of the plurality of sub-arrays in the most significant bit of the weight matrix. n second binary vectors obtained from individual bit positions up to the least significant bit are stored, and one of the n first binary vectors and the n second binary vectors are stored in the plurality of memory cells of each of the plurality of sub arrays. Sequentially perform binary matrix vector multiplication of -; an accumulator receiving and accumulating the result of the binary matrix vector multiplication operation from the memory array; an output buffer for collecting and outputting results of a binary matrix vector multiplication operation repeatedly performed n times for each of the n first binary vectors from the accumulator; and a control unit controlling the input buffer, the memory array, the accumulator, and the output buffer.

상기 메모리 셀 각각은 SRAM 또는 MRAM 셀에 포함된 2개의 트랜지스터 각각 일측에 연결되는 제1 트랜지스터 및 제2 트랜지스터와, 상기 제1 및 제2 트랜지스터의 일측에 연결되는 제3 트랜지스터를 포함할 수 있다. Each of the memory cells may include a first transistor and a second transistor connected to one side of two transistors included in an SRAM or MRAM cell, and a third transistor connected to one side of the first and second transistors.

상기 복수의 서브 어레이 각각은 미리 설정된 개수의 메모리 셀과 연결되는 팝카운트 연산기를 포함하고, 상기 복수의 메모리 셀 내부에서의 AND 연산과 상기 팝카운트 연산기의 팝카운트 연산을 통해 상기 이진 행렬 벡터 곱 연산이 수행될 수 있다. Each of the plurality of subarrays includes a pop count operator connected to a preset number of memory cells, and the binary matrix vector product operation is performed through an AND operation inside the plurality of memory cells and a pop count operation of the pop count operator. this can be done

상기 복수의 서브 어레이가 n보다 큰 경우, 상기 복수의 서브 어레이 각각에는 복수의 가중치 행렬의 최상위 비트에서 최하위 비트까지 개별 비트 위치에서 획득된 복수의 제2 이진 벡터가 저장될 수 있다. When the plurality of subarrays is greater than n, a plurality of second binary vectors obtained at individual bit positions from the most significant bit to the least significant bit of the plurality of weight matrices may be stored in each of the plurality of subarrays.

상기 복수의 서브 어레이의 개수가 n이고, 비트 정밀도가 n/k로 설정되는 경우, 상기 복수의 서브 어레이 각각에는 k개의 가중치 행렬의 최상위 비트에서 최하위 비트까지 개별 비트 위치에서 획득된 복수의 제2 이진 벡터가 저장될 수 있다. When the number of the plurality of subarrays is n and the bit precision is set to n/k, each of the plurality of subarrays includes a plurality of second pluralities obtained at individual bit positions from the most significant bit to the least significant bit of k weight matrices. Binary vectors can be stored.

상기 메모리 어레이와 상기 누산기 사이에는 상기 이진 행렬 벡터 곱 연산의 결과를 임시 저장하는 레지스터 및 상기 이진 행렬 벡터 곱 연산의 결과를 비트 정밀도에 상응하게 시프트 연산하여 이동시키는 시프터가 배치될 수 있다. A register temporarily storing a result of the binary matrix vector multiplication operation and a shifter shifting the result of the binary matrix vector multiplication operation by performing a shift operation corresponding to bit precision may be disposed between the memory array and the accumulator.

상기 출력 버퍼는 글로벌 누산기를 포함하며, 상기 글로벌 누산기는 상기 n번 반복 수행된 이진 행렬 벡터 곱 연산의 결과를 수합할 수 있다. The output buffer may include a global accumulator, and the global accumulator may collect results of the binary matrix vector multiplication operation repeatedly performed n times.

본 발명의 다른 측면에 따르면, 양자화 신경망을 위한 정밀도 변환 가능 메모리 내부 연산 방법으로서, 입력 버퍼가, n 비트의 입력 벡터를 저장하며 최상위 비트에서 최하위 비트까지 개별 비트 위치에서 획득된 n개의 제1 이진 벡터를 순차적으로 출력하는 단계; 가중치 행렬(weight matrix)을 저장하는 복수의 서브 어레이를 포함하는 메모리 어레이가 이진 행렬 벡터 곱 연산을 순차적으로 수행하는 단계-상기 복수의 서브 어레이 각각은 복수의 메모리 셀을 포함하고, 상기 복수의 서브 어레이의 적어도 일부는 상기 가중치 행렬의 최상위 비트에서 최하위 비트까지 개별 비트 위치에서 획득된 n개의 제2 이진 벡터를 저장하고, 상기 복수의 서브 어레이 각각의 상기 복수의 메모리 셀 내부에서는 상기 n개의 제1 이진 벡터 중 하나와 상기 n개의 제2 이진 벡터의 이진 행렬 벡터 곱 연산을 순차적으로 수행함-; 누산기가 상기 메모리 어레이로부터 상기 이진 행렬 벡터 곱 연산의 결과를 입력 받아 누산하는 단계; 출력 버퍼가 상기 누산기로부터 상기 n개의 제1 이진 벡터 각각에 대해 n번 반복 수행된 이진 행렬 벡터 곱 연산의 결과를 수합하여 출력하는 단계를 포함하는 메모리 내부 연산 방법이 제공된다. According to another aspect of the present invention, as a precision convertible memory internal operation method for a quantization neural network, an input buffer stores an input vector of n bits and n first binaries obtained at individual bit positions from the most significant bit to the least significant bit. sequentially outputting vectors; sequentially performing a binary matrix vector multiplication operation by a memory array including a plurality of subarrays for storing a weight matrix, wherein each of the plurality of subarrays includes a plurality of memory cells, and each of the plurality of subarrays includes a plurality of memory cells; At least a part of the array stores n second binary vectors obtained at individual bit positions from the most significant bit to the least significant bit of the weight matrix, and inside the plurality of memory cells of each of the plurality of subarrays, the n first binary vectors are stored. sequentially performing a binary matrix vector multiplication operation of one of the binary vectors and the n second binary vectors; receiving, by an accumulator, the result of the binary matrix vector multiplication operation from the memory array and accumulating the result; and outputting, by an output buffer, a result of a binary matrix vector multiplication operation repeatedly performed n times for each of the n first binary vectors from the accumulator and outputting the result.

본 발명의 또 다른 측면에 따르면, 양자화 신경망을 위해 제어 유닛에 의해 수행되는 정밀도 변환 가능 메모리 내부 연산 방법으로서, 입력 버퍼를 제어하여 n 비트의 입력 벡터를 저장하며 최상위 비트에서 최하위 비트까지 개별 비트 위치에서 획득된 n개의 제1 이진 벡터를 순차적으로 출력하는 단계; 가중치 행렬(weight matrix)을 저장하는 복수의 서브 어레이를 포함하는 메모리 어레이를 제어하여 이진 행렬 벡터 곱 연산을 순차적으로 수행하는 단계-상기 복수의 서브 어레이 각각은 복수의 메모리 셀을 포함하고, 상기 복수의 서브 어레이의 적어도 일부는 상기 가중치 행렬의 최상위 비트에서 최하위 비트까지 개별 비트 위치에서 획득된 n개의 제2 이진 벡터를 저장하고, 상기 복수의 서브 어레이 각각의 상기 복수의 메모리 셀 내부에서는 상기 n개의 제1 이진 벡터 중 하나와 상기 n개의 제2 이진 벡터의 이진 행렬 벡터 곱 연산을 순차적으로 수행함-; 누산기를 제어하여 상기 메모리 어레이로부터 상기 이진 행렬 벡터 곱 연산의 결과를 입력 받아 누산하는 단계; 출력 버퍼를 제어하여 상기 누산기로부터 상기 n개의 제1 이진 벡터 각각에 대해 n번 반복 수행된 이진 행렬 벡터 곱 연산의 결과를 수합하여 출력하는 단계를 포함하는 메모리 내부 연산 방법이 제공된다. According to another aspect of the present invention, a precision convertible in-memory operation method performed by a control unit for a quantization neural network, which controls an input buffer to store an input vector of n bits, and individual bit positions from the most significant bit to the least significant bit. sequentially outputting n first binary vectors obtained from; sequentially performing a binary matrix vector multiplication operation by controlling a memory array including a plurality of subarrays for storing a weight matrix, wherein each of the plurality of subarrays includes a plurality of memory cells; At least some of the subarrays of store n second binary vectors obtained at individual bit positions from the most significant bit to the least significant bit of the weight matrix, and inside the plurality of memory cells of each of the plurality of subarrays, the n sequentially performing a binary matrix vector multiplication operation of one of the first binary vectors and the n second binary vectors; controlling an accumulator to receive and accumulate the result of the binary matrix vector multiplication operation from the memory array; and controlling an output buffer to collect and output results of a binary matrix vector multiplication operation repeatedly performed n times for each of the n first binary vectors from the accumulator.

본 발명에 따르면, 다중 비트 정밀도를 제공하기 때문에 특정 정확도 요구량을 유지하면서 성능 및 에너지 측면의 최적화를 달성할 수 있을 뿐만 아니라, 메모리 내부에서 연산을 수행하기 때문에 기존 가속기 구조 대비 큰 폭으로 성능 및 에너지 효율이 증가하는 장점이 있다. According to the present invention, since multi-bit precision is provided, it is possible to achieve optimization in terms of performance and energy while maintaining a specific accuracy requirement, and since calculation is performed inside the memory, performance and energy are greatly improved compared to existing accelerator structures. It has the advantage of increasing efficiency.

또한, 곱셈기를 통해 연산하는 것이 아닌 메모리 셀 내부의 연산을 활용하기 때문에 기존 가속기 구조 대비 면적 측면에서도 큰 장점이 있다.In addition, there is a great advantage in terms of area compared to the existing accelerator structure because it utilizes the calculation inside the memory cell rather than the calculation through the multiplier.

도 1은 본 발명의 바람직한 일 실시예에 따른 양자화 신경망을 위한 정밀도 변환 가능 메모리 내부 연산 장치의 구성을 도시한 도면이다.
도 2는 본 실시예에 따른 개별 서브 어레이의 상세 구성을 도시한 도면이다.
도 3은 본 실시예에 따른 메모리 셀 구조를 예시적으로 도시한 도면이다.
도 4는 AND 연산 및 XNOR 연산 테이블을 나타낸 도면이다.
도 5는 본 실시예에 따른 메모리 셀 내부 연산을 이용한 이진 MVM 연산을 예시적으로 설명하기 위한 도면이다.
도 6은 8비트 정밀도를 갖는 MVM 연산을 설명하기 위한 도면이다.
도 7은 2비트 정밀도를 갖는 MVM 연산을 설명하기 위한 도면이다. 1 is a diagram showing the configuration of a precision convertible memory internal arithmetic device for a quantization neural network according to a preferred embodiment of the present invention.
2 is a diagram showing the detailed configuration of individual sub arrays according to this embodiment.
3 is a diagram exemplarily illustrating a memory cell structure according to the present embodiment.
4 is a diagram showing AND operation and XNOR operation tables.
5 is a diagram for illustratively explaining a binary MVM operation using an internal operation of a memory cell according to an exemplary embodiment.
6 is a diagram for explaining MVM operation with 8-bit precision.
7 is a diagram for explaining MVM operation with 2-bit precision.

본 발명은 다양한 변경을 가할 수 있고 여러가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세하게 설명하고자 한다.Since the present invention can make various changes and have various embodiments, specific embodiments will be illustrated in the drawings and described in detail.

그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. However, this is not intended to limit the present invention to specific embodiments, and should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present invention.

본 실시예에 따른 양자화 신경망을 위한 정밀도 변환 가능 메모리 내부 연산 방법은 다중 비트 정밀도의 MVM 연산을 이진 MVM 연산으로 변환하고, 이진 MVM 연산을 메모리 셀 내부에 추가적으로 배치한 3개의 트랜지스터(transistor)를 활용한 bitwise AND 연산과, 주변 회로(peripheral circuit)에 추가적으로 배치한 shifter와 accumulator를 활용하여 수행한다. The precision convertible memory internal operation method for a quantization neural network according to the present embodiment converts multi-bit precision MVM operation into binary MVM operation, and utilizes three transistors additionally placed inside the memory cell to perform the binary MVM operation. It is performed using a bitwise AND operation and a shifter and accumulator additionally placed in the peripheral circuit.

도 1은 본 발명의 바람직한 일 실시예에 따른 양자화 신경망을 위한 정밀도 변환 가능 메모리 내부 연산 장치의 구성을 도시한 도면이다. 1 is a diagram showing the configuration of a precision convertible memory internal arithmetic device for a quantization neural network according to a preferred embodiment of the present invention.

도 1에 도시된 바와 같이, 본 실시예에 따른 장치는 입력 버퍼(input buffer, 100), 메모리 어레이(memory array, 102), 레지스터(register, 104), 시프터(shifter, 106), 누산기(Accumulator, 108), 출력 버퍼(output buffer, 110) 및 제어 유닛(control unit, 112)을 포함할 수 있다. As shown in FIG. 1, the device according to the present embodiment includes an input buffer (100), a memory array (102), a register (register, 104), a shifter (106), and an accumulator (Accumulator). , 108), an output buffer (output buffer, 110), and a control unit (control unit, 112).

입력 버퍼(100)는 n비트의 입력 벡터를 저장하며 최상위 비트에서 최하위 비트까지 개별 비트 위치에서 획득된 n개의 제1 이진 벡터를 순차적으로 출력한다. The input buffer 100 stores n-bit input vectors and sequentially outputs n first binary vectors obtained at individual bit positions from the most significant bit to the least significant bit.

예를 들어, 8비트 정밀도가 요구되는 경우, 입력 버퍼(100)는 8비트의 입력 벡터를 저장하며, 제어 유닛(112)의 제어에 따라 최상위 비트에서 최하위 비트까지 개별 비트 위치에서 획득된 제1 이진 벡터를 순차적으로 메모리 어레이(102)로 출력한다. For example, when 8-bit precision is required, the input buffer 100 stores an 8-bit input vector, and under the control of the control unit 112, the first bits obtained at individual bit positions from the most significant bit to the least significant bit. Binary vectors are sequentially output to the memory array 102 .

예를 들어, 8비트 정밀도에서 최상위 비트로부터 획득된 제1 이진 벡터는 A[7], 최하위 비트로부터 획득된 제1 이진 벡터는 A[0]으로 표시될 수 있다. For example, in 8-bit precision, the first binary vector obtained from the most significant bit may be represented as A[7], and the first binary vector obtained from the least significant bit may be represented as A[0].

메모리 어레이(102)는 가중치 행렬(weight matrix)을 저장하는 복수의 서브 어레이를 포함한다. Memory array 102 includes a plurality of sub-arrays that store weight matrices.

도 1에서는 8개의 서브 어레이(Subarray #7 ~ Subarray #0)가 포함되는 경우를 예시적으로 도시한 것이나 서브 어레이의 개수는 반드시 이에 한정되지 않는다. Although FIG. 1 exemplarily shows a case in which eight subarrays (Subarray #7 to Subarray #0) are included, the number of subarrays is not necessarily limited thereto.

도 2는 본 실시예에 따른 개별 서브 어레이의 상세 구성을 도시한 도면이고, 도 3은 본 실시예에 따른 메모리 셀 구조를 예시적으로 도시한 도면이다. FIG. 2 is a diagram showing the detailed configuration of individual sub arrays according to this embodiment, and FIG. 3 is a diagram showing a memory cell structure according to this embodiment by way of example.

도 2에 도시된 바와 같이, 복수의 서브 어레이 각각은 복수의 메모리 셀(200)을 포함하며, 미리 설정된 개수의 메모리 셀에는 팝카운트 연산기(202)가 연결되고, 메모리 셀에서 수행된 AND 연산 결과가 팝카운트 연산기(202)를 통해 1차적으로 수합된다. As shown in FIG. 2 , each of the plurality of subarrays includes a plurality of memory cells 200, a pop count calculator 202 is connected to a preset number of memory cells, and an AND operation result performed on the memory cells is primarily collected through the pop count calculator 202.

복수의 서브 어레이의 적어도 일부는 가중치 행렬의 최상위 비트에서 최하위 비트까지 개별 비트 위치에서 획득된 n개의 제2 이진 벡터를 저장한다. At least some of the plurality of subarrays store n second binary vectors obtained at individual bit positions from the most significant bit to the least significant bit of the weight matrix.

예를 들어, 도 1과 같이 8개의 서브 어레이가 제공되고, 8비트 정밀도가 요구되는 경우, 최상위 비트로부터 획득된 제2 이진 벡터는 W[7], 최하위 비트로부터 획득된 제2 이진 벡터는 W[0]으로 표시될 수 있다. For example, when 8 sub-arrays are provided as shown in FIG. 1 and 8-bit precision is required, the second binary vector obtained from the most significant bit is W[7] and the second binary vector obtained from the least significant bit is W It can be represented as [0].

본 실시예에 따른 복수의 메모리 셀 내부에서는 입력 버퍼(100)로부터 입력된 n개의 제1 이진 벡터 중 하나와 n개의 제2 이진 벡터의 이진 MVM 연산을 순차적으로 수행한다. Inside the plurality of memory cells according to the present embodiment, a binary MVM operation of one of n first binary vectors input from the input buffer 100 and n second binary vectors is sequentially performed.

도 3에 도시된 바와 같이, 메모리 셀(200) 각각은 SRAM 셀에 포함된 2개의 트랜지스터(WL) 각각 일측에 연결되는 제1 트랜지스터(L) 및 제2 트랜지스터(R)와, 상기 제1 및 제2 트랜지스터의 일측에 연결되는 제3 트랜지스터(B)를 추가로 포함할 수 있다. As shown in FIG. 3, each of the memory cells 200 includes a first transistor L and a second transistor R connected to one side of two transistors WL included in the SRAM cell, respectively, and the first and second transistors R. A third transistor (B) connected to one side of the second transistor may be further included.

또한, 메모리 셀(200) 각각은 MRAM 셀에 포함된 2개의 트랜지스터(WLL/WLR) 각각 일측에 연결되는 제1 트랜지스터(L) 및 제2 트랜지스터(R)와, 상기 제1 및 제2 트랜지스터의 일측에 연결되는 제3 트랜지스터(B)를 추가로 포함할 수 있다. In addition, each of the memory cells 200 includes a first transistor (L) and a second transistor (R) connected to one side of each of the two transistors (WLL/WLR) included in the MRAM cell, and the first and second transistors. A third transistor (B) connected to one side may be further included.

도 4는 AND 연산 및 XNOR 연산 테이블을 나타낸 도면이다. 4 is a diagram showing AND operation and XNOR operation tables.

기존 6개의 트랜지스터로 이루어져 있는 SRAM 셀 구조, 또는 2개의 트랜지스터와 2개의 magnetic tunnel junction (MTJ)로 이루어져 있는 MRAM 셀 구조에 추가로 3개의 트랜지스터를 배치하고, 도 4의 표와 같이 3개의 트랜지스터에 적절한 제어 신호를 인가하면 제1 이진 벡터와 메모리 셀 내부에 저장되어 있는 제2 이진 벡터 사이의 단일 비트 연산을 수행한 결과를 얻을 수 있다. In addition to the existing SRAM cell structure consisting of 6 transistors or the MRAM cell structure consisting of 2 transistors and 2 magnetic tunnel junctions (MTJ), 3 transistors are additionally placed, and as shown in the table of FIG. 4, the 3 transistors When an appropriate control signal is applied, a result of performing a single bit operation between the first binary vector and the second binary vector stored in the memory cell can be obtained.

예를 들어, 메모리 셀 내부에 저장된 Q값이 1이고, 입력 값을 L 신호를 통해 전달하면, 추가적인 3개의 트랜지스터에 의해 1이라는 값을 도출할 수 있다. For example, if the Q value stored inside the memory cell is 1 and the input value is transmitted through the L signal, a value of 1 can be derived by three additional transistors.

이처럼 메모리 셀 내부에서 단일 비트 연산을 수행할 수 있으므로 이를 이용하여 이진 MVM 연산을 수행할 수 있다. As such, since a single bit operation can be performed inside a memory cell, a binary MVM operation can be performed using this.

상기한 팝카운트 연산기(202)를 통해 1차적으로 수합된 결과는 레지스터(104)를 통해 시프터(106)로 전달된다. The result obtained primarily through the pop count calculator 202 is transferred to the shifter 106 through the register 104.

레지스터(104)는 일종의 파이프라인 아키텍쳐와 유사하게, 메모리 내부 연산이 빠르게 수행될 수 있도록 중간 결과를 저장하는 역할을 한다. Similar to a kind of pipeline architecture, the register 104 serves to store intermediate results so that operations in memory can be performed quickly.

시프터(106)에서는 비트 정밀도 정보를 바탕으로 전달된 적절한 시프트 값(number of shift)만큼 연산을 수행한다. The shifter 106 performs an operation as many as an appropriate shift value (number of shift) transmitted based on the bit precision information.

비트 정밀도를 바탕으로 여러 개의 시프터(106)로부터 연산된 값들을 누산기(108)에서 이차적으로 수합한다. 여기서 바로 결과가 나오는 것이 아니라 여러 번 결과를 수합해야 하기 때문에 부분 합(partial sum)을 출력 버퍼(110)로 전달한다. Based on bit precision, values calculated from multiple shifters 106 are secondarily collected in an accumulator 108. Since the result is not immediately produced here, but the result must be collected several times, the partial sum is transferred to the output buffer 110 .

출력 버퍼(110)에서는 제어 유닛(112)로부터 받은 비트 정밀도 정보를 바탕으로 누산기(108)로부터 값을 입력 받아, 해당 값을 글로벌 누산기(Global Accumulator)를 통해 누적한다. The output buffer 110 receives a value from the accumulator 108 based on the bit precision information received from the control unit 112, and accumulates the corresponding value through a global accumulator.

비트 정밀도 정보에 기반하여 연산이 모두 완료되었다고 판단되면 결과를 내보낸다When it is determined that all operations are complete based on the bit precision information, the result is returned.

보다 상세하게, 본 실시예에 따른 출력 버퍼(110)는 누산기(108)로부터 n개의 제1 이진 벡터 각각에 대해 n번 반복 수행된 이진 MVM 연산의 결과를 수합하여 출력한다. More specifically, the output buffer 110 according to the present embodiment collects and outputs results of the binary MVM operation repeatedly performed n times for each of the n first binary vectors from the accumulator 108 .

본 실시예에 따른 장치는 메모리 셀 내부의 AND 연산과 일련의 주변 회로를 이용하여 메모리 내부에서 이진 MVM 연산을 수행하기 때문에 기존 가속기 구조 대비 큰 폭으로 성능 향상 및 에너지 절감이 가능하다. Since the device according to the present embodiment performs a binary MVM operation inside a memory using an AND operation inside a memory cell and a series of peripheral circuits, it is possible to significantly improve performance and save energy compared to a conventional accelerator structure.

또한, 제어 유닛(112)을 바탕으로 비트 정밀도를 조정하여 복수의 비트 정밀도를 지원하기 때문에 특정 네트워크가 필요로 하는 정확도 요구량을 미리 알고 있다면, 이를 만족하는 범위 내에서 성능 및 에너지 측면에서의 최적화 달성이 가능하다. In addition, since a plurality of bit precisions are supported by adjusting the bit precision based on the control unit 112, if the accuracy requirement required for a specific network is known in advance, optimization in terms of performance and energy is achieved within a range that satisfies it. this is possible

도 5는 본 실시예에 따른 메모리 셀 내부 연산을 이용한 이진 MVM 연산을 예시적으로 설명하기 위한 도면이다. 5 is a diagram for illustratively explaining a binary MVM operation using an internal operation of a memory cell according to an exemplary embodiment.

도 5를 참조하면, MVM 연산에서 곱셈 연산은 메모리 셀 내부의 AND 연산으로 수행하고, 그 결과를 MVM 연산 단위에 맞게 분리하여 팝카운트 연산을 통해 수합한다. Referring to FIG. 5 , in the MVM operation, the multiplication operation is performed by the AND operation inside the memory cell, and the result is separated according to the MVM operation unit and combined through the pop count operation.

팝카운트 연산은 1의 개수를 세는 연산으로, 단일 비트 연산 결과를 더한 것과 동일하다. The pop count operation is an operation that counts the number of 1s, and is equivalent to adding the results of single bit operations.

결과적으로 3개의 트랜지스터를 추가한 메모리 셀을 갖는 서브 어레이를 한번 접근하는 것만으로도 이진 MVM 연산을 수행할 수 있다. As a result, a binary MVM operation can be performed by accessing a sub-array having a memory cell to which three transistors are added once.

다중 비트 정밀도의 가중치 행렬은 이진 행렬의 합으로 표현할 수 있고, 마찬가지로 다중 비트 정밀도의 입력 벡터 역시 이진 벡터의 합으로 표현할 수 있다. A multi-bit precision weight matrix can be expressed as a binary matrix sum, and similarly, a multi-bit precision input vector can also be expressed as a binary vector sum.

결과적으로 다중 비트 정밀도를 요구하는 MVM 연산은 간단한 수식을 통해 이진 MVM 연산의 합으로 변환할 수 있다. As a result, MVM operations requiring multi-bit precision can be converted to the sum of binary MVM operations through a simple formula.

예를 들어, 본 실시예에 따른 장치에서 8비트 정밀도를 갖는 MVM 연산을 수행하는 과정을 설명하면 다음과 같다. For example, a process of performing an MVM operation with 8-bit precision in the device according to the present embodiment will be described.

도 6은 8비트 정밀도를 갖는 MVM 연산을 설명하기 위한 도면이다. 6 is a diagram for explaining MVM operation with 8-bit precision.

도 6을 참조하면, W라 정의되는 가중치 행렬(weight matrix) 값들은 서브 어레이에 저장된 것으로 가정한다. Referring to FIG. 6, it is assumed that weight matrix values defined as W are stored in a sub-array.

상기한 바와 같이, 본 실시예에 따른 복수의 서브 어레이의 적어도 일부는 가중치 행렬의 최상위 비트에서 최하위 비트까지 개별 비트 위치에서 획득된 n개의 제2 이진 벡터를 저장한다. As described above, at least some of the plurality of subarrays according to the present embodiment store n second binary vectors obtained at individual bit positions from the most significant bit to the least significant bit of the weight matrix.

이때, n비트 정밀도가 요구되고, 서브 어레이가 n개인 경우, 각 서브 어레이에는 하나의 가중치 행렬의 최상위 비트에서 최하위 비트까지 개별 비트 위치에서 획득된 n개의 제2 이진 벡터가 저장된다. At this time, when n-bit precision is required and there are n sub-arrays, n second binary vectors obtained at individual bit positions from the most significant bit to the least significant bit of one weight matrix are stored in each sub-array.

반면, n비트 정밀도가 요구되고, 서브 어레이가 n보다 큰 경우에는 복수의 가중치 행렬(예를 들어, W와 W')를 통해 획득된 n개의 제2 이진 벡터가 각 서브 어레이에 저장될 수 있다. On the other hand, when n-bit precision is required and the sub-array is larger than n, n second binary vectors obtained through a plurality of weight matrices (eg, W and W′) may be stored in each sub-array. .

보다 상세하게, 복수의 서브 어레이의 개수가 n이고, 비트 정밀도가 n/k로 설정되는 경우, 복수의 서브 어레이 각각에는 k개의 가중치 행렬의 최상위 비트에서 최하위 비트까지 개별 비트 위치에서 획득된 복수의 제2 이진 벡터가 저장된다. More specifically, when the number of subarrays is n and the bit precision is set to n/k, each of the plurality of subarrays has a plurality of subarrays obtained at individual bit positions from the most significant bit to the least significant bit of k weight matrices. A second binary vector is stored.

다시 도 6을 참조하면, 첫 번째 단계에서, 입력 버퍼(100)가 입력 신호로 A[7] 벡터(각 컴포넌트의 최상위 비트(most significant bit, MSB)에서 획득된 제1 이진 벡터)를 메모리 어레이(102)로 전달하면, 각 서브 어레이 내에서 메모리 셀 내부의 AND 연산과 팝카운트 연산을 통해 n개의 제2 이진 벡터(W[7] 내지 W[0])와 이진 MVM 연산을 수행한다. Referring back to FIG. 6, in the first step, the input buffer 100 converts vector A[7] (the first binary vector obtained from the most significant bit (MSB) of each component) into the memory array as an input signal. When transmitted to step 102, binary MVM operation is performed with n second binary vectors (W[7] to W[0]) through an AND operation inside a memory cell and a pop count operation in each sub-array.

8개 각각의 서브 어레이에서 연산된 이진 MVM 연산 결과를 비트 정밀도에 맞는 시프트 연산을 통해 이동시키고, 누산기(108)를 통해 이를 수합하면 입력으로 전달한 이진 벡터인 A[7]과 8비트 정밀도를 갖는 가중치 행렬(W)의 MVM 연산 결과를 얻을 수 있다. The binary MVM operation result operated on each of the 8 subarrays is shifted through a shift operation suitable for bit precision, and when they are combined through the accumulator 108, A[7], a binary vector passed as an input, and 8-bit precision The MVM operation result of the weight matrix (W) can be obtained.

도 6에서 MVM 연산 결과는 노란색으로 표시된다. In FIG. 6, the MVM operation result is displayed in yellow.

두 번째 단계에서, 서브 어레이 내에 저장되어 있는 가중치 행렬의 값(n개의 제2 이진 벡터)은 변하지 않고, 입력 신호로 A[6] 벡터를 전달하면, 첫 번째 단계와 동일한 과정을 통해 이진 벡터인 A[6]과 가중치 행렬의 MVM 연산 결과(회색 부분)를 얻을 수 있다 In the second step, if the value of the weight matrix (n second binary vectors) stored in the sub-array is not changed and the A[6] vector is passed as the input signal, the binary vector The MVM operation result (gray part) of A[6] and the weight matrix can be obtained.

연산 결과는 출력 버퍼(110) 내에 위치하는 글로벌 누산기에 의해 수합된다. The result of the operation is collected by a global accumulator located in the output buffer 110.

이 과정을 입력 신호를 변경해가면서 계속 반복하고, 마지막 단계에서, A[0] 신호를 입력하면 최종적으로 가중치 행렬 W와 입력 벡터 A의 MVM 연산 결과를 얻을 수 있다. This process is repeated continuously while changing the input signal, and in the last step, when the A[0] signal is input, the MVM operation result of the weight matrix W and the input vector A can be finally obtained.

결과적으로 8번의 메모리 접근으로 8비트 정밀도를 갖는 (32*32)*(32*1) MVM 연산을 수행할 수 있다. As a result, it is possible to perform (32*32)*(32*1) MVM operations with 8-bit precision with 8 memory accesses.

다른 정밀도를 가지는 경우에도 마찬가지로 적용이 가능하다. The same can be applied to cases with other precisions.

본 실시예에 따른 장치 내에 서브 어레이의 개수가 8개이므로, 2비트 정밀도를 가지는 경우에는 활용할 수 있는 서브 어레이가 남는다. Since the number of subarrays in the device according to the present embodiment is 8, usable subarrays remain in the case of having 2-bit precision.

남는 서브 어레이를 이용하여 도 7에 도시된 바와 같이, 다른 가중치 행렬을 이용한 MVM 연산 수행이 가능하다. As shown in FIG. 7 using the remaining subarrays, it is possible to perform MVM operations using other weight matrices.

가중치 행렬 W와 입력 벡터 A 사이의 MVM 연산과 W' 행렬과 입력 벡터 A 사이의 MVM 연산을 병렬로 수행할 수 있다. 동일하게, 2비트 정밀도를 가지는 경우에는 4배 더 빠르게 연산을 수행할 수 있다.The MVM operation between the weight matrix W and the input vector A and the MVM operation between the W' matrix and the input vector A can be performed in parallel. Similarly, with 2-bit precision, operations can be performed four times faster.

본 발명의 실시예는 예시의 목적을 위해 개시된 것이고, 본 발명에 대한 통상의 지식을 가지는 당업자라면 본 발명의 사상과 범위 안에서 다양한 수정, 변경, 부가가 가능할 것이며, 이러한 수정, 변경 및 부가는 하기의 특허청구범위에 속하는 것으로 보아야 할 것이다.The embodiments of the present invention have been disclosed for illustrative purposes, and those skilled in the art with ordinary knowledge of the present invention will be able to make various modifications, changes, and additions within the spirit and scope of the present invention, and these modifications, changes, and additions are as follows should be regarded as falling within the scope of the claims of

Claims

As a precision convertible memory internal arithmetic device for quantization neural networks,
an input buffer for storing n-bit input vectors and sequentially outputting n first binary vectors obtained at individual bit positions from the most significant bit to the least significant bit;
A memory array including a plurality of sub-arrays storing a weight matrix, each of the plurality of sub-arrays including a plurality of memory cells, and at least a portion of the plurality of sub-arrays in the most significant bit of the weight matrix. n second binary vectors obtained from individual bit positions up to the least significant bit are stored, and one of the n first binary vectors and the n second binary vectors are stored in the plurality of memory cells of each of the plurality of sub arrays. Sequentially perform binary matrix vector multiplication of -;
an accumulator receiving and accumulating the result of the binary matrix vector multiplication operation from the memory array;
an output buffer for collecting and outputting results of a binary matrix vector multiplication operation repeatedly performed n times for each of the n first binary vectors from the accumulator; and
and a control unit controlling the input buffer, the memory array, the accumulator, and the output buffer.

According to claim 1,
Each of the memory cells includes a first transistor and a second transistor connected to one side of two transistors included in an SRAM or MRAM cell, and a third transistor connected to one side of the first and second transistors. Device.

According to claim 2,
Each of the plurality of subarrays includes a pop count calculator connected to a preset number of memory cells,
The binary matrix vector multiplication operation is performed through an AND operation inside the plurality of memory cells and a pop count operation of the pop count calculator.

According to claim 1,
When the plurality of subarrays is greater than n, a plurality of second binary vectors obtained at individual bit positions from the most significant bit to the least significant bit of the plurality of weight matrices are stored in each of the plurality of subarrays.

According to claim 4,
When the number of the plurality of subarrays is n and the bit precision is set to n/k, each of the plurality of subarrays includes a plurality of second pluralities obtained at individual bit positions from the most significant bit to the least significant bit of k weight matrices. An in-memory arithmetic unit where binary vectors are stored.

According to claim 1,
A register for temporarily storing a result of the binary matrix vector multiplication operation and a shifter for shifting and shifting the result of the binary matrix vector multiplication operation corresponding to bit precision are disposed between the memory array and the accumulator.

According to claim 1,
the output buffer includes a global accumulator;
The global accumulator collects results of the binary matrix vector multiplication operation repeatedly performed n times.

As a precision convertible memory internal operation method for a quantization neural network,
an input buffer storing n-bit input vectors and sequentially outputting n first binary vectors obtained from individual bit positions from the most significant bit to the least significant bit;
sequentially performing a binary matrix vector multiplication operation by a memory array including a plurality of subarrays for storing a weight matrix, wherein each of the plurality of subarrays includes a plurality of memory cells, and each of the plurality of subarrays includes a plurality of memory cells; At least a part of the array stores n second binary vectors obtained at individual bit positions from the most significant bit to the least significant bit of the weight matrix, and inside the plurality of memory cells of each of the plurality of subarrays, the n first binary vectors are stored. sequentially performing a binary matrix vector multiplication operation of one of the binary vectors and the n second binary vectors;
receiving, by an accumulator, the result of the binary matrix vector multiplication operation from the memory array and accumulating the result;
and collecting and outputting, by an output buffer, results of a binary matrix vector multiplication operation repeatedly performed n times for each of the n first binary vectors from the accumulator.

A precision convertible in-memory operation method performed by a control unit for a quantization neural network,
controlling an input buffer to store n-bit input vectors and sequentially outputting n first binary vectors obtained at individual bit positions from a most significant bit to a least significant bit;
sequentially performing a binary matrix vector multiplication operation by controlling a memory array including a plurality of subarrays for storing a weight matrix, wherein each of the plurality of subarrays includes a plurality of memory cells; At least some of the subarrays of store n second binary vectors obtained at individual bit positions from the most significant bit to the least significant bit of the weight matrix, and inside the plurality of memory cells of each of the plurality of subarrays, the n sequentially performing a binary matrix vector multiplication operation of one of the first binary vectors and the n second binary vectors;
controlling an accumulator to receive and accumulate the result of the binary matrix vector multiplication operation from the memory array;
and controlling an output buffer to collect and output results of a binary matrix vector multiplication operation repeatedly performed n times for each of the n first binary vectors from the accumulator.

According to claim 9,
Each of the memory cells includes a first transistor and a second transistor connected to one side of two transistors included in an SRAM or MRAM cell, and a third transistor connected to one side of the first and second transistors. method.