KR102665969B1

KR102665969B1 - In memory computing(imc) circuir, neural network device including imc circuit and operating method of imc circuit

Info

Publication number: KR102665969B1
Application number: KR1020230093290A
Authority: KR
Inventors: 윤석주; 이재혁; 정승철; 권순완; 명성민; 윤대건; 창동진
Original assignee: 삼성전자주식회사
Priority date: 2022-08-30
Filing date: 2023-07-18
Publication date: 2024-05-17
Anticipated expiration: 2043-07-18
Also published as: KR20240031021A

Abstract

인메모리 컴퓨팅(IMC)을 포함하는 장치 및 방법이 제공된다. 인메모리 컴퓨팅(IMC) 회로는 인메모리 컴퓨팅(IMC) 회로는 각각의 비트 셀을 포함하는 메모리 뱅크들을 포함하는 정적 랜덤 액세스 메모리(SRAM) 비트 셀 회로를 포함하고, 각 메모리 뱅크에 대해 정적 랜덤 액세스 메모리의 동일한 워드 라인으로 그룹화되는 비트 셀, 각각의 비트 셀들에 대응하는 연산 결과에 대응하는 신호를 출력하도록 구성된 연산자들, 및 복수의 메모리 뱅크들 사이에서 MAC(Multi-accumulate) 연산을 위해 대상 메모리 뱅크에 속하는 각각의 비트 셀들에 대응하는 연산 결과를 가산기로 전송하도록 구성된 게이트 논리 회로를 포함한다.Apparatus and methods including in-memory computing (IMC) are provided. The in-memory computing (IMC) circuit includes a static random access memory (SRAM) bit cell circuit including memory banks each containing a bit cell, and for each memory bank a static random access Bit cells grouped into the same word line of memory, operators configured to output signals corresponding to operation results corresponding to each bit cell, and a target memory for multi-accumulate (MAC) operation among a plurality of memory banks. It includes a gate logic circuit configured to transmit an operation result corresponding to each bit cell belonging to the bank to an adder.

Description

IMC (IN MEMORY COMPUTING) circuit, neural network device including IMC circuit, and operating method of IMC circuit {IN MEMORY COMPUTING (IMC) CIRCUIR, NEURAL NETWORK DEVICE INCLUDING IMC CIRCUIT AND OPERATING METHOD OF IMC CIRCUIT}

아래의 실시예들은 IMC(IN MEMORY COMPUTING) 회로, IMC 회로를 포함하는 뉴럴 네트워크 장치, 및 IMC 회로의 동작 방법에 관한 것이다.The following embodiments relate to an IN MEMORY COMPUTING (IMC) circuit, a neural network device including an IMC circuit, and a method of operating the IMC circuit.

많은 응용 분야에서 예를 들어, 정확도, 속도, 및/또는 에너지 효율성과 같은 고성능을 제공하기 위해 머신 러닝(machine learning) 및/또는 딥 러닝(deep learning)으로 학습된 다양한 형태의 신경망들(neural network; NN)이 사용될 수 있다. 신경망들의 기계 학습을 가능하게 하는 알고리즘들은 연산량이 매우 많지만, 예를 들어, 두 벡터들을 내적하고 그 값들을 누적 합산하는 MAC(Multiplication and Accumulation) 연산과 같은 복잡하지 않은 연산들의 처리에 의해 수행될 수 있다. MAC 연산과 같은 복잡하지 않은 연산은 인-메모리 컴퓨팅(IN MEMORY COMPUTING)을 통해 구현될 수 있다. Various types of neural networks trained with machine learning and/or deep learning to provide high performance, for example, accuracy, speed, and/or energy efficiency, in many application areas. ; NN) can be used. Algorithms that enable machine learning of neural networks have a very large computational load, but can be performed by processing simple operations, such as the MAC (Multiplication and Accumulation) operation, for example, dot product of two vectors and cumulative sum of their values. there is. Non-complex operations such as MAC operations can be implemented through IN MEMORY COMPUTING.

일 실시예에 따르면, 인-메모리 컴퓨팅(In-Memory Computing) 회로는 복수의 메모리 뱅크들(memory banks); 및 상기 메모리 뱅크들 각각의 논리 연산 결과를 수신하는 논리 게이트를 포함하고, 상기 메모리 뱅크들 각각은 가중치를 저장하는 비트 셀; 및 입력값을 수신하는 연산기를 포함하고, 상기 연산기는 상기 비트 셀에 연결되어 상기 입력값을 수신한 상기 연산기가 상기 입력값과 상기 가중치 간의 논리 연산 결과를 출력한다. According to one embodiment, an in-memory computing circuit includes a plurality of memory banks; and a logic gate that receives a logical operation result of each of the memory banks, where each of the memory banks includes a bit cell that stores a weight. and an operator that receives an input value, where the operator is connected to the bit cell and receives the input value and outputs a result of a logical operation between the input value and the weight.

상기 메모리 뱅크들 각각의 논리 연산 결과는 상기 입력값과 상기 가중치에 대한 NAND 연산값일 수 있다. The logical operation result of each of the memory banks may be a NAND operation value for the input value and the weight.

상기 논리 게이트는 NAND 게이트일 수 있다. The logic gate may be a NAND gate.

상기 논리 게이트는 상기 메모리 뱅크들 중 선택된 하나의 메모리 뱅크의 입력값과 가중치 간의 곱셈 결과를 출력할 수 있다. The logic gate may output a result of multiplication between an input value of one selected memory bank among the memory banks and a weight.

상기 메모리 뱅크들 중 선택되지 않은 메모리 뱅크들 각각은 0의 입력값을 수신할 수 있다. Each of the unselected memory banks among the memory banks may receive an input value of 0.

상기 인-메모리 컴퓨팅 회로는 상기 논리 게이트에 연결된 가산기를 더 포함할 수 있다. The in-memory computing circuit may further include an adder coupled to the logic gate.

상기 연산기는 비트 와이즈(bit-wise) 곱 연산의 결과에 해당하는 신호를 출력하는 복수의 트랜지스터들을 포함할 수 있다. The operator may include a plurality of transistors that output signals corresponding to the result of a bit-wise multiplication operation.

상기 연산기는 제1 트랜지스터 및 제2 트랜지스터를 포함하는 2개의 트랜지스터(2T) 회로를 포함하고, 상기 입력값은 상기 제1 트랜지스터의 제1 게이트 단자 및 상기 제2 트랜지스터의 제2 게이트 단자에 인가되고, 상기 제1 게이트 단자를 거친 상기 제1 트랜지스터의 출력값은 상기 제2 게이트 단자를 거친 상기 제2 트랜지스터의 출력값과 연결됨으로써 상기 논리 연산 결과를 출력할 수 있다. The operator includes a two transistor (2T) circuit including a first transistor and a second transistor, and the input value is applied to the first gate terminal of the first transistor and the second gate terminal of the second transistor, , the output value of the first transistor that passes through the first gate terminal is connected to the output value of the second transistor that passes through the second gate terminal, thereby outputting the logic operation result.

상기 비트 셀에 저장된 상기 가중치에 기초한 값이 상기 제1 트랜지스터의 드레인 단자에 인가되고, 상기 제1 트랜지스터의 소스 단자는 상기 제2 트랜지스터의 드레인 단자를 통해 상기 논리 게이트의 입력 단자에 연결될 수 있다. A value based on the weight stored in the bit cell is applied to the drain terminal of the first transistor, and the source terminal of the first transistor may be connected to the input terminal of the logic gate through the drain terminal of the second transistor.

상기 제1 트랜지스터는 NMOS 트랜지스터를 포함하고, 상기 제2 트랜지스터는 PMOS 트랜지스터를 포함할 수 있다. The first transistor may include an NMOS transistor, and the second transistor may include a PMOS transistor.

상기 연산기는 트랜스미션 게이트(transmission gate) 및 제3 트랜지스터를 포함하는 3개의 트랜지스터(3T) 회로를 포함하고, 상기 입력값은 상기 트랜스미션 게이트의 엔이에블(enable) 단자 및 상기 제3 트랜지스터의 제3 게이트 단자에 인가되고, 상기 트랜스미션 게이트의 출력값과 상기 제3 게이트 단자를 거친 상기 제3 트랜지스터의 출력값 각각은 상기 논리 게이트의 입력에 연결되어 상기 논리 연산 결과를 출력될 수 있다. The operator includes a three transistor (3T) circuit including a transmission gate and a third transistor, and the input value is input from an enable terminal of the transmission gate and a third transistor of the third transistor. It is applied to the gate terminal, and each of the output value of the transmission gate and the output value of the third transistor that passes through the third gate terminal is connected to the input of the logic gate to output the logic operation result.

상기 논리 게이트는 상기 입력값이 상기 연산기로 인가되는지 여부에 따라, 상기 비트 셀에 해당하는 상기 논리 연산 결과를 상기 가산기로 전달할 수 있다. The logic gate may transfer the result of the logic operation corresponding to the bit cell to the adder depending on whether the input value is applied to the operator.

상기 인-메모리 컴퓨팅 회로는 모바일 디바이스, 모바일 컴퓨팅 디바이스, 모바일 폰, 스마트폰, 개인용 디지털 어시스턴트(personal digital assistant), 고정 로케이션 단말, 태블릿 컴퓨터, 컴퓨터, 웨어러블(wearable) 디바이스, 랩탑 컴퓨터, 서버, 뮤직 플레이어, 비디오 플레이어, 엔터테인먼트 유닛, 네비게이션 디바이스, 통신 디바이스, 내비게이션 디바이스, GPS 장치, 텔레비전, 튜너, 자동차, 차량 부품, 항공 전자 시스템, 드론, 멀티콥터 및 의료 기기로 구성된 그룹에서 선택된 적어도 하나의 디바이스에 통합될 수 있다. The in-memory computing circuit may be used in a mobile device, mobile computing device, mobile phone, smartphone, personal digital assistant, fixed location terminal, tablet computer, computer, wearable device, laptop computer, server, music device, etc. to at least one device selected from the group consisting of players, video players, entertainment units, navigation devices, communication devices, navigation devices, GPS devices, televisions, tuners, automobiles, vehicle parts, avionics systems, drones, multicopters, and medical devices. can be integrated.

일 실시예에 따르면, 인-메모리 컴퓨팅 회로를 포함하는 뉴럴 네트워크 장치는 인-메모리 컴퓨팅 회로들을 포함하는 어레이 회로(array circuit); 및 클럭 신호에 따라, 상기 뉴럴 네트워크 장치의 입력 신호에 해당하는 제2 값들을 상기 인-메모리 컴퓨팅 회로들 각각에 입력하고, 상기 인-메모리 컴퓨팅 회로들을 제어하는 컨트롤러를 포함하고, 상기 인-메모리 컴퓨팅 회로들 각각은 복수의 메모리 뱅크들을 포함하고, 상기 메모리 뱅크들 각각은 가중치를 저장하는 비트 셀과 입력값을 수신하는 연산기; 및 상기 메모리 뱅크들 각각의 상기 논리 연산 결과를 수신하는 논리 게이트를 포함하며, 상기 연산기는 상기 비트 셀에 연결되어 상기 입력값을 수신한 상기 연산기가 상기 입력값과 상기 가중치 간의 논리 연산 결과를 출력한다. According to one embodiment, a neural network device including in-memory computing circuitry includes an array circuit including in-memory computing circuits; and a controller that inputs second values corresponding to the input signal of the neural network device to each of the in-memory computing circuits according to a clock signal and controls the in-memory computing circuits, wherein the in-memory Each of the computing circuits includes a plurality of memory banks, each of the memory banks comprising a bit cell for storing weights and an operator for receiving an input value; and a logic gate that receives the result of the logical operation of each of the memory banks, wherein the operator is connected to the bit cell and receives the input value, and outputs the result of the logical operation between the input value and the weight. do.

상기 컨트롤러는 상기 입력값을 포함하는 입력 피처맵(input feature map)을 저장하는 IFM(input feature map) 버퍼(buffer); 상기 입력값이 상기 복수의 IMC 회로들에 인가되는지 여부를 제어하는 제어 회로; 및 상기 가중치를 읽거나 쓰는 RW(read write) 회로 중 적어도 하나를 포함할 수 있다. The controller includes an input feature map (IFM) buffer that stores an input feature map including the input value; a control circuit that controls whether the input value is applied to the plurality of IMC circuits; and a read write (RW) circuit that reads or writes the weight.

일 실시예에 따르면, 인-메모리 컴퓨팅 장치는 각각의 비트 셀 유닛을 각각 포함하는 메모리 뱅크들; 상기 각각의 비트 셀 유닛의 연산기들의 출력들을 수신하는 논리 게이트; 및 MAC 연산의 적어도 일부를 수행하기 위해 상기 논리 게이트의 출력을 수신하는 가산기를 포함하고, 상기 각각의 비트 셀 유닛은 비트 셀 및 연산기를 포함하고, 비트 셀들 중 어느 것도 동일한 연산기를 공유하지 않는다. According to one embodiment, an in-memory computing device includes memory banks each including respective bit cell units; a logic gate receiving outputs of operators of each bit cell unit; and an adder that receives the output of the logic gate to perform at least a portion of a MAC operation, wherein each bit cell unit includes a bit cell and an operator, none of the bit cells sharing the same operator.

상기 각각의 비트 셀 유닛의 출력은 상기 논리 게이트에 연결되고, 상기 비트 셀들 각각은 각각의 저장된 값을 저장하며, 상기 비트 셀 유닛들은 상기 비트 셀 유닛들에 각각의 입력값을 제공하는 각각의 입력 라인에 연결되고, 상기 인-메모리 컴퓨팅 장치는 상기 비트 셀 유닛들에 제공되는 입력값들이, 상기 비트 셀 유닛들 중 어느 하나가 해당 연산기에 의해 상기 저장된 값에 대해 수행될 연산의 대상이 되도록 선택할 수 있다. An output of each bit cell unit is connected to the logic gate, each of the bit cells stores a respective stored value, and the bit cell units each have an input that provides a respective input value to the bit cell units. connected to a line, and the in-memory computing device allows input values provided to the bit cell units to select which one of the bit cell units will be the target of an operation to be performed on the stored value by a corresponding operator. You can.

상기 연산의 대상이 아닌 상기 비트 셀 유닛들의 상기 저장된 값들은 상기 논리 게이트의 출력에 영향을 주지 않을 수 있다.The stored values of the bit cell units that are not the target of the operation may not affect the output of the logic gate.

도 1은 일 실시예에 따른 인-메모리 컴퓨팅(IMC) 회로에서 동작들이 수행될 수 있는 뉴럴 네트워크의 일 예시를 도시한 도면이다.
도 2a 내지 도 2d는 일 실시예에 따른 인-메모리 컴퓨팅(IMC) 회로 구조를 도시한 도면이다.
도 3은 일 실시예에 따라 4개의 메모리 뱅크들을 포함하는 인-메모리 컴퓨팅(IMC) 회로의 동작을 설명하기 위한 도면이다.
도 4는 일 실시예에 따른 인-메모리 컴퓨팅(IMC) 회로의 블록도이다.
도 5a 및 도 5b는 일 실시예에 따른 인-메모리 컴퓨팅(IMC) 회로의 연산기가 2개의 트랜지스터들로 구성된 경우의 동작을 설명하기 위한 도면이다.
도 6a 및 도 6b은 일 실시예에 따른 인-메모리 컴퓨팅(IMC) 회로가 메모리 뱅크를 선택하는 방법을 설명하기 위한 도면이다.
도 7은 일 실시예에 따른 인-메모리 컴퓨팅(IMC) 회로의 연산기가 3개의 트랜지스터들로 구성된 경우의 동작을 설명하기 위한 도면이다.
도 8은 다른 실시예에 따른 인-메모리 컴퓨팅(IMC) 회로의 연산기가 3개의 트랜지스터들로 구성된 경우의 동작을 설명하기 위한 도면이다.
도 9는 일 실시예에 따른 인-메모리 컴퓨팅(IMC) 회로를 포함하는 뉴럴 네트워크 장치의 블록도이다.
도 10은 일 실시예에 따른 뉴럴 네트워크 장치를 포함하는 전자 시스템의 블록도이다.
도 11은 일 실시예에 따른 인-메모리 컴퓨팅(IMC) 회로의 동작 방법을 나타낸 흐름도이다.1 is a diagram illustrating an example of a neural network in which operations can be performed in an in-memory computing (IMC) circuit according to an embodiment.
2A to 2D are diagrams illustrating an in-memory computing (IMC) circuit structure according to an embodiment.
FIG. 3 is a diagram for explaining the operation of an in-memory computing (IMC) circuit including four memory banks, according to an embodiment.
Figure 4 is a block diagram of an in-memory computing (IMC) circuit according to one embodiment.
FIGS. 5A and 5B are diagrams for explaining an operation when an operator of an in-memory computing (IMC) circuit is composed of two transistors, according to an embodiment.
6A and 6B are diagrams to explain how an in-memory computing (IMC) circuit selects a memory bank, according to an embodiment.
FIG. 7 is a diagram illustrating the operation of an in-memory computing (IMC) circuit when the operator is composed of three transistors, according to an embodiment.
FIG. 8 is a diagram for explaining the operation of an in-memory computing (IMC) circuit when the operator is composed of three transistors according to another embodiment.
Figure 9 is a block diagram of a neural network device including an in-memory computing (IMC) circuit according to one embodiment.
Figure 10 is a block diagram of an electronic system including a neural network device according to one embodiment.
Figure 11 is a flowchart showing a method of operating an in-memory computing (IMC) circuit according to an embodiment.

실시예들에 대한 특정한 구조적 또는 기능적 설명들은 단지 예시를 위한 목적으로 개시된 것으로서, 다양한 형태로 변경되어 구현될 수 있다. 따라서, 실제 구현되는 형태는 개시된 특정 실시예로만 한정되는 것이 아니며, 본 명세서의 범위는 실시예들로 설명한 기술적 사상에 포함되는 변경, 균등물, 또는 대체물을 포함한다.Specific structural or functional descriptions of the embodiments are disclosed for illustrative purposes only and may be changed and implemented in various forms. Accordingly, the actual implementation form is not limited to the specific disclosed embodiments, and the scope of the present specification includes changes, equivalents, or substitutes included in the technical idea described in the embodiments.

제1 또는 제2 등의 용어를 다양한 구성요소들을 설명하는데 사용될 수 있지만, 이런 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 해석되어야 한다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.Terms such as first or second may be used to describe various components, but these terms should be interpreted only for the purpose of distinguishing one component from another component. For example, a first component may be named a second component, and similarly, the second component may also be named a first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다.When a component is referred to as being “connected” to another component, it should be understood that it may be directly connected or connected to the other component, but that other components may exist in between.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 설명된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Singular expressions include plural expressions unless the context clearly dictates otherwise. In this specification, terms such as "comprise" or "have" are intended to designate the presence of the described features, numbers, steps, operations, components, parts, or combinations thereof, but are not intended to indicate the presence of one or more other features or numbers, It should be understood that this does not exclude in advance the possibility of the presence or addition of steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 해당 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by a person of ordinary skill in the art. Terms as defined in commonly used dictionaries should be interpreted as having meanings consistent with the meanings they have in the context of the related technology, and unless clearly defined in this specification, should not be interpreted in an idealized or overly formal sense. No.

이하, 실시예들을 첨부된 도면들을 참조하여 상세하게 설명한다. 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일한 참조 부호를 부여하고, 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments will be described in detail with reference to the attached drawings. In the description with reference to the accompanying drawings, identical components will be assigned the same reference numerals regardless of the reference numerals, and overlapping descriptions thereof will be omitted.

도 1은 일 실시예에 따른 인-메모리 컴퓨팅(IMC) 회로에서 동작들이 수행될 수 있는 뉴럴 네트워크의 일 예시를 도시한 도면이다. 도 1을 참조하면, 대응하는 인-메모리 컴퓨팅 회로에 의해 동작이 수행될 수 있는 뉴럴 네트워크(110)가 도시된다. 1 is a diagram illustrating an example of a neural network in which operations can be performed in an in-memory computing (IMC) circuit according to an embodiment. 1, a neural network 110 is shown whose operations may be performed by corresponding in-memory computing circuitry.

인-메모리 컴퓨팅(In-Memory Computing; IMC)은 폰-노이만 아키텍처에서 발생하는 연산부(예: 프로세서)와 메모리 사이의 빈번한 데이터 이동으로 인한 성능과 전력 한계점을 돌파하기 위해 데이터가 저장된 메모리 내부에서 직접 연산을 수행하도록 하는 컴퓨팅 아키텍쳐에 해당할 수 있다. 인-메모리 컴퓨팅(IMC) 회로는 연산을 어떤 도메인에서 수행할 지에 따라 아날로그(analog) 인-메모리 컴퓨팅(IMC) 회로와 디지털(digital) 인-메모리 컴퓨팅(IMC) 회로로 나눌 수 있다. 아날로그(analog) 인-메모리 컴퓨팅(IMC) 회로는 예를 들어, 전류, 전하, 시간 등과 같은 아날로그 도메인에서 연산을 수행할 수 있다. 디지털(digital) 인-메모리 컴퓨팅(IMC) 회로는 논리 회로를 사용하여 디지털 도메인에서 연산을 수행할 수 있다. 아래의 실시예들은 디지털 인- 메모리 컴퓨팅 회로에 대해 설명한다. In-Memory Computing (IMC) is a method of computing data directly inside the memory where data is stored in order to overcome the performance and power limitations caused by frequent data movement between the computational unit (e.g. processor) and memory that occurs in the von-Neumann architecture. It may correspond to a computing architecture that allows calculations to be performed. In-memory computing (IMC) circuits can be divided into analog in-memory computing (IMC) circuits and digital in-memory computing (IMC) circuits, depending on the domain in which the calculation is performed. Analog in-memory computing (IMC) circuits can perform operations in the analog domain, for example, current, charge, time, etc. Digital in-memory computing (IMC) circuits can use logic circuits to perform operations in the digital domain. The embodiments below describe a digital in-memory computing circuit.

인-메모리 컴퓨팅(IMC) 회로는 다수의 곱셈에 대한 덧셈을 한 번에 수행해 주는 매트릭스(Matrix) 연산, 및/또는 MAC(Multiplication and Accumulation) 연산을 가속화할 수 있으며, 이는 인공 지능(Artificial Intelligence; AI)의 학습 및 추론에 매우 일반적이다. 뉴럴 네트워크(110)의 학습 또는 추론을 위한 MAC 연산은 메모리 어레이를 통해 수행될 수 있으며, 메모리 어레이는 인-메모리 컴퓨팅(IMC) 회로 중 메모리 소자의 비트 셀들을 포함한다. 이하, 설명의 편의를 위하여, 뉴럴 네트워크(110)가 완전 연결 레이어들(fully connected layers)로 구성된 경우를 일 예로 들어 설명하지만, 반드시 이에 한정되지는 않는다. 뉴럴 네트워크(110)는 컨볼루션 레이어들로 구성된 컨볼루션 뉴럴 네트워크일 수 있다. 인-메모리 컴퓨팅(IMC) 회로는 비트 셀들을 포함하는 메모리 어레이에 의한 연산 기능을 통해 해당 MAC 연산을 수행함으로써 뉴럴 네트워크(110)의 기계 학습 및 추론을 가능하게 할 수 있다. In-memory computing (IMC) circuits can accelerate matrix operations that perform addition to multiple multiplications at once, and/or MAC (Multiplication and Accumulation) operations, which are used for artificial intelligence (Artificial Intelligence). It is very common in learning and inference in AI. The MAC operation for learning or inference of the neural network 110 may be performed through a memory array, and the memory array includes bit cells of a memory element in an in-memory computing (IMC) circuit. Hereinafter, for convenience of explanation, the case where the neural network 110 is composed of fully connected layers will be described as an example, but the present invention is not necessarily limited thereto. The neural network 110 may be a convolutional neural network composed of convolutional layers. An in-memory computing (IMC) circuit may enable machine learning and inference of the neural network 110 by performing the corresponding MAC operation through an operation function by a memory array including bit cells.

뉴럴 네트워크(110)는 예를 들어, 2개 이상의 히든 레이어들을 포함하는 심층 신경망(Deep Neural Network; DNN) 또는 n-계층 뉴럴 네트워크일 수 있다. 뉴럴 네트워크(110)는 예를 들어, 입력 레이어(Layer 1), 2개의 히든 레이어들(Layer 2 및 Layer 3) 및 출력 레이어(Layer 4)를 포함하는 심층 신경망(DNN)일 수 있으며, 반드시 이에 한정되지는 않는다. 뉴럴 네트워크(110)가 DNN 아키텍처로 구현된 경우 유효한 정보를 처리할 수 있는 보다 많은 레이어들을 포함하므로, 뉴럴 네트워크(110)는 싱글 레이어를 갖는 뉴럴 네트워크보다 복잡한 데이터 집합들을 처리할 수 있다. 한편, 뉴럴 네트워크(110)는 4개의 레이어들을 포함하는 것으로 도시되어 있으나, 이는 예시에 불과할 뿐 뉴럴 네트워크(110)는 더 적거나 많은 레이어들을 포함하거나, 더 적거나 많은 채널들을 포함할 수 있다. 뉴럴 네트워크(110)는 도 1에 도시된 것과는 다른, 다양한 구조의 레이어들을 포함할 수 있다.The neural network 110 may be, for example, a deep neural network (DNN) or an n-layer neural network including two or more hidden layers. The neural network 110 may be, for example, a deep neural network (DNN) including an input layer (Layer 1), two hidden layers (Layer 2 and Layer 3), and an output layer (Layer 4), and must include: It is not limited. When the neural network 110 is implemented with a DNN architecture, it includes more layers that can process valid information, so the neural network 110 can process more complex data sets than a neural network with a single layer. Meanwhile, the neural network 110 is shown as including four layers, but this is only an example and the neural network 110 may include fewer or more layers, or fewer or more channels. The neural network 110 may include layers of various structures different from those shown in FIG. 1 .

뉴럴 네트워크(110)에 포함된 레이어들 각각은 복수의 노드들(115)을 포함할 수 있다. 노드(node)는 '뉴런(neuron)', '프로세싱 엘리먼트(Processing element, PE)', '유닛(unit)', '채널(channel)' 또는 이와 유사한 용어들로 알려진, 복수의 인공 노드(artificial node)들에 해당될 수 있다. 뉴럴 네트워크(110)는 예를 들어, 입력 레이어가 3개의 노드들을 포함하고, 히든 레이어들 각각이 5개의 노드들을 포함하며, 출력 레이어가 3개의 출력 노드들을 포함할 수 있으나, 반드시 이에 한정되지는 않는다. 도 1의 예시는 일 실시예에 해당하며, 뉴럴 네트워크(110)에 포함된 레이어들 각각은 다양한 개수의 노드들을 포함할 수 있다. 뉴럴 네트워크(110)의 레이어들 각각에 포함된 노드들(115)은 서로 연결되어 데이터를 처리할 수 있다. 예를 들어, 하나의 노드는 다른 노드(들)로부터 데이터를 수신하여 연산할 수 있고, 연산 결과를 또 다른 노드들로 출력할 수 있다.Each of the layers included in the neural network 110 may include a plurality of nodes 115. A node is a plurality of artificial nodes, known as 'neurons', 'processing elements (PEs)', 'units', 'channels' or similar terms. nodes). For example, the neural network 110 may include an input layer containing 3 nodes, each of the hidden layers containing 5 nodes, and an output layer containing 3 output nodes, but is not necessarily limited thereto. No. The example in FIG. 1 corresponds to one embodiment, and each of the layers included in the neural network 110 may include various numbers of nodes. Nodes 115 included in each layer of the neural network 110 may be connected to each other to process data. For example, one node can receive data from other node(s), perform calculations, and output the calculation results to other nodes.

한 레이어의 복수의 노드들(115)은 다른 레이어의 노드들과 연결선을 통해 연결되며, 연결선에는 가중치(weight; w)가 설정될 수 있다. 예를 들어, 한 노드의 연산 수행 결과(o₁)는 해당 노드에 연결된 이전 레이어의 다른 노드들로부터 전파되는 입력 데이터(예: i₁, i₂, i₃, i₄, i₅) 및 해당 노드의 연결선들의 가중치들(w₁₁, w₂₁, w₃₁, w_41,w₅₁)에 기초하여 결정될 수 있다. A plurality of nodes 115 of one layer are connected to nodes of another layer through a connection line, and a weight (w) may be set on the connection line. For example, the result of a node's operation (o ₁ ) is the input data propagated from other nodes of the previous layer connected to that node (e.g., i ₁ , i ₂ , i ₃ , i ₄ , i ₅ ) and the corresponding It can be determined based on the weights (w ₁₁ , w ₂₁ , w ₃₁ , w _41, w ₅₁ ) of the node's connection lines.

예를 들어, L개의 출력 값들 중 l번째 출력 o _l 은 하기 수학식 1과 같이 표현될 수 있다. 여기서, L는 1 이상의 정수이고, l는 1 이상 L 이하의 정수일 수 있다.　For example, the lth output o _l among the L output values can be expressed as Equation 1 below. Here, L may be an integer greater than or equal to 1, and l may be an integer greater than or equal to 1 and less than or equal to L.

수학식 1에서, i _k 는 P개의 입력들 중 k번째 입력을 나타내고, w _kl 은 k번째 입력 및 l번째 출력 간에 설정된 가중치를 나타낼 수 있다. 여기서, P는 1이상의 정수이고, k는 1이상 P 이하의 정수를 나타낼 수 있다. In Equation 1, i _k represents the kth input among the P inputs, and w _kl may represent a weight set between the kth input and the lth output. Here, P is an integer greater than or equal to 1, and k may represent an integer greater than or equal to 1 but less than P.

다시 말해, 뉴럴 네트워크(110)에서 노드들(115) 간의 입, 출력은 입력(i) 및 가중치(w) 간의 가중합으로 나타낼 수 있다. 가중합은 복수의 입력들 및 복수의 가중치들 간의 곱셈 연산 및 반복적인 덧셈 연산으로써, 'MAC(Multiplication and Accumulation) 연산'이라고도 나타낼 수 있다. MAC 연산이 연산 기능이 추가된 메모리를 이용하여 수행되는 점에서, MAC 연산이 수행되는 회로를 '인-메모리 컴퓨팅(IMC) 회로'로 지칭할 수도 있다. In other words, input and output between nodes 115 in the neural network 110 can be expressed as a weighted sum between input (i) and weight (w). A weighted sum is a multiplication operation and repetitive addition operation between a plurality of inputs and a plurality of weights, and can also be referred to as a 'MAC (Multiplication and Accumulation) operation.' Since the MAC operation is performed using a memory with an added calculation function, the circuit in which the MAC operation is performed may be referred to as an 'in-memory computing (IMC) circuit.'

뉴럴 네트워크(110)는 예를 들어, 입력 데이터(예: i₁, i₂, i₃, i₄, i₅)를 기초로 레이어들에서의 가중합 연산을 수행하고, 연산 수행 결과(예: o₁, o₂, o₃, o₄, o₅)를 기초로 출력 데이터(예: u₁, u₂, u₃)를 생성할 수 있다.For example, the neural network 110 performs a weighted sum operation in layers based on input data (e.g., i ₁ , i ₂ , i ₃ , i ₄ , i ₅ ), and provides the result of the operation (e.g., Output data (e.g. u ₁ , u ₂ , u ₃ ) can be generated based on o ₁ , o ₂ , o ₃ , o ₄ , o ₅ ).

도 2a, 도 2b, 도 2c, 및 도 2 d는 일 실시예에 따른 인-메모리 컴퓨팅(IMC) 회로를 포함하는 IMC 매크로의 예시적인 구조를 나타낸 도면이다. 도 2a를 참조하면, 일 실시예에 따른 IMC 매크로(200)는 쓰기 워드 라인 드라이버(WWL(Write Word Line) driver)(210), 인-메모리 컴퓨팅(IMC) 회로(220), 가산기(Adder)(230), 누적 연산기(240), 입력-드라이버(input driver)(또는 읽기 워드 라인 드라이버(RWL(Read Word Liner) driver)(250), 메모리 컨트롤러(control unit)(260), 및 쓰기 비트 라인 드라이버(WBL(Write Bit Line) driver)(270)를 포함할 수 있다. IMC 매크로(200)는 예를 들어, 도 2a에 도시된 것과 같은 64kb SRAM IMC 매크로일 수 있으나, 반드시 이에 한정되지는 않는다. 2A, 2B, 2C, and 2D are diagrams illustrating an example structure of an IMC macro including an in-memory computing (IMC) circuit according to an embodiment. Referring to FIG. 2A, the IMC macro 200 according to one embodiment includes a write word line (WWL) driver 210, an in-memory computing (IMC) circuit 220, and an adder. (230), accumulation operator 240, input driver (or read word line driver (RWL) driver) 250, memory controller (control unit) 260, and write bit line The IMC macro 200 may include a driver (WBL (Write Bit Line) driver) 270, for example, but is not limited to a 64 kb SRAM IMC macro as shown in FIG. 2A. .

아래에 기술되는 바와 같이, IMC 회로(예를 들어, IMC 회로(220))는 비트 셀 회로(예를 들어, SRAM 비트 셀 회로(225))를 포함할 수 있고, 각각의 비트 셀 회로는 비트 셀 유닛(예를 들어, 비트 셀 유닛(223a-223d))을 가질 수 있다. 각각의 비트 셀 회로의 비트 셀 유닛은 IMC 회로의 각각의 메모리 뱅크에 포함될 수 있다(예를 들어, 비트 셀 유닛(223a-223d)은 각각 뱅크 0-3에 포함될 수 있다). 각각의 비트 셀 유닛은 비트 셀 및 오퍼레이터를 포함할 수 있다(예를 들어, 비트 셀 유닛(223a)은 비트 셀(221) 및 오퍼레이터(222)를 포함할 수 있다). IMC 회로의 비트 셀 회로는 또한 각각의 게이트 논리 회로를 가질 수 있다(예를 들어, SRAM 비트 셀 회로(225)는 대응하는 게이트 논리 회로(227)를 가질 수 있다). 비트 셀 회로의 비트 셀 유닛은 비트 셀 회로에 대응하는 게이트 논리 회로에 각각 연결될 수 있다(예를 들어, 비트 셀 유닛(223a-223d)은 게이트 논리 회로(227)에 연결될 수 있음).As described below, an IMC circuit (e.g., IMC circuit 220) may include a bit cell circuit (e.g., SRAM bit cell circuit 225), each bit cell circuit having a bit cell circuit. It may have cell units (e.g., bit cell units 223a-223d). The bit cell units of each bit cell circuit may be included in each memory bank of the IMC circuit (e.g., bit cell units 223a-223d may be included in banks 0-3, respectively). Each bit cell unit may include a bit cell and an operator (e.g., bit cell unit 223a may include a bit cell 221 and an operator 222). The bit cell circuits of the IMC circuit may also have respective gate logic circuits (e.g., the SRAM bit cell circuit 225 may have a corresponding gate logic circuit 227). The bit cell units of the bit cell circuit may each be connected to a gate logic circuit corresponding to the bit cell circuit (for example, the bit cell units 223a-223d may be connected to the gate logic circuit 227).

전술한 바와 같이, 인-메모리 컴퓨팅(IMC) 회로(220)는 각각의 메모리 뱅크에 배열된 각각의 연산자를 갖는 비트 셀(예를 들어, 비트 셀(221))을 포함하는 게이트 로직 회로(227) 및 연산기들(예: 연산기들(222))를 포함할 수 있다. 연산기들은 각각의 비트 셀에 대한 연산의 연산 결과에 대응하는 신호를 출력한다. 예를 들어, 도 2c는 Bank0에 각각 비트 셀(221-0 내지 221-3)을 갖는 4개의 SRAM 비트 셀 회로를 도시한다. 즉, 비트 셀 회로는 메모리 뱅크 뱅크0, 메모리 뱅크 뱅크1, 메모리 뱅크 뱅크2, 및 메모리 뱅크 뱅크3와 같은 4개의 메모리 뱅크들 각각에 포함될 수 있다. 동일한 메모리 뱅크(예를 들어, 뱅크0)에 대응하는 비트 셀 유닛은 동일한 입력값을 수신할 수 있다. As described above, the in-memory computing (IMC) circuit 220 includes a gate logic circuit 227 that includes bit cells (e.g., bit cells 221) with respective operators arranged in respective memory banks. ) and operators (e.g., operators 222). The operators output signals corresponding to the results of operations on each bit cell. For example, Figure 2C shows a four SRAM bit cell circuit with each bit cell 221-0 through 221-3 in Bank0. That is, the bit cell circuit may be included in each of four memory banks, such as memory bank Bank 0, memory bank Bank 1, memory bank Bank 2, and memory bank Bank 3. Bit cell units corresponding to the same memory bank (eg, bank 0) may receive the same input value.

전술한 바와 같이, 예를 들어, SRAM 비트 셀 회로(225)에서, 하나의 메모리 뱅크에 대응하는 하나의 비트 셀(221)과 (하나의 비트 셀(221)에 대응하는 연산 결과를 출력하는) 하나의 연산기(222)는 비트 셀의 기본 연산 단위라는 점에서 '비트 셀 유닛'(223)이라 지칭될 수 있다. 하나의 비트 셀(221)은 예를 들어, 비트 값을 저장하기 위한 8T(8-transistor) SRAM 셀 구조를 가질 수 있다. 하나의 연산기(222)는 예를 들어, 연산을 수행하기 위한 2개의 트랜지스터(2T) 회로를 포함할 수 있다. 비트 셀 유닛(223)은 예를 들어, 8T(8-transistor) SRAM 셀 구조를 갖는 비트 셀들(221)에 2개의 트랜지스터(2T) 회로의 연산기들(222)이 결합된 10개의 트랜지스터(10T)들로 구성된 SRAM 셀 구조를 가질 수 있다. 연산기(들)(222)는 예를 들어, 일반 로직 곱셈기(Multiplier)일 수도 있고, 또는 패스 트랜지스터 로직(Pass Transistor Logic)일 수도 있다. 게이트 로직 회로(227)는 복수의 메모리 뱅크들(memory banks)에 대응하는 비트 셀들(221) 중 MAC(Multiplication and Accumulation) 연산을 위한 대상(target) 메모리 뱅크에 속한 비트 셀들 각각에 대응하는 연산 결과를 가산기(230)로 전달한다. As described above, for example, in the SRAM bit cell circuit 225, one bit cell 221 corresponding to one memory bank and (outputting an operation result corresponding to one bit cell 221) One operator 222 may be referred to as a 'bit cell unit' 223 in that it is the basic operation unit of a bit cell. For example, one bit cell 221 may have an 8-transistor (8T) SRAM cell structure for storing bit values. One operator 222 may include, for example, two transistor (2T) circuits for performing operations. For example, the bit cell unit 223 includes 10 transistors (10T) in which operators 222 of a two-transistor (2T) circuit are combined with bit cells 221 having an 8-transistor (8T) SRAM cell structure. It may have a SRAM cell structure composed of . The operator(s) 222 may be, for example, a general logic multiplier or pass transistor logic. The gate logic circuit 227 produces an operation result corresponding to each bit cell belonging to a target memory bank for MAC (Multiplication and Accumulation) operation among the bit cells 221 corresponding to a plurality of memory banks. is transmitted to the adder 230.

이하, 설명의 편의를 위하여, '읽기 워드 라인(RWL) 및 쓰기 워드 라인(WWL)'을 '워드 라인(WL)'으로 간략화하여 표현하고, '쓰기 워드 라인 드라이버(WWL driver) 및 읽기 워드 라인 드라이버(RWL driver)'를 '워드 라인 드라이버(WL driver)'로 간략화하여 표현할 수 있다. '쓰기 비트 라인(Write Bit Line; WBL)' 또한 '비트 라인(BL)'으로 간략화하여 표현할 수 있다. Hereinafter, for convenience of explanation, 'read word line (RWL) and write word line (WWL)' are simplified and expressed as 'word line (WL)', and 'write word line driver (WWL driver) and read word line'. ‘Driver (RWL driver)’ can be simplified and expressed as ‘word line driver (WL driver)’. ‘Write Bit Line (WBL)’ can also be simplified and expressed as ‘bit line (BL)’.

IMC 매크로(200)는 모든 데이터를 '0' 및/또는 '1'과 같은 디지털 논리 값으로 표현하는 디지털 연산을 수행할 수 있으며, 입력 데이터(201), 가중치(203), 및 출력 데이터(205)는 바이너리 포맷(binary format)을 가질 수 있다. 예를 들어, 입력 데이터(201)와 가중치(203)는 활성화 함수(f_act)dp 의해 출력 데이터(205)로 변환될 수 있다. 도 2a 내지 도 2d는 디지털 논리 회로로 구현될 수 있다.The IMC macro 200 can perform digital operations that express all data as digital logic values such as '0' and/or '1', including input data 201, weight 203, and output data 205. ) may have a binary format. For example, input data 201 and weight 203 can be converted into output data 205 by an activation function (f _act )dp. 2A to 2D may be implemented as a digital logic circuit.

읽기 워드 라인(RWL)은 입력 데이터(201)가 인가되는 경로와 동일하여, 입력 드라이버(250)는 읽기 워드 라인 드라이버(RWL driver)에 해당할 수 있다. 입력 드라이버(250)는 인-메모리 컴퓨팅(IMC) 회로(220)의 연산(예: 곱셈 연산 또는 컨볼루션 연산)이 수행될 입력 데이터(201)를 (예를 들어, 외부 연산자)에게 전달할 수 있다. 읽기 워드 라인(RWL) 신호는 입력 데이터(201)의 입력 값에 기초하여 결정될 수 있다. 입력 데이터(201)는 멀티 비트(multi bit) 또는 싱글 비트(single bit) 디지털 데이터일 수 있다. The read word line (RWL) is the same as the path through which the input data 201 is applied, so the input driver 250 may correspond to a read word line driver (RWL driver). The input driver 250 may transmit the input data 201 on which an operation (e.g., a multiplication operation or a convolution operation) of the in-memory computing (IMC) circuit 220 is to be performed (e.g., an external operator). . The read word line (RWL) signal may be determined based on the input value of the input data 201. The input data 201 may be multi bit or single bit digital data.

입력 드라이버(250)를 통해 읽어온 입력 데이터(201)는 인코딩(Encoding; ENC) 블록(255)을 거쳐 IMC 회로(220)의 입력 신호로 변환될 수 있다. 인코딩 블록(255)은 변환한 입력 신호와 함께, 복수의 메모리 뱅크들 중 MAC 연산을 위한 대상(target) 메모리 뱅크를 선택하는 신호를 IMC 회로(220)에 제공할 수 있다. 입력 드라이버(250)의 동작에 대하여는 도 2b를 참조하여 구체적으로 설명한다. 또한, 메모리 뱅크들에서 연산이 수행되는 과정은 아래의 도 2c를 참조하여 설명하고, 쓰기 비트 라인(WBL) 드라이버(270)가 읽어온 데이터(예: 가중치 값 또는 입력 값)를 메모리 뱅크들(비트 셀들)에 기록(write)하는 과정은 아래의 도 2d를 참조하여 보다 구체적으로 설명한다.The input data 201 read through the input driver 250 may be converted into an input signal of the IMC circuit 220 through an encoding (ENC) block 255. The encoding block 255 may provide the IMC circuit 220 with a signal for selecting a target memory bank for MAC operation from among a plurality of memory banks along with the converted input signal. The operation of the input driver 250 will be described in detail with reference to FIG. 2B. In addition, the process in which operations are performed in the memory banks will be described with reference to FIG. 2C below, and the data (e.g., weight value or input value) read by the write bit line (WBL) driver 270 is stored in the memory banks ( The process of writing to bit cells is explained in more detail with reference to FIG. 2D below.

도 2b를 참조하면, 일 실시예에 따른 입력 드라이버(250)가 읽어온 입력 데이터가 인코딩 블록(255)을 통해 IMC 회로(220)로 입력되는 과정의 일례가 도시된다. 예를 들어, IMC 매크로(200)가 도 2a에 도시된 것과 같은 64kb SRAM IMC 매크로인 경우, 입력 드라이버(250)는 IN[63:0]과 같이 64개의 입력 데이터를 읽어올 수 있다. 이때, 64개의 입력 데이터 각각은 4비트로 구성될 수 있다. 입력 드라이버(250)는 4비트로 구성된 입력 데이터(201)(예: "0011 0100 1010")를 한 비트씩 순차적으로 인코딩 블록(255)로 입력할 수 있다. 인코딩 블록(255)은 입력 데이터(201)(예: "0011 0100 1010")를 2비트의 제어 신호(예: "00" 또는 "10")에 따라 4개의 메모리 뱅크들 중 어느 하나의 메모리 뱅크로 전달할 수 있다. 이때, 4개의 메모리 뱅크들 각각은 대응하는 비트 셀들에 해당할 수 있다. Referring to FIG. 2B, an example of a process in which input data read by the input driver 250 according to an embodiment is input to the IMC circuit 220 through the encoding block 255 is shown. For example, if the IMC macro 200 is a 64kb SRAM IMC macro as shown in FIG. 2A, the input driver 250 can read 64 input data such as IN[63:0]. At this time, each of the 64 input data may consist of 4 bits. The input driver 250 may sequentially input 4-bit input data 201 (e.g., “0011 0100 1010”) one bit at a time into the encoding block 255. The encoding block 255 encodes the input data 201 (e.g., “0011 0100 1010”) into any one of four memory banks according to a 2-bit control signal (e.g., “00” or “10”). It can be passed on. At this time, each of the four memory banks may correspond to corresponding bit cells.

예를 들어, 첫번째 메모리 뱅크(bank 0)가 연산기로 사용될 경우, IMC 매크로(200)는 2-비트의 제어 신호("00")를 인코딩 블록(255)에 인가할 수 있다. 2-비트의 제어 신호("00")가 인코딩 블록(255)에 인가됨에 따라, 인코딩 블록(255)은 입력 데이터(예: "0011 0100 1010")를 첫번째 메모리 뱅크(bank 0)에 연결된 제1 출력(예: O0)을 통해 비트 셀 유닛들에 순차적으로 제공할 수 있다. For example, when the first memory bank (bank 0) is used as an operator, the IMC macro 200 may apply a 2-bit control signal (“00”) to the encoding block 255. As the 2-bit control signal (“00”) is applied to the encoding block 255, the encoding block 255 encodes the input data (e.g., “0011 0100 1010”) into the first memory bank (bank 0). 1 It can be provided sequentially to bit cell units through output (e.g. O0).

두번째 메모리 뱅크(bank 1)가 연산기로 사용될 경우, IMC 매크로(200)는 인코딩 블록(255)에 2-비트의 제어 신호("01")를 인가할 수 있으며, 인코딩 블록(255)은 제2 메모리 뱅크(Bank1)에 연결된 제2 출력(O1)을 통해 제2 메모리 뱅크(Bank1)의 비트 셀 유닛들에 동일한 입력 데이터를 제공할 수 있다.When the second memory bank (bank 1) is used as an operator, the IMC macro 200 can apply a 2-bit control signal (“01”) to the encoding block 255, and the encoding block 255 is the second The same input data can be provided to the bit cell units of the second memory bank Bank1 through the second output O1 connected to the memory bank Bank1.

세번째 메모리 뱅크(bank 2)가 연산기로 사용될 경우, IMC 매크로(200)는 제어 신호("10")를 인코딩 블록(255)에 인가할 수 있고, 인코딩 블록(255)은 제3 메모리 뱅크(Bank2)에 연결된 제3 출력(O2)을 통해 제3 메모리 뱅크(Bank2)의 비트 셀 유닛들에 동일한 입력 데이터를 제공할 수 있다. When the third memory bank (bank 2) is used as an operator, the IMC macro 200 may apply a control signal (“10”) to the encoding block 255, and the encoding block 255 may use the third memory bank (Bank2) ) The same input data can be provided to the bit cell units of the third memory bank (Bank2) through the third output (O2) connected to ).

네번째 메모리 뱅크(bank 3)가 연산기로 사용될 경우, IMC 매크로(200)는 제어 신호("11")를 인코딩 블록(255)에 인가할 수 있고, 인코딩 블록(255)은 입력 데이터를 제4 메모리 뱅크(Bank3)에 연결된 제4 출력(O₃)을 통해 제4 메모리 뱅크(Bank3)로 출력할 수 있다. 각각의 경우에, 인코딩 블록(255)의 어느 출력(예를 들어, O0)이 제어 신호에 의해 활성화되어 입력 데이터를 대응하는 타겟/선택된 메모리 뱅크(예를 들어, Bank0)에 제공되던, 인코딩 블록(255)은 다른 출력들(예를 들어, O₁, O₂ 및 O₃)이 "0"을 다른(선택되지 않은/타겟이 아닌) 메모리 뱅크로 출력하도록 할 수 있다. 이러한 방식으로 IMC 매크로의 게이트 논리 회로의 출력은 (입력 비트와 선택된 메모리 뱅크의 비트 셀에 있는 비트에서 연산되므로) 선택된 메모리 뱅크의 연산자의 각 연산 출력에만 의존할 수 있다.When the fourth memory bank (bank 3) is used as an operator, the IMC macro 200 may apply a control signal (“11”) to the encoding block 255, and the encoding block 255 may transfer input data to the fourth memory. It can be output to the fourth memory bank (Bank3) through the fourth output (O ₃ ) connected to the bank (Bank3). In each case, whichever output (e.g., O0) of encoding block 255 is activated by a control signal to provide input data to the corresponding target/selected memory bank (e.g., Bank0) 255 may cause other outputs (e.g., O ₁ , O ₂ and O ₃ ) to output “0” to other (unselected/non-target) memory banks. In this way, the output of the IMC macro's gate logic circuit can depend only on the output of each operation of the operator in the selected memory bank (since it operates on the input bits and bits in the bit cells of the selected memory bank).

도 2c를 참조하면, 일 실시예에 따라 입력 드라이버(250)가 읽어온 입력 데이터(201)가 SRAM 비트 셀 회로(225)의 메모리 뱅크들로 전달됨에 따라 각 메모리 뱅크에서 연산이 수행되는 과정을 설명하기 위한 도면이 도시된다. 예를 들어, 도 2b를 참조하여 설명한 바와 같이, 인코딩 블록(255)에 제어 신호("00")가 인가됨에 따라, 인코딩 블록(255)은 입력 데이터(예: "0011 0100 1011")를 IMC 회로(220)의 첫번째 메모리 뱅크(bank 0)에 대응하는 비트 셀 유닛들(221-0, 221-1, 221-2, 221-3)에 비트 단위로 순차적으로 제공할 수 있다. 이때, 인코딩 블록(255)는 첫번째 메모리 뱅크(bank 0)를 제외한 나머지 메모리 뱅크들(예: bank 1, bank 2, bank 3)에는 '0'을 제공할 수 있다. 첫번째 메모리 뱅크(bank 0)의 비트 셀 유닛들 각각은 인코딩 블록(255)에서 순차적으로 제공되는 입력 데이터의 값들과 각 비트 셀들(221-0, 221-1, 221-2, 221-3)에 저장된 가중치 값(예를 들어, 임의의 "0" 또는 "1"을 포함하는 가중치(w₀, w₁, w₂, w₃) 간의 연산(예: 곱셈 연산) 결과를 출력할 수 있다. Referring to FIG. 2C, according to one embodiment, the input data 201 read by the input driver 250 is transferred to the memory banks of the SRAM bit cell circuit 225, and an operation is performed in each memory bank. A drawing for explanation is shown. For example, as described with reference to FIG. 2B, as the control signal (“00”) is applied to the encoding block 255, the encoding block 255 converts the input data (e.g., “0011 0100 1011”) into the IMC. It can be provided sequentially in bit units to the bit cell units 221-0, 221-1, 221-2, and 221-3 corresponding to the first memory bank (bank 0) of the circuit 220. At this time, the encoding block 255 may provide '0' to the remaining memory banks (e.g., bank 1, bank 2, bank 3) except the first memory bank (bank 0). Each of the bit cell units of the first memory bank (bank 0) contains values of input data sequentially provided from the encoding block 255 and each bit cell (221-0, 221-1, 221-2, 221-3). The results of an operation (e.g., multiplication operation) between stored weight values (e.g., weights (w ₀ , w ₁ , w ₂ , w ₃ ) containing random “0” or “1”) can be output.

예를 들어, 비트 셀에 저장된 가중치 값 w₀이 "0"이면, 비트 셀 유닛에 연결된 게이트 로직 회로(227)는 입력 데이터(예를 들어, "0011 0100 1010")와 w₀("0")을 비트 단위로 곱한 결과 "0000 0000 0000"을 출력할 수 있다. Bank0 이외의 나머지 메모리 뱅크들의 가중치 내용은 이러한 연산을 위한 게이트 로직 회로(227)의 출력에 영향을 주지 않는데, 이는 나머지 메모리 뱅크들은 각각의 곱셈 연산 중에 모두 인코딩 블록(255)으로부터 "0"을 수신하기 때문이다. 가중치 값 w₀= 1인경우, 비트 셀 유닛(221-0)에 연결된 로직 회로(227)는 연산 결과로써 입력 데이터(예: "0011 0100 1011")와 "1" 간의 곱셈 연산 결과인 "0011 0100 1011"를 출력할 수 있다. 다시, Bank0 이외의 메모리 뱅크의 가중치 값은 이들 메모리 뱅크들이 각각의 곱셈 연산 중에 모두 인코딩 블록(255)으로부터 "0"을 수신하기 때문에 이들 연산에 대한 게이트 논리 회로(227)의 출력에 영향을 주지 않는다. For example, if the weight value w ₀ stored in the bit cell is “0”, the gate logic circuit 227 connected to the bit cell unit outputs the input data (e.g., “0011 0100 1010”) and w ₀ (“0”). ) can be multiplied in bits to output "0000 0000 0000". The weight contents of the remaining memory banks other than Bank0 do not affect the output of the gate logic circuit 227 for this operation, as the remaining memory banks all receive a “0” from the encoding block 255 during each multiplication operation. Because it does. with weight value w ₀ = 1In this case, the logic circuit 227 connected to the bit cell unit 221-0 may output "0011 0100 1011", which is the result of a multiplication operation between input data (e.g., "0011 0100 1011") and "1", as an operation result. there is. Again, the weight values of memory banks other than Bank0 do not affect the output of gate logic circuit 227 for these operations because these memory banks all receive "0" from encoding block 255 during each multiplication operation. No.

도 2d를 참조하면, 일 실시예에 따른 쓰기 비트 라인(WBL) 드라이버(270)가 읽어온 데이터(예: 가중치 값 또는 입력 값)를 메모리 뱅크들(비트 셀들)(예: 첫번째 메모리 뱅크(bank 0))에 쓰는(write) 과정을 설명하기 위한 도면이 도시된다. 쓰기 워드 라인(WWL) 드라이버(210)는 인-메모리 컴퓨팅(IMC) 회로(220)에 데이터를 기록하기 위해 메모리 뱅크들(및 그에 따른 비트 셀들)을 선택할 수 있다. 예를 들어, 첫번째 메모리 뱅크(bank 0)에 데이터를 기록하고자 하는 경우, 쓰기 워드 라인(WWL) 드라이버(210)는 쓰기 워드 라인(WWL)[3:0]에 "1000"를 적용(인가)하여 첫번째 메모리 뱅크(bank 0)를 선택할 수 있다. 네번째 메모리 뱅크(bank 3)에 데이터를 기록하고자 하는 경우, 쓰기 워드 라인(WWL) 드라이버(210)는 쓰기 워드 라인(WWL)[3:0]에 "0001"를 적용하여 네번째 메모리 뱅크(bank 3)를 선택할 수 있다. 또한, 쓰기 비트 라인(WBL) 드라이버(270)는 쓰기 워드 라인(WWL) 드라이버(210)에 의해 선택된 비트 셀들에 저장할 데이터(예: 가중치 값)를 제공할 수 있다. 도 2a에 도시된 WBL[255:0]은 비트 셀들에 데이터를 기록(write)하는 경로에 해당할 수 있다. 도 2a에 도시된 것과 같은 64 x 64 연산기에서는 256 비트의 데이터가 각각 4 비트의 가중치(예를 들어, w₀, w₁, w₂, w₃)를 갖는 64개의 행들(rows)에 동시에 기록될 수 있다. 구조에 따라서, 64 비트의 데이터를 열(column) 방향으로 동시에 기록할 수도 있다. 256 비트의 입력(저장용)이 (주기 당 한 열씩) 각 열에 연속적으로 데이터를 전달할 때, 전체 연산기의 데이터를 총 64 싸이클 동안 기록할 수 있다. 쓰기 비트 라인(WBL) 드라이버(270)가 쓰기 동작을 수행하는 경우, 읽기 워드 라인(RWL)에는 전부 '0'이 입력될 수 있다. Referring to FIG. 2D, the write bit line (WBL) driver 270 according to an embodiment stores the read data (e.g., weight value or input value) in memory banks (bit cells) (e.g., the first memory bank (bank). A drawing is shown to explain the writing process in 0)). Write word line (WWL) driver 210 may select memory banks (and thus bit cells) to write data to in-memory computing (IMC) circuitry 220. For example, if you want to write data to the first memory bank (bank 0), the write word line (WWL) driver 210 applies (applies) “1000” to the write word line (WWL) [3:0]. You can select the first memory bank (bank 0). If you want to write data to the fourth memory bank (bank 3), the write word line (WWL) driver 210 applies “0001” to the write word line (WWL) [3:0] to write data to the fourth memory bank (bank 3). ) can be selected. Additionally, the write bit line (WBL) driver 270 may provide data (eg, weight values) to be stored in bit cells selected by the write word line (WWL) driver 210. WBL[255:0] shown in FIG. 2A may correspond to a path for writing data to bit cells. In a 64 x 64 operator as shown in Figure 2a, 256 bits of data are simultaneously recorded in 64 rows, each with a weight of 4 bits (e.g., w ₀ , w ₁ , w ₂ , w ₃ ). It can be. Depending on the structure, 64 bits of data can be recorded simultaneously in column direction. When 256 bits of input (for storage) continuously transmit data to each column (one column per cycle), the data of the entire operator can be recorded for a total of 64 cycles. When the write bit line (WBL) driver 270 performs a write operation, '0' may be input to the read word line (RWL).

입력 드라이버(250)는 예를 들어, 프로세서(예: 도 10의 프로세서(1010) 참조)와 같은 외부 모듈로부터 입력 데이터(201)를 수신할 수도 있고, 또는 IFM(input feature map) 버퍼(buffer)(예: 도 9의 IFM 버퍼(931) 참조)에 저장된 입력 피처맵(input feature map)으로부터 입력 데이터(201)를 읽어올 수도 있다. 입력 데이터의 소스는 중요하지 않으며, 임의의 소스가 사용될 수 있다. The input driver 250 may receive input data 201 from an external module, for example, a processor (e.g., see processor 1010 in FIG. 10), or an input feature map (IFM) buffer. (For example, see IFM buffer 931 in FIG. 9) Input data 201 may be read from an input feature map stored in the input feature map. The source of the input data is not critical; any source may be used.

예를 들어, 도 2a에 도시된 입력 데이터(201)의 입력 값이 멀티 비트인 경우, 입력 드라이버(250)는 멀티 비트 값들을 비트 자리(bit position) 별로 순차적으로 인-메모리 컴퓨팅(IMC) 회로(220)에 전달할 수 있다. 예를 들어, IMC 매크로(200)가 뉴럴 네트워크 연산을 위해 동작하는 경우, 입력 드라이버(250)는 읽기 워드 라인 드라이버(RWL driver)와 같이 동작할 수 있다. 이하, 입력 드라이버(250)와 읽기 워드 라인 드라이버(RWL driver)는 서로 동일한 의미로 이해될 수 있다. For example, when the input value of the input data 201 shown in FIG. 2A is multi-bit, the input driver 250 sequentially stores the multi-bit values by bit position through an in-memory computing (IMC) circuit. It can be delivered to (220). For example, when the IMC macro 200 operates for neural network operation, the input driver 250 may operate like a read word line driver (RWL driver). Hereinafter, the input driver 250 and the read word line driver (RWL driver) may be understood to have the same meaning.

이 경우, 입력 드라이버(250)는 읽기 워드 라인들(예: RWL₀, RWL₁ 내지 RWL_M-1)에 신경망의 각 레이어들의 M개 노드들로부터 수신된 입력 값들을 인가할 수 있다. 이때, RWLm과 IN[m]은 같은 노드에 해당할 수 있다. In this case, the input driver 250 may apply input values received from M nodes of each layer of the neural network to read word lines (e.g., RWL ₀ , RWL ₁ to RWL _M-1 ). At this time, RWLm and IN[m] may correspond to the same node.

예를 들어, m번째 노드에서의 입력 값이 RWLm에 인가될 수 있고, RWLm에 인가되는 입력 값은 멀티 비트이거나 싱글 비트일 수 있다. 여기서, m은 0 이상 M-1 이하의 정수이고, M은 1 이상의 정수일 수 있다. 예를 들어, RWLm에 인가되는 입력 값이 멀티 비트인 경우, 전술한 바와 같이 순차적으로 비트 자리 별 비트 값이 인-메모리 컴퓨팅(IMC) 회로(220)로 전달될 수 있다. 입력 드라이버(250)는 전술한 노드들로부터 수신된 M개의 입력 값들을 개별적으로 M개의 비트 셀들로 전달할 수 있다. 후술하겠으나, M개의 비트 셀들 각각은 다른 비트 셀들에 대해 병렬적으로 곱 연산을 수행하므로, 각 비트 라인 별로 M개의 곱 연산들이 병렬적으로 수행될 수 있다.For example, the input value from the mth node may be applied to RWLm, and the input value applied to RWLm may be multi-bit or single bit. Here, m is an integer from 0 to M-1, and M may be an integer from 1 to 1. For example, when the input value applied to RWLm is multi-bit, the bit value for each bit position may be sequentially transmitted to the in-memory computing (IMC) circuit 220 as described above. The input driver 250 may individually transfer M input values received from the above-described nodes to M bit cells. As will be described later, each of the M bit cells performs a multiplication operation on other bit cells in parallel, so M multiplication operations can be performed in parallel for each bit line.

또는, 가중치(203)가 멀티 비트인 경우, 가중치(203)를 표현하기 위한 비트 개수 만큼의 출력 라인들이 그룹핑될 수 있다. 그룹핑된 출력 라인들을 '출력 라인 그룹'이라고 부를 수 있다. 예를 들어, 가중치(203)가 X 비트인 경우, X개의 출력 라인들이 그룹핑될 수 있고, IMC 매크로(200)는 그룹핑된 X개의 출력 라인들에 의해 입력 데이터(201)의 입력값과 X 비트의 가중치(203) 간의 곱셈을 합산한 결과를 출력할 수 있다. 여기서, X는 2이상의 정수일 수 있다. Alternatively, when the weight 203 is multi-bit, output lines equal to the number of bits for expressing the weight 203 may be grouped. Grouped output lines can be called an ‘output line group’. For example, if the weight 203 is The result of summing the multiplication between the weights 203 can be output. Here, X may be an integer of 2 or more.

SRAM 비트 셀 회로(225)는 멀티 비트의 가중치를 표현하기 위해 여러 개의 비트 셀들로 구성될 수 있다. 이때, 입력은 멀티 비트의 가중치와의 곱을 위해서 동시에 각 비트 셀들로 인가될 수 있다. 예시적으로 한 그룹으로 묶인 X개의 출력 라인들 중 제1 출력 라인은 가중치의 LSB(least significant bit)에 대응하는 가중치 비트 값과 입력 비트 값 간의 곱셈 결과를 출력할 수 있다. 유사하게, 제x 출력 라인은 LSB로부터 x-1번째 비트 자리의 가중치 비트 값과 입력 비트 값 간의 곱셈 결과를 출력할 수 있다. 여기서, x는 2이상 X 이하의 정수일 수 있다. 이 경우, 누적 연산기(240)는 같은 출력 라인 그룹의 출력 라인에 대응하는 비트 자리를 일정 비트(예: 한 비트) 씩 쉬프트(shift)한 결과를 해당 출력 라인에서 출력된 합산 결과에 적용하고, 비트 자리를 쉬프트한 값들을 누적함으로써 최종 MAC 연산 결과를 출력할 수 있다. 누적 연산기(240)는 예를 들어, 쉬프터(shifter)과 애더(adder)로 구현될 수도 있고, 또는 별도의 누산기(accumulator)에 의해 구현될 수도 있으며, 반드시 이에 한정되지는 않는다. The SRAM bit cell circuit 225 may be composed of several bit cells to express multi-bit weights. At this time, the input can be applied to each bit cell simultaneously for multiplication with the multi-bit weight. Exemplarily, the first output line among the Similarly, the x-th output line can output the result of multiplication between the input bit value and the weight bit value of the x-1th bit position from the LSB. Here, x may be an integer between 2 and X. In this case, the accumulation operator 240 applies the result of shifting the bit positions corresponding to the output lines of the same output line group by a certain bit (e.g., one bit) to the sum result output from the corresponding output line, The final MAC operation result can be output by accumulating the values obtained by shifting the bit positions. The accumulation operator 240 may be implemented with, for example, a shifter and an adder, or a separate accumulator, but is not necessarily limited thereto.

비트 셀 들 각각은 예를 들어, 가중치(203) 값(예: 제1 값)을 저장할 수 있다. 복수의 비트 셀들을 포함하는 인-메모리 컴퓨팅(IMC) 회로(220)의 구조 및 동작은 아래의 도 3 내지 도 4를 통해 보다 구체적으로 설명한다. Each of the bit cells may store, for example, a weight 203 value (eg, a first value). The structure and operation of the in-memory computing (IMC) circuit 220 including a plurality of bit cells will be described in more detail with reference to FIGS. 3 and 4 below.

인-메모리 컴퓨팅(IMC) 회로(220)는 입력 드라이버(250)를 통해 전달받은 입력 데이터(201)의 값과 비트 셀에 저장된 가중치(203) 간의 곱 연산을 수행할 수 있다. 일 실시예에 따른 인-메모리 컴퓨팅(IMC) 회로(220)은 비트 셀(들), 연산기(들)(222) 및 게이트 로직 회로가 연결된 구조를 통해 비트 셀들 각각에 대응하는 연산 결과(예: 비트 와이즈(bit-wise) 곱 연산 결과)에 대응하는 신호를 출력할 수 있다. 예를 들어, 아래의 도 3에서 설명하는 바와 같이, 전반적인 연산 효과로서, 인-메모리 컴퓨팅(IMC) 회로(220)는 비트 셀들 각각에 저장된 가중치(203) 값(예: 제1 값)과 워드 라인을 통해 메모리 뱅크들에 대응하는 비트 셀들의 입력 신호로 인가되는 입력값(예: 제2 값) 간의 곱 연산 결과에 해당하는 신호들에 대한 AND 논리 연산의 결과 값을 가산기(230)로 전달할 수 있다. The in-memory computing (IMC) circuit 220 may perform a multiplication operation between the value of the input data 201 received through the input driver 250 and the weight 203 stored in the bit cell. The in-memory computing (IMC) circuit 220 according to one embodiment has a structure in which bit cell(s), operator(s) 222, and gate logic circuit are connected to calculate operation results (e.g., corresponding to each of the bit cells). A signal corresponding to the bit-wise product operation result can be output. For example, as described in Figure 3 below, as an overall computational effect, the in-memory computing (IMC) circuit 220 stores a weight 203 value (e.g., a first value) and a word stored in each of the bit cells. The result value of the AND logic operation on the signals corresponding to the result of the multiplication operation between the input values (e.g., the second value) applied as the input signals of the bit cells corresponding to the memory banks through the line is transmitted to the adder 230. You can.

연산기(222)는 예를 들어, 트랜지스터 개수를 최소화한 패스 트랜지스터 로직의 형태일 수 있다.For example, the operator 222 may be in the form of pass transistor logic that minimizes the number of transistors.

가산기(230)는 하나 이상의 인-메모리 컴퓨팅(IMC) 회로(220)의 출력 단에 연결될 수 있다. 인-메모리 컴퓨팅(IMC) 회로(220)의 출력 단은 출력 라인에 대응할 수 있다. 하나의 출력 라인에 하나 이상의 인-메모리 컴퓨팅(IMC) 회로(220)의 출력 단이 연결될 수 있다. 가산기(230)는 하나 이상의 인-메모리 컴퓨팅(IMC) 회로(220)에서 출력된 신호를 가산할 수 있다. 가산기(230)는 같은 출력 라인에 연결된 복수의 인-메모리 컴퓨팅(IMC) 회로(220)들의 곱셈 결과를 합산할 수 있다. 가산기(230)는 예를 들어, 전가산기(full adder), 반가산기(half adder), 및/또는 플립-플롭(flip-flop)으로 구현될 수 있다. 가산기(230)는 예를 들어, 가산기 트리 회로(adder tree circuit)와 같은 디지털 가산기에 해당할 수 있으며, 반드시 이에 한정되지는 않는다. Adder 230 may be connected to the output of one or more in-memory computing (IMC) circuits 220. The output stage of the in-memory computing (IMC) circuit 220 may correspond to an output line. The output terminal of one or more in-memory computing (IMC) circuits 220 may be connected to one output line. Adder 230 may add signals output from one or more in-memory computing (IMC) circuits 220. The adder 230 may add the multiplication results of a plurality of in-memory computing (IMC) circuits 220 connected to the same output line. Adder 230 may be implemented as, for example, a full adder, a half adder, and/or a flip-flop. For example, the adder 230 may correspond to a digital adder such as an adder tree circuit, but is not necessarily limited thereto.

또한, 전술한 바와 같이, 인-메모리 컴퓨팅(IMC) 회로(220)의 출력 결과가 전체적으로, AND 논리 연산의 결과 값이므로, 가산기(230)는 각 인-메모리 컴퓨팅(IMC) 회로(220)의 출력 결과를 반전시키는 인버터(inverter)를 포함하여 구현될 수도 있다. 이 경우, 가산기(230)는 인-메모리 컴퓨팅(IMC) 회로(220)의 출력 결과를 반전시킨 값을 합산할 수 있다. 가산기(230)는 비트 셀들 각각에 대응하는 곱셈 결과를 합산한 결과를 누적 연산기(240)에 전달할 수 있다. 가산기(230)는 인-메모리 컴퓨팅(IMC) 회로(220)의 각 출력 라인마다 배치될 수 있다. In addition, as described above, since the output result of the in-memory computing (IMC) circuit 220 is overall the result of the AND logic operation, the adder 230 is the result of each in-memory computing (IMC) circuit 220. It may also be implemented including an inverter that inverts the output result. In this case, the adder 230 may add a value obtained by inverting the output result of the in-memory computing (IMC) circuit 220. The adder 230 may transfer the sum of the multiplication results corresponding to each bit cell to the accumulation operator 240. Adder 230 may be disposed on each output line of the in-memory computing (IMC) circuit 220.

누적 연산기(240)는 하나 이상의 인-메모리 컴퓨팅(IMC) 회로(220)의 곱 연산 결과를 합산하는 가산기(230)의 출력을 저장하고, 합산 결과를 누적할 수 있다. 누적 연산기(240)는 가산기(230)에서 비트 셀들 각각에 대응하는 곱셈 결과를 합산하고, 합산한 결과를 최종적으로 결합하여 MAC 연산 결과(예를 들어, Q₀[13:0] ~ Q₆₃[13:0])로써 출력할 수 있다. The accumulation operator 240 may store the output of the adder 230, which sums the product operation results of one or more in-memory computing (IMC) circuits 220, and accumulates the sum results. The accumulation operator 240 sums the multiplication results corresponding to each bit cell in the adder 230, and finally combines the summed results to produce a MAC operation result (e.g., Q ₀ [13:0] to Q ₆₃ [ 13:0]).

예를 들어, 입력 드라이버(250)가 멀티 비트로 된 입력 데이터(201)를 수신한 경우, 워드 라인 드라이버(210)는 쓰기 워드 라인들(write word lines)(예를 들어, WWL₀[3:0] ~ WWL₆₃[3:0])을 통해 입력 데이터(201)의 비트 자리 별 비트 값을 인-메모리 컴퓨팅(IMC) 회로(220)에게 순차적으로 전달할 수 있다. 이에 따라, 인-메모리 컴퓨팅(IMC) 회로(220) 또한 해당하는 비트 자리의 곱 연산 결과를 출력할 수 있다. 가산기(230)는 해당하는 비트 자리의 곱 연산 결과 값들을 합산한 결과를 누적 연산기(240)에 전달할 수 있다. For example, when the input driver 250 receives multi-bit input data 201, the word line driver 210 generates write word lines (for example, WWL ₀ [3:0 ] ~ WWL ₆₃ [3:0]), the bit value for each bit position of the input data 201 can be sequentially transmitted to the in-memory computing (IMC) circuit 220. Accordingly, the in-memory computing (IMC) circuit 220 may also output the result of the multiplication operation of the corresponding bit digit. The adder 230 may transfer the result of adding up the product operation result values of the corresponding bit positions to the accumulation operator 240.

누적 연산기(240)는 해당하는 비트 자리의 합산 결과를 비트 시프팅(bit shift)하여 합산할 수 있다. 누적 연산기(240)는 다음 비트 자리의 합산 결과를 해당하는 비트 시프팅된 합산 결과에 결합함으로써 곱 연산 결과들을 비트 자리에 따라 누적할 수 있다. 후술하겠으나, 입력 드라이버(250)가 싱글 비트로 된 입력 데이터를 수신한 경우 비트 시프팅이 필요 없으므로, 누적 연산기(240)는 가산기(230)의 합산 결과를 바로 출력하거나, 또는 출력 레지스터(output register)(미도시)에 저장할 수 있다. The accumulation operator 240 can add the sum results of the corresponding bit positions by bit shifting. The accumulation operator 240 may accumulate product operation results according to bit positions by combining the sum result of the next bit position with the corresponding bit shifted sum result. As will be described later, when the input driver 250 receives input data in a single bit, bit shifting is not necessary, so the accumulation operator 240 directly outputs the sum result of the adder 230, or uses an output register. It can be saved at (not shown).

출력 레지스터는 누적 연산기(240)로부터 출력되는 최종 곱 연산 결과(예: 곱셈 누적 결과)를 저장할 수 있다. 누적 연산기(240)는 쉬프트 연산과 합 연산 뿐만 아니라. 누적 연산 또한 수행한다는 점에서 "Shift & adder + accumulaor"(240)라고 부를 수도 있다. 출력 레지스터에 저장된 최종 곱셈 누적 결과(예: MAC 연산 결과)는 예를 들어, 전자 시스템의 프로세서(예: 도 10의 프로세서(1010) 참조)에 의해 판독되어 다른 연산을 위해 사용될 수 있다. 예를 들어, IMC 매크로(200)가 한 번에 뉴럴 네트워크의 일부 레이어에 대응하는 MAC 연산을 수행하는 경우, 출력 레지스터에 저장된 MAC 연산 결과는 다음 레이어에서 수행되는 연산을 위해 워드 라인 드라이버(210)로 전달될 수도 있다. IMC 매크로(200)의 워드 라인 드라이버(210)는 다음 레이어에 대응하는 가중치 집합이 설정된 비트 셀(들)을 선택하여 곱 연산을 수행할 수 있다.The output register may store the final multiplication operation result (eg, multiplication accumulation result) output from the accumulation operator 240. The accumulation operator 240 performs not only shift operations and sum operations. It can also be called "Shift & adder + accumulaor" (240) in that it also performs accumulation operations. The final multiplication accumulation result (e.g., MAC operation result) stored in the output register may be read by, for example, a processor of the electronic system (e.g., see processor 1010 in FIG. 10) and used for other operations. For example, when the IMC macro 200 performs a MAC operation corresponding to some layers of the neural network at once, the MAC operation result stored in the output register is sent to the word line driver 210 for the operation performed in the next layer. It may also be transmitted as . The word line driver 210 of the IMC macro 200 may perform a multiplication operation by selecting bit cell(s) for which a weight set corresponding to the next layer is set.

쓰기 비트 라인 드라이버(WBL driver)(270)는 인-메모리 컴퓨팅(IMC) 회로(220)에 포함된 하나 이상의 비트 셀의 데이터를 쓸 수 있다. 쓰기 비트라인 드라이버(WBL driver)(270)는 '쓰기 회로'로 간략화하여 표현할 수 있다. 이하, '쓰기 비트라인 드라이버'와 '쓰기 회로'는 서로 혼용될 수 있다. The write bit line driver (WBL driver) 270 can write data of one or more bit cells included in the in-memory computing (IMC) circuit 220. The write bit line driver (WBL driver) 270 can be simplified and expressed as a ‘write circuit’. Hereinafter, 'write bitline driver' and 'write circuit' may be used interchangeably.

하나 이상의 비트 셀의 데이터는 예를 들어 MAC 연산에서 입력 값에 곱해질 가중치(203) 값을 포함할 수 있다. 쓰기 비트라인 드라이버(270)는 비트 라인(예: WBL, WBLB)을 통해 인-메모리 컴퓨팅(IMC) 회로(220)의 비트 셀에 액세스(access)할 수 있다. 인-메모리 컴퓨팅(IMC) 회로(220)가 복수의 비트 셀들을 포함하는 경우, 쓰기 비트라인 드라이버(270)는 복수의 워드 라인들(RWL) 중 활성화(activation)된 워드 라인에 연결된 비트 셀에 액세스할 수 있다. 쓰기 비트라인 드라이버(270)는 액세스한 비트 셀에 가중치를 설정(쓰기)하거나, 비트셀에 설정된 가중치를 읽어올 수 있다. The data of one or more bit cells may include a weight 203 value to be multiplied by the input value, for example, in a MAC operation. The write bitline driver 270 may access a bit cell of the in-memory computing (IMC) circuit 220 through a bit line (eg, WBL, WBLB). When the in-memory computing (IMC) circuit 220 includes a plurality of bit cells, the write bit line driver 270 is connected to a bit cell connected to an activated word line among the plurality of word lines (RWL). You can access it. The write bitline driver 270 can set (write) a weight to the accessed bit cell or read the weight set to the bit cell.

메모리 컨트롤러(260)는 워드 라인 드라이버(210), 하나 이상의 인-메모리 컴퓨팅(IMC) 회로(220), 누적 연산기(240)(예를 들어, 누적 연산기<0> ~ 누적 연산기<63>), 가산기(230), 입력 드라이버(250) 및/또는 출력 레지스터를 제어할 수 있다.The memory controller 260 includes a word line driver 210, one or more in-memory computing (IMC) circuits 220, and an accumulation operator 240 (e.g., accumulation operator <0> to accumulation operator <63>). ), the adder 230, the input driver 250, and/or the output register can be controlled.

IMC 매크로(200)는 예를 들어, 뉴럴 네트워크 장치, 인 메모리 컴퓨팅 회로, MAC 연산 회로 및/또는 장치로 구현될 수 있으며, 반드시 이에 한정되지는 않는다. IMC 매크로(200)는 워드 라인을 통해 입력 값을 수신하고, 10T SRAM 비트 셀에 저장된 가중치와 입력 값 간의 곱셈 결과에 대응하는 신호를 비트 라인을 통해 출력할 수 있다. The IMC macro 200 may be implemented as, for example, a neural network device, an in-memory computing circuit, a MAC operation circuit, and/or a device, but is not necessarily limited thereto. The IMC macro 200 may receive an input value through a word line and output a signal corresponding to the result of multiplication between the input value and the weight stored in the 10T SRAM bit cell through the bit line.

도 3은 일 실시예에 따른 인 메모리 컴퓨팅(IMC) 회로의 구조를 설명하기 위한 도면이다. 도 3을 참조하면, 일 실시예에 따른 인 메모리 컴퓨팅(IMC) 회로(220)는 SRAM 비트셀 회로(225) 및 게이트 로직 회로(340)를 포함할 수 있다. FIG. 3 is a diagram for explaining the structure of an in-memory computing (IMC) circuit according to an embodiment. Referring to FIG. 3, an in-memory computing (IMC) circuit 220 according to an embodiment may include an SRAM bitcell circuit 225 and a gate logic circuit 340.

SRAM 비트셀 회로(225)는 복수의 메모리 뱅크들(memory banks) 각각에 대응하는 복수 개의 비트 셀 유닛(223)을 포함할 수 있다. 비트 셀 유닛(223)은 하나의 비트 셀(310) 및 입력된 비트와 하나의 비트 셀(310)에 저장된 값 간의 연산 결과에 해당하는 신호를 출력하는 연산기(320)를 포함할 수 있다. 연산기(320)는 도 2를 통해 전술한 연산기(222)에 해당할 수 있다. _비트 셀(310)은 2개의 인버터들(311,313) 및 2개의 트랜스미션 게이트들(315,317)로 구성된 워드 라인 트랜지스터를 포함할 수 있다. 여기서, '트랜스미션 게이트'는 NMOS 트랜지스터와 PMOS 트랜지스터가 병렬로 연결된 양방향 스위치로서 외부에서 적용하는 로직 레벨에 의해 제어될 수 있다. 예를 들어, 트랜스미션 게이트(315,317)의 인에이블(enable; E) 단자에 '1'이 인가되는 경우, 트랜스미션 게이트(315,317)는 '닫힌(Closed)' 스위치의 역할을 수행할 수 있다. 이와 달리, 트랜스미션 게이트(315,317)의 인에이블 단자에 '0'이 인가되는 경우, 트랜스미션 게이트(315,317)는 '열린(Opened)' 스위치의 역할을 수행할 수 있다. 인버터들(311, 313) 및 트랜스미션 게이트들(315, 317) 각각은 2개의 트랜지스터들로 구성될 수 있다.The SRAM bit cell circuit 225 may include a plurality of bit cell units 223 corresponding to each of a plurality of memory banks. The bit cell unit 223 may include one bit cell 310 and an operator 320 that outputs a signal corresponding to an operation result between an input bit and a value stored in one bit cell 310. The operator 320 may correspond to the operator 222 described above with reference to FIG. 2 . _The bit cell 310 may include a word line transistor consisting of two inverters (311 and 313) and two transmission gates (315 and 317). Here, the 'transmission gate' is a bidirectional switch in which an NMOS transistor and a PMOS transistor are connected in parallel and can be controlled by an externally applied logic level. For example, when '1' is applied to the enable (E) terminal of the transmission gates 315 and 317, the transmission gates 315 and 317 may function as a 'closed' switch. In contrast, when '0' is applied to the enable terminal of the transmission gates 315 and 317, the transmission gates 315 and 317 may function as an 'open' switch. Each of the inverters 311 and 313 and the transmission gates 315 and 317 may be composed of two transistors.

연산기(320)는 복수의 트랜지스터들(예: 제1 트랜지스터(321), 제2 트랜지스터(323))을 포함할 수 있다. 복수의 트랜지스터들(예: 제1 트랜지스터(321), 제2 트랜지스터(323))은 비트 셀(310)에 저장된 제1 값과 입력 드라이버(250)를 통해 비트 셀(310)에 입력 신호로 인가되는 제2 값 간의 비트 와이즈(bit-wise) 곱 연산 결과에 해당하는 신호를 출력할 수 있다. The operator 320 may include a plurality of transistors (eg, a first transistor 321 and a second transistor 323). A plurality of transistors (e.g., the first transistor 321 and the second transistor 323) are applied as an input signal to the bit cell 310 through the first value stored in the bit cell 310 and the input driver 250. A signal corresponding to the result of a bit-wise product operation between the second values may be output.

연산기(320)는 도 3, 도 5 및/또는 도 6에 도시된 것과 같이 2개의 트랜지스터들(2T)로 구성될 수도 있고, 또는 도 7 및/또는 도 8에 도시된 것과 같이 3개의 트랜지스터들(3T)로 구성될 수도 있다. Operator 320 may be composed of two transistors 2T as shown in FIGS. 3, 5, and/or 6, or three transistors as shown in FIGS. 7 and/or 8. It may also be composed of (3T).

예를 들어, 도 3에 도시된 것과 같이, 연산기(320)가 2개의 트랜지스터들로 구성된 경우, 비트 셀 유닛(223)이 10개의 트랜지스터들(2 x 2 + 2 x 2 + 2 = 10)로 구성된다는 점에서 SRAM 비트셀 회로(225)는 '10T SRAM 셀' 구조 또는 '10T' 구조라고 부를 수 있다. For example, as shown in FIG. 3, when the operator 320 is composed of two transistors, the bit cell unit 223 is composed of 10 transistors (2 x 2 + 2 x 2 + 2 = 10). In that it is configured, the SRAM bit cell circuit 225 can be called a '10T SRAM cell' structure or a '10T' structure.

SRAM 비트셀 회로(225)의 비트 셀 유닛들(310) 중 동일한 메모리 뱅크에 해당하는 비트 셀 유닛들에는 동일한 입력값이 인가될 수 있다. '메모리 뱅크'는 전체 메모리 영역을 복수 개의 블록들로 나눈 경우, 하나의 블록에 해당할 수 있다. 메모리 뱅크는 메모리 영역을 나타내는 동일한 번지가 여러 쌍 존재하고, 64bit 단위의 입/출력이 발생하는 경우에 하나의 데이터 패스를 공유하는 묶음인 채널(channel) 안에서 하나 또는 그 이상의 메모리의 논리적 묶음에 해당할 수 있다. 메모리 뱅크는 반드시 여러 쌍 또는 세트로 사용될 수 있다. 메모리 뱅크는 예를 들어, 애더 트리(adder Tree)와 같은 가산기(230)를 공유하는 메모리 그룹에 해당할 수 있다. 비트 셀(310)들은 예를 들어, 4개의 메모리 뱅크들에 대응할 수 있다. The same input value may be applied to bit cell units 310 of the SRAM bit cell circuit 225 corresponding to the same memory bank. A 'memory bank' may correspond to one block when the entire memory area is divided into a plurality of blocks. A memory bank corresponds to a logical bundle of one or more memories within a channel, which is a bundle that shares one data path when there are multiple pairs of identical addresses representing memory areas and 64-bit input/output occurs. can do. Memory banks may be used in multiple pairs or sets. For example, a memory bank may correspond to a memory group that shares an adder 230, such as an adder tree. Bit cells 310 may correspond to four memory banks, for example.

연산기(320)들 각각은 복수의 메모리 뱅크들 중 해당 메모리 뱅크에 대응하는 비트 셀들 각각에 저장된 제1 값과 워드 라인을 통해 해당 메모리 뱅크의 입력 신호로 인가되는 제2 값 간의 비트 와이즈(bit-wise) 곱 연산 결과에 해당하는 신호를 출력하는 복수의 트랜지스터들(예: 321, 323)을 포함할 수 있다. 연산기(320)들 각각은 비트 셀(310)들 각각에 대응할 수 있다. Each of the operators 320 performs a bitwise (bit-) operation between the first value stored in each of the bit cells corresponding to the corresponding memory bank among the plurality of memory banks and the second value applied as the input signal of the corresponding memory bank through the word line. wise) may include a plurality of transistors (eg, 321, 323) that output signals corresponding to the result of the multiplication operation. Each of the operators 320 may correspond to each of the bit cells 310.

게이트 로직 회로(gate logic circuit)(340)는 복수의 메모리 뱅크들 중 MAC(Multiplication and Accumulation)을 위한 대상(target) 메모리 뱅크에 속한 비트 셀들 각각에 대응하는 연산 결과를 가산기(230)로 전달할 수 있다. 게이트 로직 회로(340)는 연산기(320)들 각각에 제2 값이 인가되는지에 따라 해당 메모리 뱅크에 속한 비트 셀들 각각에 대응하는 연산 결과를 가산기(230)로 전달할 수 있다. 게이트 로직 회로(340)는 예를 들어, NAND 게이트, NOR 게이트, XOR 게이트, XNOR 게이트, AND 게이트, 및 OR 게이트 중 어느 하나를 포함할 수 있으며, 반드시 이에 한정되지는 않는다. 예를 들어, 게이트 로직 회로(340)가 NAND 게이트 이외에 NOR 게이트, XOR 게이트, XNOR 게이트, AND 게이트, 및 OR 게이트를 포함하는 경우, 게이트 로직 회로(340)의 구조는 해당하는 게이트의 논리 연산에 부합하는 형태로 변경될 수 있다. The gate logic circuit 340 can transmit the operation results corresponding to each bit cell belonging to the target memory bank for MAC (Multiplication and Accumulation) among the plurality of memory banks to the adder 230. there is. The gate logic circuit 340 may transmit the operation result corresponding to each bit cell belonging to the corresponding memory bank to the adder 230 depending on whether the second value is applied to each of the operators 320. The gate logic circuit 340 may include, but is not necessarily limited to, any one of, for example, a NAND gate, a NOR gate, an XOR gate, an XNOR gate, an AND gate, and an OR gate. For example, if the gate logic circuit 340 includes a NOR gate, an It can be changed to a suitable form.

인 메모리 컴퓨팅(IMC) 회로(220)에서 SRAM 비트셀 회로(225)의 레이아웃(layout)의 크기와 라우팅(routing)의 복잡도는 SRAM IMC 회로의 전력 효율, 및/또는 면적 효율에 큰 영향을 줄 수 있다. In the in-memory computing (IMC) circuit 220, the size of the layout and the complexity of routing of the SRAM bit cell circuit 225 will greatly affect the power efficiency and/or area efficiency of the SRAM IMC circuit. You can.

또한, 메모리의 면적 효율(D_M)은 아래의 수학식 2와 같이 구할 수 있다. Additionally, the area efficiency (D _M ) of the memory can be obtained as shown in Equation 2 below.

여기서, WE는 multi-bit을 위한 메모리 용량에 해당할 수 있다. 예를 들어, 8bit을 표현하기 위해서는 WE는 8이고 되고, 4bit을 표현하기 위해서는 WE는 4가 될 수 있다. Here, WE may correspond to memory capacity for multi-bit. For example, to express 8 bits, WE can be 8, and to express 4 bits, WE can be 4.

면적 밀도는 수학식 3에 따라 메모리의 면적(Area)을 감소시키거나 메모리 뱅크(Bank)의 개수를 증가시킴으로써 개선할 수 있다. 메모리의 면적은 예를 들어, 비트 셀들, 가산기(230) 및/또는 주변 제어 라인들이 차지하는 면적에 해당할 수 있다. Area density can be improved by reducing the area of the memory or increasing the number of memory banks according to Equation 3. For example, the area of the memory may correspond to the area occupied by bit cells, adder 230, and/or peripheral control lines.

마찬가지 원리로, 인 메모리 컴퓨팅(IMC) 회로(220)에 포함된 트랜지스터(들)의 개수를 줄이거나, 메모리셀을 구성하는 트렌지스터(들)의 개수를 줄이거나, 메모리 뱅크들의 개수를 증가시킴으로써 인 메모리 컴퓨팅(IMC) 회로(220)의 면적을 감소시킬 수 있다. In the same principle, by reducing the number of transistor(s) included in the in-memory computing (IMC) circuit 220, reducing the number of transistor(s) constituting the memory cell, or increasing the number of memory banks, The area of the memory computing (IMC) circuit 220 can be reduced.

일 실시예에서는 SRAM의 비트 셀들을 복수의 메모리 뱅크들에 대응하도록 구성하고, 작은 개수(예: 2개 또는 3개)의 트랜지스터들로 구성된 연산기(320) 및 게이트 로직 회로(340)에 의해 복수의 메모리 뱅크들 중 대상(target) 메모리 뱅크에 대응하는 연산 결과가 가산기(230)로 전달되도록 함으로써 인 메모리 컴퓨팅(IMC) 회로의 제어 라인의 개수를 감소시켜 저전압의 쓰기(Write) 동작이 가능하게 하는 한편, 인 메모리 컴퓨팅(IMC) 회로의 면적 효율 또한 향상시킬 수 있다. 여기서, '대상 메모리 뱅크'는 복수의 메모리 뱅크들 중 해당 메모리 뱅크에 속한 셀들 각각에 대응하는 연산 결과가 MAC 연산에 사용되는 경우에 해당 메모리 뱅크를 지칭하는 용어일 수 있다. In one embodiment, the bit cells of the SRAM are configured to correspond to a plurality of memory banks, and the plurality of memory banks are configured by an operator 320 and a gate logic circuit 340 composed of a small number (e.g., 2 or 3) of transistors. By allowing the operation result corresponding to the target memory bank among the memory banks to be transmitted to the adder 230, the number of control lines in the in-memory computing (IMC) circuit is reduced, enabling low-voltage write operation. Meanwhile, the area efficiency of in-memory computing (IMC) circuits can also be improved. Here, 'target memory bank' may be a term referring to a corresponding memory bank when the operation result corresponding to each cell belonging to the corresponding memory bank among a plurality of memory banks is used in the MAC operation.

인-메모리 컴퓨팅(IMC) 회로(220)의 비트 셀들을 복수 개의 메모리 뱅크들로 구성하는 방법은 아래의 도 4를 참조하여 보다 구체적으로 설명한다. A method of configuring the bit cells of the in-memory computing (IMC) circuit 220 into a plurality of memory banks will be described in more detail with reference to FIG. 4 below.

도 4는 일 실시예에 따른 인-메모리 컴퓨팅(IMC) 회로의 블록도이다. 도 4를 참조하면, 복수의 메모리 뱅크들(예를 들어, Bank₀ 내지 Bank_n)에 대응하는 SRAM 비트셀들(415)(예: 도 2의 비트 셀들(221))과 SRAM 비트셀들(415)에 대응하는 연산기들(420)(예: 도 2의 연산기들(222))을 포함하는 SRAM 비트셀 회로(225) 및 가산기(440)(예: 애더 트리)를 포함하는 인-메모리 컴퓨팅(IMC) 회로의 구조를 나타낸 도면(400)이 도시된다. 4 is a block diagram of an in-memory computing (IMC) circuit according to one embodiment. Referring to FIG. 4, SRAM bit cells 415 (e.g., bit cells 221 of FIG. 2) and SRAM bit cells (e.g., Bank ₀ to Bank _{n) corresponding to a plurality of memory banks (e.g., Bank 0 to Bank n} ). In-memory computing including an SRAM bitcell circuit 225 and an adder 440 (e.g., an adder tree) including operators 420 (e.g., operators 222 of FIG. 2) corresponding to 415) A diagram 400 showing the structure of an (IMC) circuit is shown.

4개의 메모리 뱅크들 별로 SRAM 비트 셀 회로(410)의 비트 셀들(415) 중 동일한 메모리 뱅크에 해당하는 비트 셀들에는 입력 드라이버가 읽어온 SRAM의 동일한 워드 라인(예: IN<0: n-1> <0> , .. , IN<0: n-1> <63>)의 값이 인가될 수 있다. 여기서, n은 예를 들어, 64일 수 있으나, 반드시 이에 한정되지는 않는다. Among the bit cells 415 of the SRAM bit cell circuit 410 for each of the four memory banks, the bit cells corresponding to the same memory bank contain the same word line of the SRAM read by the input driver (e.g., IN<0: n-1> The values of <0>, .., IN<0: n-1><63>) can be applied. Here, n may be, for example, 64, but is not necessarily limited thereto.

IMC 회로(220)의 SRAM 비트 셀 회로(225)는 연산기들(420)로 입력되는 외부 입력 신호 값과 SRAM 비트 셀들(415)에 저장된 값 사이의 연산 결과에 해당하는 신호를 출력하는 연산기들(420)을 포함할 수 있다. IMC 회로(220)는 MAC 연산을 위한 대상 메모리 뱅크에 속하는 각 SRAM 비트 셀들(415)에 대응하는 연산 결과가 가산기(440)로 전달되도록 연산을 수행하는 연산기들(420)의 입력을 조절할 수 있다(즉, 목표가 아닌 연산 결과는 연산 결과에 기여하지 않는다). The SRAM bit cell circuit 225 of the IMC circuit 220 includes operators ( 420) may be included. The IMC circuit 220 can adjust the inputs of the operators 420 that perform the operation so that the operation results corresponding to each SRAM bit cell 415 belonging to the target memory bank for the MAC operation are delivered to the adder 440. (That is, non-target computation results do not contribute to the computation result).

연산기들(420) 각각의 출력은 게이트 로직 회로(430)의 로직 연산(예: NAND 로직 연산)을 거쳐 애더 트리와 같은 가산기(440)로 전달될 수 있다. The output of each of the operators 420 may be transmitted to an adder 440 such as an adder tree through a logic operation (eg, NAND logic operation) of the gate logic circuit 430.

인-메모리 컴퓨팅(IMC) 회로(220)는 MAC 연산을 위한 대상 메모리 뱅크에 속한 비트 셀들에 대응하는 연산 결과가 대상 메모리 뱅크의 비트 셀 값과 입력 비트 값에 따라 '1' 또는 0'이 되도록 하고, 대상 메모리 뱅크를 제외한 나머지 메모리 뱅크들에 속한 비트 셀들에 대응하는 연산 결과가 '0'이 되도록 할 수 있다. 이렇게 함으로써 대상 메모리 뱅크에 대응하는 연산 결과를 MAC 연산에 사용할 수 있고, 대상 메모리 뱅크가 아닌 메모리 뱅크들은 연산 결과에 영향을 주지 않는다. The in-memory computing (IMC) circuit 220 ensures that the operation result corresponding to the bit cells belonging to the target memory bank for MAC operation is '1' or 0' depending on the bit cell value of the target memory bank and the input bit value. And, the operation result corresponding to the bit cells belonging to the remaining memory banks except the target memory bank can be set to '0'. By doing this, the operation result corresponding to the target memory bank can be used in the MAC operation, and memory banks other than the target memory bank do not affect the operation result.

일 실시예에서는 SRAM 비트셀들(415)을 복수의 메모리 뱅크들로 구성함으로써 연산기(들)(420)를 제어하는 제어 라인들의 개수를 줄여 인-메모리 컴퓨팅(IMC) 회로(220)의 구현 면적을 감소시키고, 이를 통해 인-메모리 컴퓨팅(IMC) 회로(220)의 면적 효율을 향상시킬 수 있다. In one embodiment, the number of control lines that control the operator(s) 420 is reduced by configuring the SRAM bit cells 415 into a plurality of memory banks, thereby reducing the implementation area of the in-memory computing (IMC) circuit 220. By reducing, the area efficiency of the in-memory computing (IMC) circuit 220 can be improved.

또한, 일 실시예에서는 연산기(들)(420)에 의해 곱 연산을 위한 트랜지스터의 개수를 줄임으로써 인-메모리 컴퓨팅(IMC) 회로(220)를 구성하는 전체 트랜지스터들의 개수를 감소시킬 수 있다. Additionally, in one embodiment, the total number of transistors constituting the in-memory computing (IMC) circuit 220 can be reduced by reducing the number of transistors for the multiplication operation by the operator(s) 420.

인-메모리 컴퓨팅(IMC) 회로(220)는 SRAM 비트셀 회로(225), 게이트 로직 회로(430), 및 가산기(440) 각각에 인가되는 전원을 부분적으로 분리하여 SRAM 비트셀 회로(225), 게이트 로직 회로(430), 및/또는 가산기(440) 각각에 상이한 전압이 인가되도록 할 수 있다. The in-memory computing (IMC) circuit 220 partially separates the power applied to each of the SRAM bitcell circuit 225, the gate logic circuit 430, and the adder 440 to form the SRAM bitcell circuit 225, Different voltages may be applied to each of the gate logic circuit 430 and/or the adder 440.

도 5a 및 도 5b는 일 실시예에 따른 인-메모리 컴퓨팅(IMC) 회로의 연산기가 2개의 트랜지스터들로 구성된 경우의 동작을 설명하기 위한 도면이다. 도 5a를 참조하면, 일 실시예에 따른 SRAM 비트셀 회로(225) 및 게이트 로직 회로(예: NAND 게이트)(430)를 포함하는 인-메모리 컴퓨팅(IMC) 회로의 구조를 나타낸 도면(500)이 도시된다. FIGS. 5A and 5B are diagrams for explaining an operation when an operator of an in-memory computing (IMC) circuit is composed of two transistors, according to an embodiment. Referring to FIG. 5A, a diagram 500 showing the structure of an in-memory computing (IMC) circuit including an SRAM bitcell circuit 225 and a gate logic circuit (e.g., NAND gate) 430 according to an embodiment. This is shown.

SRAM 비트셀 회로(225)는 각각의 비트 셀 유닛에 대해, 8개의 트랜지스터들(8T)로 구성된 비트 셀(310)과 2개의 트랜지스터들(2T)로 구성된 연산기(320) 및 게이트 로직 회로(430)를 결합하여 구현된 곱셈 셀일 수 있다. SRAM 비트셀 회로(225)는 예를 들어, 4비트의 입력 신호들(IN₀, IN₁, IN₂, IN₃) 각각에 연결된 4개의 메모리 뱅크들(Bank0, Bank1, Bank2, Bank3) 각각에 대응하는 비트 셀(310)들 및 비트 셀(310)들 각각에 대응하는 연산기(320)들을 포함할 수 있다.The SRAM bit cell circuit 225 includes, for each bit cell unit, a bit cell 310 consisting of eight transistors (8T), an operator 320 consisting of two transistors (2T), and a gate logic circuit 430. ) may be a multiplication cell implemented by combining. For example, the SRAM bit cell circuit 225 is connected to each of the four memory banks (Bank0, Bank1, Bank2, Bank3) connected to each of the 4-bit input signals (IN ₀ , IN ₁ , IN ₂ , IN ₃ ). It may include corresponding bit cells 310 and operators 320 corresponding to each of the bit cells 310.

연산기(320)는 제1 트랜지스터(N₁)(321) 및 제2 트랜지스터(P₁)(323)를 포함하는 2개의 트랜지스터(2T) 회로로 구성될 수 있다. 제1 트랜지스터(321)는 예를 들어, NMOS 트랜지스터에 해당할 수 있으며, 반드시 이에 한정되지는 않는다. 또한, 제2 트랜지스터(323)는 PMOS 트랜지스터에 해당할 수 있으며, 반드시 이에 한정되지는 않는다. The operator 320 may be composed of a two transistor (2T) circuit including a first transistor (N ₁ ) (321) and a second transistor (P ₁ ) (323). The first transistor 321 may correspond to, for example, an NMOS transistor, but is not necessarily limited thereto. Additionally, the second transistor 323 may correspond to a PMOS transistor, but is not necessarily limited thereto.

예를 들어, 대상 메모리 뱅크(예: 메모리 뱅크 0(Bank₀))의 입력 신호에 해당하는 제2 값(예: 입력 신호 IN₀)은 메모리 뱅크 0에 대응하는 비트 셀(310)의 읽기 워드 라인(RWL)을 통해 제1 트랜지스터(321)의 제1 게이트 단자 및 제2 트랜지스터(323)의 제2 게이트 단자에 인가될 수 있다. 제1 트랜지스터(321)의 드레인 단자에는 대상 메모리 뱅크인 메모리 뱅크 0에 속한 비트 셀(310)에 저장된 가중치(W)의 역전된 가중치(Inverted Weight)() 값이 인가될 수 있다. 제1 트랜지스터(321)의 소스 단자는 제2 트랜지스터(323)의 드레인 단자를 거쳐 게이트 로직 회로(430)의 입력 단자에 연결될 수 있다. For example, the second value (e.g., input signal IN ₀ ) corresponding to the input signal of the target memory bank (e.g., memory bank 0 (Bank ₀ )) is the read word of the bit cell 310 corresponding to memory bank 0. It may be applied to the first gate terminal of the first transistor 321 and the second gate terminal of the second transistor 323 through the line RWL. The drain terminal of the first transistor 321 contains an inverted weight of the weight (W) stored in the bit cell 310 belonging to memory bank 0, which is the target memory bank. ) value can be applied. The source terminal of the first transistor 321 may be connected to the input terminal of the gate logic circuit 430 via the drain terminal of the second transistor 323.

제1 트랜지스터(321)의 제1 게이트 단자를 거친 제1 트랜지스터(321)의 출력값은 제2 트랜지스터(323)의 제2 게이트 단자를 거친 제2 트랜지스터(323)의 출력값과 연결됨으로써 비트 와이즈 곱 연산 결과에 해당하는 신호(예를 들어, O₁)로 출력될 수 있다. The output value of the first transistor 321 that passes through the first gate terminal of the first transistor 321 is connected to the output value of the second transistor 323 that passes through the second gate terminal of the second transistor 323, thereby performing a bitwise multiplication operation. It may be output as a signal (for example, O ₁ ) corresponding to the result.

도 5b를 참조하면, 도 5a에 도시된 인-메모리 컴퓨팅(IMC) 회로에서 메모리 뱅크 0(Bank₀)가 대상 메모리 뱅크인 경우의 SRAM 비트셀 회로(225)의 동작을 나타낸 진리표(truth table)(530)가 도시된다. Referring to FIG. 5B, a truth table showing the operation of the SRAM bitcell circuit 225 when memory bank 0 (Bank ₀ ) is the target memory bank in the in-memory computing (IMC) circuit shown in FIG. 5A. 530 is shown.

도 5a의 열(column)의 헤딩은 도 5a의 회로에서 동일한 포인트/라인들에 대응할 수 있다. The headings of the columns in FIG. 5A may correspond to the same points/lines in the circuit of FIG. 5A.

일례로, 메모리 뱅크 0(Bank₀)에 대응하는 입력 신호 IN₀는 '1'이고, 메모리 뱅크 1(Bank₁), 메모리 뱅크 2(Bank₂), 및 메모리 뱅크 3(Bank₃) 각각에 대응하는 입력 신호 IN_1,IN_2,IN₃는 '0' 일 수 있다. 또한, 메모리 뱅크 0(Bank₀)의 비트 셀(310)에 저장된 가중치(W)가 '1'이라면, 역전된 가중치()는 '0'일 수 있다. For example, the input signal IN ₀ corresponding to memory bank 0 (Bank ₀ ) is '1', and corresponds to memory bank 1 (Bank ₁ ), memory bank 2 (Bank ₂ ), and memory bank 3 (Bank ₃ ), respectively. The input signals IN _1, IN _{2, and} IN ₃ may be '0'. In addition, if the weight (W) stored in the bit cell 310 of memory bank 0 (Bank ₀ ) is '1', the inverted weight ( ) may be '0'.

이때, '1'인 입력 신호 IN₀가 제1 트랜지스터(321)(메모리 뱅크 0(Bank₀)의 NMOS 트랜지스터)의 게이트 단자에 인가되면, 제1 트랜지스터(321)의 게이트 단자와 소스 단자 사이에 전위차가 발생하므로 채널이 형성되어 제1 트랜지스터(321)가 'ON' 될 수 있다. 제1 트랜지스터(321)가 'ON' 되면, 제1 트랜지스터(321)의 드레인 단자에 연결된 역전된 가중치() = '0'가 메모리 뱅크 0(Bank₀)에 대응하는 비트 셀(310)의 출력 값(O₀)으로 출력될 수 있다. 또한, '1'인 입력 신호 IN₀가 메모리 뱅크 0(Bank₀)의 PMOS 트랜지스터인 제2 트랜지스터(323)의 게이트 단자에 인가되면, 제2 트랜지스터(323)의 제2 게이트 단자와 소스 단자 사이에 전위차가 발생하지 않으므로 채널이 형성되지 않아 제2 트랜지스터(323)가 'OFF' 될 수 있다. At this time, when the input signal IN ₀ of '1' is applied to the gate terminal of the first transistor 321 (NMOS transistor of memory bank 0 (Bank ₀ )), between the gate terminal and the source terminal of the first transistor 321 As a potential difference occurs, a channel is formed and the first transistor 321 can be turned 'ON'. When the first transistor 321 is 'ON', the inverted weight connected to the drain terminal of the first transistor 321 ( ) = '0' may be output as the output value (O ₀ ) of the bit cell 310 corresponding to memory bank 0 (Bank ₀ ). In addition, when the input signal IN ₀ of '1' is applied to the gate terminal of the second transistor 323, which is the PMOS transistor of memory bank 0 (Bank ₀ ), between the second gate terminal and the source terminal of the second transistor 323 Since no potential difference occurs, a channel is not formed and the second transistor 323 may be turned 'OFF'.

이때, 대상이 아닌 메모리 뱅크 1, 메모리 뱅크 2, 및 메모리 뱅크 3 각각에 대응하는 입력 신호 IN_1,IN_2,IN_3,가 '0' 이라면, 전술한 것과 마찬가지로 방식으로 메모리 뱅크 1, 메모리 뱅크 2, 및 메모리 뱅크 3에 대응하는 비트셀들의 연산기들(320)의 출력 값(O_1,)은 '1'이 될 수 있다. 따라서, NAND 게이트(430)의 출력은 출력 O₀에 의존하게 된다. 각 메모리 뱅크들에 대응하는 비트셀들의 출력값들 중 메모리 뱅크 0에 대응하는 비트 셀(310)의 연산기들(320)의 출력 값(O₀)이 '0'이므로, NAND 게이트(430)의 출력 값(O)은 '1'이 될 수 있다. At this time, if the input signals IN _1, IN ₂ , and IN 3 corresponding to each of the non-target memory bank 1, memory bank 2, and memory bank ₃ are '0', memory bank 1 and memory bank 3 are inputted in the same manner as described above. 2, and the output value (O _1, ) of the operators 320 of the bit cells corresponding to memory bank 3 may be '1'. Accordingly, the output of the NAND gate 430 depends on the output O ₀ . Since the output value (O ₀ ) of the operators 320 of the bit cell 310 corresponding to memory bank 0 among the output values of the bit cells corresponding to each memory bank is '0', the output of the NAND gate 430 The value (O) can be '1'.

또는, 메모리 뱅크 0에 대응하는 입력 신호 IN₀가 '0'이고, 메모리 뱅크 1, 메모리 뱅크 2, 및 메모리 뱅크 3 각각에 대응하는 입력 신호 IN_1,IN_2,IN_3,가 '0'일 수 있다. 또한, 메모리 뱅크 0의 비트 셀(310)에 저장된 가중치(W)가 '0'이라면, 역전된 가중치()는 '1'일 수 있다. Alternatively, the input signal IN ₀ corresponding to memory bank 0 is '0', and the input signals IN _1, IN _{2, IN 3, corresponding to memory bank 1, memory bank 2,} and memory bank 3 _, respectively, are '0'. You can. Additionally, if the weight (W) stored in the bit cell 310 of memory bank 0 is '0', the inverted weight ( ) may be '1'.

제1 트랜지스터(321)(메모리 뱅크 0의 NMOS 트랜지스터)의 게이트 단자에 '0'인 입력 신호 IN₀가 인가되면, 제1 트랜지스터(321)의 게이트 단자와 소스 단자 사이에 전위차가 발생하지 않으므로 제1 트랜지스터(321)의 단자에 채널이 형성되지 않아 제1 트랜지스터(321)가 'OFF' 될 수 있다. 또한, '0'인 입력 신호 IN₀가 제2 트랜지스터(323)( 메모리 뱅크 0의 PMOS 트랜지스터)의 게이트 단자에 인가되면, 제2 트랜지스터(323)의 제2 게이트 단자와 소스 단자 사이에 발생한 전위차에 의해 채널이 형성되므로 제2 트랜지스터(323)가 'ON' 될 수 있다. 제2 트랜지스터(323)가 'ON' 되면, 메모리 뱅크 0에 해당하는 비트 셀(310)의 연산기(320)의 출력값(O₀) 으로 제2 트랜지스터(323)의 소스 단자에 인가된 Vdd 전압에 해당하는 '1'이 출력될 수 있다. When the input signal IN ₀ of '0' is applied to the gate terminal of the first transistor 321 (NMOS transistor of memory bank 0), a potential difference does not occur between the gate terminal and the source terminal of the first transistor 321, so the first 1 Since a channel is not formed at the terminal of the transistor 321, the first transistor 321 may be turned 'OFF'. In addition, when the input signal IN ₀ of '0' is applied to the gate terminal of the second transistor 323 (PMOS transistor of memory bank 0), the potential difference generated between the second gate terminal and the source terminal of the second transistor 323 Since a channel is formed by , the second transistor 323 can be turned 'ON'. When the second transistor 323 is 'ON', the output value (O ₀ ) of the operator 320 of the bit cell 310 corresponding to memory bank 0 is changed to the Vdd voltage applied to the source terminal of the second transistor 323. The corresponding '1' may be output.

메모리 뱅크 1, 메모리 뱅크 2, 및 메모리 뱅크 3 각각에 대응하는 입력 신호 IN_1,IN_2,IN_3,가 '0' 인 경우, 전술한 것과 마찬가지로 방식으로 메모리 뱅크 1, 메모리 뱅크 2, 및 메모리 뱅크 3 각각에 대응하는 비트셀들의 출력 값들(O₁)은 '1'이 될 수 있다. 각 메모리 뱅크들에 대응하는 비트셀들의 출력값들이 모두 '1'인 경우, NAND 게이트(430)의 출력 값(O)은 '0'이 될 수 있고, 그 결과, AND 로직 연산을 수행한 것과 같은 결과를 얻을 수 있다. When the input signals IN _1, IN 2, and IN 3 corresponding to each of memory bank 1, memory bank ₂ , and memory bank ₃ are '0', memory bank 1, memory bank 2, and memory are inputted in the same manner as described above. The output values (O ₁ ) of the bit cells corresponding to each of bank 3 may be '1'. If the output values of the bit cells corresponding to each memory bank are all '1', the output value (O) of the NAND gate 430 may be '0', and as a result, the same as performing the AND logic operation. You can get results.

도 5a에서 입력 신호 I0에 연결된 메모리 뱅크 0에 대응하는 연산기(320)에 인가되는 입력 신호 IN₀ 와 비트 셀(310)에 저장된 가중치(W) 간의 곱 연산은 비트 셀(310)에 저장된 가중치(W)의 역전된 가중치(Inverted Weight)()와 입력 신호 IN₀를 입력으로 하는 패스 트랜지스터 로직(Pass transistor logic) 구조를 통해 수행될 수 있다. 여기서, '패스 트랜지스터 로직'은 기본 입력을 사용하여 게이트 단자, 소스 단자, 및 드레인 단자를 구동함으로써 로직을 구현하기 위한 트랜지스터를 줄이는 데 이용될 수 있다. 보완적인 CMOS 로직에서는 기본 입력이 게이트 단자를 구동할 수 있다. 여기서, 기본 입력은 예를 들어, 입력, 반전 입력, VDD, 및 GND에 해당할 수 있다 In Figure 5a, the product operation between the input signal IN ₀ applied to the operator 320 corresponding to memory bank 0 connected to the input signal I0 and the weight (W) stored in the bit cell 310 is the weight stored in the bit cell 310 ( Inverted Weight of W) ( ) and the input signal IN ₀ can be performed through a pass transistor logic structure. Here, 'pass transistor logic' can be used to reduce the number of transistors to implement logic by driving the gate terminal, source terminal, and drain terminal using basic input. In complementary CMOS logic, the primary input can drive the gate terminal. Here, the primary input may correspond to, for example, input, inverting input, VDD, and GND.

전술한 것과 같이 도 5a는 NMOS 패스 트랜지스터를 사용하는 인-메모리 컴퓨팅(IMC) 회로에 의해 AND 기능이 구현되는 일 예를 도시한다. NMOS 패스 트랜지스터에서 게이트 입력이 높으면 왼쪽 NMOS 트랜지스터, 다시 말해 제1 트랜지스터(321)가 켜지고, 소스 입력이 출력에 복사될 수 있다. 이와 달리, NMOS 패스 트랜지스터에서 게이트 입력이 낮으면, 오른쪽 NMOS 패스 트랜지스터, 다시 말해 제2 트랜지스터(323)가 켜지고, 출력에 '0'을 전달할 수 있다. As described above, Figure 5A shows an example of an AND function being implemented by an in-memory computing (IMC) circuit using NMOS pass transistors. When the gate input in the NMOS pass transistor is high, the left NMOS transistor, that is, the first transistor 321, is turned on, and the source input can be copied to the output. In contrast, when the gate input of the NMOS pass transistor is low, the right NMOS pass transistor, that is, the second transistor 323, is turned on and '0' can be transmitted to the output.

도 5b에 도시된 진리표(530)는 전술한 동작의 검증을 위한 AND 게이트의 진리표에 해당할 수 있다. The truth table 530 shown in FIG. 5B may correspond to the truth table of the AND gate for verification of the above-described operation.

이때, MAC 연산을 위해 사용되는 메모리 뱅크에 대응하는 비트 셀들의 읽기 워드 라인(RWL)을 통해 입력 신호 '1'이 인가되어 해당 메모리 뱅크에 속한 비트 셀들의 연산 결과가 가산기(예: 도 4의 가산기(440))로 전달됨으로써 해당 메모리 뱅크가 선택된 것과 같이 처리될 수 있다. 이와 달리, MAC 연산을 위해 사용되지 않는 메모리 뱅크에 대응하는 비트 셀들의 읽기 워드 라인(RWL)에는 입력 신호 '0'이 인가되어 해당 메모리 뱅크에 속한 비트 셀들의 연산 결과가 전달되지 않음으로써 해당 메모리 뱅크가 선택되지 않은 것과 같이 처리될 수 있다. At this time, the input signal '1' is applied through the read word line (RWL) of the bit cells corresponding to the memory bank used for MAC operation, and the operation results of the bit cells belonging to the corresponding memory bank are transmitted to the adder (e.g., in FIG. 4). By being transmitted to the adder 440, it can be processed as if the corresponding memory bank has been selected. In contrast, the input signal '0' is applied to the read word line (RWL) of the bit cells corresponding to the memory bank that is not used for MAC operation, and the operation results of the bit cells belonging to the corresponding memory bank are not transmitted, thereby It can be treated as if the bank was not selected.

일 실시예에서는 입력 신호 IN₀를 읽기 위한 별도의 읽기 워드 라인(RWL) 제어 신호가 없어도 바로 2개의 트랜지스터들로 구성된 게이트 로직 회로(430)(예: NAND 게이트)를 이용하여 비트 와이즈 곱 연산을 수행하므로, 인터페이스의 제어 라인의 개수는 비트 셀(310) 당 4개(예: WBL(Write Bit Line), WWL(Write Word Line), WWBL(Write Bit line inverted), 및 RWL(Read Word Line))로 감소될 수 있다.In one embodiment, a bitwise product operation is performed using the gate logic circuit 430 (e.g., NAND gate) consisting of two transistors even without a separate read word line (RWL) control signal to read the input signal IN _0. Therefore, the number of control lines of the interface is 4 per bit cell 310 (e.g., Write Bit Line (WBL), Write Word Line (WWL), Write Bit line inverted (WWBL), and Read Word Line (RWL). ) can be reduced to

따라서, 인-메모리 컴퓨팅(IMC) 회로를 구성하는 총 트랜지스터들의 개수는 4 뱅크 x (SRAM 비트 셀(8T) + 연산기(2T)) + 게이트 로직 회로(8 T) = 4 X 10T + 8T = 48T이고, 총 제어 라인들의 개수는 4 뱅크 X 4 = 16 개가 될 수 있다. 도 5에서 입력 신호 IN₀에 연결된 메모리 뱅크 0에 대응하는 비트 셀(310)의 곱 연산의 출력 값(O0)은 다른 비트 셀들의 곱 연산의 출력 값들(O1)과 함께 NAND 게이트(430)로 전달될 수 있다. NAND 게이트(430)는 진리표(530)에서와 같이 4개의 비트 셀들의 출력 값들(O₀및 O₁)에 대해 NAND 로직 연산을 수행한 결과(O)를 가산기(230)의 입력으로 전달함으로써 MAC 연산을 수행할 수 있다. Therefore, the total number of transistors constituting the in-memory computing (IMC) circuit is 4 banks x (SRAM bit cell (8T) + operator (2T)) + gate logic circuit (8 T) = 4 , and the total number of control lines can be 4 banks x 4 = 16. In FIG. 5, the output value (O0) of the product operation of the bit cell 310 corresponding to memory bank 0 connected to the input signal IN ₀ is sent to the NAND gate 430 together with the output values (O1) of the product operation of other bit cells. It can be delivered. The NAND gate 430 transfers the result (O) of performing a NAND logic operation on the output values (O ₀ and O ₁ ) of the four bit cells as shown in the truth table 530 to the input of the adder 230, thereby performing MAC Calculations can be performed.

도 6a 및 도 6b는 일 실시예에 따른 인-메모리 컴퓨팅(IMC) 회로가 메모리 뱅크를 선택하는 방법을 설명하기 위한 도면이다. 도 6a를 참조하면, 일 실시예에 따라 인-메모리 컴퓨팅(IMC) 회로의 메모리 뱅크 0(Bank₀)(610)가 대상 메모리 뱅크로 선택되고, 인-메모리 컴퓨팅(IMC) 회로의 메모리 뱅크 1(Bank₁) (630)이 대상 메모리 뱅크로 선택되지 않은 경우가 도시된다. 도 6b는 도 6a의 인-메모리 컴퓨팅(IMC) 회로의 입력 값 및 출력 값을 갖는 진리표(650)를 도시한다. 도 6A에서, V_DD는 전원 전압(power supply voltage)을 나타낼 수 있다. FIGS. 6A and 6B are diagrams to explain how an in-memory computing (IMC) circuit selects a memory bank according to an embodiment. Referring to FIG. 6A, according to one embodiment, memory bank 0 (Bank ₀ ) 610 of the in-memory computing (IMC) circuit is selected as the target memory bank, and memory bank 1 of the in-memory computing (IMC) circuit is selected as the target memory bank. A case where (Bank ₁ ) 630 is not selected as the target memory bank is shown. FIG. 6B shows a truth table 650 with input and output values of the in-memory computing (IMC) circuit of FIG. 6A. In FIG. 6A, V _DD may represent a power supply voltage.

도 6a에 도시된 바와 같이, 대상 메모리 뱅크로 선택된 메모리 뱅크 0(610)의 비트 셀에 저장된 가중치(W)가 '0'이고, 워드 라인을 통해 메모리 뱅크 0의 입력 신호(IN₀)로 인가되는 값이 '1'인 경우, 메모리 뱅크 0에 대응하는 출력(O₀) 값이 '1'이 될 수 있다. 메모리 뱅크 0에 대응하는 비트 셀의 출력(O₀) 값(예: NAND 게이트로 입력되는 비트 셀의 (O_0,O₁) 중 어느 하나임)이 '1'인 경우, NAND 게이트의 출력 값(O)은 '0'이 되므로 가산기(230)에서의 MAC 연산에 영향을 주지 않는다. As shown in Figure 6a, the weight (W) stored in the bit cell of memory bank 0 (610) selected as the target memory bank is '0', and is applied as the input signal (IN ₀ ) of memory bank 0 through the word line. If the value is '1', the output (O ₀ ) value corresponding to memory bank 0 may be '1'. If the output (O ₀ ) value of the bit cell corresponding to memory bank 0 (e.g., one of (O _0, O ₁ ) of the bit cell input to the NAND gate) is '1', the output value of the NAND gate ( Since O) is '0', it does not affect the MAC operation in the adder 230.

즉, 다른 메모리 뱅크들(예: 메모리 뱅크 1, 메모리 뱅크 2, 메모리 뱅크 3)의 해당 비트 셀 단위가 "0"의 입력을 가질 때, 각각의 비트 셀 연산자는 모두 NAND 게이트에 "1"을 출력한다. 따라서 NAND 게이트의 출력은 메모리 뱅크 0(610)의 비트 셀 유닛들의 출력에 의해서 결정될 수 있다. 다른 메모리 뱅크들(예: 메모리 뱅크 1, 메모리 뱅크 2, 메모리 뱅크 3)의 가중치는 NAND 게이트의 출력에 영향을 줄 수 없다. 다시 말해, 메모리 뱅크 0의 비트 셀 유닛만이 "1"의 입력을 가지기 때문에, 메모리 뱅크 0의 비트 셀 유닛은 4개의 비트 셀 유닛들 중에서 가중치 W가 NAND 게이트의 출력에 영향을 미칠 수 있는 유일한 비트 셀 유닛이 될 수 있다. That is, when the corresponding bit cell units of other memory banks (e.g., memory bank 1, memory bank 2, memory bank 3) have an input of “0”, each bit cell operator all inputs “1” to the NAND gate. Print out. Therefore, the output of the NAND gate can be determined by the output of the bit cell units of memory bank 0 (610). The weights of other memory banks (e.g., memory bank 1, memory bank 2, memory bank 3) cannot affect the output of the NAND gate. In other words, since only the bit cell unit in memory bank 0 has an input of "1", the bit cell unit in memory bank 0 is the only one of the four bit cell units whose weight W can affect the output of the NAND gate. It can be a bit cell unit.

메모리 뱅크 0(610)에 대응하는 비트 셀에 저장된 가중치(W)가 '1'이고, 워드 라인을 통해 메모리 뱅크 0(610)의 입력 신호(IN₀)로 인가되는 값이 '1'인 경우, 메모리 뱅크 0에 대응하는 출력(O₀) 값이 '0'이 될 수 있다. 각 메모리 뱅크들에 대응하는 비트셀들의 출력값들(O_0,O₁) 중 어느 하나인 메모리 뱅크 0에 대응하는 비트 셀의 출력(O₀) 값이 '0'인 경우, NAND 게이트의 출력 값(O)은 '1'이 되므로 가산기(230)에서의 MAC 연산에 영향을 줄 수 있다. 이와 같이, 입력 신호 '1'이 인가되는 메모리 뱅크에 대응하는 출력이 가산기(230)에서의 MAC 연산에 영향을 주므로, 일 실시예에서는 별도의 제어 신호가 없이도 RWL(Read Word Line)를 통한 입력 신호의 인가를 통해 대상 메모리 뱅크(예: 메모리 뱅크 0)가 MAC 연산을 위해 선택된 것처럼 작용하도록 할 수 있다. When the weight (W) stored in the bit cell corresponding to memory bank 0 (610) is '1' and the value applied to the input signal (IN ₀ ) of memory bank 0 (610) through the word line is '1'. , the output (O ₀ ) value corresponding to memory bank 0 may be '0'. When the output (O ₀ ) value of the bit cell corresponding to memory bank 0, which is one of the output values (O _0, O ₁ ) of the bit cells corresponding to each memory bank, is '0', the output value of the NAND gate Since (O) becomes '1', it may affect the MAC operation in the adder 230. In this way, since the output corresponding to the memory bank to which the input signal '1' is applied affects the MAC operation in the adder 230, in one embodiment, the input through RWL (Read Word Line) is performed without a separate control signal. Application of a signal can cause the target memory bank (e.g., memory bank 0) to act as if it has been selected for MAC operation.

또는 예를 들어, 도면(600)과 같이 메모리 뱅크 1에 대응하는 비트 셀에 저장된 가중치(W)가 '1'이고, 워드 라인을 통해 메모리 뱅크 1의 입력 신호(IN₃)로 인가되는 값이 '0'인 경우, 메모리 뱅크 1에 대응하는 출력(O₁) 값이 (예를 들어, 높은 레벨의 하이 값을 가지는) '1'이 되므로 가산기(230)에서의 MAC 연산은 NAND 게이트의 출력에 영향을 받지 않을 수 있다. Or, for example, as shown in the figure 600, the weight (W) stored in the bit cell corresponding to memory bank 1 is '1', and the value applied to the input signal (IN ₃ ) of memory bank 1 through the word line is If it is '0', the output (O ₁ ) value corresponding to memory bank 1 is '1' (e.g., has a high level high value), so the MAC operation in the adder 230 is the output of the NAND gate. may not be affected.

요약하면, 각각의 메모리 뱅크의 비트 셀 그룹 중에서, 각각의 비트 셀은 그 자신의 각각의 연산기(예를 들어, 비트 곱셈기)를 갖는다. 동작 대상이 아닌 메모리 뱅크의 비트 셀 유닛에는 "비활성화" 또는 "제어" 입력 신호("0")가 공급될 수 있다. 이러한 신호는 실제 입력 데이터 신호가 아니라 메모리 뱅크의 타겟팅/선택 회로에 의해 제공될 수 있다. 실제 입력되는 데이터 신호는 현재 동작 대상인 메모리 뱅크의 비트 셀 유닛에 공급될 수 있다. 데이터 입력이 "0"이면 연산 결과/출력은 "0"이지만, 데이터 입력이 "1"이면 연산 결과는 대상 비트 셀에 저장된 값(예: 가중치 비트)에 따라 달라질 수 있다. 대상 비트 셀에 저장된 가중치 비트 값이 "1"이면 연산 결과는 "1"이 되고, 대상 비트 셀에 저장된 가중치 비트 값이 "0"이면 연산 결과는 "0"이 될 수 있다. In summary, among the bit cell groups of each memory bank, each bit cell has its own respective operator (eg, bit multiplier). A “deactivation” or “control” input signal (“0”) may be supplied to bit cell units of a memory bank that are not subject to operation. These signals may be provided by the memory bank's targeting/selection circuitry rather than actual input data signals. The actual input data signal may be supplied to the bit cell unit of the memory bank that is currently the target of operation. If the data input is "0", the operation result/output is "0", but if the data input is "1", the operation result may vary depending on the value (e.g. weight bit) stored in the target bit cell. If the weight bit value stored in the target bit cell is “1”, the operation result may be “1”, and if the weight bit value stored in the target bit cell is “0”, the operation result may be “0”.

도 7은 일 실시예에 따른 인-메모리 컴퓨팅(IMC) 회로의 연산기가 3개의 트랜지스터들로 구성된 경우의 동작을 설명하기 위한 도면이다. 도 7을 참조하면, 일 실시예에 따라 입력 신호들(I_0, I₁, I₂, I₃) 각각에 연결된 4개의 메모리 뱅크들 각각에 대응하는 비트 셀(310) 및 비트 셀들 각각에 대응하는 연산기(710)를 포함하는 SRAM 비트 셀 회로 및 게이트 로직 회로(430)(예: NAND 게이트)를 포함하는 인-메모리 컴퓨팅(IMC) 회로의 구조를 나타낸 도면(700)이 도시된다.FIG. 7 is a diagram illustrating the operation of an in-memory computing (IMC) circuit when the operator is composed of three transistors, according to an embodiment. Referring to FIG. 7, according to one embodiment, a bit cell 310 and bit cells corresponding to each of four memory banks connected to each of the input signals (I _0, I ₁ , I ₂ , I ₃ ) A diagram 700 is shown showing the structure of an in-memory computing (IMC) circuit including an SRAM bit cell circuit including an operator 710 and a gate logic circuit 430 (e.g., a NAND gate).

연산기(710)는 트랜스미션 게이트(transmission gate)(711) 및 제3 트랜지스터(713)를 포함하는 3개의 트랜지스터(3T) 회로로 구성될 수 있다. 제3 트랜지스터(713)는 예를 들어, PMOS 트랜지스터에 해당할 수 있으며, 반드시 이에 한정되지는 않는다. The operator 710 may be composed of a three transistor (3T) circuit including a transmission gate 711 and a third transistor 713. The third transistor 713 may correspond to, for example, a PMOS transistor, but is not necessarily limited thereto.

대상 메모리 뱅크(예: 메모리 뱅크 0)의 입력 신호에 해당하는 제2 값(예: 입력 신호 I₀)은 메모리 뱅크 0에 대응하는 비트 셀의 읽기 워드 라인(RWL)을 통해 트랜스미션 게이트(711)의 엔이에블(enable; E) 단자와 제3 트랜지스터(713)의 게이트 단자('제3 게이트 단자')에 인가될 수 있다. A second value (e.g., input signal I ₀ ) corresponding to the input signal of the target memory bank (e.g., memory bank 0) is transmitted to the transmission gate 711 through the read word line (RWL) of the bit cell corresponding to memory bank 0. It can be applied to the enable (E) terminal of and the gate terminal ('third gate terminal') of the third transistor 713.

또한, 트랜스미션 게이트(711)의 입력(In) 단자에는 대상 메모리 뱅크인 메모리 뱅크 0에 속한 비트 셀(310)에 저장된 가중치(W)의 역전된 가중치(Inverted Weight)()가 인가될 수 있다. 비트 셀(310)의 역전된 입력(Inverted input)()은 트랜스미션 게이트(711)의 인에이블 바() 단자 및 제3 트랜지스터(713)의 소스 단자와 연결될 수 있다. In addition, the input (In) terminal of the transmission gate 711 contains an inverted weight (Inverted Weight) of the weight (W) stored in the bit cell 310 belonging to memory bank 0, which is the target memory bank. ) may be approved. Inverted input of the bit cell 310 ( ) is the enable bar ( ) terminal and the source terminal of the third transistor 713.

트랜스미션 게이트(711)의 출력값과 제3 트랜지스터(713)의 제3 게이트 단자를 거친 제3 트랜지스터(713)의 출력값 각각은 NAND 게이트(430)의 입력과 연결되어 비트 와이즈 곱 연산 결과에 해당하는 신호로 출력될 수 있다. The output value of the transmission gate 711 and the output value of the third transistor 713 that passed through the third gate terminal of the third transistor 713 are each connected to the input of the NAND gate 430 to produce a signal corresponding to the result of the bitwise product operation. It can be output as .

예를 들어, 표(730)에 기재된 것과 같이, 메모리 뱅크 0의 비트 셀(310)에 저장된 가중치(W)가 '1'이고, 메모리 뱅크 0에 대응하는 입력 신호 I₀가 '1'이며, 메모리 뱅크 1, 메모리 뱅크 2, 및 메모리 뱅크 3 각각에 대응하는 입력 신호 I_1,I_2,I_3,가 '0'일 수 있다.For example, as shown in table 730, the weight (W) stored in the bit cell 310 of memory bank 0 is '1', the input signal I ₀ corresponding to memory bank 0 is '1', Input signals I _1, I 2, and I 3 corresponding to memory bank 1, memory bank ₂ , and memory bank 3 _, respectively, may be '0'.

이때, '1'인 입력 신호 I₀가 트랜스미션 게이트(711)의 인에이블 단자에 인가되면, 트랜스미션 게이트(711)는 '단힌(Closed)' 스위치의 역할을 수행하므로, 트랜스미션 게이트(711)의 입력 단자에 연결된 역전된 가중치() 값인 '0'이 트랜스미션 게이트(711)의 출력(Out) 단자로 출력될 수 있다. 또한, 제3 트랜지스터(713)의 게이트 단자에 입력 신호 I₀= '1'이 인가됨에 따라 제3 트랜지스터(713)의 소스 단자에 연결된 역전된 입력() = '0'이 제3 트랜지스터(713)의 출력값으로 출력될 수 있다. At this time, when the input signal I ₀ of '1' is applied to the enable terminal of the transmission gate 711, the transmission gate 711 functions as a 'closed' switch, so the input of the transmission gate 711 Inverted weight connected to the terminal ( ) The value '0' can be output to the output terminal of the transmission gate 711. In addition, as the input signal I ₀ = '1' is applied to the gate terminal of the third transistor 713, the inverted input connected to the source terminal of the third transistor 713 ( ) = '0' may be output as the output value of the third transistor 713.

트랜스미션 게이트(711)에서 출력된 출력 값('0')과 제3 트랜지스터(713)의 출력값('0')이 모두 '0'이므로 메모리 뱅크 0에 대응하는 비트 셀(310)의 출력 값(O₀)으로 '0'이 출력될 수 있다. 각 메모리 뱅크들에 대응하는 비트셀들의 출력값들 중 메모리 뱅크 0에 대응하는 연산기(710)의 출력 값(O₀)이 '0'이므로, NAND 게이트(430)의 출력 값(O)은 '1'이 될 수 있다. Since both the output value ('0') output from the transmission gate 711 and the output value ('0') of the third transistor 713 are '0', the output value of the bit cell 310 corresponding to memory bank 0 ( O ₀ ), '0' can be output. Since the output value (O ₀ ) of the operator 710 corresponding to memory bank 0 among the output values of the bit cells corresponding to each memory bank is '0', the output value (O) of the NAND gate 430 is '1''It can be.

도 7에 도시된 인-메모리 컴퓨팅(IMC) 회로 구조에서 입력 신호가 '1'인 경우, 비트 셀에 저장된 데이터() 값이 스위치로 동작하는 트랜스미션 게이트(711)를 통해 전달되므로 도 5에 도시된 인-메모리 컴퓨팅(IMC) 회로 구조에 비해 저전압으로 동작이 가능할 수 있다. In the in-memory computing (IMC) circuit structure shown in FIG. 7, when the input signal is '1', the data stored in the bit cell ( ) Since the value is transmitted through the transmission gate 711 that operates as a switch, it can be operated at a lower voltage than the in-memory computing (IMC) circuit structure shown in FIG. 5.

또한, 도 7에 도시된 인-메모리 컴퓨팅(IMC) 회로의 단위(unit) 비트 셀을 구성하는 총 트랜지스터들의 개수는 4 뱅크 x (SRAM 비트 셀(8T) + 연산기(3T)) + 게이트 로직 회로(430)(8T NAND 게이트) = 4 X 11T + 8T = 52T이고, 총 제어 라인들의 개수는 4 뱅크 X 5개(예: WBL, WWL, WWLB, RWL, RWLB(Read Word Line inverted)) = 20 개가 될 수 있다. Additionally, the total number of transistors constituting a unit bit cell of the in-memory computing (IMC) circuit shown in FIG. 7 is 4 bank x (SRAM bit cell (8T) + operator (3T)) + gate logic circuit. (430)(8T NAND gate) = 4 It could be a dog.

도 8은 다른 실시예에 따른 인-메모리 컴퓨팅(IMC) 회로의 연산기가 3개의 트랜지스터들로 구성된 경우의 동작을 설명하기 위한 도면이다. 도 8을 참조하면, 일 실시예에 따라 입력 신호들(I_0,I_1, I_2,I₃) 각각에 연결된 4개의 메모리 뱅크들 각각에 대응하는 비트 셀(310) 및 비트 셀(310)들 각각에 대응하는 연산기(810)를 포함하는 SRAM 비트 셀 회로 및 게이트 로직 회로(430)를 포함하는 인-메모리 컴퓨팅(IMC) 회로의 구조를 나타낸 도면(800)이 도시된다. FIG. 8 is a diagram for explaining the operation of an in-memory computing (IMC) circuit when the operator is composed of three transistors according to another embodiment. Referring to FIG. 8, according to one embodiment, a bit cell 310 and a bit cell 310 corresponding to each of four memory banks connected to each of the input signals (I _0, I _1, I _2, I ₃ ). A diagram 800 is shown showing the structure of an in-memory computing (IMC) circuit including a gate logic circuit 430 and an SRAM bit cell circuit including an operator 810 corresponding to each.

연산기(810)는 NMOS 트랜지스터와 PMOS 트랜지스터가 병렬로 연결된 트랜스미션 게이트(811) 및 제4 트랜지스터(813)를 포함하는 3개의 트랜지스터(3T) 회로로 구성될 수 있다. 트랜스미션 게이트(811)는 각 트랜지스터의 게이트에 인가되는 입력 I에 의해 스위치 온 또는 오프될 수 있다. 제4 트랜지스터(813)는 예를 들어, PMOS 트랜지스터에 해당할 수 있으며, 반드시 이에 한정되지는 않는다. The operator 810 may be composed of a three-transistor (3T) circuit including a transmission gate 811 and a fourth transistor 813 in which an NMOS transistor and a PMOS transistor are connected in parallel. The transmission gate 811 can be switched on or off by the input I applied to the gate of each transistor. The fourth transistor 813 may correspond to, for example, a PMOS transistor, but is not necessarily limited thereto.

대상 메모리 뱅크(예: 메모리 뱅크 0)의 입력 신호에 해당하는 제2 값(예: 입력 신호 I₀)은 메모리 뱅크 0에 대응하는 비트 셀의 읽기 워드 라인(RWL)을 통해 트랜스미션 게이트(811)의 엔이에블(E) 단자와 제4 트랜지스터(813)의 게이트 단자('제4 게이트 단자')에 인가될 수 있다. A second value (e.g., input signal I ₀ ) corresponding to the input signal of the target memory bank (e.g., memory bank 0) is transmitted to the transmission gate 811 through the read word line (RWL) of the bit cell corresponding to memory bank 0. It can be applied to the enable (E) terminal of and the gate terminal ('fourth gate terminal') of the fourth transistor 813.

또한, 트랜스미션 게이트(811)의 입력(In) 단자에는 대상 메모리 뱅크인 메모리 뱅크 0에 속한 비트 셀(310)에 저장된 가중치(W)의 역전된 가중치(Inverted Weight)()가 인가될 수 있다. 비트 셀(310)의 역전된 입력(Inverted input)()은 트랜스미션 게이트(811)의 인에이블 바() 단자에 연결될 수 있다. In addition, the input (In) terminal of the transmission gate 811 contains an inverted weight (Inverted Weight) of the weight (W) stored in the bit cell 310 belonging to memory bank 0, which is the target memory bank. ) may be approved. Inverted input of the bit cell 310 ( ) is the enable bar ( ) can be connected to the terminal.

제4 트랜지스터(813)의 소스 단자는 Vdd와 연결되고, 제4 트랜지스터(813)의 드레인 단자는 비트 셀(310)에 저장된 가중치(W)의 역전된 가중치()와 연결될 수 있다. The source terminal of the fourth transistor 813 is connected to Vdd, and the drain terminal of the fourth transistor 813 is the inverted weight (W) of the weight (W) stored in the bit cell 310. ) can be connected to.

트랜스미션 게이트(811)의 출력값과 제4 트랜지스터(813)의 제4 게이트 단자를 거친 제4 트랜지스터(813)의 출력값 각각은 NAND 게이트(430)의 입력과 연결되어 비트 와이즈 곱 연산 결과에 해당하는 신호로 출력될 수 있다. The output value of the transmission gate 811 and the output value of the fourth transistor 813 that passed through the fourth gate terminal of the fourth transistor 813 are each connected to the input of the NAND gate 430 to produce a signal corresponding to the result of the bitwise product operation. It can be output as .

예를 들어, 표(830)에 기재된 것과 같이, 메모리 뱅크 0의 비트 셀(310)에 저장된 가중치(W)가 '1'이고, 메모리 뱅크 0에 대응하는 입력 신호 I₀가 '1'이며, 메모리 뱅크 1, 메모리 뱅크 2, 및 메모리 뱅크 3 각각에 대응하는 입력 신호 I_1,I_2,I_3,가 '0'일 수 있다.For example, as shown in table 830, the weight (W) stored in the bit cell 310 of memory bank 0 is '1', the input signal I ₀ corresponding to memory bank 0 is '1', Input signals I _1, I 2, and I 3 corresponding to memory bank 1, memory bank ₂ , and memory bank 3 _, respectively, may be '0'.

이때, '1'인 입력 신호 I₀가 트랜스미션 게이트(811)의 인에이블(E) 단자에 인가되면, 트랜스미션 게이트(811)는 '단힌(Closed)' 스위치의 역할을 수행하므로, 트랜스미션 게이트(811)의 입력 단자에 연결된 역전된 가중치() 값인 '0'이 트랜스미션 게이트(811)의 출력(Out) 단자로 출력될 수 있다. 또한, 제4 트랜지스터(813)의 게이트 단자에 입력 신호 I₀= '1'이 인가되면, 제4 트랜지스터(813)의 게이트 단자('제4 게이트 단자')와 소스 단자 사이에 전위차가 발생하지 않으므로 채널이 형성되지 않아 제4 트랜지스터(813)가 'OFF' 될 수 있다. 이에 따라 제4 트랜지스터(813)의 출력 값은 '0'이 될 수 있다. At this time, when the input signal I ₀ of '1' is applied to the enable (E) terminal of the transmission gate 811, the transmission gate 811 functions as a 'closed' switch, so the transmission gate 811 ) connected to the input terminal of the inverted weight ( ) The value '0' can be output to the output terminal of the transmission gate 811. In addition, when the input signal I ₀ = '1' is applied to the gate terminal of the fourth transistor 813, no potential difference occurs between the gate terminal ('fourth gate terminal') and the source terminal of the fourth transistor 813. Therefore, a channel is not formed and the fourth transistor 813 may be turned 'OFF'. Accordingly, the output value of the fourth transistor 813 may be '0'.

트랜스미션 게이트(811)에서 출력된 출력 값('0')과 제4 트랜지스터(813)의 출력값('0')이 모두 '0'이므로 메모리 뱅크 0에 대응하는 비트 셀(310)의 연산기(810)의 출력 값(O₀)으로 '0'이 출력될 수 있다. 각 메모리 뱅크들에 대응하는 비트셀들의 출력값들 중 메모리 뱅크 0에 대응하는 비트 셀(310)의 연산기(810)의 출력 값(O₀)이 '0'이므로, NAND 게이트(430)의 출력 값(O)은 '1'이 될 수 있다. Since the output value ('0') output from the transmission gate 811 and the output value ('0') of the fourth transistor 813 are both '0', the operator 810 of the bit cell 310 corresponding to memory bank 0 '0' may be output as the output value (O ₀ ). Since the output value (O ₀ ) of the operator 810 of the bit cell 310 corresponding to memory bank 0 among the output values of the bit cells corresponding to each memory bank is '0', the output value of the NAND gate 430 (O) can be '1'.

도 9는 일 실시예에 따른 인-메모리 컴퓨팅(IMC) 회로를 포함하는 뉴럴 네트워크 장치의 블록도이다. 도 9를 참조하면, 일 실시예에 따른 뉴럴 네트워크 장치(900)는 어레이 회로(910) 및 컨트롤러(930)를 포함한다. Figure 9 is a block diagram of a neural network device including an in-memory computing (IMC) circuit according to one embodiment. Referring to FIG. 9, the neural network device 900 according to one embodiment includes an array circuit 910 and a controller 930.

어레이 회로(910)는 인-메모리 컴퓨팅(IMC) 회로들(915)을 포함한다. 인-메모리 컴퓨팅(IMC) 회로들(915) 각각은 복수의 메모리 뱅크들 중 해당 메모리 뱅크에 대응하는 비트 셀들 각각에 저장된 제1 값과. 제2 값 간의 연산 결과에 해당하는 신호를 출력하는 연산기들이 비트 셀들 각각에 대응하여 구비된다. 인-메모리 컴퓨팅(IMC) 회로들(915) 각각은 연산기들을 포함하는 SRAM 비트셀 회로, 및 게이트 로직 회로를 포함할 수 있다. 인-메모리 컴퓨팅(IMC) 회로들(915) 각각은 도 2 내지 도 8을 통해 전술한 인-메모리 컴퓨팅(IMC) 회로에 해당할 수 있다. Array circuitry 910 includes in-memory computing (IMC) circuits 915 . Each of the in-memory computing (IMC) circuits 915 includes a first value stored in each of the bit cells corresponding to the corresponding memory bank among the plurality of memory banks. Operators that output signals corresponding to calculation results between second values are provided corresponding to each bit cell. Each of the in-memory computing (IMC) circuits 915 may include a SRAM bitcell circuit containing operators, and a gate logic circuit. Each of the in-memory computing (IMC) circuits 915 may correspond to the in-memory computing (IMC) circuit described above with reference to FIGS. 2 to 8 .

SRAM 비트셀 회로는 복수의 메모리 뱅크들에 대응하는 비트 셀들을 포함하고, 비트 셀들은 메모리 뱅크 별로 SRAM의 워드 라인과 연결될 수 있다.The SRAM bit cell circuit includes bit cells corresponding to a plurality of memory banks, and the bit cells may be connected to a word line of the SRAM for each memory bank.

연산기들은 비트 셀들 각각에 대응하는 연산 결과에 해당하는 신호를 출력할 수 있다. 연산기들은 복수의 메모리 뱅크들 중 해당 메모리 뱅크에 대응하는 비트 셀들 각각에 저장된 제1 값과 워드 라인을 통해 해당 메모리 뱅크의 입력 신호로 인가되는 제2 값 간의 비트 와이즈(bit-wise) 곱 연산 결과에 해당하는 신호를 출력하는 복수의 트랜지스터들을 포함할 수 있다. 연산기들 각각은 2개의 트랜지스터(2T) 회로로 구성될 수도 있고, 또는 3개의 트랜지스터(3T) 회로로 구성될 수도 있다. Operators may output signals corresponding to operation results corresponding to each bit cell. The operators calculate the result of a bit-wise multiplication between the first value stored in each of the bit cells corresponding to the corresponding memory bank among the plurality of memory banks and the second value applied as the input signal of the corresponding memory bank through the word line. It may include a plurality of transistors that output signals corresponding to . Each of the operators may be composed of two transistor (2T) circuits, or three transistor (3T) circuits.

예를 들어, 연산기들 각각은 제1 트랜지스터 및 제2 트랜지스터를 포함하는 2개의 트랜지스터(2T) 회로로 구성될 수 있다. 이 경우, 해당 메모리 뱅크의 입력 신호에 해당하는 제2 값은 제1 트랜지스터의 제1 게이트 단자 및 상기 제2 트랜지스터의 제2 게이트 단자에 인가될 수 있다. 또한, 제1 게이트 단자를 거친 제1 트랜지스터의 출력값은 제2 게이트 단자를 거친 제2 트랜지스터의 출력값과 연결됨으로써 비트 와이즈 곱 연산 결과에 해당하는 신호로 출력될 수 있다. For example, each of the calculators may be composed of a two transistor (2T) circuit including a first transistor and a second transistor. In this case, the second value corresponding to the input signal of the corresponding memory bank may be applied to the first gate terminal of the first transistor and the second gate terminal of the second transistor. Additionally, the output value of the first transistor that passes through the first gate terminal can be connected to the output value of the second transistor that passes through the second gate terminal, thereby outputting a signal corresponding to the result of the bitwise product operation.

또는, 연산기들 각각은 트랜스미션 게이트(transmission gate) 및 제3 트랜지스터를 포함하는 3개의 트랜지스터(3T) 회로로 구성될 수 있다. 이 경우, 해당 메모리 뱅크의 입력 신호에 해당하는 제2 값은 트랜스미션 게이트의 엔이에블(enable) 단자 및 제3 트랜지스터의 제3 게이트 단자에 인가될 수 있다. 트랜스미션 게이트의 출력값은 제3 게이트 단자를 거친 제3 트랜지스터의 출력값과 연결되어 비트 와이즈 곱 연산 결과에 해당하는 신호로 출력될 수 있다. Alternatively, each of the operators may be composed of a three transistor (3T) circuit including a transmission gate and a third transistor. In this case, the second value corresponding to the input signal of the corresponding memory bank may be applied to the enable terminal of the transmission gate and the third gate terminal of the third transistor. The output value of the transmission gate may be connected to the output value of the third transistor through the third gate terminal and output as a signal corresponding to the result of the bitwise multiplication operation.

게이트 로직 회로(또는 로직 게이트들)는 MAC 연산을 위해 대상 메모리 뱅크에 속하는 비트 셀들 각각에 대응하는 연산 결과를 가산기로 전달할 수 있다. 인 메모리 컴퓨팅(IMC) 회로들(915) 각각은 도 3 내지 도 8을 통해 전술한 인 메모리 컴퓨팅(IMC) 회로에 해당할 수 있다. The gate logic circuit (or logic gates) may transfer the operation result corresponding to each bit cell belonging to the target memory bank to an adder for MAC operation. Each of the in-memory computing (IMC) circuits 915 may correspond to the in-memory computing (IMC) circuit described above with reference to FIGS. 3 to 8 .

컨트롤러(930)는 클럭 신호에 따라, 뉴럴 네트워크 장치(900)의 입력 신호에 해당하는 제2 값들을 인-메모리 컴퓨팅(IMC) 회로들(915) 각각에 입력하고, 인-메모리 컴퓨팅(IMC) 회로들(915) 각각을 제어할 수 있다. The controller 930 inputs second values corresponding to the input signal of the neural network device 900 to each of the in-memory computing (IMC) circuits 915 according to the clock signal, and performs the in-memory computing (IMC) circuits 915. Each of the circuits 915 can be controlled.

컨트롤러(930)는 예를 들어, 제2 값들을 포함하는 입력 피처맵(input feature map)을 저장하는 IFM(input feature map) 버퍼(buffer)(931), 인-메모리 컴퓨팅(IMC) 회로들(915) 각각에 대한 제2 값들의 인가 여부를 제어하는 제어 회로(933), 및 제1 값들을 읽거나 쓰는 RW(read write) 회로(935) 중 적어도 하나를 포함할 수 있다. The controller 930 includes, for example, an input feature map (IFM) buffer 931 that stores an input feature map including second values, and in-memory computing (IMC) circuits ( 915) It may include at least one of a control circuit 933 that controls whether to apply the second values to each, and a read write (RW) circuit 935 that reads or writes the first values.

제어 회로(933)는 연산기들에 포함된 복수의 트랜지스터들에 대한 제2 값의 인가 여부를 제어함으로써 게이트 로직 회로가 해당 메모리 뱅크에 속한 비트 셀들 각각에 대응하는 연산 결과를 가산기로 전달하도록 할 수 있다. The control circuit 933 can control whether to apply a second value to a plurality of transistors included in the operators so that the gate logic circuit transmits the operation result corresponding to each bit cell in the corresponding memory bank to the adder. there is.

IMC 장치는 가중치, 입력 데이터/맵 등과 같은 신경망 데이터를 참조하여 위에서 설명되었지만, IMC 장치는 임의의 특정 유형의 데이터에 제한되지 않는다. 다시 말해, 처리하는 데 사용되는 데이터 유형에 관계없이 회로와 장치는 새롭고 유익할 수 있다. 신경망 데이터 처리는 많은 잠재적 응용 프로그램 중 하나일 뿐입니다.Although the IMC device is described above with reference to neural network data such as weights, input data/maps, etc., the IMC device is not limited to any particular type of data. In other words, circuits and devices can be novel and beneficial, regardless of the type of data they are used to process. Neural network data processing is just one of many potential applications.

도 10은 일 실시예에 따른 뉴럴 네트워크 장치를 포함하는 전자 시스템의 블록도이다. 도 10을 참조하면, 일 실시예에 따른 전자 시스템(1000)은 뉴럴 네트워크(예: 도 1의 뉴럴 네트워크(110))를 기초로 입력 데이터를 실시간으로 분석하여 유효한 정보를 추출하고, 추출된 정보를 기초로 상황 판단을 하거나, 전자 시스템(1000)이 탑재되는 전자 디바이스의 구성들을 제어할 수 있다. 예컨대 전자 시스템(1000)은 드론(drone), 첨단 운전자 보조 시스템(Advanced Drivers Assistance System; ADAS) 등과 같은 로봇 장치, 스마트 TV, 스마트폰, 의료 디바이스, 모바일 디바이스, 영상 표시 디바이스, 계측 디바이스, IoT 디바이스 등에 적용될 수 있으며, 이 외에도 다양한 종류의 전자 디바이스들 중 적어도 하나에 탑재될 수 있다. Figure 10 is a block diagram of an electronic system including a neural network device according to one embodiment. Referring to FIG. 10, the electronic system 1000 according to one embodiment extracts valid information by analyzing input data in real time based on a neural network (e.g., the neural network 110 of FIG. 1), and extracts the extracted information. Based on this, the situation can be judged or the configurations of the electronic device on which the electronic system 1000 is mounted can be controlled. For example, the electronic system 1000 may include robotic devices such as drones, advanced driver assistance systems (ADAS), smart TVs, smartphones, medical devices, mobile devices, video display devices, measurement devices, and IoT devices. It can be applied to the like, and in addition, it can be mounted on at least one of various types of electronic devices.

전자 시스템(1000)는 프로세서(1010), RAM(Random Access Memory)(1020), 뉴럴 네트워크 장치(1030), 메모리(1040), 센서 모듈(1050) 및 송수신 모듈(1060)을 포함할 수 있다. 전자 시스템(1000)은 입출력 모듈, 보안 모듈, 전력 제어 장치 등을 더 포함할 수 있다. 전자 시스템(1000)의 하드웨어 구성들 중 일부는 적어도 하나의 반도체 칩에 탑재될 수 있다. The electronic system 1000 may include a processor 1010, random access memory (RAM) 1020, a neural network device 1030, a memory 1040, a sensor module 1050, and a transmission/reception module 1060. The electronic system 1000 may further include an input/output module, a security module, a power control device, etc. Some of the hardware components of the electronic system 1000 may be mounted on at least one semiconductor chip.

프로세서(1010)는 전자 시스템(1000)의 전반적인 동작을 제어한다. 프로세서(1010)는 하나의 프로세서 코어(Single Core)를 포함하거나, 복수의 프로세서 코어들(Multi-Core)을 포함할 수 있다. 프로세서(1010)는 메모리(1040)에 저장된 프로그램들 및/또는 데이터를 처리 또는 실행할 수 있다. 일부 실시예에 있어서, 프로세서(1010)는 메모리(1040)에 저장된 프로그램들을 실행함으로써, 뉴럴 네트워크 장치(1030)의 기능을 제어할 수 있다. 프로세서(1010)는 CPU(Central Processing Unit), GPU(Graphics Processing Unit), AP(Application Processor) 등으로 구현될 수 있다.The processor 1010 controls the overall operation of the electronic system 1000. The processor 1010 may include one processor core (Single Core) or may include a plurality of processor cores (Multi-Core). The processor 1010 may process or execute programs and/or data stored in the memory 1040. In some embodiments, the processor 1010 may control the functions of the neural network device 1030 by executing programs stored in the memory 1040. The processor 1010 may be implemented as a Central Processing Unit (CPU), Graphics Processing Unit (GPU), or Application Processor (AP).

RAM(1020)은 프로그램들, 데이터, 또는 명령들(instructions)을 일시적으로 저장할 수 있다. 예컨대 메모리(1040)에 저장된 프로그램들 및/또는 데이터는 프로세서(1010)의 제어 또는 부팅 코드에 따라 RAM(1020)에 일시적으로 저장될 수 있다. RAM(1020)은 예를 들어, DRAM(Dynamic RAM) 또는 SRAM(Static RAM) 등의 메모리로 구현될 수 있다.RAM 1020 may temporarily store programs, data, or instructions. For example, programs and/or data stored in the memory 1040 may be temporarily stored in the RAM 1020 according to the control or booting code of the processor 1010. The RAM 1020 may be implemented as a memory such as, for example, Dynamic RAM (DRAM) or Static RAM (SRAM).

뉴럴 네트워크 장치(1030)는 수신되는 입력 데이터를 기초로 뉴럴 네트워크의 연산을 수행하고, 수행 결과를 기초로 다양한 정보 신호를 생성할 수 있다. 뉴럴 네트워크는 예를 들어, CNN(Convolution Neural Network), RNN(Recurrent Neural Network), FNN(Fuzzy Neural Networks), Deep Belief Networks, Restricted Boltzman Machines 등을 포함할 수 있으나 반드시 이에 한정되지는 않는다. 뉴럴 네트워크 장치(1030)는 예를 들어, 뉴럴 네트워크 전용의 하드웨어 가속기 자체 및/또는 이를 포함하는 장치일 수도 있고, 또는 도 9를 통해 전술한 뉴럴 네트워크 장치(900)에 해당할 수 있다. The neural network device 1030 may perform a neural network operation based on received input data and generate various information signals based on the performance results. Neural networks may include, for example, Convolution Neural Network (CNN), Recurrent Neural Network (RNN), Fuzzy Neural Networks (FNN), Deep Belief Networks, Restricted Boltzman Machines, etc., but are not necessarily limited thereto. For example, the neural network device 1030 may be a neural network-specific hardware accelerator itself and/or a device including the same, or may correspond to the neural network device 900 described above with reference to FIG. 9 .

뉴럴 네트워크 장치(1030)는 인-메모리 컴퓨팅(IMC) 회로의 SRAM 비트셀 회로들이 동일한 입력 데이터를 공유 및/또는 처리하도록 제어하고, SRAM 비트셀 회로들로부터 출력되는 연산 결과들 중 적어도 일부를 선별할 수 있다. The neural network device 1030 controls the SRAM bitcell circuits of the in-memory computing (IMC) circuit to share and/or process the same input data, and selects at least some of the operation results output from the SRAM bitcell circuits. can do.

여기서, '정보 신호'는 예를 들어, 음성 인식 신호, 사물 인식 신호, 영상 인식 신호, 생체 정보 인식 신호 등과 같은 다양한 종류의 인식 신호 중 하나를 포함할 수 있다. 예를 들어, 뉴럴 네트워크 장치(1030)는 비디오 스트림에 포함되는 프레임 데이터를 입력 데이터로서 수신하고, 프레임 데이터로부터 프레임 데이터가 나타내는 이미지에 포함된 사물에 대한 인식 신호를 생성할 수 있다. 뉴럴 네트워크 장치(1030)는 전자 시스템(1000)이 탑재된 전자 시스템의 종류 또는 기능에 따라 다양한 종류의 입력 데이터를 수신할 수 있고, 입력 데이터에 따른 인식 신호를 생성할 수 있다.Here, the 'information signal' may include one of various types of recognition signals, such as, for example, a voice recognition signal, an object recognition signal, an image recognition signal, a biometric information recognition signal, etc. For example, the neural network device 1030 may receive frame data included in a video stream as input data and generate a recognition signal for an object included in the image represented by the frame data from the frame data. The neural network device 1030 may receive various types of input data depending on the type or function of the electronic system on which the electronic system 1000 is mounted, and may generate a recognition signal according to the input data.

메모리(1040)는 데이터를 저장하기 위한 저장 장소로서, OS(Operating System), 각종 프로그램들, 및 각종 데이터를 저장할 수 있다. 실시예에 있어서, 메모리(1040)는 뉴럴 네트워크 장치(1030)의 연산 수행 과정에서 생성되는 중간 결과들을 저장할 수 있다. The memory 1040 is a storage location for storing data and can store an operating system (OS), various programs, and various data. In an embodiment, the memory 1040 may store intermediate results generated during the operation of the neural network device 1030.

메모리(1040)는 휘발성 메모리 또는 비휘발성 메모리 중 적어도 하나를 포함할 수 있다. 비휘발성 메모리는 예를 들어, ROM(Read Only Memory), PROM(Programmable Read Only Memory), EPROM(Erasable Programmable Read Only Memory), EEPROM(Electrically Erasable Programmable Read Only Memory), 플래시 메모리(flash memory) 등을 포함할 수 있으며, 반드시 이에 한정되지는 않는다. 휘발성 메모리는 예를 들어, DRAM(Dynamic RAM), SRAM(Static RAM), SDRAM, PRAM(Phase Change Memory RAM), MRAM(Magnetoresistive RAM), RRAM(Resistive RAM), 및/또는 FRAM(Ferroelectric RAM) 등을 포함할 수 있으며, 반드시 이에 한정되지는 않는다. 실시예에 따라서, 메모리(1040)는 HDD(Hard Disk Drive), SSD(Solid State Driver), CF(Compact Flash) 카드, SD(Secure Digital) 카드, Micro-SD, Mini-SD, Xd 픽처 카드(extreme Digital Picture Card) 또는 메모리 스틱(Memory Stick) 중 적어도 하나를 포함할 수 있다. The memory 1040 may include at least one of volatile memory or non-volatile memory. Non-volatile memory includes, for example, Read Only Memory (ROM), Programmable Read Only Memory (PROM), Erasable Programmable Read Only Memory (EPROM), Electrically Erasable Programmable Read Only Memory (EEPROM), and flash memory. It may include, but is not necessarily limited to this. Volatile memory includes, for example, Dynamic RAM (DRAM), Static RAM (SRAM), SDRAM, Phase Change Memory RAM (PRAM), Magnetoresistive RAM (MRAM), Resistive RAM (RRAM), and/or Ferroelectric RAM (FRAM), etc. It may include, but is not necessarily limited to this. Depending on the embodiment, the memory 1040 may include a Hard Disk Drive (HDD), Solid State Driver (SSD), Compact Flash (CF) card, Secure Digital (SD) card, Micro-SD, Mini-SD, or Xd Picture Card ( It may include at least one of an extreme Digital Picture Card or a Memory Stick.

센서 모듈(1050)은 전자 시스템(1000)이 탑재되는 전자 기기 주변의 정보를 수집할 수 있다. 센서 모듈(1050)은 전자 시스템(1000)의 외부로부터 신호(예컨대 영상 신호, 음성 신호, 자기 신호, 생체 신호, 터치 신호 등)를 센싱 또는 수신하고, 센싱 또는 수신된 신호를 데이터로 변환할 수 있다. 센서 모듈(1050)은 센싱 장치, 예컨대 마이크, 촬상 장치, 이미지 센서, 라이더(LIDAR; light detection and ranging) 센서, 초음파 센서, 적외선 센서, 바이오 센서, 및 터치 센서 등 다양한 종류의 센싱 장치 중 적어도 하나를 포함할 수 있다. The sensor module 1050 may collect information around electronic devices on which the electronic system 1000 is mounted. The sensor module 1050 can sense or receive signals (e.g., video signals, audio signals, magnetic signals, biological signals, touch signals, etc.) from outside the electronic system 1000, and convert the sensed or received signals into data. there is. The sensor module 1050 is at least one of various types of sensing devices, such as a microphone, an imaging device, an image sensor, a LIDAR (light detection and ranging) sensor, an ultrasonic sensor, an infrared sensor, a biosensor, and a touch sensor. may include.

센서 모듈(1050)은 변환된 데이터를 뉴럴 네트워크 장치(1030)에 입력 데이터로서 제공할 수 있다. 예를 들어, 센서 모듈(1050)은 이미지 센서를 포함할 수 있으며, 전자 시스템(1000)의 외부 환경을 촬영하여 비디오 스트림을 생성하고, 비디오 스트림의 연속하는 데이터 프레임을 뉴럴 네트워크 장치(1030)에 입력 데이터로서 순서대로 제공할 수 있다. 그러나 이에 제한되는 것은 아니며 센서 모듈(1050)은 다양한 종류의 데이터를 뉴럴 네트워크 장치(1030)에 제공할 수 있다. The sensor module 1050 may provide converted data to the neural network device 1030 as input data. For example, the sensor module 1050 may include an image sensor, capture the external environment of the electronic system 1000, generate a video stream, and transmit successive data frames of the video stream to the neural network device 1030. Input data can be provided in order. However, it is not limited to this, and the sensor module 1050 can provide various types of data to the neural network device 1030.

송수신 모듈(1060)은 외부 디바이스와 통신할 수 있는 다양한 유선 또는 무선 인터페이스를 구비할 수 있다. 예컨대 송수신 모듈(1060)은 유선 근거리통신망(Local Area Network; LAN), Wi-fi(Wireless Fidelity)와 같은 무선 근거리 통신망 (Wireless Local Area Network; WLAN), 블루투스(Bluetooth)와 같은 무선 개인 통신망(Wireless Personal Area Network; WPAN), 무선 USB (Wireless Universal Serial Bus), Zigbee, NFC (Near Field Communication), RFID (Radio-frequency identification), PLC(Power Line communication), 또는 3G (3rd Generation), 4G(4th Generation), LTE (Long Term Evolution) 등 이동 통신망(mobile cellular network)에 접속 가능한 통신 인터페이스 등을 포함할 수 있다.The transmitting/receiving module 1060 may be equipped with various wired or wireless interfaces capable of communicating with external devices. For example, the transmitting and receiving module 1060 may be connected to a wired local area network (LAN), a wireless local area network (WLAN) such as Wi-fi (Wireless Fidelity), or a wireless personal communication network (Wireless) such as Bluetooth. Personal Area Network (WPAN), Wireless USB (Wireless Universal Serial Bus), Zigbee, NFC (Near Field Communication), RFID (Radio-frequency identification), PLC (Power Line communication), or 3G (3rd Generation), 4G (4th) Generation), LTE (Long Term Evolution), and other communication interfaces that can be connected to mobile cellular networks.

도 11은 일 실시예에 따른 인-메모리 컴퓨팅(IMC) 회로의 동작 방법을 나타낸 흐름도이다. 이하 실시예에서 각 동작들은 순차적으로 수행될 수도 있으나, 반드시 순차적으로 수행되는 것은 아니다. 예를 들어, 각 동작들의 순서가 변경될 수도 있으며, 적어도 두 동작들이 병렬적으로 수행될 수도 있다.Figure 11 is a flowchart showing a method of operating an in-memory computing (IMC) circuit according to an embodiment. In the following embodiments, each operation may be performed sequentially, but is not necessarily performed sequentially. For example, the order of each operation may be changed, and at least two operations may be performed in parallel.

도 11을 참조하면, 일 실시예에 따른 인-메모리 컴퓨팅(IMC) 회로는 단계(1110) 내지 단계(1140)을 통해 비트 셀들 각각에 대응하는 연산 결과를 가산기로 전달하여 MAC 연산을 수행할 수 있다. 인-메모리 컴퓨팅(IMC) 회로는 SRAM 비트셀 회로 및 게이트 로직 회로를 포함할 수 있다. SRAM 비트셀 회로는 예를 들어, 복수의 메모리 뱅크들에 대응하는 비트 셀들과 비트 셀들 각각에 대응하는 연산 결과에 해당하는 신호를 출력하는 연산기들을 포함할 수 있다. 이때, 비트 셀들은 메모리 뱅크 별로 SRAM의 워드 라인과 연결될 수 있다. 인-메모리 컴퓨팅(IMC) 회로는 예를 들어, 도 2 내지 도 9를 통해 전술한 인-메모리 컴퓨팅(IMC) 회로에 해당할 수 있으며, 반드시 이에 한정되지는 않는다. Referring to FIG. 11, an in-memory computing (IMC) circuit according to an embodiment may perform a MAC operation by transferring the operation result corresponding to each bit cell to an adder through steps 1110 to 1140. there is. In-memory computing (IMC) circuitry may include SRAM bitcell circuitry and gate logic circuitry. For example, the SRAM bit cell circuit may include bit cells corresponding to a plurality of memory banks and operators that output signals corresponding to operation results corresponding to each of the bit cells. At this time, the bit cells may be connected to the word line of the SRAM for each memory bank. For example, the in-memory computing (IMC) circuit may correspond to the in-memory computing (IMC) circuit described above with reference to FIGS. 2 to 9, but is not necessarily limited thereto.

단계(1110)에서, 인-메모리 컴퓨팅(IMC) 회로는 SRAM 비트셀 회로의 복수의 메모리 뱅크들에 대응하는 비트 셀들 각각에 제1 값을 저장한다. 인-메모리 컴퓨팅(IMC) 회로는 RW(read write) 회로를 이용하여 비트 셀들 각각에 제1 값을 저장할 수 있다. At step 1110, the in-memory computing (IMC) circuit stores a first value in each of the bit cells corresponding to the plurality of memory banks of the SRAM bitcell circuit. An in-memory computing (IMC) circuit can store a first value in each bit cell using a read write (RW) circuit.

단계(1120)에서, 인-메모리 컴퓨팅(IMC) 회로는 SRAM(static random access memory)의 워드 라인을 통해 복수의 메모리 뱅크들 중 MAC 연산을 위한 대상 메모리 뱅크의 입력 신호로 제2 값을 인가한다. 이때, 제2 값은 인-메모리 컴퓨팅(IMC) 회로가 예를 들어, 입력 드라이버(input driver)를 통해 IFM(input feature map) 버퍼(buffer)에 저장된 입력 피처맵(input feature map)으로부터 읽어온 것일 수 있으며, 반드시 이에 한정되지는 않는다. In step 1120, the in-memory computing (IMC) circuit applies a second value as an input signal to a target memory bank for MAC operation among a plurality of memory banks through the word line of static random access memory (SRAM). . At this time, the second value is read by the in-memory computing (IMC) circuit from, for example, an input feature map stored in an input feature map (IFM) buffer through an input driver. It may be, but is not necessarily limited to this.

단계(1130)에서, 인-메모리 컴퓨팅(IMC) 회로는 연산기들에 의해 비트 셀들 각각에 대응하며, 제1 값과 제2 값 간의 곱 연산 결과에 해당하는 신호를 출력한다. 연산기들은 곱 연산 결과에 해당하는 신호를 출력하는 복수의 트랜지스터들을 포함할 수 있다. 인-메모리 컴퓨팅(IMC) 회로는 복수의 메모리 뱅크들 중 해당 메모리 뱅크에 대응하는 비트 셀들 각각에 저장된 제1 값과 워드 라인을 통해 해당 메모리 뱅크의 입력 신호로 인가되는 제2 값 간의 비트 와이즈(bit-wise) 곱 연산 결과에 해당하는 신호를 연산기들을 통해 출력할 수 있다. In step 1130, the in-memory computing (IMC) circuit corresponds to each of the bit cells by operators and outputs a signal corresponding to the result of a multiplication operation between the first value and the second value. The operators may include a plurality of transistors that output signals corresponding to the results of the multiplication operation. The in-memory computing (IMC) circuit uses a bit wise ( A signal corresponding to the bit-wise) product operation result can be output through operators.

단계(1140)에서, 인-메모리 컴퓨팅(IMC) 회로는 가산기가 연산 결과에 대한 합 연산을 수행하도록, 게이트 로직 회로를 통해 대상 메모리 뱅크에 속한 비트 셀들 각각에 대응하는 연산 결과를 가산기로 전달한다. 가산기는 예를 들어, 도 도 2의 가산기(230), 도 3의 가산기(230) 및/또는 도 4의 가산기(230)에 해당할 수 있다. 이후, 가산기는 단계(1140)를 통해 전달받은 연산 결과에 대한 합 연산을 수행하고, 합 연산 결과를 누적 연산기에 저장할 수 있다. 누적 연산기는 예를 들어, 도 2a 내지 도 2d, 또는 도 3의 누적 연산기(240)에 해당할 수 있다. In step 1140, the in-memory computing (IMC) circuit transfers the operation result corresponding to each of the bit cells belonging to the target memory bank to the adder through the gate logic circuit so that the adder performs a sum operation on the operation result. . The adder may correspond to, for example, the adder 230 of FIG. 2, the adder 230 of FIG. 3, and/or the adder 230 of FIG. 4. Afterwards, the adder may perform a sum operation on the operation results received through step 1140 and store the sum operation result in the accumulation operator. The accumulation operator may correspond to, for example, the accumulation operator 240 of FIGS. 2A to 2D or FIG. 3 .

도 1 내지 도 11에 기술된 신경망, 신경망 장치, 전자 시스템, IMC 매크로, IMC 회로, IMC 장치, 메모리, 저장 장치, 및 구성 요소들을 하으뒈어 구성 요소들에 의해 구성되거나, 또는 하드웨어 구성 요소를 나타낼 수 있다. 예를 들어, 실시예들에서 설명된 장치, 방법 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 컨트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The neural network, neural network device, electronic system, IMC macro, IMC circuit, IMC device, memory, storage device, and components described in FIGS. 1 to 11 are collectively composed of components or represent hardware components. You can. For example, the devices, methods, and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, and a field programmable gate (FPGA). It may be implemented using a general-purpose computer or a special-purpose computer, such as an array, programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and software applications running on the operating system. Additionally, a processing device may access, store, manipulate, process, and generate data in response to the execution of software. For ease of understanding, a single processing device may be described as being used; however, those skilled in the art will understand that a processing device includes multiple processing elements and/or multiple types of processing elements. It can be seen that it may include. For example, a processing device may include multiple processors or one processor and one controller. Additionally, other processing configurations, such as parallel processors, are possible.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may include a computer program, code, instructions, or a combination of one or more of these, which may configure a processing unit to operate as desired, or may be processed independently or collectively. You can command the device. Software and/or data may be used on any type of machine, component, physical device, virtual equipment, computer storage medium or device to be interpreted by or to provide instructions or data to a processing device. , or may be permanently or temporarily embodied in a transmitted signal wave. Software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored on a computer-readable recording medium.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있으며 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. A computer-readable medium may include program instructions, data files, data structures, etc., singly or in combination, and the program instructions recorded on the medium may be specially designed and constructed for the embodiment or may be known and available to those skilled in the art of computer software. It may be possible. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -Includes optical media (magneto-optical media) and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, etc. Examples of program instructions include machine language code, such as that produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter, etc.

위에서 설명한 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 또는 복수의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The hardware devices described above may be configured to operate as one or multiple software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 이를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다. 그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.As described above, although the embodiments have been described with limited drawings, those skilled in the art can apply various technical modifications and variations based on this. For example, the described techniques are performed in a different order than the described method, and/or components of the described system, structure, device, circuit, etc. are combined or combined in a different form than the described method, or other components are used. Alternatively, appropriate results may be achieved even if substituted or substituted by an equivalent. Therefore, other implementations, other embodiments, and equivalents of the claims also fall within the scope of the claims described below.

200: IMC 매크로
210: 쓰기 워드 라인 드라이버(WWL(Write Word Line) driver)
220: 인-메모리 컴퓨팅(IMC) 회로
230: 가산기
240: 누적 연산기
250: 입력-드라이버(input driver)(또는 읽기 워드 라인 드라이버(RWL(Read Word Liner) driver)
260: 메모리 컨트롤러
270: 쓰기 비트 라인 드라이버(WBL(Write Bit Line) driver)200: IMC macro
210: Write Word Line (WWL) driver
220: In-memory computing (IMC) circuit
230: adder
240: Accumulative operator
250: input driver (or read word line driver (RWL) driver)
260: memory controller
270: Write Bit Line (WBL) driver

Claims

In an in-memory computing circuit,
a plurality of memory banks; and
A logic gate that receives the logical operation results of each of the memory banks
Including,
Each of the memory banks is
Bit cells that store weights; and
Operator that receives input values
Including,
The calculator is
comprising a two transistor (2T) circuit including a first transistor and a second transistor,
The input value is applied to the first gate terminal of the first transistor and the second gate terminal of the second transistor, and the output value of the first transistor that passes through the first gate terminal is the second gate terminal of the second transistor. 2 An in-memory computing circuit in which the operator, which is connected to the bit cell and receives the input value by being connected to the output value of a transistor, outputs a result of a logical operation between the input value and the weight.

According to paragraph 1,
An in-memory computing circuit, wherein the logical operation result of each of the memory banks is a NAND operation value for the input value and the weight.

According to paragraph 1,
An in-memory computing circuit, wherein the logic gate is a NAND gate.

According to paragraph 1,
The logic gate is
An in-memory computing circuit that outputs a multiplication result between an input value of one selected memory bank among the memory banks and a weight.

According to paragraph 4,
An in-memory computing circuit, wherein each of the unselected memory banks receives an input value of 0.

According to paragraph 1,
Adder connected to the logic gate
An in-memory computing circuit further comprising:

According to paragraph 1,
The calculator is
A plurality of transistors that output a signal corresponding to the result of a bit-wise product operation
In-memory computing circuitry comprising:

delete

According to paragraph 1,
A value based on the weight stored in the bit cell is applied to the drain terminal of the first transistor,
The source terminal of the first transistor is
An in-memory computing circuit connected to the input terminal of the logic gate through the drain terminal of the second transistor.

According to paragraph 1,
The first transistor includes an NMOS transistor,
In-memory computing circuit, wherein the second transistor includes a PMOS transistor.

In an in-memory computing circuit,
a plurality of memory banks; and
A logic gate that receives the logical operation results of each of the memory banks
Including,
Each of the memory banks is
Bit cells that store weights; and
Operator that receives input values
Including,
The calculator is
It includes a three transistor (3T) circuit including a transmission gate and a third transistor,
The input value is applied to an enable terminal of the transmission gate and a third gate terminal of the third transistor,
The output value of the transmission gate and the output value of the third transistor that passed through the third gate terminal are each connected to the input of the logic gate and thus connected to the bit cell, so that the operator receiving the input value calculates the input value and the weight. An in-memory computing circuit that outputs the results of logical operations between nodes.

According to clause 6,
The logic gate is
An in-memory computing circuit that transfers the logical operation result corresponding to the bit cell to the adder, depending on whether the input value is applied to the operator.

According to claim 1,
The in-memory computing circuit,
Mobile device, mobile computing device, mobile phone, smartphone, personal digital assistant, fixed location terminal, tablet computer, computer, wearable device, laptop computer, server, music player, video player, entertainment unit , in-memory computing, integrated into at least one device selected from the group consisting of navigation devices, communication devices, navigation devices, GPS devices, televisions, tuners, automobiles, vehicle components, avionics systems, drones, multicopters, and medical devices. Circuit.

In a neural network device including in-memory computing circuitry,
an array circuit including in-memory computing circuits; and
A controller that inputs second values corresponding to the input signal of the neural network device to each of the in-memory computing circuits according to a clock signal and controls the in-memory computing circuits.
Including,
Each of the in-memory computing circuits
comprising a plurality of memory banks,
Each of the memory banks is
A bit cell that stores weights and an operator that receives input values; and
A logic gate that receives the logical operation results of each of the memory banks
Includes,
The calculator is
comprising a two transistor (2T) circuit including a first transistor and a second transistor,
The input value received by the operator is applied to the first gate terminal of the first transistor and the second gate terminal of the second transistor, and the output value of the first transistor after passing through the first gate terminal is applied to the second gate terminal. A neural network device in which the operator, which is connected to the bit cell and receives the input value by being connected to the output value of the second transistor that has passed through, outputs a result of a logical operation between the input value and the weight.

According to clause 14,
A neural network device wherein the logical operation result of each of the memory banks is a NAND operation value for the input value and the weight.

According to clause 14,
A neural network device, wherein the logic gate is a NAND gate.

According to clause 14,
The controller is
an input feature map (IFM) buffer that stores an input feature map including the input values;
a control circuit that controls whether the input value is applied to the plurality of IMC circuits; and
RW (read write) circuit that reads or writes the weights
A neural network device comprising at least one of the following:

An in-memory computing device, comprising:
memory banks each containing a respective bit cell unit;
a logic gate receiving outputs of operators of each bit cell unit; and
An adder that receives the output of the logic gate to perform at least a portion of the MAC operation.
Including,
Each bit cell unit is
comprising a bit cell and an operator, none of the bit cells sharing the same operator,
The calculator is
comprising a two transistor (2T) circuit including a first transistor and a second transistor,
The input value received by the operator is applied to the first gate terminal of the first transistor and the second gate terminal of the second transistor, and the output value of the first transistor after passing through the first gate terminal is applied to the second gate terminal. An in-memory computing device in which the operator, which is connected to the bit cell and receives the input value by being connected to the output value of the second transistor that has passed through, outputs a result of a logical operation between the input value and the weight.

According to clause 18,
The output of each bit cell unit is connected to the logic gate,
Each of the bit cells stores a respective stored value,
The bit cell units are connected to respective input lines that provide respective input values to the bit cell units,
The in-memory computing device is
An in-memory computing device wherein input values provided to the bit cell units select one of the bit cell units to be the target of an operation to be performed on the stored value by a corresponding operator.

According to clause 19,
The stored values of the bit cell units that are not the target of the operation do not affect the output of the logic gate.