KR20240011596A

KR20240011596A - Memory device for in memory computin and method thereof

Info

Publication number: KR20240011596A
Application number: KR1020220143480A
Authority: KR
Inventors: 이재혁; 윤석주; 창동진; 명성민; 윤대건
Original assignee: 삼성전자주식회사
Priority date: 2022-07-19
Filing date: 2022-11-01
Publication date: 2024-01-26

Abstract

A memory device according to one embodiment can perform a multiplication operation using a multiplier cell having a memory cell and a switching element. The memory cell has a pair of inverters connected in opposite directions, a first transistor connected to one end of the pair of inverters, and a second transistor connected to the other end of the pair of inverters, wherein a weight can be set. The switching element is connected to an output end of the memory cell and can output a signal corresponding to a multiplication result between an input value and the weight by performing switching in response to the input value.

Description

Memory device for in-memory computing and method of operating the same {MEMORY DEVICE FOR IN MEMORY COMPUTIN AND METHOD THEREOF}

아래의 개시는 인메모리 컴퓨팅을 위한 메모리 장치에 관한 것이다.The disclosure below relates to memory devices for in-memory computing.

MAC(multiply and accumulate) 연산(operation)이라고도 알려져 있는 벡터 매트릭스 곱셈 오퍼레이션은 다양한 분야에서 어플리케이션의 성능을 좌우한다. 예를 들어, 다중 레이어를 포함하는 뉴럴 네트워크(neural network)의 머신 러닝(machine learning) 및 인증 동작에 있어서, MAC 연산이 수행될 수 있다. 입력 신호는 입력 벡터를 형성하는 것으로 간주될 수 있으며, 이미지, 바이트 스트림 또는 기타 데이터 세트에 대한 데이터일 수 있다. 입력 신호에 웨이트(weight)가 곱해지고 누적된 MAC 연산의 결과로부터 출력 벡터가 구해지고, 이 출력 벡터는 다음 레이어에 대한 입력 벡터로 제공될 수 있다. 이와 같은 MAC 연산은 다수의 레이어에 대해 반복되기 때문에, 뉴럴 네트워크 처리 성능은 주로 MAC 연산의 성능에 의해 결정된다. MAC 연산이 인메모리 컴퓨팅을 통해 구현될 수 있다.Vector-matrix multiplication operations, also known as MAC (multiply and accumulate) operations, determine application performance in a variety of fields. For example, in machine learning and authentication operations of a neural network including multiple layers, a MAC operation may be performed. The input signal can be thought of as forming an input vector, and may be data for an image, byte stream, or other data set. The input signal is multiplied by the weight and an output vector is obtained from the accumulated results of the MAC operation, and this output vector can be provided as an input vector for the next layer. Because this MAC operation is repeated for multiple layers, neural network processing performance is mainly determined by the performance of the MAC operation. MAC operations can be implemented through in-memory computing.

일 실시예에 따른 메모리 장치는, 서로 반대방향으로 연결되는 한 쌍의 인버터들(inverters), 상기 한 쌍의 인버터들의 일단에 연결되는 제1 트랜지스터, 상기 한 쌍의 인버터들의 타단에 연결되는 제2 트랜지스터를 가지고, 가중치가 설정되는 메모리 셀(memory cell), 상기 메모리 셀의 출력단과 연결되고, 입력 값에 응답하여 스위칭을 수행함으로써 상기 입력 값과 상기 가중치 간의 곱 결과에 대응하는 신호를 출력하는 스위칭 소자를 가지는 곱셈기 셀을 포함할 수 있다.A memory device according to an embodiment includes a pair of inverters connected in opposite directions, a first transistor connected to one end of the pair of inverters, and a second transistor connected to the other end of the pair of inverters. A memory cell that has a transistor and a weight is set, is connected to the output terminal of the memory cell, and performs switching in response to an input value, thereby outputting a signal corresponding to the result of the product between the input value and the weight. It may include a multiplier cell having an element.

상기 스위칭 소자는, 공급 전압과 상기 메모리 셀의 출력단 간에 연결되어, 상기 입력 값으로서 1의 논리 값을 수신하는 경우 턴오프되고, 상기 입력 값으로서 0의 논리 값을 수신하는 경우 턴온될 수 있다.The switching element may be connected between a supply voltage and an output terminal of the memory cell, and may be turned off when receiving a logic value of 1 as the input value, and may be turned on when receiving a logic value of 0 as the input value.

상기 스위칭 소자는, 상기 입력 값을 게이트 단자에서 수신 가능한 풀업 트랜지스터로 구성될 수 있다.The switching element may be configured as a pull-up transistor capable of receiving the input value at a gate terminal.

상기 제1 트랜지스터 및 상기 제2 트랜지스터는 각각 NMOS 트랜지스터이고, 상기 풀업 트랜지스터는, PMOS 트랜지스터일 수 있다.The first transistor and the second transistor may each be an NMOS transistor, and the pull-up transistor may be a PMOS transistor.

상기 메모리 장치는, 일련의 곱 연산들 중 일부 곱 연산에서 상기 워드 라인을 통해 공급 전압보다 낮은 전압이 인가되는 것에 응답하여 상기 풀업 트랜지스터의 출력단의 전압을 공급 전압으로 드라이빙(drive)하는 것을 포함하고 공급되는 입력에 따라 매번 곱 연산의 결과를 출력하는 제1 동작, 및 매 곱 연산마다 프리차지 페이즈(pre-charge phase)에서 상기 풀업 트랜지스터의 출력단의 전압을 공급 전압으로 드라이빙하고 평가 페이즈(evaluation phase)에서 곱 연산을 수행하는 제2 동작중 한 동작을 선택하여 수행할 수 있다.The memory device includes driving the voltage at the output terminal of the pull-up transistor to a supply voltage in response to a voltage lower than the supply voltage being applied through the word line in some of a series of multiplication operations; A first operation of outputting the result of the multiplication operation each time according to the supplied input, and driving the voltage of the output terminal of the pull-up transistor to the supply voltage in a pre-charge phase for each multiplication operation and performing an evaluation phase. ), one of the second operations that performs the multiplication operation can be selected and performed.

상기 메모리 장치는, 상기 메모리 장치의 동작 주파수 또는 누설 중 적어도 하나에 기초하여 상기 메모리 장치의 상기 제1 동작 또는 상기 제2 동작 중 한 동작을 선택할 수 있다.The memory device may select one of the first operation and the second operation of the memory device based on at least one of an operating frequency or leakage of the memory device.

상기 메모리 장치는, 상기 하나 이상의 곱셈기 셀의 출력 단에 연결되고, 상기 하나 이상의 곱셈기 셀에서 출력된 신호를 인버스한 값을 가산하는 가산기(adder)를 더 포함할 수 있다.The memory device may further include an adder connected to an output terminal of the one or more multiplier cells and adding a value obtained by inverting a signal output from the one or more multiplier cells.

상기 메모리 장치는, 상기 하나 이상의 곱셈기 셀의 메모리 셀에 억세스하여 메모리 셀의 가중치에 대한 읽기 동작 또는 쓰기 동작 중 적어도 하나를 위한 글로벌 비트 라인 및 스위치를 더 포함할 수 있다.The memory device may further include a global bit line and a switch for accessing a memory cell of the one or more multiplier cells to perform at least one of a read operation or a write operation on the weight of the memory cell.

상기 하나 이상의 곱셈기 셀 중 적어도 하나의 곱셈기 셀은, 같은 풀업 트랜지스터에 연결된 복수의 메모리 셀들을 포함할 수 있다.At least one of the one or more multiplier cells may include a plurality of memory cells connected to the same pull-up transistor.

상기 메모리 장치는, 상기 복수의 메모리 셀들 중 타겟 곱 연산에 사용될 메모리 셀을 선택하는 입력-워드라인 드라이버를 더 포함할 수 있다.The memory device may further include an input-word line driver that selects a memory cell to be used for a target multiplication operation among the plurality of memory cells.

상기 입력-워드라인 드라이버는, 입력 신호로부터 상기 하나 이상의 곱셈기 셀로 제공되는 입력 값 및 곱셈기 셀에 포함된 복수의 메모리 셀들 중 타겟 곱 연산에 사용될 메모리 셀을 지정하는 신호를 디코드하는 디코딩 회로를 포함할 수 있다.The input-wordline driver may include a decoding circuit that decodes an input value provided to the one or more multiplier cells from an input signal and a signal designating a memory cell to be used for a target multiplication operation among a plurality of memory cells included in the multiplier cell. You can.

상기 메모리 장치는, 한 곱셈기 셀에 포함된 상기 복수의 메모리 셀들 중 타겟 연산에 대응하는 가중치를 가지는 메모리 셀에 연결된 워드 라인을 활성화하고, 나머지 메모리 셀에 연결된 워드 라인을 비활성화할 수 있다.The memory device may activate a word line connected to a memory cell having a weight corresponding to a target operation among the plurality of memory cells included in one multiplier cell and deactivate word lines connected to the remaining memory cells.

상기 메모리 장치는, 복수의 연산들 중 제1 연산에 대해서는 상기 복수의 메모리 셀들 중 제1 메모리 셀을 선택하여 상기 같은 풀업 트랜지스터를 통해 곱 결과에 대응하는 신호를 출력하고, 상기 복수의 연산들 중 제2 연산에 대해서는 상기 복수의 메모리 셀들 중 제2 메모리 셀을 선택하여 상기 같은 풀업 트랜지스터를 통해 곱 결과에 대응하는 신호를 출력할 수 있다.The memory device selects a first memory cell among the plurality of memory cells for a first operation among the plurality of operations and outputs a signal corresponding to the product result through the same pull-up transistor, and selects a first memory cell among the plurality of operations. For the second operation, a second memory cell among the plurality of memory cells may be selected and a signal corresponding to the multiplication result may be output through the same pull-up transistor.

상기 메모리 장치는, 상기 하나 이상의 곱셈기 셀을 포함하는 복수의 곱셈기 셀들을 포함하고, 상기 복수의 곱셈기 셀들의 각각에서 다른 곱셈기 셀들과 병렬적으로 곱 연산을 수행하며, 상기 복수의 곱셈기 셀들 중 같은 컬럼 라인에 연결된 곱셈기 셀들의 출력들을 같은 가산기에서 합산할 수 있다.The memory device includes a plurality of multiplier cells including the one or more multiplier cells, each of the plurality of multiplier cells performs a multiplication operation in parallel with other multiplier cells, and the same column of the plurality of multiplier cells The outputs of the multiplier cells connected to the line can be summed in the same adder.

상기 하나 이상의 곱셈기 셀은, 한 쌍의 로컬 비트 라인과 연결되고, 상기 하나 이상의 곱셈기 셀에 포함된 복수의 메모리 셀들 중 제1 메모리 셀은 제1 로컬 비트 라인과 연결되고, 상기 복수의 메모리 셀들 중 제2 메모리 셀은 제2 로컬 비트 라인과 연결될 수 있다.The one or more multiplier cells are connected to a pair of local bit lines, a first memory cell among the plurality of memory cells included in the one or more multiplier cells is connected to a first local bit line, and one of the plurality of memory cells is connected to a first local bit line. The second memory cell may be connected to a second local bit line.

상기 제1 로컬 비트 라인과 연결된 상기 제1 메모리 셀은 가중치에 대응하는 값을 가지고, 상기 제2 로컬 비트 라인과 연결된 상기 제2 메모리 셀은 가중치를 인버스한 값을 가질 수 있다.The first memory cell connected to the first local bit line may have a value corresponding to the weight, and the second memory cell connected to the second local bit line may have a value obtained by inverting the weight.

상기 메모리 장치는, 상기 하나 이상의 곱셈기 셀의 곱 결과를 합산하는 가산기(adder)의 출력을 저장하고, 합산 결과를 누적하는 누적기를 더 포함할 수 있다.The memory device may further include an accumulator that stores the output of an adder that sums the multiplication result of the one or more multiplier cells and accumulates the sum result.

메모리 장치는 상기 누적기로부터 출력되는 최종 곱 연산 결과를 저장하는 출력 레지스터를 더 포함할 수 있다.The memory device may further include an output register that stores the final product operation result output from the accumulator.

상기 메모리 장치는, 단일 비트, 또는 멀티 비트 중 마지막 비트에 대응하는 입력 신호를 수신한 경우, 해당 입력 신호에 대한 누적기 연산 결과를 상기 출력 레지스터에 저장할 수 있다.When receiving an input signal corresponding to the last bit of a single bit or multi-bit, the memory device may store the result of an accumulator operation for the input signal in the output register.

메모리 장치는 상기 하나 이상의 곱셈기 셀, 입력-워드라인 드라이버, 읽기-쓰기 회로, 가산기, 누적기, 및 출력 레지스터를 제어하는 메모리 컨트롤러를 더 포함할 수 있다.The memory device may further include a memory controller that controls the one or more multiplier cells, input-word line drivers, read-write circuits, adders, accumulators, and output registers.

상기 메모리 장치는, 미리 정한 주기가 경과한 경우, 또는 각 곱셈기 셀 내에서 다른 메모리 셀을 사용한 곱 연산을 수행하는 경우 중 적어도 하나에 응답하여, 상기 풀업 트랜지스터의 출력단에 대해 프리차지를 위한 동작을 수행할 수 있다.The memory device performs an operation for precharging the output terminal of the pull-up transistor in response to at least one of when a predetermined period has elapsed or when a multiplication operation using another memory cell is performed within each multiplier cell. It can be done.

일 실시예에 따른 메모리 장치의 동작 방법은, 서로 반대방향으로 연결되는 두 인버터들 및 상기 두 인버터들의 양단에 연결되는 두 트랜지스터들을 가지는 메모리 셀이 워드 라인을 통해 입력 값을 수신하는 단계, 상기 메모리 셀의 출력단과 연결된 풀업 트랜지스터가 상기 입력 값을 게이트 단자에서 수신하는 단계, 및 상기 풀업 트랜지스터의 출력단에서 상기 입력 값과 상기 메모리 셀에 저장된 가중치 간의 곱 결과에 대응하는 신호를 출력하는 단계를 포함할 수 있다.A method of operating a memory device according to an embodiment includes the steps of a memory cell having two inverters connected in opposite directions and two transistors connected to both ends of the two inverters receiving an input value through a word line, the memory A pull-up transistor connected to the output terminal of a cell receives the input value at a gate terminal, and outputting a signal corresponding to a product result between the input value and the weight stored in the memory cell at the output terminal of the pull-up transistor. You can.

일 실시예에 따른 메모리 장치는 게이트를 가지며 출력 라인에 연결된 풀업 트랜지스터, 및 서로 반대 방향으로 연결된 한 쌍의 인버터들(inverters), 및 게이트를 가지며 상기 한 쌍의 인버터들의 일단 및 상기 출력 라인에 연결된 셀 트랜지스터를 포함하는 메모리 셀(memory cell),을 포함하고 상기 풀업 트랜지스터의 게이트 및 상기 셀 트랜지스터의 게이트에 동일한 논리 값을 가지는 입력이 인가되어 상기 메모리 셀에 설정된 바이너리 가중치와 상기 입력의 바이너리 곱셈 결과에 대응하는 논리 값을 상기 출력 라인으로 출력할 수 있다.A memory device according to an embodiment includes a pull-up transistor having a gate and connected to an output line, a pair of inverters connected in opposite directions, and a gate connected to one end of the pair of inverters and the output line. A memory cell including a cell transistor, and an input having the same logic value is applied to the gate of the pull-up transistor and the gate of the cell transistor, resulting in a binary multiplication of the input by a binary weight set in the memory cell. A logic value corresponding to can be output to the output line.

상기 바이너리 곱셈 결과에 대응하는 논리 값은 NAND일 수 있다.The logical value corresponding to the binary multiplication result may be NAND.

상기 풀업 트랜지스터는 PMOS 트랜지스터이고, 상기 셀 트랜지스터는 NMOS 트랜지스터일 수 있다.The pull-up transistor may be a PMOS transistor, and the cell transistor may be an NMOS transistor.

상기 곱셈 결과는 매 클럭 사이클마다 출력될 수 있다.The multiplication result may be output every clock cycle.

상기 곱셈 결과는 2 개의 클럭 사이클마다 출력될 수 있다.The multiplication result may be output every two clock cycles.

상기 셀 트랜지스터는, 제1 셀 트랜지스터이고, 상기 메모리 셀은, 게이트를 가지며 상기 한 쌍의 인버터들의 타단에 연결된 제2 셀 트랜지스터를 더 포함하며, 상기 제2 셀 트랜지스터의 게이트에 상기 동일한 논리 값을 가지는 입력이 인가될 수 있다.The cell transistor is a first cell transistor, and the memory cell further includes a second cell transistor having a gate and connected to the other end of the pair of inverters, and applying the same logic value to the gate of the second cell transistor. Input can be accepted.

상기 출력 라인은 제1 출력 라인이고, 제2 출력 라인을 더 포함할 수 있다.The output line is a first output line and may further include a second output line.

상기 셀 트랜지스터는, 제1 셀 트랜지스터이고,상기 메모리 셀은, 게이트를 가지며 상기 한 쌍의 인버터들의 타단 및 상기 제2 출력 라인에 연결된 제2 셀 트랜지스터를 더 포함할 수 있다.The cell transistor is a first cell transistor, and the memory cell may further include a second cell transistor having a gate and connected to the other end of the pair of inverters and the second output line.

상기 풀업 트랜지스터는, 제1 풀업 트랜지스터이고, 상기 제2 출력 라인에 연결된 제2 풀업 트랜지스터를 더 포함할 수 있다.The pull-up transistor is a first pull-up transistor and may further include a second pull-up transistor connected to the second output line.

상기 제1 출력 라인 및 상기 제2 출력 라인에 연결된 상기 메모리 셀을 복수 개 포함할 수 있다.It may include a plurality of memory cells connected to the first output line and the second output line.

상기 출력 라인에 연결된 상기 메모리 셀을 복수 개 포함할 수 있다.It may include a plurality of memory cells connected to the output line.

도 1은 일 실시예에 따른 뉴럴 네트워크의 곱셈 누적 연산(MAC operation, multiply and accumulate operation)의 인메모리 컴퓨팅 시스템의 구현 예시를 도시한다.
도 2는 일 실시예에 따른 인메모리 컴퓨팅 시스템에서 메모리 장치의 예시적인 구조를 도시한다.
도 3a 내지 도 3f는 일 실시예에 따른 메모리 장치에서 예시적인 곱셈기 셀의 구조를 도시한다.
도 4는 일 실시예에 따른 곱셈기 셀의 동작 예시들을 도시한다.
도 5는 일 실시예에 따른 곱셈기 셀들이 어레이 구조로 배치된 메모리 장치를 도시한다.
도 6a 및 도 6b는 일 실시예에 따른 곱셈기 셀 내에 복수의 메모리 셀들이 풀업 트랜지스터를 공유하는 예시적인 구조를 도시한다.
도 7은 도 6a에 도시된 곱셈기 셀이 어레이 구조로 배치된 메모리 장치를 도시한다.
도 8은 일 실시예에 따른 곱셈기 셀이 한 쌍의 로컬 비트 라인들을 통해 곱 결과를 출력하는 예시를 도시한다.
도 9는 도 8에 도시된 곱셈기 셀이 어레이 구조로 배치된 메모리 장치를 도시한다.
도 10은 일 실시예에 따른 곱셈기 셀의 동작 방법을 도시한 흐름도이다.
도 11은 일 실시예에 따른 메모리 장치의 동작 방법을 도시한 흐름도이다.
도 12는 일 실시예에 따른 곱셈기 셀의 구현 예시를 도시한다.Figure 1 shows an example of implementation of an in-memory computing system for a neural network multiply and accumulate operation (MAC operation, multiply and accumulate operation) according to an embodiment.
Figure 2 shows an example structure of a memory device in an in-memory computing system according to one embodiment.
3A-3F illustrate the structure of an example multiplier cell in a memory device according to one embodiment.
4 shows examples of operation of a multiplier cell according to one embodiment.
FIG. 5 illustrates a memory device in which multiplier cells are arranged in an array structure, according to one embodiment.
6A and 6B illustrate an example structure in which a plurality of memory cells share a pull-up transistor within a multiplier cell according to one embodiment.
FIG. 7 shows a memory device in which the multiplier cells shown in FIG. 6A are arranged in an array structure.
Figure 8 shows an example in which a multiplier cell outputs a multiplication result through a pair of local bit lines, according to one embodiment.
FIG. 9 shows a memory device in which the multiplier cells shown in FIG. 8 are arranged in an array structure.
Figure 10 is a flowchart showing a method of operating a multiplier cell according to an embodiment.
FIG. 11 is a flowchart illustrating a method of operating a memory device according to an embodiment.
Figure 12 shows an example implementation of a multiplier cell according to one embodiment.

실시예들에 대한 특정한 구조적 또는 기능적 설명들은 단지 예시를 위한 목적으로 개시된 것으로서, 다양한 형태로 변경되어 구현될 수 있다. 따라서, 실제 구현되는 형태는 개시된 특정 실시예로만 한정되는 것이 아니며, 본 명세서의 범위는 실시예들로 설명한 기술적 사상에 포함되는 변경, 균등물, 또는 대체물을 포함한다.Specific structural or functional descriptions of the embodiments are disclosed for illustrative purposes only and may be changed and implemented in various forms. Accordingly, the actual implementation form is not limited to the specific disclosed embodiments, and the scope of the present specification includes changes, equivalents, or substitutes included in the technical idea described in the embodiments.

제1 또는 제2 등의 용어를 다양한 구성요소들을 설명하는데 사용될 수 있지만, 이런 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 해석되어야 한다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.Terms such as first or second may be used to describe various components, but these terms should be interpreted only for the purpose of distinguishing one component from another component. For example, a first component may be named a second component, and similarly, the second component may also be named a first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다.When a component is referred to as being “connected” to another component, it should be understood that it may be directly connected or connected to the other component, but that other components may exist in between.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 설명된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Singular expressions include plural expressions unless the context clearly dictates otherwise. In this specification, terms such as “comprise” or “have” are intended to designate the presence of the described features, numbers, steps, operations, components, parts, or combinations thereof, and are intended to indicate the presence of one or more other features or numbers, It should be understood that this does not exclude in advance the possibility of the presence or addition of steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 해당 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by a person of ordinary skill in the art. Terms as defined in commonly used dictionaries should be interpreted as having meanings consistent with the meanings they have in the context of the related technology, and unless clearly defined in this specification, should not be interpreted in an idealized or overly formal sense. No.

본 문서에서, "A 또는 B", "A 및 B 중 적어도 하나", "A 또는 B 중 적어도 하나", "A, B 또는 C", "A, B 및 C 중 적어도 하나", 및 "A, B, 또는 C 중 적어도 하나"와 같은 문구들 각각은 그 문구들 중 해당하는 문구에 함께 나열된 항목들 중 어느 하나, 또는 그들의 모든 가능한 조합을 포함할 수 있다.As used herein, “A or B”, “at least one of A and B”, “at least one of A or B”, “A, B or C”, “at least one of A, B and C”, and “A Each of phrases such as “at least one of , B, or C” may include any one of the items listed together in the corresponding phrase, or any possible combination thereof.

이하, 실시예들을 첨부된 도면들을 참조하여 상세하게 설명한다. 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일한 참조 부호를 부여하고, 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments will be described in detail with reference to the attached drawings. In the description with reference to the accompanying drawings, identical components will be assigned the same reference numerals regardless of the reference numerals, and overlapping descriptions thereof will be omitted.

도 1은 일 실시예에 따른 뉴럴 네트워크의 곱셈 누적 연산(MAC operation, multiply and accumulate operation)의 인메모리 컴퓨팅 시스템의 구현 예시를 도시한다.Figure 1 shows an example of implementation of an in-memory computing system for a neural network multiply and accumulate operation (MAC operation, multiply and accumulate operation) according to an embodiment.

폰-노이만 아키텍처에서는 연산부와 메모리부 사이의 빈번한 데이터 이동으로 인한 성능과 전력 한계가 발생한다. 인메모리 컴퓨팅(In-memory computing; IMC)은 데이터가 저장된 메모리 내부에서 직접 연산을 수행하는 컴퓨터 아키텍쳐로서, 프로세서(120)와 메모리 장치(110) 간의 데이터 이동이 감소되고, 전력 효율이 증가될 수 있다. 일 실시예에 따른 인메모리 컴퓨팅 시스템(100)의 프로세서(120)가 연산되어야 하는 데이터를 메모리 장치(110)에 입력하고, 메모리 장치(110)가 자체적으로 연산을 수행할 수 있다. 프로세서(120)는 메모리 장치(110)로부터 연산의 결과를 읽어올 수 있다. 따라서 연산 과정 동안의 데이터 전송이 최소화될 수 있다.In the von-Neumann architecture, performance and power limitations occur due to frequent data movement between the operation unit and the memory unit. In-memory computing (IMC) is a computer architecture that performs calculations directly inside the memory where data is stored, and data movement between the processor 120 and the memory device 110 can be reduced and power efficiency can be increased. there is. The processor 120 of the in-memory computing system 100 according to an embodiment may input data to be calculated into the memory device 110, and the memory device 110 may independently perform the calculation. The processor 120 may read the result of the operation from the memory device 110. Therefore, data transmission during the calculation process can be minimized.

예를 들어, 인메모리 컴퓨팅 시스템(100)은 다양한 연산 중 인공지능(artificial intelligence; AI) 알고리즘에서 빈번하게 사용되는 곱셈 누적(multiplication and accumulation; MAC) 연산을 수행할 수 있다. 도 1에 도시된 바와 같이, 뉴럴 네트워크에서 레이어 연산(190)은 입력 노드들의 입력 값들의 각각에 가중치를 곱한 결과들을 합산하는 MAC 연산을 포함할 수 있다. MAC 연산은 예시적으로 하기 수학식 1과 같이 표현될 수 있다.For example, the in-memory computing system 100 may perform multiplication and accumulation (MAC) operations, which are frequently used in artificial intelligence (AI) algorithms, among various operations. As shown in FIG. 1, the layer operation 190 in a neural network may include a MAC operation that adds up the results of multiplying each of the input values of the input nodes by a weight. The MAC operation can be exemplarily expressed as Equation 1 below.

전술한 수학식 1에서 O_t는 t번째 노드로의 출력, I_m는 m번째 입력, W_t,m는 t번째 노드에 입력되는 m번째 입력에 대해 적용되는 가중치를 나타낼 수 있다. 여기서, O_t은 노드의 출력 또는 노드 값으로서 입력 I_m와 가중치 W_t,m의 가중합(weighted sum)으로서 산출될 수 있다. 여기서, m은 0 이상 M-1 이하의 정수, t는 0이상 T-1이하의 정수, M, T는 정수일 수 있다. M은 연산의 대상이 되는 현재 레이어의 한 노드에 연결된 이전 레이어의 노드들의 개수일 수 있고, T는 현재 레이어의 노드들의 개수일 수 있다. 일 실시예에 따른 인메모리 컴퓨팅 시스템(100)의 메모리 장치(110)는 전술한 MAC 연산을 수행할 수 있다. 메모리 장치(110)는 저항성 메모리 장치(110), 메모리 어레이, 또는 인메모리 컴퓨팅 장치라고도 나타낼 수 있다.In the above-mentioned equation 1, O _t may represent the output to the t-th node, I _m may represent the m-th input, and W _t,m may represent the weight applied to the m-th input to the t-th node. Here, O _t can be calculated as a weighted sum of the input I _m and the weight W _t,m as the output or node value of the node. Here, m is an integer between 0 and M-1, t is an integer between 0 and T-1, and M and T can be integers. M may be the number of nodes of the previous layer connected to one node of the current layer that is the target of the operation, and T may be the number of nodes of the current layer. The memory device 110 of the in-memory computing system 100 according to one embodiment may perform the above-described MAC operation. Memory device 110 may also be referred to as a resistive memory device 110, a memory array, or an in-memory computing device.

IMC는 아날로그 IMC 및 디지털 IMC로 구분될 수 있다. 아날로그 IMC는 MAC 연산을 전류, 전하, 또는 시간 도메인을 포함하는 아날로그 도메인에서 수행할 수 있다. 예시적으로, 디지털 IMC는 논리 회로를 사용하여 MAC 연산을 수행할 수 있다. 디지털 IMC는 선단 공정으로 쉽게 구현이 가능하고, 우수한 성능을 나타낼 수 있다. 일 실시예에 따른 메모리 장치(110)는 복수의 트랜지스터들(예: 6개의 트랜지스터들)을 포함하는 SRAM(Static Random Access Memory)를 가질 수 있다. 6개의 트랜지스터로 구성된 SRAM을 6T SRAM이라고도 나타낼 수 있다. SRAM은 0 또는 1의 논리 값으로 데이터를 저장하므로, 도메인 변환 과정이 요구되지 않는다. 예시적으로, 메모리 장치(110)는 풀업 트랜지스터와 메모리 셀(예: SRAM)이 결합된 곱셈 셀을 포함할 수 있다. 곱셈 셀이 한 풀업 트랜지스터에 연결되는 복수의 메모리 셀들을 포함함으로써, 메모리 장치(110)의 메모리 어레이는 보다 적은 개수의 트랜지스터로 구현될 수 있다. 따라서, 메모리 장치(110)는 곱셈 셀을 통해 면적 효율 및 전력 효율이 개선된 하드웨어를 가질 수 있다. 다만, 메모리 장치(110)가 MAC 연산을 위해서 사용되는 것으로 한정하는 것은 아니고, 메모리 장치(110)는 메모리의 저장 및 곱셈 연산을 포함하는 알고리즘을 구동하기 위해 사용될 수도 있다. 일 실시예에 따른 메모리 장치(110)가 데이터의 이동 없이 메모리 안에서 직접 연산을 수행하는 컴퓨팅 구조를 아래에서 설명한다.IMC can be divided into analog IMC and digital IMC. Analog IMC can perform MAC operations in the analog domain, including current, charge, or time domains. By way of example, the digital IMC may perform a MAC operation using a logic circuit. Digital IMC can be easily implemented with advanced processes and can exhibit excellent performance. The memory device 110 according to one embodiment may have a Static Random Access Memory (SRAM) including a plurality of transistors (eg, six transistors). SRAM consisting of 6 transistors can also be referred to as 6T SRAM. SRAM stores data as logical values of 0 or 1, so no domain conversion process is required. By way of example, the memory device 110 may include a multiplication cell in which a pull-up transistor and a memory cell (eg, SRAM) are combined. Because the multiplication cell includes a plurality of memory cells connected to one pull-up transistor, the memory array of the memory device 110 can be implemented with fewer transistors. Accordingly, the memory device 110 may have hardware with improved area efficiency and power efficiency through the multiplication cell. However, the memory device 110 is not limited to being used for MAC operations, and the memory device 110 may be used to drive an algorithm including memory storage and multiplication operations. A computing structure in which the memory device 110 according to an embodiment performs operations directly within the memory without moving data will be described below.

도 2는 일 실시예에 따른 인메모리 컴퓨팅 시스템에서 메모리 장치의 예시적인 구조를 도시한다.Figure 2 shows an example structure of a memory device in an in-memory computing system according to one embodiment.

일 실시예에 따른 메모리 장치(200)(예: 도 1의 메모리 장치(110))는 곱셈기 셀(210), 입력-워드라인 드라이버(220), 가산기(230), 출력부(240), 읽기-쓰기 회로(280), 및 메모리 컨트롤러(290)를 포함할 수 있다. 디지털 인메모리 컴퓨팅 시스템 및/또는 회로에서는 모든 데이터들이 논리 값으로 표현되어 연산이 수행되므로, 입력 값, 가중치, 및 출력 값이 모두 바이너리 포맷(binary format)을 가질 수 있다. 도 2에서 설명되는 구성 요소들은 디지털 논리 회로 기반으로 구현될 수 있다.The memory device 200 (e.g., the memory device 110 of FIG. 1) according to an embodiment includes a multiplier cell 210, an input-word line driver 220, an adder 230, an output unit 240, and a read -May include a writing circuit 280 and a memory controller 290. In a digital in-memory computing system and/or circuit, all data is expressed as logical values and operations are performed, so input values, weights, and output values may all have a binary format. The components described in FIG. 2 may be implemented based on digital logic circuits.

입력-워드라인 드라이버(220)는 연산을 수행할 입력 데이터를 곱셈기 셀(210)에 전달할 수 있다. 입력-워드라인 드라이버(220)는 각 곱셈기 셀(210)의 메모리 셀 및 풀업 트랜지스터로 인가되는 풀업 신호 및 워드라인 신호를 생성할 수 있다. 풀업 신호 및 워드라인 신호는 입력 데이터의 입력 값에 기초하여 결정되는 신호로서, 하기 도 6a에서 설명한다. 입력 데이터는 멀티 비트(multi bit) 또는 싱글 비트(single bit)로 된 입력 값을 가지는 디지털 데이터일 수 있다. 입력-워드라인 드라이버(220)는 외부 모듈(예: 도 1의 프로세서(110))로부터 입력 데이터를 수신할 수 있다. 예시적으로 입력 값이 멀티 비트인 경우, 입력-워드라인 드라이버(220)는 멀티 비트 값들을 비트 자리(bit position) 별로 순차적으로 곱셈기 셀(210)에 전달될 수 있다. 참고로, 도 2에 도시된 예시에서, 입력/워드라인 드라이버(220)가 4비트의 입력 값들을 LSB(least significant bit)로부터 MSB(most significant bit)까지 순차적으로 수신할 수 있다. 메모리 장치(200)가 뉴럴 네트워크 연산을 위해 동작하는 경우, 입력-워드라인 드라이버(220)는 워드라인들(WL₀, WL₁ 내지 WL_M-1)에 레이어의 M개 노드들에서 수신된 입력 값들을 인가할 수 있다. 예를 들어, m번째 노드에서의 입력 값이 WL_m에 인가될 수 있고, WL_m에 인가되는 입력 값은 멀티 비트이거나 싱글 비트일 수 있다. 여기서, m은 0이상 M-1 이하의 정수, M은 1이상의 정수일 수 있다. WL_m에 인가되는 입력 값이 멀티 비트인 경우, 전술한 바와 같이 순차적으로 비트 자리 별 비트 값이 곱셈기 셀(210)로 전달될 수 있다. 입력-워드라인 드라이버(220)는 전술한 노드들로부터 수신된 M개의 입력 값들을 개별적으로 M개의 곱셈기 셀들로 전달할 수 있다. 후술하겠으나, M개의 곱셈기 셀들의 각각은 다른 곱셈기 셀들에 대해 병렬적으로 곱 연산을 수행하므로, 각 출력 라인(예: 컬럼 라인) 별로 M개의 곱 연산들이 병렬적으로 수행될 수 있다.The input-word line driver 220 may transmit input data to perform an operation to the multiplier cell 210. The input-word line driver 220 may generate a pull-up signal and a word line signal that are applied to the memory cell and pull-up transistor of each multiplier cell 210. The pull-up signal and word line signal are signals determined based on the input value of input data, and are explained in FIG. 6A below. Input data may be digital data having an input value of multi bits or single bits. The input-word line driver 220 may receive input data from an external module (eg, processor 110 of FIG. 1). For example, when the input value is multi-bit, the input-word line driver 220 may sequentially transmit the multi-bit values to the multiplier cell 210 for each bit position. For reference, in the example shown in FIG. 2, the input/word line driver 220 may sequentially receive 4-bit input values from the least significant bit (LSB) to the most significant bit (MSB). When the memory device 200 operates for neural network operation, the input-word line driver 220 inputs the input received from M nodes of the layer to the word lines (WL ₀ , WL ₁ to WL _M-1 ). Values can be applied. For example, the input value from the mth node may be applied to WL _m , and the input value applied to WL _m may be multi-bit or single bit. Here, m may be an integer greater than or equal to 0 and less than or equal to M-1, and M may be an integer greater than or equal to 1. When the input value applied to WL _m is multi-bit, the bit value for each bit position may be sequentially transmitted to the multiplier cell 210 as described above. The input-word line driver 220 may individually transfer M input values received from the above-described nodes to M multiplier cells. As will be described later, each of the M multiplier cells performs a multiplication operation in parallel on other multiplier cells, so M multiplication operations can be performed in parallel for each output line (eg, column line).

참고로, 가중치가 멀티 비트인 경우, 가중치를 표현하기 위한 비트 개수 만큼의 출력 라인들이 그룹핑될 수 있다. 그룹핑된 출력 라인들을 출력 라인 그룹이라고 나타낼 수 있다. 예를 들어, 가중치가 X 비트인 경우, X개의 출력 라인들이 그룹핑될 수 있고, 그룹핑된 X개의 출력 라인들이 입력 값 및 X비트의 가중치 간의 곱셈 합산 결과를 출력할 수 있다. 여기서, X는 2이상의 정수일 수 있다. 예시적으로 한 그룹으로 묶인 X개의 출력 라인들 중 제1 출력 라인은 가중치의 LSB(least significant bit)에 대응하는 가중치 비트 값과 입력 비트 값 간의 곱셈 결과를 출력할 수 있다. 유사하게, 제x 출력 라인은 LSB로부터 x-1번째 비트 자리의 가중치 비트 값과 입력 비트 값 간의 곱셈 결과를 출력할 수 있다. 여기서, x는 2이상 X이하의 정수일 수 있다. 이 경우, 누적기 회로(241)가 같은 출력 라인 그룹의 출력 라인에 대응하는 비트 자리의 비트 시프팅을 해당 출력 라인에서 출력된 합산 결과에 적용하고, 비트 시프팅이 적용된 값들을 누적함으로써 최종 MAC 연산 결과를 출력할 수 있다.For reference, if the weight is multi-bit, output lines equal to the number of bits for expressing the weight may be grouped. Grouped output lines can be referred to as an output line group. For example, when the weight is X bits, X output lines may be grouped, and the grouped Here, X may be an integer of 2 or more. Exemplarily, the first output line among the Similarly, the x-th output line can output the result of multiplication between the input bit value and the weight bit value of the x-1th bit position from the LSB. Here, x may be an integer between 2 and X. In this case, the accumulator circuit 241 applies bit shifting of the bit position corresponding to the output line of the same output line group to the sum result output from the corresponding output line, and accumulates the bit-shifted values to obtain the final MAC. Calculation results can be output.

또한, 한 곱셈기 셀(210)이 복수의 메모리 셀들을 포함하는 경우, 입력-워드라인 드라이버(220)는 수신된 입력 데이터에 대해 적용할 가중치가 설정된 메모리 셀을 선택할 수 있다. 입력-워드라인 드라이버(220)는 디코딩 유닛(예: 디코딩 회로)를 통해 입력 데이터에 적용할 가중치가 설정된 메모리 셀을 지시하는 값을 추출할 수 있다. 곱셈기 셀(210)이 복수의 메모리 셀들을 포함하는 구조의 동작은 하기 도 6a에서 설명한다.Additionally, when one multiplier cell 210 includes a plurality of memory cells, the input-word line driver 220 may select a memory cell for which a weight to be applied to the received input data is set. The input-word line driver 220 may extract a value indicating a memory cell in which a weight to be applied to input data is set through a decoding unit (eg, a decoding circuit). The operation of a structure in which the multiplier cell 210 includes a plurality of memory cells is explained in FIG. 6A below.

곱셈기 셀(210)은 전달 받은 입력 값과 메모리 셀에 저장된 가중치의 곱 연산을 수행할 수 있다. 일 실시예에 따른 곱셈기 셀(210)은 메모리 셀, 풀업 트랜지스터, 워드 라인(WL), 및 풀업 라인(PU)이 연결된 구조를 통해 곱셈 결과에 대응하는 신호를 출력할 수 있다. 예를 들어, 하기 도 3a 내지 도 3f에서 설명하는 바와 같이, 곱셈기 셀(210)은 입력된 비트 값 및 가중치 값 간의 논리 연산 NAND 결과 값을 출력할 수 있다. 곱셈 결과는 논리 곱 AND의 결과 값이므로, NAND 결과 값을 인버스한 값에 대응할 수 있다. 후술하겠으나, 곱셈기 셀(210)에서 출력된 결과는 인버스되어 합산될 수 있다.The multiplier cell 210 may perform a multiplication operation between the received input value and the weight stored in the memory cell. The multiplier cell 210 according to one embodiment may output a signal corresponding to the multiplication result through a structure in which a memory cell, a pull-up transistor, a word line (WL), and a pull-up line (PU) are connected. For example, as described in FIGS. 3A to 3F below, the multiplier cell 210 may output a NAND result value of a logical operation between an input bit value and a weight value. Since the multiplication result is the result of logical AND, it can correspond to the inverted value of the NAND result. As will be described later, the results output from the multiplier cell 210 may be inverted and added.

가산기(230)(adder)는 곱셈기 셀(210)의 출력 단에 연결될 수 있다. 곱셈기 셀(210)의 출력 단은 출력 라인에 대응할 수 있다. 한 출력 라인에 곱셈기 셀(210)의 출력 단이 연결될 수 있다. 가산기(230)는 곱셈기 셀(210)에서 출력된 신호를 인버스한 값을 가산할 수 있다. 가산기(230)는 같은 출력 라인에 연결된 복수의 곱셈기 셀(210)들의 곱셈 결과를 합산할 수 있다. 가산기(230)는 전가산기(full adder), 반가산기(half adder), 및/또는 플립-플롭(flip-flop)으로 구현될 수 있으며, 가산기 회로(adder tree circuit)로 구현될 수 있다. 또한, 전술한 바와 같이, 곱셈기 셀(210)의 출력 결과가 NAND 결과 값이므로, 가산기(230)는 각 곱셈기 셀(210)의 출력 결과를 인버스하는 인버팅 기능 또는 인버터(inverter)를 포함하여 구현될 수도 있다. 가산기(230)는 각 곱셈기 셀(210)의 출력 결과가 인버스된 값을 합산할 수 있다. 가산기(230)는 여러 곱셈 결과를 합산한 결과를 누적기 회로(241)에 전달할 수 있다. 가산기(230)는 각 출력 라인마다 배치될 수 있다. 출력 라인이 T개인 경우, T개의 가산기들이 개별적으로 배치될 수 있다. T개의 가산기들로부터 T개의 합산된 곱셈 결과 값들이 누적기 회로(241)로 전달될 수 있다.The adder 230 may be connected to the output terminal of the multiplier cell 210. The output stage of the multiplier cell 210 may correspond to an output line. The output terminal of the multiplier cell 210 may be connected to one output line. The adder 230 may add a value obtained by inverting the signal output from the multiplier cell 210. The adder 230 may add the multiplication results of a plurality of multiplier cells 210 connected to the same output line. The adder 230 may be implemented as a full adder, a half adder, and/or a flip-flop, and may be implemented as an adder tree circuit. In addition, as described above, since the output result of the multiplier cell 210 is a NAND result value, the adder 230 is implemented by including an inverting function or inverter that inverts the output result of each multiplier cell 210. It could be. The adder 230 may add the inverted value of the output result of each multiplier cell 210. The adder 230 may transfer the result of adding up multiple multiplication results to the accumulator circuit 241. The adder 230 may be arranged for each output line. When there are T output lines, T adders can be individually arranged. T summed multiplication result values from T adders may be transmitted to the accumulator circuit 241.

출력부(240)는 누적기 회로(241) 및 출력 레지스터(242)를 포함할 수 있다. 누적기 회로(241)는 결과들을 결합하여 최종 MAC 연산 결과를 출력할 수 있다.The output unit 240 may include an accumulator circuit 241 and an output register 242. The accumulator circuit 241 may combine the results and output the final MAC operation result.

누적기 회로(241)(예: 누적기)는 곱셈기 셀(210)의 곱 결과를 합산하는 가산기의 출력을 저장하고, 합산 결과를 누적할 수 있다. 예를 들어, 입력-워드라인 드라이버(220)가 멀티비트로 된 입력 데이터를 수신한 경우, 입력-워드라인 드라이버(220)는 각 곱셈기 셀(210)에게 비트 자리 별 비트 값을 순차적으로 전달할 수 있다. 따라서, 각 곱셈기 셀(210)도 해당하는 비트 자리의 곱셈 결과 값을 출력할 수 있다. 가산기(230)는 해당하는 비트 자리의 곱셈 결과 값들을 합산한 결과를 누적기 회로(241)에 전달할 수 있다. 누적기 회로(241)는 해당하는 비트 자리의 합산 결과를 비트 시프팅할 수 있다. 누적기 회로(241)는 다음 비트 자리의 합산 결과가 비트 시프팅된 합산 결과를 결합함으로써 곱셈 결과들이 비트 자리에 따라 누적된 결과를 획득할 수 있다. 후술하겠으나, 입력-워드라인 드라이버(220)가 싱글 비트로 된 입력 데이터를 수신한 경우 비트 시프팅이 필요 없으므로, 누적기 회로(241)는 가산기(230)의 합산 결과를 출력 레지스터(242)로 바로 전달할 수도 있다.The accumulator circuit 241 (eg, an accumulator) may store the output of an adder that sums the product result of the multiplier cell 210 and accumulates the sum result. For example, when the input-word line driver 220 receives multi-bit input data, the input-word line driver 220 may sequentially transmit the bit value for each bit position to each multiplier cell 210. . Accordingly, each multiplier cell 210 can also output the multiplication result value of the corresponding bit position. The adder 230 may transmit the result of adding up the multiplication results of the corresponding bit positions to the accumulator circuit 241. The accumulator circuit 241 may bit-shift the sum result of the corresponding bit positions. The accumulator circuit 241 can obtain a result in which the multiplication results are accumulated according to the bit position by combining the bit-shifted sum result with the sum result of the next bit position. As will be described later, when the input-word line driver 220 receives input data as a single bit, bit shifting is not necessary, so the accumulator circuit 241 directly transfers the sum result of the adder 230 to the output register 242. You can also pass it on.

출력 레지스터(242)는 누적기로부터 출력되는 최종 곱 연산 결과(예: 곱셈 누적 결과)를 저장할 수 있다. 참고로, 출력 레지스터(242)에 저장된 최종 곱셈 누적 결과(예: MAC 결과)는 프로세서에 의해 판독되어 다른 연산을 위해 사용될 수 있다. 예를 들어, 메모리 장치(200)가 한번에 뉴럴 네트워크의 일부 레이어에 대응하는 MAC 연산만 수행 가능한 경우, 출력 레지스터(242)에 저장된 MAC 결과는 다음 레이어의 연산을 위해 입력-워드라인 드라이버(220)로 전달될 수도 있다. 메모리 장치(200)의 입력-워드라인 드라이버(220)는 다음 레이어에 대응하는 가중치 셋트가 설정된 메모리 셀을 선택하여 곱 연산을 수행할 수 있다.The output register 242 may store the final multiplication operation result (eg, multiplication accumulation result) output from the accumulator. For reference, the final multiplication accumulation result (e.g., MAC result) stored in the output register 242 may be read by the processor and used for other operations. For example, if the memory device 200 can only perform MAC operations corresponding to some layers of the neural network at a time, the MAC result stored in the output register 242 is used by the input-wordline driver 220 for the operation of the next layer. It may also be passed on. The input-word line driver 220 of the memory device 200 may perform a multiplication operation by selecting a memory cell in which a weight set corresponding to the next layer is set.

가중치 셋트는 한 MAC 연산에서 입력에 곱해지는 가중치들의 셋트를 나타낼 수 있다. 예시적으로 가중치 셋트는 뉴럴 네트워크에서 한 레이어와 다른 레이어의 노드들 간의 연결 가중치들의 셋트일 수 있다. 다만, 가중치 셋트가 뉴럴 네트워크의 노드들 간 연결 가중치인 것으로 한정하는 것은 아니고, 다양한 작업(task) 별로 상이한 가중치 셋트가 사용될 수도 있다. 예를 들어, 메모리 장치(200)는 제1 작업을 위한 MAC 연산에서 제1 가중치 셋트가 요구되는 경우, 곱셈기 셀(210)에 포함된 복수의 메모리 셀들 중 제1 가중치 셋트에 속하는 가중치가 설정된 메모리 셀을 선택할 수 있다. 유사하게, 메모리 장치(200)는 제2 작업을 위한 MAC 연산에서 제2 가중치 셋트가 요구되는 경우 제2 가중치 셋트에 속하는 가중치가 설정된 메모리 셀을 선택할 수도 있다.A weight set may represent a set of weights that are multiplied by the input in a MAC operation. For example, the weight set may be a set of connection weights between nodes of one layer and another layer in a neural network. However, the weight set is not limited to being a connection weight between nodes of a neural network, and different weight sets may be used for various tasks. For example, when the first weight set is required in the MAC operation for the first task, the memory device 200 uses a memory set with a weight belonging to the first weight set among the plurality of memory cells included in the multiplier cell 210. You can select cells. Similarly, when the second weight set is required in the MAC operation for the second task, the memory device 200 may select a memory cell with a weight set belonging to the second weight set.

읽기-쓰기 회로(280)는 곱셈기 셀(210)에 포함된 메모리 셀의 데이터를 읽고 쓸 수 있다. 메모리 셀의 데이터는 예를 들어 MAC 연산에서 입력 값에 곱해질 가중치를 포함할 수 있다. 읽기-쓰기 회로(280)는 글로벌 비트 라인(GBL, GBLB)을 통해 곱셈기 셀(210)의 메모리 셀에 억세스할 수 있다. 곱셈기 셀(210)이 복수의 메모리 셀들을 포함하는 경우, 읽기-쓰기 회로(280)는 복수의 워드라인들 중 활성화된 워드라인에 연결된 메모리 셀에 억세스할 수 있다. 읽기-쓰기 회로(280)는 억세스한 메모리 셀에 가중치를 설정하거나, 설정된 가중치를 읽어올 수 있다. 글로벌 비트 라인(GBL, GBLB)을 통한 억세스는 하기 도 5에서 설명한다.The read-write circuit 280 can read and write data in memory cells included in the multiplier cell 210. The data in the memory cell may include weights to be multiplied by the input value, for example in a MAC operation. The read-write circuit 280 can access the memory cells of the multiplier cell 210 through global bit lines (GBL, GBLB). When the multiplier cell 210 includes a plurality of memory cells, the read-write circuit 280 can access a memory cell connected to an activated word line among the plurality of word lines. The read-write circuit 280 can set a weight in the accessed memory cell or read the set weight. Access through global bit lines (GBL, GBLB) is explained in FIG. 5 below.

메모리 컨트롤러(290)는 곱셈기 셀(210), 입력-워드라인 드라이버(220), 읽기-쓰기 회로(280), 가산기(230), 누적기 회로(241), 및 출력 레지스터(242)를 제어할 수 있다.The memory controller 290 controls the multiplier cell 210, input-word line driver 220, read-write circuit 280, adder 230, accumulator circuit 241, and output register 242. You can.

메모리 장치(200)는 뉴럴 네트워크 장치, 인 메모리 컴퓨팅 회로, 곱셈 누적기(MAC, multiplier and accumulator) 회로 및/또는 장치로 구현될 수 있다. 메모리 장치(200)는 인메모리 컴퓨팅을 위한 면적 효율적인 SRAM 곱셈 셀들을 포함할 수 있다. 메모리 장치(200)는 워드 라인을 통해 입력 값을 수신하고, 6T SRAM 메모리 셀에 저장된 가중치와 입력 값 간의 곱셈 결과에 대응하는 신호(예: NAND 결과 신호)를 비트 라인을 통해 출력할 수 있다. 메모리 장치(200)는 보다 적은 수의 트랜지스터로 컨트롤 및 곱셈기 역할을 수행할 수 있다.The memory device 200 may be implemented as a neural network device, an in-memory computing circuit, a multiplier and accumulator (MAC) circuit, and/or a device. Memory device 200 may include area-efficient SRAM multiply cells for in-memory computing. The memory device 200 may receive an input value through a word line and output a signal (e.g., a NAND result signal) corresponding to the result of multiplication between the input value and the weight stored in the 6T SRAM memory cell through a bit line. The memory device 200 can function as a control and multiplier with fewer transistors.

도 3a 내지 도 3f은 일 실시예에 따른 메모리 장치에서 예시적인 곱셈기 셀의 구조를 도시한다.3A-3F illustrate the structure of an example multiplier cell in a memory device according to one embodiment.

일 실시예에 따른 곱셈기 셀(310)은 입력 값과 메모리 셀(memory cell)(311)에 설정된 가중치 간의 곱 연산을 수행할 수 있다. 곱셈기 셀(310)의 각각은 메모리 셀(311) 및 스위칭 소자(319)(예: 풀업 트랜지스터)를 포함할 수 있다. 각 곱셈기 셀(310)은 두 로컬 비트 라인들에 연결되고, 두 로컬 비트 라인들 중 적어도 한 비트 라인에 하나의 스위칭 소자(319)가 배치될 수 있다. 예를 들어, 각 곱셈기 셀(310)은 한 로컬 비트 라인에서는 하나의 스위칭 소자(319)까지만 포함할 수 있다. 도 3a 내지 도 3e에 도시된 예시에서는 제1 로컬 비트 라인(LBLB)에 단일 스위칭 소자(319)가 배치되고, 제2 로컬 비트 라인(LBL)에는 스위칭 소자(319)가 배치되지 않을 수 있다. 후술하는 도 8에서 두 로컬 비트 라인들(LBL, LBLB)에 각각 하나씩 스위칭 소자(319)가 배치된 예시가 도시된다.The multiplier cell 310 according to one embodiment may perform a multiplication operation between an input value and a weight set in a memory cell 311. Each of the multiplier cells 310 may include a memory cell 311 and a switching element 319 (eg, a pull-up transistor). Each multiplier cell 310 is connected to two local bit lines, and one switching element 319 may be disposed on at least one of the two local bit lines. For example, each multiplier cell 310 may include only one switching element 319 in one local bit line. In the example shown in FIGS. 3A to 3E, a single switching device 319 may be disposed on the first local bit line LBLB, and no switching device 319 may be disposed on the second local bit line LBL. In FIG. 8 , which will be described later, an example in which one switching element 319 is disposed on each of the two local bit lines (LBL and LBLB) is shown.

일 실시예에 따르면 메모리 셀(311)는 설정된 가중치를 가질 수 있다. 메모리 셀(311)은 가중치에 기초한 신호를 입력 값에 응답하여 출력 라인에 선택적으로 제공할 수 있다. 예를 들어, 메모리 셀(311)은 워드 라인을 통해 제1 논리 값(예: 0의 논리 값 또는 L의 논리 값)을 수신하는 경우 출력 라인으로부터 분리될 수 있다. 메모리 셀(311)이 워드 라인을 통해 제2 논리 값(예: 1의 논리 값 또는 H의 논리 값)을 수신하는 경우 출력 라인으로 가중치에 기초한 신호(예: 설정된 가중치의 논리 값이 인버스된 값(QB)을 나타내는 신호)를 출력 라인으로 제공할 수 있다.According to one embodiment, the memory cell 311 may have a set weight. The memory cell 311 may selectively provide a signal based on a weight to an output line in response to an input value. For example, the memory cell 311 may be disconnected from the output line when receiving a first logic value (eg, a logic value of 0 or a logic value of L) through the word line. When the memory cell 311 receives a second logical value (e.g., a logical value of 1 or a logical value of H) through a word line, a signal based on a weight is sent to the output line (e.g., a value in which the logical value of the set weight is inverted) (signal representing QB)) can be provided to the output line.

메모리 셀(311)은 두 인버터들(inverters)(INV1, INV2) 및 셀 트랜지스터(예: 제1 트랜지스터(TR1))를 포함할 수 있다. 셀 트랜지스터는, 게이트를 가지며 상기 한 쌍의 인버터들(INV1, INV2)의 일단 및 출력 라인에 연결될 수 있다. 두 인버터들(INV1, INV2)의 양단에 두 트랜지스터들(예: 셀 트랜지스터들)이 연결될 수 있다. 예를 들어, 한 쌍의 인버터들(INV1, INV2)은 서로 반대방향으로 연결될 수 있다. 메모리 장치는 출력 라인에 연결된 메모리 셀을 복수 개 포함할 수 있다.The memory cell 311 may include two inverters (INV1 and INV2) and a cell transistor (eg, first transistor TR1). The cell transistor has a gate and may be connected to one end and an output line of the pair of inverters (INV1 and INV2). Two transistors (eg, cell transistors) may be connected to both ends of the two inverters (INV1 and INV2). For example, a pair of inverters (INV1 and INV2) may be connected in opposite directions. A memory device may include a plurality of memory cells connected to an output line.

제1 트랜지스터(TR1)(예: 제1 셀 트랜지스터)가 한 쌍의 인버터들(INV1, INV2)의 일단에 연결될 수 있다. 제2 트랜지스터(TR2)(예: 제2 셀 트랜지스터)가 한 쌍의 인버터들(INV1, INV2)의 타단에 연결될 수 있다. 전술한 메모리 셀(311)은 두 인버터들(INV1, INV2), 제1 트랜지스터(TR1), 및 제2 트랜지스터(TR2)를 포함하는 6개의 트랜지스터들로 구성될 수 있다. 메모리 셀(311)은 6개의 트랜지스터로 구현된 SRAM일 수 있다. 한 쌍의 인버터들(INV1, INV2)의 일단에는 가중치가 인버스된 값(QB)이 설정될 수 있다. 메모리 셀(311)에서 한 쌍의 인버터들(INV1, INV2)의 타단에 가중치가 설정될 수 있다. 제1 트랜지스터(TR1) 및 제2 트랜지스터(TR2)의 게이트 단자는 워드 라인(WL_m)과 연결될 수 있다. 제1 트랜지스터(TR1)의 일단은 제1 로컬 비트 라인(LBLB)에 연결되고, 제1 트랜지스터(TR1)의 타단은 한 쌍의 인버터들(INV1, INV2)에 연결될 수 있다. 제2 트랜지스터(TR2)의 일단은 제2 로컬 비트 라인(LBL)에 연결되고, 제2 트랜지스터(TR2)의 타단은 한 쌍의 인버터들(INV1, INV2)에 연결될 수 있다. 셀 트랜지스터들(예: 제1 트랜지스터(TR1) 및 제2 트랜지스터(TR2))은 각각 NMOS 트랜지스터일 수 있다. 풀업 트랜지스터의 게이트, 제1 셀 트랜지스터의 게이트, 및 제2 셀 트랜지스터의 게이트에 동일한 논리 값을 가지는 입력이 인가될 수 있다. 제1 셀 트랜지스터는 제1 출력 라인(예: 제1 로컬 비트 라인(LBLB))에 연결되고, 제2 셀 트랜지스터는 제2 출력 라인(예: 제2 로컬 비트 라인(LBL))에 연결될 수 있다.The first transistor TR1 (eg, a first cell transistor) may be connected to one end of a pair of inverters INV1 and INV2. The second transistor TR2 (eg, a second cell transistor) may be connected to the other terminal of the pair of inverters INV1 and INV2. The memory cell 311 described above may be composed of six transistors including two inverters (INV1 and INV2), a first transistor (TR1), and a second transistor (TR2). The memory cell 311 may be an SRAM implemented with six transistors. A value (QB) with an inverted weight may be set at one end of a pair of inverters (INV1 and INV2). A weight may be set at the other end of a pair of inverters (INV1 and INV2) in the memory cell 311. Gate terminals of the first transistor TR1 and the second transistor TR2 may be connected to the word line WL _m . One end of the first transistor TR1 may be connected to the first local bit line LBLB, and the other end of the first transistor TR1 may be connected to a pair of inverters INV1 and INV2. One end of the second transistor TR2 may be connected to the second local bit line LBL, and the other end of the second transistor TR2 may be connected to a pair of inverters INV1 and INV2. Each of the cell transistors (eg, the first transistor TR1 and the second transistor TR2) may be an NMOS transistor. An input having the same logic value may be applied to the gate of the pull-up transistor, the gate of the first cell transistor, and the gate of the second cell transistor. The first cell transistor may be connected to a first output line (e.g., first local bit line (LBLB)), and the second cell transistor may be connected to a second output line (e.g., second local bit line (LBL)). .

스위칭 소자(319)는 메모리 셀(311)의 출력단(N_out)과 연결될 수 있다. 스위칭 소자(319)는 입력 값에 응답하여 스위칭을 수행함으로써 입력 값과 가중치 간의 곱 결과에 대응하는 신호를 출력할 수 있다. 스위칭 소자(319)는, 공급 전압(V_DD)과 메모리 셀(311)의 출력단(N_out) 간에 연결될 수 있다. 스위칭 소자(319)는 입력 값으로서 1의 논리 값을 수신하는 경우 턴오프될 수 있다. 스위칭 소자(319)는 입력 값으로서 0의 논리 값을 수신하는 경우 턴온될 수 있다. 예시적으로 스위칭 소자(319)는 입력 값을 게이트 단자에서 수신 가능한 풀업 트랜지스터로 구성될 수 있다. 본 명세서에서는 스위칭 소자(319)가 풀업 트랜지스터인 예시를 주로 설명한다.The switching element 319 may be connected to the output terminal (N _out ) of the memory cell 311. The switching element 319 may output a signal corresponding to the result of the product of the input value and the weight by performing switching in response to the input value. The switching element 319 may be connected between the supply voltage (V _DD ) and the output terminal (N _out ) of the memory cell 311. The switching element 319 may be turned off when it receives a logic value of 1 as an input value. The switching element 319 may be turned on when it receives a logic value of 0 as an input value. For example, the switching element 319 may be configured as a pull-up transistor capable of receiving an input value at a gate terminal. In this specification, an example in which the switching element 319 is a pull-up transistor will mainly be described.

풀업 트랜지스터는 게이트를 가지며 출력 라인에 연결될 수 있다. 또한, 도 3a 내지 도 3e에서 풀업 트랜지스터(319)의 게이트 단자가 풀업 라인과 연결되는데, 풀업 라인이 워드 라인(WL_m)과 연결될 수 있다. 다만, 이로 한정하는 것은 아니고, 하기 도 6a에서 후술하는 바와 같이, 풀업 라인은 워드 라인(WL_m)과 별도로 입력-워드라인 드라이버에 연결될 수 있고, 입력-워드라인 드라이버가 풀업 라인에 입력 값을 인가할 수도 있다. 풀업 트랜지스터는 입력 값과 가중치 간의 곱 결과에 대응하는 신호를 출력할 수 있다. 풀업 트랜지스터의 일단은 공급 전압(V_DD)과 연결되고, 타단이 메모리 셀(311)의 출력단(N_out)과 연결될 수 있다. 메모리 셀(311)의 출력단(N_out)은 로컬 비트 라인 바에 연결될 수 있고, 도 3a 내지 도 3e에서는 제1 로컬 비트 라인(LBLB)에서 곱셈 결과에 대응하는 신호가 출력될 수 있다. 풀업 트랜지스터는, PMOS 트랜지스터일 수 있다. A pull-up transistor has a gate and can be connected to an output line. Additionally, in FIGS. 3A to 3E, the gate terminal of the pull-up transistor 319 is connected to a pull-up line, and the pull-up line may be connected to the word line (WL _m ). However, it is not limited to this, and as described later in FIG. 6A, the pull-up line may be connected to the input-word line driver separately from the word line (WL _m ), and the input-word line driver may send an input value to the pull-up line. It may also be approved. The pull-up transistor can output a signal corresponding to the result of the product of the input value and the weight. One end of the pull-up transistor may be connected to the supply voltage (V _DD ), and the other end may be connected to the output terminal (N _out ) of the memory cell 311. The output terminal (N _out ) of the memory cell 311 may be connected to the local bit line bar, and in FIGS. 3A to 3E, a signal corresponding to the multiplication result may be output from the first local bit line (LBLB). The pull-up transistor may be a PMOS transistor.

일 실시예에 따른 메모리 장치(예: 곱셈기 셀(310))는, 풀업 트랜지스터의 게이트 및 셀 트랜지스터의 게이트에 동일한 논리 값을 가지는 입력이 인가되어, 메모리 셀(311)에 설정된 바이너리 가중치와 입력의 바이너리 곱셈 결과에 대응하는 논리 값을 출력 라인으로 출력할 수 있다. 바이너리 곱셈 결과에 대응하는 논리 값은 NAND일 수 있다. 예를 들어, 곱셈기 셀(310)은 도 3a에 도시된 진리표와 같이 동작할 수 있다. 풀업 라인(PU)은 워드 라인(WL_m)과 같은 신호(예: 입력 값)를 수신할 수 있다. 메모리 셀(311) 내부의 노드(Q)에는 가중치에 대응하는 신호가 나타날 수 있다. 곱셉기 셀은 워드 라인(WL_m)을 통해 입력 값을 수신하고, 노드(Q)에 저장된 가중치 및 입력 값 간의 곱셈에 대응하는 결과(예: NAND 결과)를 제1 로컬 비트 라인(LBLB)으로 출력할 수 있다. LBL은 로컬 비트 라인, LBLB는 로컬 비트 라인 바(local bit line bar)를 나타낼 수 있다. 진리표에 나타난 바와 같이, 곱셈기 셀(310)의 연산은 NAND 연산일 수 있다. 아래에서 도 3b 내지 도 3e는 각각 명제 별 곱셉기 셀(310)의 회로 상태를 도시한다.In a memory device (e.g., multiplier cell 310) according to an embodiment, an input having the same logic value is applied to the gate of the pull-up transistor and the gate of the cell transistor, and the binary weight set in the memory cell 311 and the input The logical value corresponding to the binary multiplication result can be output to the output line. The logical value corresponding to the binary multiplication result may be NAND. For example, multiplier cell 310 may operate like the truth table shown in Figure 3A. The pull-up line (PU) can receive a signal (e.g., an input value) such as the word line (WL _m ). A signal corresponding to the weight may appear at the node Q inside the memory cell 311. The multiplier cell receives the input value through the word line (WL _m ) and sends the result (e.g., NAND result) corresponding to the multiplication between the input value and the weight stored in the node (Q) to the first local bit line (LBLB). Can be printed. LBL may represent a local bit line, and LBLB may represent a local bit line bar. As shown in the truth table, the operation of the multiplier cell 310 may be a NAND operation. 3B to 3E below show the circuit state of the multiplier cell 310 for each proposition.

도 3b 및 도 3c는 풀업 라인(PU)과 워드 라인(WL_m)을 통해 수신된 입력 값이 0인 케이스들(390b, 390c)의 곱셈기 셀(310)을 도시한다. 풀업 트랜지스터는 제1 로컬 비트 라인(LBLB)에 공급 전압(V_DD)을 제공할 수 있다. 공급 전압(V_DD)은 논리 값 1을 나타낼 수 있다. 접지 전압(예: 0V)은 논리 값 0을 나타낼 수 있다. 워드 라인(WL_m)을 통해 수신된 0의 입력 값으로 인해 제1 트랜지스터(TR1)가 개방(open)될 수 있다. 제1 트랜지스터(TR1)가 개방됨으로써 메모리 셀(311)의 노드(QB)가 제1 로컬 비트 라인(LBLB)으로부터 분리(disconnect)될 수 있다. 따라서, 풀업 라인(PU)과 워드 라인(WL_m)을 통해 수신된 입력 값이 0이면, 곱셈기 셀(310)의 출력은 노드들(Q, QB)에 설정된 가중치와 무관해질 수 있다. 곱셈기 셀(310)은 노드(Q)에 설정된 가중치가 0이거나 1이거나 관계없이 출력단(N_out)에 논리 값 1을 출력할 수 있다.3B and 3C show the multiplier cell 310 in cases 390b and 390c where the input value received through the pull-up line (PU) and the word line (WL _m ) is 0. The pull-up transistor may provide the supply voltage (V _DD ) to the first local bit line (LBLB). The supply voltage (V _DD ) may represent a logic value of 1. A ground voltage (e.g. 0V) can represent a logic value of 0. The first transistor TR1 may be open due to an input value of 0 received through the word line WL _m . When the first transistor TR1 is opened, the node QB of the memory cell 311 may be disconnected from the first local bit line LBLB. Accordingly, if the input value received through the pull-up line (PU) and the word line (WL _m ) is 0, the output of the multiplier cell 310 may be independent of the weights set at the nodes (Q, QB). The multiplier cell 310 can output a logical value of 1 to the output terminal (N _out ) regardless of whether the weight set at the node (Q) is 0 or 1.

도 3d 및 도 3e는 풀업 라인(PU)과 워드 라인(WL_m)을 통해 수신된 입력 값이 1인 케이스들(390d, 390e)의 곱셈기 셀(310)을 도시한다. 풀업 라인(PU)을 통해 수신된 1의 입력 값으로 인해 풀업 트랜지스터가 개방(open)될 수 있다. 풀업 트랜지스터가 개방됨으로써 공급 전압(V_DD)이 제1 로컬 비트 라인(LBLB)으로부터 분리될 수 있다. 따라서, 풀업 라인(PU)을 통해 수신된 입력 값이 1이면, 곱셈기 셀(310)의 출력은 공급 전압(V_DD)과 무관해지고, 노드들(Q, QB)에 설정된 가중치에 의존하게 될 수 있다. 곱셈기 셀(310)은 제1 로컬 비트 라인(LBLB)에서 노드(QB)에 대응하는 값을 출력할 수 있다. 도 3e를 참조하면, 풀업 라인(PU)과 워드 라인(WL_m)의 입력 값이 1이고 노드(Q)의 가중치가 1인 경우, 제1 로컬 비트 라인(LBLB)에서 노드(QB)의 논리 값 0에 대응하는 접지 전압(예: 0V)이 나타날 수 있다. 도 3d를 참조하면, 풀업 라인(PU)과 워드 라인(WL_m)의 입력 값이 1이고 노드(Q)의 가중치가 0인 경우(390), 곱셈기 셀(310)이 제1 로컬 비트 라인(LBLB)의 전압을 최대 V_DD-V_TH까지 드라이빙할 수 있다. V_DD-V_TH가 완전한 논리값 1에 대응하지는 않으나, 실질적으로 1의 논리 값으로서 처리될 수 있다. 참고로, 대부분 동작 시 제1 로컬 비트 라인(LBLB)은 공급 전압(V_DD)으로 미리 프리차지(pre-charge)될 수 있다. 제1 로컬 비트 라인(LBLB)이 공급 전압(V_DD)으로 프리차지된 상태에서 워드 라인(WL_m)이 켜지기 때문에 제1 로컬 비트 라인(LBLB)이 공급 전압(V_DD) 또는 공급 전압(V_DD)에 근접한 전압으로 유지될 수 있다. 따라서, 이후 연결되는 디지털 논리 회로(digital logic circuit)(예: 가산기)는 올바르게 논리값 1로 인식하여 정상 동작할 수 있다.3D and 3E show the multiplier cell 310 in cases 390d and 390e where the input value received through the pull-up line (PU) and the word line (WL _m ) is 1. The pull-up transistor may be open due to an input value of 1 received through the pull-up line (PU). By opening the pull-up transistor, the supply voltage (V _DD ) can be separated from the first local bit line (LBLB). Accordingly, if the input value received through the pull-up line (PU) is 1, the output of the multiplier cell 310 becomes independent of the supply voltage (V _DD ) and may depend on the weights set at the nodes (Q, QB). there is. The multiplier cell 310 may output a value corresponding to the node QB on the first local bit line LBLB. Referring to Figure 3e, when the input value of the pull-up line (PU) and the word line (WL _m ) is 1 and the weight of the node (Q) is 1, the logic of the node (QB) in the first local bit line (LBLB) A ground voltage corresponding to the value 0 (e.g. 0V) may appear. Referring to FIG. 3D, when the input values of the pull-up line (PU) and the word line (WL _m ) are 1 and the weight of the node (Q) is 0 (390), the multiplier cell 310 operates on the first local bit line ( LBLB) voltage can be driven up to V _DD -V _TH . Although V _DD -V _TH does not correspond to the exact logical value 1, it can be substantially treated as a logical value of 1. For reference, during most operations, the first local bit line (LBLB) may be pre-charged with the supply voltage (V _DD ). Since the word line (WL _m ) is turned on while the first local bit line (LBLB) is precharged with the supply voltage (V _DD ), the first local bit line (LBLB) is charged with the supply voltage (V _DD ) or the supply voltage ( It can be maintained at a voltage close to V _DD ). Therefore, a digital logic circuit (e.g., an adder) connected later can correctly recognize the logic value as 1 and operate normally.

다만, 도 3d로 동작하는 곱셈기 셀(310)에서, 도 3f에 도시된 바와 같이, 제1 트랜지스터(TR1) 및 풀업 트랜지스터의 기생 커패시턴스(380f)로 인해, 일종의 부트스트래핑(bootstrapping) 회로가 형성될 수 있다. 곱셈기 셀(310)이 도 3d에 따른 동작을 반복할 경우 전력 누설이 발생할 수 있다. 전력 누설이 발생하는 경우에는 하기 도 4와 같이 동작할 수 있다.However, in the multiplier cell 310 operating in FIG. 3D, as shown in FIG. 3F, due to the parasitic capacitance 380f of the first transistor TR1 and the pull-up transistor, a kind of bootstrapping circuit is formed. You can. If the multiplier cell 310 repeats the operation according to FIG. 3D, power leakage may occur. If power leakage occurs, the operation may be performed as shown in FIG. 4 below.

참고로, 제2 로컬 비트 라인(LBL)과 제1 로컬 비트 라인(LBLB)에서는 서로 인버스 관계의 신호가 나타날 수 있다. 본 명세서에서는 풀업 라인(PU)과 활성화되는 워드 라인(WL_m)에 같은 논리값이 인가되는 예시를 주로 설명한다.For reference, signals in an inverse relationship may appear in the second local bit line (LBL) and the first local bit line (LBLB). In this specification, an example in which the same logic value is applied to the pull-up line (PU) and the activated word line (WL _m ) will mainly be described.

도 4는 일 실시예에 따른 곱셈기 셀의 동작 예시들을 도시한다.4 shows examples of operation of a multiplier cell according to one embodiment.

일 실시예에 따른 메모리 장치(예: 도 2의 메모리 장치(200))는 제1 동작(410) 및 제2 동작(420) 중 한 동작을 선택하여 수행할 수 있다. 동작에 따라 곱셈 결과는 매 클럭 사이클마다 또는 2 개의 클럭 사이클마다 출력될 수 있다. 도 4는 메모리 장치의 한 곱셈기 셀에서 동작 별 타이밍도를 도시한다. 메모리 장치는 제1 동작(410) 및 제2 동작(420)을 선택 및/또는 조합하여 수행할 수 있다. 다만, 이로 한정하는 것은 아니고, 메모리 장치는 제1 동작(410) 및 제2 동작(420) 중 하나만 수행하도록 설계될 수도 있다. 제1 동작(410)에서는 매 클럭 사이클마다 곱셈이 수행되고, 제2 동작(420)에서는 2개의 클럭 사이클마다 곱셉이 수행될 수 있다. 타이밍도에서 M1은 첫번째 곱셈 연산, M2는 두번째 곱셈 연산을 나타내며, 나머지 M3 내지 M8도 3번째 내지 8번째 곱셈 연산을 각각 나타낼 수 있다. 참고로, 도 4에 도시된 예시에서, 초기 상태(initial state)(init.)에서는 항상 워드 라인(WL)을 통해 수신되는 입력 값이 없어서 0이므로, 로컬 비트 라인(LBLB)은 풀업 트랜지스터에 의해 공급 전압(V_DD)으로 드라이빙될 수 있다. 이 상태에서 하기 동작들이 수행될 수 있다.A memory device (eg, the memory device 200 of FIG. 2 ) according to an embodiment may select and perform one of the first operation 410 and the second operation 420 . Depending on the operation, the multiplication result may be output every clock cycle or every two clock cycles. Figure 4 shows a timing diagram for each operation in one multiplier cell of a memory device. The memory device may perform the first operation 410 and the second operation 420 by selecting and/or combining them. However, it is not limited to this, and the memory device may be designed to perform only one of the first operation 410 and the second operation 420. In the first operation 410, multiplication may be performed every clock cycle, and in the second operation 420, multiplication may be performed every two clock cycles. In the timing diagram, M1 represents the first multiplication operation, M2 represents the second multiplication operation, and the remaining M3 to M8 may represent the third to eighth multiplication operations, respectively. For reference, in the example shown in FIG. 4, in the initial state (init.), there is always no input value received through the word line (WL), so it is 0, so the local bit line (LBLB) is pulled by the pull-up transistor. It can be driven with a supply voltage (V _DD ). In this state, the following operations can be performed.

제1 동작(410)은 공급되는 입력에 따라 매번 곱 연산의 결과를 출력하는 동작을 나타낼 수 있다. 제1 동작(410)은 일련의 곱 연산들 중 일부 곱 연산에서 워드 라인(WL)을 통해 공급 전압보다 충분히 낮은 전압(예: 0V)이 인가되는 것에 응답하여 풀업 트랜지스터의 출력단의 전압을 공급 전압으로 드라이빙(drive)하는 동작을 포함할 수 있다. 다시 말해, 출력단의 전압이 공급 전압으로 초기화될 수 있다. 메모리 장치의 곱셈기 셀은 워드 라인(WL)을 통해 매 클럭 사이클마다 연산하고자 하는 입력 신호(예: 입력 값)을 수신할 수 있다. 곱셈기 셀은 노드(Q)에 저장된 가중치 및 입력 값 간의 곱 연산의 결과를 출력할 수 있다. The first operation 410 may represent an operation of outputting the result of a multiplication operation each time according to the supplied input. The first operation 410 changes the voltage at the output terminal of the pull-up transistor to the supply voltage in response to a sufficiently lower voltage (e.g., 0V) than the supply voltage being applied through the word line (WL) in some of the series of multiplication operations. It may include a driving operation. In other words, the voltage at the output stage can be initialized to the supply voltage. The multiplier cell of the memory device can receive an input signal (e.g., an input value) to be operated on every clock cycle through a word line (WL). The multiplier cell can output the result of a multiplication operation between the input value and the weight stored in the node (Q).

예를 들어, M1 상태에서 워드 라인(WL)을 통해 수신된 입력 값이 1이고 노드(Q)의 가중치가 0인 경우, 곱셈기 셀은 로컬 비트 라인(LBLB)에서 공급 전압(V_DD)을 유지할 수 있다. 누설 전류(Leakage current)가 없거나 임계치 이하인 경우, 로컬 비트 라인(LBLB)의 전압이 떨어지지 않고 공급 전압(V_DD)에서 유지되기 때문이다. M2 상태에서 워드 라인(WL)의 입력 값이 0이므로 로컬 비트 라인(LBLB)은 공급 전압(V_DD)을 향해 드라이빙이 수행될 수 있다. M1 상태에서 약간의 누설 전류가 발생했더라도, M2 상태에서의 드라이빙으로 인해 로컬 비트 라인(LBLB)의 전압이 복원될 수 있다. M3 상태에서 다시 입력이 1이되면 M1 상태와 유사하게, 곱셈기 셀은 로컬 비트 라인(LBLB)에서 공급 전압(V_DD)을 유지할 수 있다. 따라서, 누설이 큰 경우가 아니라면, 실질적으로 곱셈기 셀은 출력단을 통해 로컬 비트 라인(LBLB)으로 모든 입력 비트 값 및 가중치 비트 값에 대한 곱셈의 결과를 논리값에 대응하는 전압(예: 0 혹은 V_DD)으로 올바르게 출력할 수 있다.For example, in the M1 state, if the input value received over the word line (WL) is 1 and the weight of the node (Q) is 0, the multiplier cell will maintain the supply voltage (V _DD ) on the local bit line (LBLB). You can. This is because when there is no leakage current or is below the threshold, the voltage of the local bit line (LBLB) does not drop but is maintained at the supply voltage (V _DD ). In the M2 state, since the input value of the word line (WL) is 0, the local bit line (LBLB) can be driven toward the supply voltage (V _DD ). Even if a small leakage current occurs in the M1 state, the voltage of the local bit line (LBLB) can be restored due to driving in the M2 state. When the input becomes 1 again in the M3 state, similar to the M1 state, the multiplier cell can maintain the supply voltage (V _DD ) on the local bit line (LBLB). Therefore, unless the leakage is large, the multiplier cell actually transmits the result of multiplication of all input bit values and weight bit values to the local bit line (LBLB) through the output stage to a voltage corresponding to a logic value (e.g., 0 or V). _DD ) can be output correctly.

참고로, 메모리 장치는, 미리 정한 주기가 경과한 경우, 또는 각 곱셈기 셀 내에서 다른 메모리 셀을 사용한 곱 연산을 수행하는 경우 중 적어도 하나에 응답하여, 풀업 트랜지스터의 출력단에 대해 프리차지를 위한 동작을 수행할 수도 있다. 동작 시간이 경과하는 동안 워드 라인(WL) 및 풀업 라인을 통해 0의 입력 값이 수신되지 않아 로컬 비트 라인(LBLB)에서 공급 전압으로 드라이빙되지 않으면 점진적으로 로컬 비트 라인(LBLB)의 전압이 V_DD-V_TH까지 감소될 수 있다. 메모리 장치는 곱셈기의 출력단의 전압을 공급 전압으로 유지되도록, 주기적으로 초기화 동작(예: 워드 라인(WL)에 0의 전압을 인가하는 동작)을 할 수 있다.For reference, the memory device performs an operation for precharging the output terminal of the pull-up transistor in response to at least one of when a predetermined period has elapsed or when a multiplication operation using another memory cell is performed within each multiplier cell. You can also perform . During the operating time, if no input value of 0 is received through the word line (WL) and pull-up line and the local bit line (LBLB) is not driven to the supply voltage, the voltage on the local bit line (LBLB) gradually decreases to V _DD. -V can be reduced to _TH . The memory device may periodically perform an initialization operation (e.g., applying a voltage of 0 to the word line (WL)) to maintain the voltage at the output terminal of the multiplier at the supply voltage.

제2 동작(420)은 매 곱 연산마다 프리차지 페이즈(pre-charge phase)(P)에서 풀업 트랜지스터의 출력단의 전압을 공급 전압으로 드라이빙하고 평가 페이즈(evaluation phase)(E)에서 곱 연산을 수행하는 동작을 나타낼 수 있다. 예를 들어, 제2 동작(420)에서는 2개의 클럭 사이클들이 프리차지 페이즈(P)와 평가 페이즈(E)로 구분될 수 있다. 평가 페이즈(E)에서의 동작은 제1 동작과 동일할 수 있다. 메모리 장치는 프리차지 페이즈에 해당하는 클럭 사이클에서 항상 워드라인(WL)의 전압을 0으로 강제할 수 있다. 다시 말해, 메모리 장치는 곱셈기 셀의 출력단이 연결된 로컬 비트 라인(LBLB)의 전압을 공급 전압(V_DD)으로 드라이빙할 수 있다. 이후, 메모리 장치는 평가 페이즈(E)에서 입력 값을 워드 라인(WL)에 전달하여 연산을 수행할 수 있다. 회로의 구조 및 레이아웃 상으로 큰 누설 전류가 발생하는 구조, 또는 임계치보다 느린 주파수의 클럭 사이클을 사용하는 회로에서는 제2 동작(420)이 사용될 수 있다.The second operation 420 drives the voltage at the output terminal of the pull-up transistor to the supply voltage in the pre-charge phase (P) for each multiplication operation and performs the multiplication operation in the evaluation phase (E). It can represent an action. For example, in the second operation 420, two clock cycles may be divided into a precharge phase (P) and an evaluation phase (E). The operation in the evaluation phase (E) may be the same as the first operation. The memory device can always force the voltage of the word line (WL) to 0 in the clock cycle corresponding to the precharge phase. In other words, the memory device can drive the voltage of the local bit line (LBLB) to which the output terminal of the multiplier cell is connected to the supply voltage (V _DD ). Afterwards, the memory device may perform an operation by transferring the input value to the word line (WL) in the evaluation phase (E). The second operation 420 may be used in a structure in which a large leakage current occurs due to the structure and layout of the circuit, or in a circuit that uses a clock cycle with a frequency slower than the threshold.

메모리 장치는 상황에 맞추어 유리한 방식으로 선택적으로 동작 옵션을 결정해 사용할 수 있다. 예를 들어, 메모리 장치는, 메모리 장치의 동작 주파수 또는 누설 중 적어도 하나에 기초하여 메모리 장치의 제1 동작(410) 또는 제2 동작(420) 중 한 동작을 선택할 수 있다. 메모리 장치는 동작 주파수가 임계 주파수보다 낮은 경우, 제2 동작(420)을 수행하고, 동작 주파수가 임계 주파수 이상인 경우 제1 동작(410)을 수행할 수 있다. 메모리 장치는 누설이 임계치보다 큰 경우 제2 동작(420)을 수행하고, 누설이 임계치 이하인 경우 제1 동작(410)을 수행할 수 있다. 메모리 장치는 전술한 동작 주파수 또는 누설 전류를 모니터링하는 회로를 더 포함할 수 있고, 메모리 장치의 메모리 컨트롤러, 입력-워드라인 드라이버, 또는 외부 프로세서가 동작 모드를 결정할 수 있다.Memory devices can be used by selectively determining operation options in an advantageous manner according to the situation. For example, the memory device may select one of the first operation 410 or the second operation 420 based on at least one of the operating frequency or leakage of the memory device. The memory device may perform the second operation 420 when the operating frequency is lower than the threshold frequency, and may perform the first operation 410 when the operating frequency is higher than the threshold frequency. The memory device may perform the second operation 420 when the leakage is greater than the threshold value, and may perform the first operation 410 when the leakage is less than the threshold value. The memory device may further include circuitry for monitoring the above-described operating frequency or leakage current, and the memory controller, input-wordline driver, or external processor of the memory device may determine the operation mode.

도 5는 일 실시예에 따른 곱셈기 셀들이 어레이 구조로 배치된 메모리 장치를 도시한다.FIG. 5 illustrates a memory device in which multiplier cells are arranged in an array structure, according to one embodiment.

일 실시예에 따른 메모리 장치(예: 도 2의 메모리 장치(200))는 도 3a에서 전술한 곱셈기 셀(510)이 여러 개 배치된 메모리 어레이를 포함할 수 있다. 복수의 워드 라인들(WL₀ 내지 WL_M-1) 및 출력 라인들을 따라 복수의 곱셈기 셀(510)들이 배열될 수 있다. 입력-워드 라인 드라이버(520)는 복수의 워드 라인들(WL₀ 내지 WL_M-1)로 입력 값을 전달할 수 있다. 출력 라인 별로 가산기(530)가 배치될 수 있다. 도 5에 도시된 메모리 어레이를 가지는 메모리 장치를 SRAM IMC 매크로 회로라고도 나타낼 수 있다. 전술한 바와 같이, 입력 값은 워드 라인들(WL₀ 내지 WL_M-1)을 통해 각 곱셈기 셀(510)로 전달될 수 있다. 곱셈기 셀(510)은 메모리 셀에 저장된 가중치와 입력 값 간의 곱셈 결과를 로컬 비트 라인(LBLB)으로 출력할 수 있다. 여러 로컬 비트 라인들은 가산기(530)와 연결될 수 있다. 가산기(530)는 곱셈 결과들을 합산하고, 합산된 결과를 누적기로 전달할 수 있다. 출력부(540)의 누적기는 각 비트 자리 별로 합산된 결과를 비트 시프팅에 기초하여 결합함으로써 최종 MAC 연산 결과를 출력할 수 있다.A memory device (e.g., the memory device 200 of FIG. 2) according to an embodiment may include a memory array in which several multiplier cells 510 described above in FIG. 3A are arranged. A plurality of multiplier cells 510 may be arranged along a plurality of word lines (WL ₀ to WL _M-1 ) and output lines. The input-word line driver 520 can transmit input values to a plurality of word lines (WL ₀ to WL _M-1 ). An adder 530 may be arranged for each output line. A memory device having the memory array shown in FIG. 5 may also be referred to as an SRAM IMC macro circuit. As described above, the input value may be transmitted to each multiplier cell 510 through word lines (WL ₀ to WL _M-1 ). The multiplier cell 510 may output the result of multiplication between the weight stored in the memory cell and the input value to the local bit line (LBLB). Several local bit lines may be connected to adder 530. The adder 530 may add the multiplication results and transmit the summed result to the accumulator. The accumulator of the output unit 540 may output the final MAC operation result by combining the summed results for each bit position based on bit shifting.

또한, 메모리 장치는 곱셈기 셀(510)의 메모리 셀에 억세스하여 메모리 셀의 가중치에 대한 읽기 동작 또는 쓰기 동작 중 적어도 하나를 위한 글로벌 비트 라인(GBL, GBLB) 및 스위치(SW)를 더 포함할 수 있다. 글로벌 비트 라인(GBL, GBLB)은 스위치(SW)를 경유하여 곱셈기 셀(510)의 제1 트랜지스터 및 제2 트랜지스터에 연결될 수 있다. GBLB는 글로벌 비트 라인 바(global bit line bar)를 나타낼 수 있다. 글로벌 비트 라인(GBL, GBLB)은 읽기-쓰기 회로(580)와 연결될 수 있다. 예를 들어, 메모리 장치는 읽기 동작 또는 쓰기 동작의 대상이 되는 메모리 셀의 양단에 위치된 스위치(SW)를 턴온할 수 있다. 메모리 장치는 해당 메모리 셀과 연결되는 워드 라인을 활성화함으로써, 해당 메모리 셀에 억세스할 수 있다. 메모리 장치는 읽기-쓰기 회로(580)를 통해 해당 메모리 셀에 기록된 가중치 값을 읽어오거나, 해당 메모리 셀의 가중치 값을 변경 및/또는 설정할 수 있다.Additionally, the memory device may further include a global bit line (GBL, GBLB) and a switch (SW) for accessing the memory cell of the multiplier cell 510 for at least one of a read operation or a write operation for the weight of the memory cell. there is. The global bit lines (GBL, GBLB) may be connected to the first and second transistors of the multiplier cell 510 via the switch (SW). GBLB may represent a global bit line bar. Global bit lines (GBL, GBLB) may be connected to the read-write circuit 580. For example, a memory device may turn on switches (SW) located at both ends of a memory cell that is the target of a read or write operation. A memory device can access a corresponding memory cell by activating a word line connected to the corresponding memory cell. The memory device may read the weight value written in the corresponding memory cell through the read-write circuit 580, or change and/or set the weight value of the corresponding memory cell.

하기 도 6a에서는 한 곱셈기 셀(510) 내에서 복수의 메모리 셀들이 한 풀업 트랜지스터를 공유하도록 연결되어 면적 효율이 개선되는 구조를 설명한다.FIG. 6A below explains a structure in which area efficiency is improved by connecting a plurality of memory cells within one multiplier cell 510 to share one pull-up transistor.

도 6a 및 도 6b는 일 실시예에 따른 곱셈기 셀 내에 복수의 메모리 셀들이 풀업 트랜지스터를 공유하는 예시적인 구조를 도시한다.6A and 6B illustrate an example structure in which a plurality of memory cells share a pull-up transistor within a multiplier cell according to one embodiment.

일 실시예에 따른 곱셈기 셀(610)은 복수의 메모리 셀(611)들이 같은 곱셈 회로를 공유하는 구조로 구현될 수 있다. 예를 들어, 곱셈기 셀(610) 중 적어도 하나의 곱셈기 셀(610)은, 같은 풀업 트랜지스터(619)에 연결된 복수의 메모리 셀(611)들을 포함할 수 있다. 풀업 트랜지스터(619)는 복수의 메모리 셀(611)들의 출력단들과 같은 노드, 및 같은 로컬 비트 라인에서 연결될 수 있다. 도 6a은 입력-워드라인 드라이버(620)가 m번째 입력을 곱셈기 셀(610) 내부의 메모리 셀(611)들 중 i번째 메모리 셀(611)에 인가하는 예시를 도시한다. i는 0이상 N-1 이하의 정수일 수 있다.The multiplier cell 610 according to one embodiment may be implemented in a structure in which a plurality of memory cells 611 share the same multiplication circuit. For example, at least one of the multiplier cells 610 may include a plurality of memory cells 611 connected to the same pull-up transistor 619. The pull-up transistor 619 may be connected to the same node and the same local bit line as the output terminals of the plurality of memory cells 611. FIG. 6A shows an example in which the input-word line driver 620 applies the mth input to the ith memory cell 611 among the memory cells 611 inside the multiplier cell 610. i may be an integer between 0 and N-1.

입력-워드라인 드라이버(620)는 복수의 메모리 셀(611)들 중 타겟 곱 연산에 사용될 메모리 셀(611)을 선택할 수 있다. 입력-워드라인 드라이버(620)는, 디코딩 회로를 포함할 수 있다. 디코딩 회로는 입력 신호로부터 곱셈기 셀(610)로 제공되는 입력 값 및 곱셈기 셀(610)에 포함된 복수의 메모리 셀(611)들 중 타겟 곱 연산에 사용될 메모리 셀(611)을 지정하는 신호를 디코드할 수 있다. 예를 들어, 도 6a에서 타겟 곱 연산에 사용될 메모리 셀(611)을 지정하는 신호는 i번째 메모리 셀(611)을 지시할 수 있다. 메모리 장치는, 한 곱셈기 셀(610)에 포함된 복수의 메모리 셀(611)들 중 타겟 연산에 대응하는 가중치를 가지는 메모리 셀(611)에 연결된 워드 라인을 활성화하고, 나머지 메모리 셀(611)에 연결된 워드 라인을 비활성화할 수 있다. 곱셈기 셀(610) 내부에서는 한번의 곱 연산에 대해 하나의 메모리 셀(611)만 활성화될 수 있다. 풀업 라인(PU_m)에는 항상 입력 신호가 인가되고, 워드 라인들 중 활성화된 워드 라인에만 입력 신호가 인가될 수 있다. 입력-워드라인 드라이버(620)는 풀업 라인(PU_m)과 활성화된 워드 라인(WL_m,i)에는 동일한 논리 값을 인가할 수 있다.The input-word line driver 620 may select a memory cell 611 to be used for the target multiplication operation from among the plurality of memory cells 611. The input-word line driver 620 may include a decoding circuit. The decoding circuit decodes the input value provided from the input signal to the multiplier cell 610 and a signal designating a memory cell 611 to be used in the target multiplication operation among the plurality of memory cells 611 included in the multiplier cell 610. can do. For example, in FIG. 6A, the signal designating the memory cell 611 to be used in the target multiplication operation may indicate the i-th memory cell 611. The memory device activates the word line connected to the memory cell 611 having a weight corresponding to the target operation among the plurality of memory cells 611 included in one multiplier cell 610, and Connected word lines can be disabled. Inside the multiplier cell 610, only one memory cell 611 can be activated for one multiplication operation. An input signal is always applied to the pull-up line (PU _m ), and the input signal can be applied only to the activated word line among the word lines. The input-word line driver 620 may apply the same logic value to the pull-up line (PU _m ) and the activated word line (WL _m,i ).

예시적으로 입력-워드라인 드라이버(620)는 m번째 풀업 라인(PU_m) 및 m번째 입력에 대한 곱셈기 셀(610) 내에서 i번째 워드 라인(WL_m,i)로 m번째 입력 값(IN_m)을 인가할 수 있다. 나머지 워드 라인(WL_m,k)은 비활성화될 수 있다. 타이밍도에 도시된 바와 같이, m번째 곱셈기 셀(610)은 i번째 워드 라인을 통해 수신된 입력 값 및 i번째 메모리 셀(611)의 가중치 간의 곱 결과(P_m,i)를 로컬 비트 라인에서 공유된 풀업 트랜지스터(619)를 통해 출력할 수 있다. 다시 말해, 곱셈기 셀(610)은 m번째 입력 값(IN_m) 및 i번째 가중치(Q_m,i) 간의 곱 결과(P_m,i)를 출력할 수 있다.Illustratively, the input-word line driver 620 outputs the mth input value (IN) to the ith word line (WL _m,i ) within the multiplier cell 610 for the mth pull-up line (PU _m ) and the mth input. _m ) can be applied. The remaining word lines (WL _m,k ) can be deactivated. As shown in the timing diagram, the m-th multiplier cell 610 multiplies the result (P _m,i ) between the input value received through the i-th word line and the weight of the i-th memory cell 611 in the local bit line. It can be output through a shared pull-up transistor (619). In other words, the multiplier cell 610 may output the product result (P m,i) between the m-th input value (IN _m ) and the ith weight (Q _m,i ₎ .

도 6a에서, i번째 워드 라인(WL_m,i) 및 m번째 풀업 라인(PU_m)으로 입력 값이 인가되고, i번째 가중치(Q_m,i)가 1 또는 0인 예시들이 도시된다. i번째 가중치(Q_m,i)가 1인 예시에서, 입력 값(IN_m)이 1인 사이클(601)에서 도 3e와 같이 곱 결과(P_m,i)는 0을 나타내고, 입력 값(IN_m)이 0인 사이클(602)에서 도 3c와 같이 곱 결과(P_m,i)는 1을 나타낼 수 있다. i번째 가중치(Q_m,i)가 0인 예시에서 도 3b 및 도 3d와 같이 모든 사이클(601, 602)에서 곱 결과(P_m,i)가 1로 나타날 수 있다.In Figure 6a, examples are shown in which an input value is applied to the ith word line (WL _m,i ) and the mth pull-up line (PU _m ), and the ith weight (Q _m,i ) is 1 or 0. In the example where the ith weight (Q _m,i ) is 1, in cycle 601 where the input value (IN _m ) is 1, the product result (P _m,i ) represents 0 as shown in Figure 3e, and the input value (IN In cycle 602 where _m ) is 0, the product result (P _m,i ) may represent 1, as shown in FIG. 3C. In the example where the ith weight (Q _m,i ) is 0, the product result (P _m,i ) may appear as 1 in all cycles 601 and 602, as shown in FIGS. 3B and 3D.

참고로, 도 3a의 진리표는 한 워드 라인에 연결된 메모리 셀과 풀업 트랜지스터에 동일한 입력 값이 인가되는 것을 전제한다. 도 6a에서 비활성화된 나머지 워드 라인(WL_m,k)과 풀업 트랜지스터에 인가되는 신호의 논리 값이 서로 독립적이어서 다를 수 있으므로, 나머지 워드 라인(WL_m,k)에 연결된 메모리 셀들에 대해서는 도 3a의 진리표가 적용되지 않을 수 있다. 예시적으로, 비활성화된 나머지 워드 라인(WL_m,k)에 연결된 메모리 셀에서는 제1 트랜지스터 및 제2 트랜지스터가 턴오프되므로, 해당 메모리 셀의 가중치가 설정된 노드가 출력단으로부터 분리될 수 있다. 비활성화된 나머지 워드 라인(WL_m,k)에 연결된 메모리 셀들에 설정된 가중치는 출력단과 무관해지고 나머지 워드 라인(WL_m,k)에 연결된 메모리 셀들은 출력 형성으로부터 배제될 수 있다. 따라서 도 6a에 도시된 구조에서 곱셈기 셀은 활성화된 i번째 워드 라인(WL_m,i)에 연결된 메모리 셀 및 풀업 트랜지스터(619)에 의한 곱셈 결과에 대응하는 신호만 출력단에서 출력할 수 있다. 한 곱셈기 셀(610) 내에서 같은 풀업 트랜지스터(619)를 공유하는 메모리 셀(611)의 개수가 증가할수록 면적 효율이 개선될 수 있다.For reference, the truth table in FIG. 3A assumes that the same input value is applied to the memory cell and the pull-up transistor connected to one word line. Since the logical values of the signal applied to the remaining word line (WL _m,k ) deactivated in FIG. 6A and the pull-up transistor are independent of each other and may be different, the memory cells connected to the remaining word line (WL _m,k ) are shown in FIG. 3A. Truth tables may not apply. As an example, the first transistor and the second transistor are turned off in the memory cells connected to the remaining deactivated word lines (WL _m,k ), so the node to which the weight of the corresponding memory cell is set may be separated from the output terminal. The weights set for the memory cells connected to the remaining deactivated word lines (WL _m,k ) become unrelated to the output terminal, and the memory cells connected to the remaining word lines (WL _m,k ) may be excluded from forming the output. Therefore, in the structure shown in FIG. 6A, the multiplier cell can only output signals corresponding to the multiplication result by the memory cell and pull-up transistor 619 connected to the activated ith word line (WL _m,i ) from the output terminal. As the number of memory cells 611 sharing the same pull-up transistor 619 within one multiplier cell 610 increases, area efficiency may be improved.

일 실시예에 따르면, 메모리 장치는 복수의 연산들을 순차적으로 수행하면서, 각 연산에 대응하는 메모리 셀을 선택적으로 활성화할 수 있다. 한 출력 라인에 M개의 곱셈기 셀들이 배치되고, 각 곱셈기 셀이 N개의 메모리 셀들을 포함하는 경우, 메모리 셀들의 총 개수는 M×N개일 수 있다. 각 연산마다 M개의 곱셈기 셀들의 각각에서 1개의 메모리 셀이 선택되므로, 메모리 장치는 M×N개의 메모리 셀들 중 M개의 메모리 셀들을 선택할 수 있다. 메모리 장치는, 복수의 연산들 중 제1 연산에 대해서는 복수의 메모리 셀들 중 제1 메모리 셀을 선택하여 같은 풀업 트랜지스터(619)를 통해 곱 결과에 대응하는 신호를 출력할 수 있다. 메모리 장치는 복수의 연산들 중 제2 연산에 대해서는 복수의 메모리 셀들 중 제2 메모리 셀을 선택하여 같은 풀업 트랜지스터(619)를 통해 곱 결과에 대응하는 신호를 출력할 수 있다.According to one embodiment, a memory device may sequentially perform a plurality of operations while selectively activating memory cells corresponding to each operation. If M multiplier cells are arranged in one output line, and each multiplier cell includes N memory cells, the total number of memory cells may be M×N. Since one memory cell is selected from each of the M multiplier cells for each operation, the memory device can select M memory cells among the M×N memory cells. For a first operation among a plurality of operations, the memory device may select a first memory cell among the plurality of memory cells and output a signal corresponding to the multiplication result through the same pull-up transistor 619. For the second operation among the plurality of operations, the memory device may select a second memory cell among the plurality of memory cells and output a signal corresponding to the multiplication result through the same pull-up transistor 619.

예시적으로 도 6b를 참조하면, 메모리 장치는 거대한 뉴럴 네트워크(690b)의 연산을 복수의 연산들로 나누어 실행할 수 있다. 각 연산에 대응하는 뉴럴 네트워크의 가중치들이 복수의 메모리 셀들에 분배되어 설정될 수 있다. 메모리 장치는, 뉴럴 네트워크 연산 중 제1 연산을 수행하는 경우, 각 곱셈기 셀(610)마다 제1 연산을 위한 제1 가중치가 설정된 제1 메모리 셀(611b)을 활성화할 수 있다. 각 곱셈기 셀(610) 내의 나머지 메모리 셀은 비활성화될 수 있다. 제1 연산을 수행한 후 뉴럴 네트워크 연산 중 제2 연산을 수행하는 경우, 메모리 장치는 각 곱셈기 셀(610)마다 제2 연산을 위한 제2 가중치가 설정된 제2 메모리 셀(612b)을 활성화할 수 있다. 제1 메모리 셀(611b)을 포함한 나머지 메모리 셀은 비활성화될 수 있다. 도 6b는 간명한 설명을 위해, 뉴럴 네트워크(690b) 중에서도 임의의 한 노드에 대응하는 제1 연산 및 그 노드에 연결된 후속 노드에 대응하는 제2 연산이, 같은 곱셈기 셀(610) 내의 상이한 메모리 셀들(611b, 612b)을 이용하여 수행되는 예시를 설명한다. 이 예시에서 제1 연산은 해당 노드로 전파되는 복수의 입력 값들(IN) 중에서도 한 입력 값(IN_m)에 제1 가중치(Q_m,i)를 곱하는 연산이고, 제2 연산은 후속 노드로 전파되는 복수의 입력 값들(IN') 중에서도 한 입력 값(IN'_m)에 제2 가중치(Q_m,j)를 곱하는 연산일 수 있다. 다만, 이로 한정하는 것은 아니고, 같은 곱셈기 셀 내의 메모리 셀들은 같은 작업(예: 같은 뉴럴 네트워크 연산) 내의 다른 부분의 연산을 위한 가중치를 가지거나, 서로 다른 작업들(예: 얼굴 인식 및 객체 인식)을 위한 가중치를 가질 수도 있다. 아래 도 7은 연산 별 선택적인 메모리 셀의 사용이 가능한 어레이 구조를 설명한다.Referring to FIG. 6B as an example, the memory device may divide and execute the operation of the large neural network 690b into a plurality of operations. The weights of the neural network corresponding to each operation may be distributed and set to a plurality of memory cells. When performing the first operation among the neural network operations, the memory device may activate the first memory cell 611b in which the first weight for the first operation is set for each multiplier cell 610. The remaining memory cells within each multiplier cell 610 may be deactivated. When performing the second operation among the neural network operations after performing the first operation, the memory device may activate the second memory cell 612b in which the second weight for the second operation is set for each multiplier cell 610. there is. The remaining memory cells, including the first memory cell 611b, may be deactivated. For simplicity of explanation, Figure 6b shows that a first operation corresponding to an arbitrary node in the neural network 690b and a second operation corresponding to a subsequent node connected to that node are performed on different memory cells within the same multiplier cell 610. An example performed using (611b, 612b) will be described. In this example, the first operation is an operation that multiplies one input value (IN _m ) among the plurality of input values (IN) propagated to the corresponding node by the first weight (Q _m,i ), and the second operation is propagated to the subsequent node. It may be an operation that multiplies one input value (IN' _m ) among a plurality of input values (IN') by a second weight (Q _m,j ). However, it is not limited to this, and memory cells within the same multiplier cell have weights for operations in different parts of the same task (e.g., the same neural network operation) or different tasks (e.g., face recognition and object recognition). It may also have weights for . Figure 7 below explains an array structure that allows the use of selective memory cells for each operation.

도 7은 도 6a에 도시된 곱셈기 셀이 어레이 구조로 배치된 메모리 장치를 도시한다.FIG. 7 shows a memory device in which the multiplier cells shown in FIG. 6A are arranged in an array structure.

일 실시예에 따른 메모리 장치는, 곱셈기 셀(710)을 포함하는 복수의 곱셈기 셀들을 포함할 수 있다. 예를 들어, 곱셈기 셀들은 어레이 구조로 배치될 수 있다. 복수의 곱셈기 셀들은 복수의 출력 라인들 및 복수의 워드 라인들을 따라 배치될 수 있다. 도 7에 도시된 바와 같이, 입력-워드라인 드라이버(720)는 곱셈기 셀(710)에 포함된 복수의 메모리 셀들 중 타겟 작업에 대응하는 가중치(Q_m,i)가 설정된 한 메모리 셀(예: i에 대응하는 메모리 셀)을 선택할 수 있다. 입력-워드라인 드라이버(720)는 복수의 곱셈기 셀들에 대해 개별적으로 타겟 작업에 대응하는 가중치(Q_m,i)가 설정된 메모리 셀로 입력 값(IN_m)을 전달할 수 있다. 따라서, 메모리 장치는 복수의 사이클들에서 다양한 작업을 수행하는 경우, 각 사이클에서 요구되는 가중치(Q_m,i)를 각 곱셈기 셀 내의 메모리 셀들에 미리 설정할 수 있다. 메모리 장치는 타겟 작업이 변경되면, 변경된 작업에 대응하는 가중치를 외부에서 로드할 필요 없이, 미리 설정된 메모리 셀들 중 타겟 작업에 대응하는 가중치(Q_m,i)가 설정된 메모리 셀을 선택하여 곱셈 연산을 수행할 수 있다.A memory device according to an embodiment may include a plurality of multiplier cells including a multiplier cell 710. For example, multiplier cells can be arranged in an array structure. A plurality of multiplier cells may be arranged along a plurality of output lines and a plurality of word lines. As shown in FIG _. 7, the input-wordline driver 720 selects one memory cell (e.g., The memory cell corresponding to i) can be selected. The input-word line driver 720 may transfer an input value (IN _m ) to a memory cell in which a weight (Q _m,i ) corresponding to a target task is individually set for a plurality of multiplier cells. Accordingly, when a memory device performs various tasks in a plurality of cycles, the weight (Q _m,i ) required in each cycle can be set in advance to the memory cells in each multiplier cell. When the target task is changed, the memory device performs a multiplication operation by selecting a memory cell with a weight (Q _m,i ) corresponding to the target task among preset memory cells without the need to externally load the weight corresponding to the changed task. It can be done.

같은 워드 라인에 연결된 곱셈기 셀들은 같은 입력 값(IN_m)을 수신할 수 있다. 복수의 곱셈기 셀들의 각각에서 다른 곱셈기 셀들과 병렬적으로 곱 연산을 수행할 수 있다. 메모리 장치는 복수의 곱셈기 셀들 중 같은 컬럼 라인(예: 같은 출력 라인)에 연결된 곱셈기 셀들의 출력들을 같은 가산기(730)에서 합산할 수 있다. 한 곱셉기 셀과 다른 곱셀기 셀은 서로 병렬적으로 곱 결과를 출력할 수 있다. 한 곱셈기 셀(710) 내에서는 하나의 메모리 셀에 기초한 곱 연산이 수행될 수 있다. 다시 말해, 각 곱셈기 셀(710)이 N개의 메모리 셀들을 포함하는 경우, 입력-워드라인 드라이버(720)는 매 사이클마다 N개의 메모리 셀들 중 1개의 메모리 셀을 선택할 수 있다. 출력 라인에 연결된 M개의 곱셈기 셀들에서 M개의 곱 연산이 병렬적으로 수행될 수 있다. 출력 라인이 T개인 경우, 메모리 장치의 메모리 어레이에서 M×T개의 곱 연산이 병렬적으로 수행될 수 있다. 같은 출력 라인에 연결된 M개의 곱 연산에 따른 결과는 합산되므로, 출력부(740)는 T개의 누적된 출력 값을 생성할 수 있다.Multiplier cells connected to the same word line can receive the same input value (IN _m ). Each of the plurality of multiplier cells may perform a multiplication operation in parallel with other multiplier cells. The memory device may sum the outputs of multiplier cells connected to the same column line (eg, the same output line) among a plurality of multiplier cells in the same adder 730. One multiplier cell and another multiplier cell can output multiplication results in parallel. Within one multiplier cell 710, a multiplication operation based on one memory cell may be performed. In other words, if each multiplier cell 710 includes N memory cells, the input-wordline driver 720 may select one memory cell among the N memory cells every cycle. M multiplication operations can be performed in parallel in M multiplier cells connected to the output line. When there are T output lines, M×T multiplication operations can be performed in parallel in the memory array of the memory device. Since the results of M multiplication operations connected to the same output line are summed, the output unit 740 can generate T accumulated output values.

도 7에 도시된 메모리 장치에서는, 곱셈기 셀(710) 내에 포함된 메모리 셀의 개수가 증가할수록, 하나의 곱셈 연산을 위해 요구되는 트랜지스터의 개수가 감소할 수 있다. 예를 들어, 곱셈기 셀(710)이 4개의 메모리 셀들을 포함하는 경우, 하나의 곱셈 연산은 7.25개의 트랜지스터에 의해 구현되는 것으로 해석될 수 있다. 각 메모리 셀이 6개의 트랜지스터를 포함하고, 풀업을 위한 1개의 트랜지스터, 글로벌 비트 라인을 위한 2개의 스위치의 각각이 2개의 트랜지스터를 포함하므로, (6×4+5)=29개의 트랜지스터들을 4개의 메모리 셀들이 나누어 가지기 때문이다. 곱셈기 셀(710)이 8개의 메모리 셀들을 포함하는 경우, 하나의 곱셈 연산은 6.625개의 트랜지스터에 의해 구현되는 것으로 해석될 수 있다. 유사하게, (6×8+5)=53개의 트랜지스터들을 8개의 메모리 셀들이 나누어 가지기 때문이다. 곱셈기 셀(710)이 16개의 메모리 셀들을 포함하는 경우, 하나의 곱셈 연산은 6.3125개의 트랜지스터에 의해 구현되는 것으로 해석될 수 있다. 유사하게, (6×16+5)=101개의 트랜지스터들을 16개의 메모리 셀들이 나누어 가지기 때문이다. 워드 라인을 따라 어레이 되어 배치되는 복수개의 곱셈기 셀들이 하나의 입력-워드라인 드라이버(720)에 의해 구동될 수 있으므로, 면적 오버헤드(overhead)도 감소될 수 있다. 일 실시예에 따른 메모리 장치에서는, 비교 실시예 대비 큰 면적 저감 효과가 나타날 수 있다.In the memory device shown in FIG. 7, as the number of memory cells included in the multiplier cell 710 increases, the number of transistors required for one multiplication operation may decrease. For example, if the multiplier cell 710 includes four memory cells, one multiplication operation can be interpreted as being implemented by 7.25 transistors. Since each memory cell contains 6 transistors, 1 transistor for pull-up, and each of the 2 switches for the global bit line contains 2 transistors, (6 × 4 + 5) = 29 transistors in 4 This is because the memory cells are divided. If the multiplier cell 710 includes eight memory cells, one multiplication operation can be interpreted as being implemented by 6.625 transistors. Similarly, (6×8+5)=53 transistors are divided into 8 memory cells. If the multiplier cell 710 includes 16 memory cells, one multiplication operation can be interpreted as being implemented by 6.3125 transistors. Similarly, (6×16+5)=101 transistors are divided into 16 memory cells. Since a plurality of multiplier cells arranged in an array along a word line can be driven by one input-word line driver 720, area overhead can also be reduced. In the memory device according to one embodiment, a large area reduction effect can be achieved compared to the comparative example.

도 7에 도시된 바와 같이, 레이아웃(layout)(790)에서 하나의 풀업 라인(PU)과 복수의 워드 라인들(WL_0,0 내지 WL_0,N-1)의 패턴이 반복적으로 나타날 수 있다.As shown in FIG. 7, a pattern of one pull-up line (PU) and a plurality of word lines (WL _0,0 to WL _0,N-1 ) may appear repeatedly in the layout 790. .

도 8은 일 실시예에 따른 곱셈기 셀(810)이 한 쌍의 로컬 비트 라인들을 통해 곱 결과를 출력하는 예시를 도시한다.FIG. 8 shows an example in which the multiplier cell 810 outputs a multiplication result through a pair of local bit lines, according to one embodiment.

일 실시예에 따른 곱셈기 셀(810)은, 한 쌍의 로컬 비트 라인과 연결될 수 있다. 곱셈기 셀(810)은 복수의 메모리 셀들 중 제1 메모리 셀(811)에 기초한 곱셈 결과를 제1 로컬 비트 라인(850R)으로 출력하고, 제2 메모리 셀(812)에 기초한 곱셈 결과를 제2 로컬 비트 라인(850L)으로 출력할 수 있다. 도 8에서 제1 메모리 셀(811)은 가중치(Q_m,i)가 설정된 메모리 셀, 제2 메모리 셀(812)은 가중치(Q_m,j)가 설정된 메모리 셀로 예시된다. 참고로, 도 3a에서 제1 로컬 비트 라인(850R)이 출력단으로 도시되었는데, 도 8에서는 제1 로컬 비트 라인(850R) 및 제2 로컬 비트 라인(850L) 둘 다에서 곱셈 결과가 출력될 수 있다. 여기서, 제1 로컬 비트 라인(850R)에서는 곱 연산에 대응하는 결과로서, 도 1 내지 도 7에서 설명한 바와 동일하게, 입력 값(IN_m) 및 가중치(Q_m,i) 간의 NAND 결과가 출력될 수 있다. 반면, 제2 로컬 비트 라인(850L)에서에서는, 곱 연산에 대응하는 결과로서, 가중치(Q_m,j)를 인버스한 값과 입력 값(IN_m) 간의 NAND 결과가 출력될 수 있다. 메모리 장치는 제1 메모리 셀(811)에는 연산될 가중치에 대응하는 값을 설정하고, 제2 메모리 셀(812)에는 연산될 가중치를 인버스한 값을 설정할 수 있다.The multiplier cell 810 according to one embodiment may be connected to a pair of local bit lines. The multiplier cell 810 outputs the multiplication result based on the first memory cell 811 among the plurality of memory cells to the first local bit line 850R, and outputs the multiplication result based on the second memory cell 812 to the second local bit line 850R. Can be output to bit line (850L). In FIG. 8 , the first memory cell 811 is a memory cell with a weight (Q _m,i ) set, and the second memory cell 812 is a memory cell with a weight (Q _m,j ) set. For reference, in FIG. 3A, the first local bit line 850R is shown as an output terminal, but in FIG. 8, the multiplication result can be output from both the first local bit line 850R and the second local bit line 850L. . Here, as a result corresponding to the product operation, the first local bit line 850R outputs a NAND result between the input value IN _m and the weight Q _m,i, as described in FIGS. 1 to 7. You can. On the other hand, in the second local bit line 850L, a NAND result between the inverted value of the weight (Q _m,j ) and the input value (IN _m ) may be output as a result corresponding to the multiplication operation. The memory device may set a value corresponding to the weight to be calculated in the first memory cell 811 and set an inverse value of the weight to be calculated in the second memory cell 812.

메모리 장치는 제1 로컬 비트 라인(850R)(예: 제1 출력 라인)으로의 곱셈 결과 출력을 위한 제1 풀업 트랜지스터(819-R)와 함께 제2 로컬 비트 라인(850L)(예: 제2 출력 라인)으로의 곱셈 결과 출력을 위한 제2 풀업 트랜지스터(819-L)도 포함할 수 있다. 따라서, 제1 로컬 비트 라인(850R)과 연결된 제1 메모리 셀(811)은 가중치에 대응하는 값을 가질 수 있다. 제2 로컬 비트 라인(850L)과 연결된 제2 메모리 셀(812)은 가중치를 인버스한 값을 가질 수 있다. 메모리 장치는 제1 출력 라인 및 제2 출력 라인에 연결된 메모리 셀을 복수 개 포함할 수 있다.The memory device includes a second local bit line 850L (e.g., a second local bit line 850L) together with a first pull-up transistor 819-R for outputting the multiplication result to the first local bit line 850R (e.g., a first output line). It may also include a second pull-up transistor (819-L) for outputting the multiplication result to the output line. Accordingly, the first memory cell 811 connected to the first local bit line 850R may have a value corresponding to the weight. The second memory cell 812 connected to the second local bit line 850L may have an inverted weight value. A memory device may include a plurality of memory cells connected to a first output line and a second output line.

제1 메모리 셀(811)의 제1 로컬 비트 라인(850R)을 통해 출력되는 곱셈 결과는 제2 메모리 셀(812)의 제2 로컬 비트 라인(850L)을 통해 출력되는 곱셈 결과와 가산기에서 합산될 수 있다. 다시 말해, 같은 곱셈기 셀이더라도, 서로 다른 로컬 비트 라인에 연결된 메모리 셀들의 곱 결과들이 가산기에서 합산될 수 있다. 도 8에 도시된 구조는 2개의 컬럼 라인들이 하나의 곱셈기 셀(810)로 병합된 것으로 해석될 수 있다. 도 8에 도시된 곱셈기 셀(810)은 N개의 메모리 셀들을 포함할 수 있다. 제1 로컬 비트 라인(850R)에 연결된 N/2개의 메모리 셀들 중 제1 메모리 셀(811)(예: i번째 메모리 셀), 제2 로컬 비트 라인(850L)에 연결된 N/2개의 메모리 셀들 중 제2 메모리 셀(812)(예: j번째 메모리 셀)에서 각각 곱셈 연산이 수행될 수 있다. 여기서, N은 2의 배수인 정수일 수 있다. 각 메모리 셀에는 제1 워드 라인(RWL) 및 제2 워드 라인(LWL)이 하나씩 연결될 수 있다. 입력-워드라인 드라이버(820)는 제1 워드 라인들(RWL_m,0 내지 RWL_m,N-1) 중 한 워드 라인(RWL_m,i)을 활성화하고, 나머지 워드 라인(RWL_m,k)을 비활성화할 수 있다. 입력-워드라인 드라이버(820)는 제2 워드라인들(LWL_m,0 내지 LWL_m,N-1) 중에서도 한 워드 라인(LWL_m,j)을 활성화하고, 나머지 워드 라인(LWL_m,p)을 비활성화할 수 있다. i, j, k, p는 각각 0 이상의 정수이고, i는 k와 다르고, p는 j와 다를 수 있다.The multiplication result output through the first local bit line 850R of the first memory cell 811 is added to the multiplication result output through the second local bit line 850L of the second memory cell 812 in the adder. You can. In other words, even if it is the same multiplier cell, the product results of memory cells connected to different local bit lines can be summed in the adder. The structure shown in FIG. 8 can be interpreted as two column lines merged into one multiplier cell 810. The multiplier cell 810 shown in FIG. 8 may include N memory cells. A first memory cell 811 (e.g., ith memory cell) among N/2 memory cells connected to the first local bit line 850R, and among N/2 memory cells connected to the second local bit line 850L. A multiplication operation may be performed in each of the second memory cells 812 (e.g., the jth memory cell). Here, N may be an integer that is a multiple of 2. A first word line (RWL) and a second word line (LWL) may be connected to each memory cell. The input-word line driver 820 activates one word line (RWL m, _i ) among the first word lines (RWL _m,0 to RWL _m,N-1 ) and the remaining word line (RWL _m,k ) can be disabled. The input-word line driver 820 activates one word line (LWL m, _j ) among the second word lines (LWL _m,0 to LWL _m,N-1 ) and the remaining word lines (LWL _m,p ) can be disabled. i, j, k, and p are each integers greater than 0, i may be different from k, and p may be different from j.

참고로, 메모리 장치는 곱셈기 셀(810) 내에서 짝수 번째 가중치들 중 한 가중치를 갖는 메모리 셀에 기초한 곱 연산을 제1 로컬 비트 라인(850R)으로, 홀수 번째 가중치들 중 한 가중치를 갖는 메모리 셀에 기초한 곱 연산을 제2 로컬 비트 라인(850L)으로 출력할 수 있다. 다만, 가중치 설정 방식을 전술한 바로 한정하는 것은 아니다. 예시적으로, 본 명세서에서는 대칭적인 구조를 위해 한 곱셈기 셀(810) 내에서 제1 로컬 비트 라인(850R)에 연결된 메모리 셀의 개수와 제2 로컬 비트 라인(850L)에 연결된 메모리 셀의 개수가 동일한 예시를 설명하였으나, 반드시 이로 한정하는 것은 아니다. 설계에 따라, 각 로컬 비트 라인에 연결된 메모리 셀들의 개수가 달라질 수도 있다.For reference, the memory device performs a multiplication operation based on a memory cell with one of the even-numbered weights within the multiplier cell 810 to the first local bit line 850R, and a memory cell with one of the odd-numbered weights. A multiplication operation based on can be output to the second local bit line 850L. However, the weight setting method is not limited to the above-described method. For example, in this specification, for a symmetrical structure, the number of memory cells connected to the first local bit line 850R and the number of memory cells connected to the second local bit line 850L within one multiplier cell 810 are Although the same example has been described, it is not necessarily limited to this. Depending on the design, the number of memory cells connected to each local bit line may vary.

일 실시예에 따른 메모리 장치는 한 곱셈기 셀(810) 내에서 같은 입력 값(IN_m)에 대해 한번에 제1 가중치(Qm,_i) 및 제2 가중치(Q_m,j)에 대한 곱셈을 동시에 수행할 수 있다. 입력-워드라인 드라이버(820)는 한번에 풀업 라인(PU_m), 제2 워드 라인(LWL_m,j), 제1 워드 라인(RWL_m,i)에 입력 값(IN_m)의 논리값을 인가할 수 있다. 입력-워드라인 드라이버(820)는 나머지 워드라인들에는 모두 0의 논리값을 인가할 수 있다. 제1 로컬 비트 라인(850R)과 제2 로컬 비트 라인(850L)에서 각각 제1 곱셈 결과(RP)와 제2 곱셈 결과(LP)가 동시에 출력될 수 있다. 도 8에 도시된 구조는, 대칭적 구조이기 때문에 레이아웃(layout) 측면에서 유리할 수 있다.The memory device according to one embodiment simultaneously performs multiplication of the first weight (Qm, _i ) and the second weight (Q _m,j ) on the same input value (IN _m ) within one multiplier cell 810. can do. The input-word line driver 820 applies the logical value of the input value (IN _m ) to the pull-up line (PU _m ), the second word line (LWL _m,j ), and the first word line (RWL _m,i ) at once. can do. The input-word line driver 820 can apply a logical value of 0 to all remaining word lines. The first multiplication result (RP) and the second multiplication result (LP) may be simultaneously output from the first local bit line 850R and the second local bit line 850L, respectively. The structure shown in FIG. 8 may be advantageous in terms of layout because it is a symmetrical structure.

도 9는 도 8에 도시된 곱셈기 셀이 어레이 구조로 배치된 메모리 장치를 도시한다.FIG. 9 shows a memory device in which the multiplier cells shown in FIG. 8 are arranged in an array structure.

도 7에 도시된 어레이 구조에서, 도 8에 도시된 곱셈기 셀(910)이 배치될 수 있다. 각 곱셈기 셀(910) 당 2개의 워드 라인들(RWL, LWL)이 요구될 수 있다. 또한, 각 곱셈기 셀(910)은 2개의 로컬 비트 라인들(LBL 및 LBLB)을 통해 곱셈 결과를 동시에 두 개 출력할 수 있다. 예시적으로, 도 9의 제1 로컬 비트 라인(950R)은 LBLB, 제2 로컬 비트 라인(950L)은 LBL에 대응할 수 있다. 입력-워드라인 드라이버(920)는 각 곱셈기 셀(910)마다 제1 로컬 비트 라인(950R)에 대응하는 제1 메모리 셀, 제2 로컬 비트 라인(950L)에 대응하는 제2 메모리 셀을 선택하고, 개별적으로 병렬적인 곱 연산을 수행시킬 수 있다. 참고로, 메모리 장치가 임의의 사이클에서 한 메모리 셀을 이용한 곱 결과가 제1 로컬 비트 라인(950R)으로 출력하더라도, 해당 메모리 셀의 곱 결과를 항상 제1 로컬 비트 라인(950R)으로 출력하는 것으로 고정하는 것은 아니다. 메모리 장치는 다른 사이클에서 해당 메모리 셀을 이용한 곱 결과를 제2 로컬 비트 라인(950L)으로 출력하도록 동작할 수도 있다. 이 경우, 메모리 장치는 해당 메모리 셀에 반전된 가중치를 설정할 수 있다.In the array structure shown in FIG. 7, the multiplier cell 910 shown in FIG. 8 may be placed. Two word lines (RWL, LWL) may be required for each multiplier cell 910. Additionally, each multiplier cell 910 can simultaneously output two multiplication results through two local bit lines (LBL and LBLB). By way of example, the first local bit line 950R of FIG. 9 may correspond to LBLB, and the second local bit line 950L may correspond to LBL. The input-word line driver 920 selects a first memory cell corresponding to the first local bit line 950R and a second memory cell corresponding to the second local bit line 950L for each multiplier cell 910, and , parallel multiplication operations can be performed individually. For reference, even if the memory device outputs the product result using one memory cell to the first local bit line 950R in a random cycle, the product result of the corresponding memory cell is always output to the first local bit line 950R. It is not fixed. The memory device may operate to output the product result using the corresponding memory cell to the second local bit line 950L in another cycle. In this case, the memory device may set an inverted weight to the corresponding memory cell.

로컬 비트 라인들의 곱셈 결과들은 개별적으로 가산기(930)로 전달될 수 있다. 예를 들어, 도 9에 도시된 예시에서, 곱셈기 셀(910)의 제1 메모리 셀 및 제2 메모리 셀이 제1 로컬 비트 라인(950R)에 매핑되고, 제3 메모리 셀 및 제4 메모리 셀이 제2 로컬 비트 라인(950L)에 매핑된 것이 가정될 수 있다. 메모리 장치는 제1 메모리 셀에 기초한 곱 결과를 제3 메모리 셀 또는 제4 메모리 셀에 기초한 곱 결과와 가산기(930)에서 합산할 수 있다. 유사하게, 메모리 장치는 제2 메모리 셀에 기초한 곱 결과를 제3 메모리 셀 또는 제4 메모리 셀에 기초한 곱 결과와 가산기(930)에서 합산할 수 있다. 도 8에서 전술한 바와 같이, 같은 곱셈기 셀(910) 내에 배치된 메모리 셀이더라도, 서로 다른 로컬 비트 라인에 대응하는 메모리 셀에 기초한 곱 결과는 병렬적으로 수행될 수 있고, 나아가, 가산기(930)에서 합산될 수 있다. 출력부(940)는 각 출력 라인에 연결된 가산기(930)의 출력들을 누적하여 최종 곱 결과를 출력할 수 있다.The multiplication results of the local bit lines may be individually transmitted to the adder 930. For example, in the example shown in Figure 9, the first and second memory cells of multiplier cell 910 are mapped to the first local bit line 950R, and the third and fourth memory cells are mapped to the first local bit line 950R. It can be assumed that it is mapped to the second local bit line 950L. The memory device may add the product result based on the first memory cell and the product result based on the third or fourth memory cell in the adder 930. Similarly, the memory device may add the product result based on the second memory cell with the product result based on the third or fourth memory cell in the adder 930 . As described above in FIG. 8, even if the memory cells are arranged within the same multiplier cell 910, the multiplication results based on memory cells corresponding to different local bit lines can be performed in parallel, and further, the adder 930 can be summed up. The output unit 940 may accumulate the outputs of the adder 930 connected to each output line and output the final product result.

도 10은 일 실시예에 따른 곱셈기 셀의 동작 방법을 도시한 흐름도이다.Figure 10 is a flowchart showing a method of operating a multiplier cell according to an embodiment.

우선, 단계(1010)에서 메모리 장치는 입력 값을 곱셈기 셀에게 전달할 수 있다. 예를 들어, 메모리 셀이 워드 라인을 통해 입력 값을 수신할 수 있다. 전술한 바와 같이, 메모리 셀은 서로 반대방향으로 연결되는 두 인버터들 및 두 인버터들의 양단에 연결되는 두 트랜지스터들을 가질 수 있다. 메모리 셀의 출력단과 연결된 풀업 트랜지스터가 입력 값을 게이트 단자에서 수신할 수 있다. First, in step 1010, the memory device may transfer the input value to the multiplier cell. For example, a memory cell can receive input values through a word line. As described above, a memory cell may have two inverters connected in opposite directions and two transistors connected to both ends of the two inverters. A pull-up transistor connected to the output terminal of the memory cell can receive the input value at the gate terminal.

그리고 단계(1020)에서 메모리 장치의 곱셈기 셀이 곱 결과에 대응하는 신호를 출력할 수 있다. 예를 들어, 메모리 장치는 풀업 트랜지스터의 출력단에서 입력 값과 메모리 셀에 저장된 가중치 간의 곱 결과에 대응하는 신호를 출력할 수 있다. 도 3a에서 전술한 진리표에 따라, 풀업 트랜지스터 및 메모리 셀의 출력단에서 곱 결과에 대응하는 신호(예: NAND 결과)가 출력될 수 있다.And in step 1020, the multiplier cell of the memory device may output a signal corresponding to the multiplication result. For example, a memory device may output a signal corresponding to a product result between an input value and a weight stored in a memory cell at the output terminal of a pull-up transistor. According to the truth table described above in FIG. 3A, a signal (eg, NAND result) corresponding to the product result may be output from the output terminal of the pull-up transistor and the memory cell.

도 11은 일 실시예에 따른 메모리 장치의 동작 방법을 도시한 흐름도이다.FIG. 11 is a flowchart illustrating a method of operating a memory device according to an embodiment.

우선, 단계(1101)에서 메모리 장치는 메모리 어레이의 데이터를 관리할 수 있다. 예를 들어, 메모리 장치는 읽기-쓰기 회로를 이용하여, 메모리 어레이의 각 메모리 셀에 가중치를 설정할 수 있다. 외부의 프로세서가 메모리 장치에게 쓰고자 하는 데이터 및 가중치가 설정될 메모리 셀의 어드레스를 지시할 수도 있다.First, in step 1101, the memory device can manage data in the memory array. For example, a memory device can use a read-write circuit to set a weight for each memory cell in a memory array. An external processor may instruct the memory device of the data to be written and the address of the memory cell where the weight will be set.

그리고 단계(1102)에서 메모리 장치는 MAC 연산을 개시할 지 여부를 결정할 수 있다. 예를 들어, 메모리 장치는 MAC 연산의 대상이 되는 입력 값을 수신하는 경우, MAC 연산을 개시할 수 있다.And in step 1102, the memory device may determine whether to initiate MAC operation. For example, when the memory device receives an input value that is the target of the MAC operation, it may initiate the MAC operation.

이어서 단계(1010)에서 메모리 장치는 입력 값을 곱셈기 셀에게 전달할 수 있다. 예를 들어, 단계(1111)에서 메모리 장치는 입력 신호 및 가중치 셋트 어드레스를 입력-워드라인 드라이버에게 전달할 수 있다. 외부 프로세서가 메모리 장치에게 전술한 입력 신호 및 가중치 셋트 어드레스(예: 곱셈기 셀 내에 포함된 메모리 셀들 중 i번째 메모리 셀을 지시하는 신호)를 전달할 수도 있다. 단계(1112)에서 입력-워드라인 드라이버가 제어 신호를 생성할 수 있다. 예를 들어, 입력-워드라인 드라이버는 입력 신호 및 가중치 셋트 어드레스를 디코딩하고, 풀업 라인(PU_m) 및 워드라인(WL_m,i)에 입력 값과 동일한 논리 값을 인가할 수 있다. 입력-워드라인 드라이버는 나머지 워드라인들에 대해서는 0의 논리 값을 인가할 수 있다.Next, in step 1010, the memory device may transfer the input value to the multiplier cell. For example, in step 1111, the memory device may transmit the input signal and the weight set address to the input-word line driver. An external processor may transmit the above-described input signal and a weight set address (eg, a signal indicating the ith memory cell among memory cells included in a multiplier cell) to the memory device. At step 1112, the input-wordline driver may generate a control signal. For example, the input-word line driver may decode the input signal and the weight set address, and apply the same logic value as the input value to the pull-up line (PU _m ) and word line (WL _m,i ). The input-word line driver can apply a logical value of 0 to the remaining word lines.

그리고 단계(1120)에서 메모리 장치는 곱셈기 셀 내에서 선택된 메모리 셀의 곱 결과에 대응하는 신호를 출력할 수 있다. 예를 들어, 각 곱셈기 셀은 입력 값(IN_m)과 선택된 메모리 셀의 가중치(Q_m,i) 간의 곱셈 결과에 대응하는 신호(예: NAND 결과 값)을 로컬 비트 라인으로 출력할 수 있다. 같은 출력 라인에 연결된 복수의 곱셈기 셀들의 출력은 해당 출력 라인의 가산기로 전달될 수 있다.And in step 1120, the memory device may output a signal corresponding to the multiplication result of the memory cell selected within the multiplier cell. For example, each multiplier cell may output a signal (e.g., NAND result value) corresponding to the result of multiplication between the input value (IN _m ) and the weight (Q _m,i ) of the selected memory cell to the local bit line. The outputs of multiple multiplier cells connected to the same output line may be transmitted to the adder of the corresponding output line.

단계(1130)에서 가산기가 각 곱 결과의 합 연산을 수행할 수 있다. 전술한 바와 같이, 가산기는 NAND 결과를 수신하므로, NAND 결과를 인버스한 값을 합산할 수 있다. 가산기는 합산된 곱셈 결과 값들을 누적기로 전달할 수 있다.In step 1130, an adder may perform a sum operation of each product result. As described above, since the adder receives the NAND result, it can sum the inverted value of the NAND result. The adder can transfer the summed multiplication result values to the accumulator.

단계(1140)에서 누적기가 합 연산의 결과를 누적할 수 있다. 후술하겠으나, 누적기는 입력 값이 멀티 비트인 경우, 해당하는 비트 자리에 따른 비트 시프팅을 수행하고, 다음 비트 자리에 대한 곱 결과를 누적할 수 있다.At step 1140, an accumulator may accumulate the results of the sum operation. As will be described later, when the input value is multi-bit, the accumulator can perform bit shifting according to the corresponding bit position and accumulate the product result for the next bit position.

단계(1150)에서 메모리 장치는 곱 연산이 수행된 입력 값이 마지막 비트인 지 여부를 판단할 수 있다. 예를 들어, 메모리 장치는 마지막 비트에 대한 연산을 수행하는 경우, 누적기의 출력을 출력 레지스터로 전달할 수 있다. 입력 값이 싱글 비트인 경우에는 누적할 필요가 없으므로, 누적기는 곱 결과를 출력 레지스터로 바이패스할 수도 있다. 현재 입력 비트 값이 마지막 비트가 아닌 경우, 메모리 장치는 다음 비트 자리의 입력 비트 값에 대한 연산을 동일하게 수행할 수 있다. 메모리 장치는 가산기에서 곱 결과가 출력되면, 누적기를 통해 이전에 저장된 누적 결과를 비트 시프트(bit-shift)한 후 현재 곱 결과와 합산하고 그 결과를 다시 누적기에 저장함으로써 누적할 수 있다.In step 1150, the memory device may determine whether the input value on which the multiplication operation has been performed is the last bit. For example, when performing an operation on the last bit, the memory device may transfer the output of the accumulator to an output register. If the input value is a single bit, there is no need for accumulation, so the accumulator may bypass the product result to the output register. If the current input bit value is not the last bit, the memory device can perform the same operation on the input bit value of the next bit position. When the product result is output from the adder, the memory device can accumulate by bit-shifting the previously stored accumulation result through an accumulator, adding it to the current product result, and storing the result again in the accumulator.

단계(1160)에서 메모리 장치는 누적된 결과를 출력 레지스터에 저장할 수 있다. 예를 들어, 메모리 장치는 단일 비트, 또는 멀티 비트 중 마지막 비트에 대응하는 입력 신호를 수신한 경우, 해당 입력 신호에 대한 누적기 연산 결과를 출력 레지스터에 저장할 수 있다.In step 1160, the memory device may store the accumulated result in an output register. For example, when the memory device receives an input signal corresponding to the last bit of a single bit or multiple bits, the memory device may store the result of an accumulator operation for the input signal in an output register.

단계(1170)에서 메모리 장치는 누적기를 초기화하고, MAC 연산이 완료된 경우에는 종료할 수 있다.In step 1170, the memory device initializes the accumulator and terminates when the MAC operation is completed.

일 실시예에 따른 메모리 장치에서는, 10개 또는 12개의 트랜지스터들로로 128Kb의 크로스바어레이 구조를 구현하는 비교 실시예 대비, 곱셈 기능 구현을 위한 트랜지스터, 전체 트랜지스터의 요구 개수가 30% 이상 개선 및/또는 감소될 수 있다.In the memory device according to one embodiment, compared to the comparative embodiment that implements a 128Kb crossbar array structure with 10 or 12 transistors, the required number of transistors and total transistors for implementing the multiplication function is improved by more than 30% and/ Or it may be reduced.

도 12는 일 실시예에 따른 곱셈기 셀의 구현 예시를 도시한다.Figure 12 shows an example implementation of a multiplier cell according to one embodiment.

일 실시예에 따르면 전자 장치(1200)는 HD(High Density) IMC(In-Memory Computing) 매크로(1210), CPU(1220), 램(1230), 논리 블록(1240), 및 HE(High Efficiency) IMC 매크로(1250)를 포함할 수 있다.According to one embodiment, the electronic device 1200 includes a High Density (HD) In-Memory Computing (IMC) macro 1210, a CPU 1220, a RAM 1230, a logical block 1240, and a High Efficiency (HE) May include IMC macro 1250.

HD IMC 매크로(1210)는 도1 내지 도 11에서 전술한 곱셈기 셀들이 배열된 메모리 매크로 유닛(memory macro unit)를 나타낼 수 있다. HD IMC 매크로(1210)는 높은 메모리 밀도(memory density) 및 메모리 용량(High memory capacity)을 가질 수 있다. HD IMC 매크로(1210)는 전술한 곱셈기 셀들이 크로스바 형태로 배치된 구조를 가질 수 있다. 곱셈기 셀에 복수의 메모리 셀들이 집적되므로, 메모리 매크로 유닛의 제조에 필요한 트랜지스터 수가 감소될 수 있다.The HD IMC macro 1210 may represent a memory macro unit in which the multiplier cells described above in FIGS. 1 to 11 are arranged. HD IMC macro 1210 may have high memory density and high memory capacity. The HD IMC macro 1210 may have a structure in which the above-described multiplier cells are arranged in a crossbar shape. Since a plurality of memory cells are integrated into a multiplier cell, the number of transistors required to manufacture a memory macro unit can be reduced.

CPU(1220)는 HS(High Speed) IMC 매크로(1221)를 포함할 수 있다. HS IMC 매크로(1221)는 높은 스루풋(throughput) 및 구동 속도(operating speed)을 가지고, 레지스터 파일 타입(Register file type)의 셀 구조(cell structure)를 나타낼 수 있다.The CPU 1220 may include a High Speed (HS) IMC macro 1221. The HS IMC macro 1221 has high throughput and operating speed and can represent a cell structure of a register file type.

램(1230)은 시스템 메모리로 사용하기 위한 메모리를 포함할 수 있다.RAM 1230 may include memory for use as system memory.

논리 블록(1240)은 다양한 논리 연산에 사용되는 논리 회로를 포함할 수 있다.Logic block 1240 may include logic circuits used for various logical operations.

HE(High Efficiency) IMC 매크로(1250)는 높은 에너지 효율 및 낮은 공급 전압 동작(supply voltage operation)을 가질 수 있다.The High Efficiency (HE) IMC macro 1250 may have high energy efficiency and low supply voltage operation.

일 실시예에 따른 전자 장치(1200)는 예시적으로 AI 알고리즘(예: 얼굴 인식)을 위한 전용 하드웨어 가속기로 구현될 수 있다.The electronic device 1200 according to one embodiment may be implemented as a dedicated hardware accelerator for an AI algorithm (eg, face recognition).

이상에서 설명된 실시예들은 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치, 방법 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 컨트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The embodiments described above may be implemented with hardware components, software components, and/or a combination of hardware components and software components. For example, the devices, methods, and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, and a field programmable gate (FPGA). It may be implemented using a general-purpose computer or a special-purpose computer, such as an array, programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and software applications running on the operating system. Additionally, a processing device may access, store, manipulate, process, and generate data in response to the execution of software. For ease of understanding, a single processing device may be described as being used; however, those skilled in the art will understand that a processing device includes multiple processing elements and/or multiple types of processing elements. It can be seen that it may include. For example, a processing device may include multiple processors or one processor and one controller. Additionally, other processing configurations, such as parallel processors, are possible.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may include a computer program, code, instructions, or a combination of one or more of these, which may configure a processing unit to operate as desired, or may be processed independently or collectively. You can command the device. Software and/or data may be used on any type of machine, component, physical device, virtual equipment, computer storage medium or device to be interpreted by or to provide instructions or data to a processing device. , or may be permanently or temporarily embodied in a transmitted signal wave. Software may be distributed over networked computer systems and thus stored or executed in a distributed manner. Software and data may be stored on a computer-readable recording medium.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 저장할 수 있으며 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. A computer-readable medium may store program instructions, data files, data structures, etc., singly or in combination, and the program instructions recorded on the medium may be specially designed and constructed for the embodiment or may be known and available to those skilled in the art of computer software. there is. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -Includes optical media (magneto-optical media) and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, etc. Examples of program instructions include machine language code, such as that produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter, etc.

위에서 설명한 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 또는 복수의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The hardware devices described above may be configured to operate as one or multiple software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 이를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with limited drawings, those skilled in the art can apply various technical modifications and variations based on this. For example, the described techniques are performed in a different order than the described method, and/or components of the described system, structure, device, circuit, etc. are combined or combined in a different form than the described method, or other components are used. Alternatively, appropriate results may be achieved even if substituted or substituted by an equivalent.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents of the claims also fall within the scope of the claims described below.

Claims

In the memory device,
A memory having a pair of inverters connected in opposite directions, a first transistor connected to one end of the pair of inverters, and a second transistor connected to the other end of the pair of inverters, and a weight is set. cell (memory cell);
A switching element connected to the output terminal of the memory cell and performing switching in response to an input value to output a signal corresponding to the result of the product of the input value and the weight.
A multiplier cell with
A memory device containing a.

According to paragraph 1,
The switching element is,
Connected between the supply voltage and the output terminal of the memory cell,
Turns off when receiving a logical value of 1 as the input value,
Turned on when receiving a logical value of 0 as the input value,
memory device.

According to paragraph 1,
The switching element is,
Consisting of a pull-up transistor capable of receiving the input value at the gate terminal,
memory device.

According to paragraph 3,
The first transistor and the second transistor are
Each is an NMOS transistor,
The pull-up transistor is,
PMOS transistor,
memory device.

According to paragraph 3,
The memory device is,
Some of the series of multiplication operations include driving the voltage at the output stage of the pull-up transistor to the supply voltage in response to a voltage lower than the supply voltage being applied through the word line, each time depending on the supplied input. a first operation that outputs the result of the product operation, and
A second operation of driving the voltage of the output terminal of the pull-up transistor to a supply voltage in a pre-charge phase and performing a multiplication operation in an evaluation phase for each multiplication operation.
Selecting and performing one of the actions,
memory device.

According to clause 5,
The memory device is,
selecting one of the first operation or the second operation of the memory device based on at least one of an operating frequency or leakage of the memory device,
memory device.

According to paragraph 1,
The memory device is,
An adder connected to the output terminal of the multiplier cell and adding a value obtained by inverting the signal output from the multiplier cell.
A memory device further comprising:

According to paragraph 1,
The memory device is,
A global bit line and switch for accessing the memory cell of the multiplier cell and performing at least one of a read operation or a write operation on the weight of the memory cell
A memory device further comprising:

According to paragraph 1,
The multiplier cell is,
Multiple memory cells connected to the same pull-up transistor
A memory device containing a.

According to clause 9,
The memory device is,
An input-word line driver that selects a memory cell to be used for a target multiplication operation among the plurality of memory cells.
A memory device further comprising:

According to clause 10,
The input-word line driver,
A decoding circuit that decodes an input value provided from an input signal to the multiplier cell and a signal designating a memory cell to be used for a target multiplication operation among a plurality of memory cells included in the multiplier cell.
A memory device containing a.

According to clause 9,
The memory device is,
Activating a word line connected to a memory cell having a weight corresponding to a target operation among the plurality of memory cells included in one multiplier cell and deactivating a word line connected to the remaining memory cells,
memory device.

According to clause 9,
The memory device is,
For the first operation among the plurality of operations, a first memory cell among the plurality of memory cells is selected and a signal corresponding to the product result is output through the same pull-up transistor,
For the second operation among the plurality of operations, a second memory cell among the plurality of memory cells is selected and a signal corresponding to the product result is output through the same pull-up transistor.
memory device.

According to paragraph 1,
The memory device is,
comprising a plurality of multiplier cells including the multiplier cell,
Performing a multiplication operation in parallel with other multiplier cells in each of the plurality of multiplier cells,
Summing the outputs of multiplier cells connected to the same column line among the plurality of multiplier cells in the same adder,
memory device.

According to paragraph 1,
The multiplier cell is,
connected to a pair of local bit lines,
A first memory cell among the plurality of memory cells included in the multiplier cell is connected to a first local bit line,
A second memory cell among the plurality of memory cells is connected to a second local bit line,
memory device.

In paragraph 15:
The first memory cell connected to the first local bit line has a value corresponding to a weight,
The second memory cell connected to the second local bit line has a weight inverted value,
memory device.

According to paragraph 1,
The memory device is,
An accumulator that stores the output of an adder that sums the multiplication results of the multiplier cells and accumulates the sum results.
A memory device further comprising:

According to clause 17,
An output register that stores the final product operation result output from the accumulator.
A memory device further comprising:

According to clause 14,
The memory device is,
When receiving an input signal corresponding to the last bit of a single bit or multi-bit, storing the result of an accumulator operation for the input signal in the output register,
memory device.

According to paragraph 1,
A memory controller that controls the multiplier cells, input-word line drivers, read-write circuits, adders, accumulators, and output registers.
A memory device further comprising:

According to paragraph 1,
The memory device is,
In response to at least one of when a predetermined period has elapsed or when a multiplication operation using another memory cell is performed within each multiplier cell, an operation for precharging the output terminal of the pull-up transistor is performed.
memory device.

In a method of operating a memory device,
A memory cell having two inverters connected in opposite directions and two transistors connected to both ends of the two inverters receives an input value through a word line;
A pull-up transistor connected to the output terminal of the memory cell receives the input value at a gate terminal; and
Outputting a signal corresponding to a product result between the input value and the weight stored in the memory cell at the output terminal of the pull-up transistor.
An operation method comprising:

A pull-up transistor having a gate connected to an output line; and
a memory cell including a pair of inverters connected in opposite directions, and a cell transistor having a gate and connected to one end of the pair of inverters and the output line;
contains
An input having the same logic value is applied to the gate of the pull-up transistor and the gate of the cell transistor, and a logic value corresponding to a result of binary multiplication of the input and the binary weight set in the memory cell is output to the output line,
memory device.

According to clause 23,
The logical value corresponding to the binary multiplication result is NAND,
memory device.

According to clause 23,
The pull-up transistor is a PMOS transistor,
The cell transistor is an NMOS transistor,
memory device.

According to clause 23,
The multiplication result is output every clock cycle,
memory device.

According to clause 23,
The multiplication result is output every two clock cycles,
memory device.

According to clause 23,
The cell transistor is,
It is a first cell transistor,
The memory cell is,
It further includes a second cell transistor having a gate and connected to the other end of the pair of inverters,
An input having the same logic value is applied to the gate of the second cell transistor,
memory device.

According to clause 23,
The output line is a first output line,
further comprising a second output line,
memory device.

According to clause 29,
The cell transistor is,
It is a first cell transistor,
The memory cell is,
Further comprising a second cell transistor having a gate and connected to the other end of the pair of inverters and the second output line,
memory device.

According to clause 30,
The pull-up transistor is,
It is a first pull-up transistor,
Further comprising a second pull-up transistor connected to the second output line,
memory device.

According to clause 31,
Comprising a plurality of memory cells connected to the first output line and the second output line,
memory device.

According to clause 23,
Comprising a plurality of the memory cells connected to the output line,
memory device.