CN110597555A

CN110597555A - Non-volatile in-memory computing chip and operation control method thereof

Info

Publication number: CN110597555A
Application number: CN201910713399.0A
Authority: CN
Inventors: 康旺; 张和; 潘彪; 赵巍胜
Original assignee: Beijing University of Aeronautics and Astronautics
Current assignee: Beihang University
Priority date: 2019-08-02
Filing date: 2019-08-02
Publication date: 2019-12-20
Anticipated expiration: 2039-08-02
Also published as: CN110597555B

Abstract

The present invention provides a non-volatile in-memory computing chip and an operation control method thereof. The non-volatile in-memory computing chip includes: a cache module for caching data; a non-volatile in-memory computing module connected to the cache The module is used to perform operations on the data sent by the cache module; the post-processing module is connected to the non-volatile in-memory computing module, and is used to post-process the calculation results of the non-volatile in-memory computing module; wherein, The non-volatile in-memory computing module includes: a non-volatile memory cell array, a row-column decoder connected to the non-volatile memory cell array, and a read-write circuit connected to the non-volatile memory cell array. Among them, through the above-mentioned non-volatile in-memory computing chip combined with the operation control method, the multiplication and accumulation operation and the binary neural network operation are realized based on the integrated storage and calculation technology, without the need to transmit data between the memory and the processor, reducing power consumption and time. delay.

Description

Non-volatile in-memory computing chip and operation control method thereof

技术领域technical field

本发明涉及半导体集成电路应用技术领域，尤其涉及一种非易失性存内计算芯片及其运算控制方法。The invention relates to the technical field of application of semiconductor integrated circuits, in particular to a non-volatile in-memory computing chip and an operation control method thereof.

背景技术Background technique

随着深度学习理论的提出和数值计算设备的改进，深度学习神经网络技术得到了快速发展，并被大量应用于计算机视觉、自然语言处理等领域。现在神经网络一般采用浮点计算，需要较大的存储空间和较长的运算时间。With the introduction of deep learning theory and the improvement of numerical computing equipment, deep learning neural network technology has developed rapidly and has been widely used in computer vision, natural language processing and other fields. Now the neural network generally adopts floating-point calculation, which requires a large storage space and a long operation time.

二值神经网络(Binary Neural Network，BNN)是指在浮点型神经网络的基础上，将其权重矩阵中权重值和各个激活函数值(特征值)同时进行二值化得到的神经网络，即：将权重值和激活函数值二值化为1或者-1。通过二值化操作，使模型的参数占用更小的存储空间(内存消耗理论上减少为原来的1/32倍，从浮点32位到1位)，同时利用位操作来代替网络中的乘加运算，大大降低了运算时间和功耗。因此，二值神经网络能够解决当前浮点型神经网络模型应用到嵌入式或移动场景下(例如手机端、可穿戴设备、自动驾驶汽车等)存在的模型过大、计算密度过高等问题，有效减少了存储空间占用，降低了运算时间，以其高模型压缩率和快计算速度的潜在优势，近些年成为深度学习的热门研究方向。A binary neural network (Binary Neural Network, BNN) refers to a neural network obtained by simultaneously binarizing the weight values in its weight matrix and each activation function value (eigenvalue) on the basis of a floating-point neural network, namely : Binarize the weight value and activation function value to 1 or -1. Through the binarization operation, the parameters of the model take up less storage space (memory consumption is theoretically reduced to 1/32 of the original, from floating point 32 bits to 1 bit), and bit operations are used to replace multiplication in the network The addition operation greatly reduces the operation time and power consumption. Therefore, the binary neural network can solve the problems that the current floating-point neural network model is applied to embedded or mobile scenarios (such as mobile phones, wearable devices, self-driving cars, etc.) It reduces storage space occupation, reduces computing time, and has become a popular research direction of deep learning in recent years due to its potential advantages of high model compression rate and fast computing speed.

但是，虽然二值神经网络与浮点型神经网络相变，能够减少存储空间占用，降低运算时间，但是，由于二值神经网络仍然需要在存储器与处理器之间传输数据，频繁的数据移动仍然会带来较高的功耗与时延。However, although the phase transition between the binary neural network and the floating-point neural network can reduce the storage space occupied and reduce the operation time, however, because the binary neural network still needs to transmit data between the memory and the processor, frequent data movement is still It will bring higher power consumption and delay.

发明内容Contents of the invention

针对现有技术中的问题，本发明提供一种非易失性存内计算芯片及其运算控制方法，能够至少部分地解决现有技术中存在的问题。Aiming at the problems in the prior art, the present invention provides a non-volatile in-memory computing chip and an operation control method thereof, which can at least partly solve the problems in the prior art.

为了实现上述目的，本发明采用如下技术方案：In order to achieve the above object, the present invention adopts the following technical solutions:

第一方面，提供一种非易失性存内计算芯片，包括：In the first aspect, a non-volatile in-memory computing chip is provided, including:

缓存模块，用于缓存数据；A cache module for caching data;

非易失性存内计算模块，连接该缓存模块，用于对该缓存模块发送的数据执行运算；A non-volatile in-memory computing module, connected to the cache module, for performing operations on data sent by the cache module;

后处理模块，连接该非易失性存内计算模块，用于对该非易失性存内计算模块的运算结果进行后处理；A post-processing module, connected to the non-volatile in-memory computing module, for post-processing the calculation results of the non-volatile in-memory computing module;

其中，该非易失性存内计算模块包括：非易失性存储单元阵列、连接该非易失性存储单元阵列的行列译码器、连接该非易失性存储单元阵列的读写电路。Wherein, the non-volatile in-memory computing module includes: a non-volatile memory cell array, a row-column decoder connected to the non-volatile memory cell array, and a read-write circuit connected to the non-volatile memory cell array.

进一步地，该缓存模块包括：第一缓存单元和第二缓存单元，Further, the cache module includes: a first cache unit and a second cache unit,

该第一缓存单元连接在该非易失性存内计算模块的前端，用于接收并缓存输入数据以及特征图数据；The first cache unit is connected to the front end of the non-volatile in-memory computing module, and is used to receive and cache input data and feature map data;

该第二缓存单元连接该非易失性存内计算模块，用于缓存权重数据。The second cache unit is connected to the non-volatile in-memory computing module for caching weight data.

进一步地，该行列译码器包括：行译码器和列译码器，该非易失性存储单元阵列包括：多个阵列排布的非易失性存储单元；Further, the row and column decoder includes: a row decoder and a column decoder, and the nonvolatile memory cell array includes: a plurality of nonvolatile memory cells arranged in an array;

每列非易失性存储单元均通过一位线连接列译码器，每行非易失性存储单元均通过一字线连接该行译码器，每行非易失性存储单元的位线和源线均连接该读写电路。Each column of non-volatile memory cells is connected to the column decoder through a bit line, and each row of non-volatile memory cells is connected to the row decoder through a word line. The bit line of each row of non-volatile memory cells Both the source line and the source line are connected to the read-write circuit.

进一步地，该非易失性存储单元包括：串联连接的非易失性存储器件以及三端开关元件；Further, the non-volatile memory unit includes: a non-volatile memory device and a three-terminal switch element connected in series;

该非易失性存储器件一端连接该位线，另一端连接该三端开关元件的第一端，该三端开关元件的第二端连接该字线，该三端开关元件的第三端连接该源线。One end of the non-volatile memory device is connected to the bit line, the other end is connected to the first end of the three-terminal switching element, the second end of the three-terminal switching element is connected to the word line, and the third end of the three-terminal switching element is connected to the source line.

每列非易失性存储单元均通过一位线连接列译码器，每行非易失性存储单元均通过一源线连接该行译码器，每行非易失性存储单元的位线和源线均连接该读写电路。Each column of non-volatile memory cells is connected to the column decoder through a bit line, and each row of non-volatile memory cells is connected to the row decoder through a source line. The bit line of each row of non-volatile memory cells Both the source line and the source line are connected to the read-write circuit.

进一步地，该非易失性存储单元包括：串联连接的非易失性存储器件以及两端开关元件；Further, the non-volatile storage unit includes: a non-volatile storage device connected in series and a two-terminal switch element;

由该非易失性存储器件以及该两端开关元件形成的串联支路一端连接该位线，另一端连接该源线。One end of the series branch formed by the nonvolatile memory device and the two-terminal switching element is connected to the bit line, and the other end is connected to the source line.

进一步地，还包括：放大器，该放大器连接各条位线，用于将各条位线上的总的模拟电流/电压与参考信息进行对比，输出非易失性存内计算模块的运算结果。Further, it also includes: an amplifier, which is connected to each bit line, and is used to compare the total analog current/voltage on each bit line with reference information, and output the calculation result of the non-volatile memory calculation module.

进一步地，还包括：计数器，该计数器连接该读写电路，该计数器的输出作为非易失性存内计算模块的运算结果。Further, it also includes: a counter, the counter is connected to the read-write circuit, and the output of the counter is used as the calculation result of the non-volatile in-memory calculation module.

进一步地，该非易失性存储单元为阻变存储单元、相变存储单元、铁电存储单元、自旋存储单元。Further, the nonvolatile memory unit is a resistive change memory unit, a phase change memory unit, a ferroelectric memory unit, or a spin memory unit.

第二方面，提供一种基于非易失性存内计算实现乘积累加运算的控制方法，包括：In the second aspect, a control method for implementing multiplication and accumulation operations based on non-volatile in-memory calculations is provided, including:

将第一二进制运算信号存入一行非易失性存储单元，每个非易失性存储单元中存储该第一二进制运算信号的一位；storing the first binary operation signal into a row of non-volatile storage units, and storing one bit of the first binary operation signal in each non-volatile storage unit;

将第二二进制运算信号加载至该行非易失性存储单元，第一二进制运算信号和第二二进制运算信号执行乘积累加运算时的对应位施加于同一非易失性存储单元；The second binary operation signal is loaded to the row of non-volatile storage units, and the corresponding bits of the first binary operation signal and the second binary operation signal are applied to the same non-volatile storage when the multiplication and accumulation operation is performed. unit;

将同或运算指令加载至该行非易失性存储单元，以使该行非易失性存储单元响应于该同或运算指令执行该第一二进制运算信号和该第二二进制运算信号对应位的同或运算，并将运算结果存储在对应的非易失性存储单元中；Loading an NOR operation instruction into the row of non-volatile storage units, so that the row of non-volatile storage units executes the first binary operation signal and the second binary operation in response to the NOR operation instruction The exclusive OR operation of the corresponding bit of the signal, and store the operation result in the corresponding non-volatile storage unit;

读取该行非易失性存储单元中每个非易失性存储单元中的数据并累加，得到该第一二进制运算信号和第二二进制运算信号各位的乘积累加运算结果。The data in each non-volatile storage unit in the row of non-volatile storage units is read and accumulated to obtain the multiplication-accumulation operation result of each bit of the first binary operation signal and the second binary operation signal.

第三方面，提供一种基于非易失性存内计算实现二值神经网络运算的控制方法，包括：In the third aspect, a control method for implementing binary neural network operations based on non-volatile in-memory calculations is provided, including:

将至少一个二值权重信号存入至少一行非易失性存储单元，每个非易失性存储单元中存储该二值权重信号的一位；storing at least one binary weight signal in at least one row of non-volatile storage units, and storing one bit of the binary weight signal in each non-volatile storage unit;

将特征信号加载至该行非易失性存储单元，二值权重信号和特征信号执行乘积累加运算时的对应位施加于同一非易失性存储单元；Loading the feature signal to the row of non-volatile storage units, and applying the corresponding bit when the binary weight signal and the feature signal perform a multiplication-accumulation operation to the same non-volatile storage unit;

将同或运算指令加载至该行非易失性存储单元，以使该行非易失性存储单元响应于该同或运算指令执行该二值权重信号和该特征信号对应位的同或运算，并将运算结果存储在对应的非易失性存储单元中；Loading the NOR operation instruction into the non-volatile storage unit of the row, so that the non-volatile storage unit of the row responds to the NOR operation instruction to perform the NOR operation of the binary weight signal and the corresponding bit of the feature signal, And store the operation result in the corresponding non-volatile storage unit;

读取该行非易失性存储单元中每个非易失性存储单元中的数据并累加，得到该二值权重信号和该特征信号各位的乘积累加运算结果。The data in each non-volatile storage unit in the row of non-volatile storage units is read and accumulated to obtain a multiplication-accumulation operation result of the binary weight signal and each bit of the characteristic signal.

进一步地，还包括：Further, it also includes:

缓存该乘积累加运算结果，作为下一层的特征信号。The result of the multiply-accumulate operation is cached as the feature signal of the next layer.

进一步地，还包括：Further, it also includes:

对该乘积累加运算结果进行后处理，得到二值神经网络运算结果。The result of the multiply-accumulate operation is post-processed to obtain the result of the binary neural network operation.

本发明实施例提供的非易失性存内计算芯片及其运算控制方法，该非易失性存内计算芯片包括：缓存模块，用于缓存数据；非易失性存内计算模块，连接该缓存模块，用于对该缓存模块发送的数据执行运算；后处理模块，连接该非易失性存内计算模块，用于对该非易失性存内计算模块的运算结果进行后处理；其中，该非易失性存内计算模块包括：非易失性存储单元阵列、连接该非易失性存储单元阵列的行列译码器、连接该非易失性存储单元阵列的读写电路。其中，通过上述非易失性存内计算芯片配合运算控制方法，基于存算一体技术实现乘积累加运算和二值神经网络运算，不需要在存储器与处理器之间传输数据，降低功耗与时延。The embodiment of the present invention provides a non-volatile in-memory computing chip and its operation control method. The non-volatile in-memory computing chip includes: a cache module for caching data; a non-volatile in-memory computing module connected to the The cache module is used to perform calculations on the data sent by the cache module; the post-processing module is connected to the non-volatile in-memory computing module, and is used to post-process the calculation results of the non-volatile in-memory computing module; The non-volatile in-memory computing module includes: a non-volatile memory cell array, a row-column decoder connected to the non-volatile memory cell array, and a read-write circuit connected to the non-volatile memory cell array. Among them, through the above-mentioned non-volatile in-memory computing chip combined with the operation control method, the multiplication and accumulation operation and the binary neural network operation are realized based on the integrated storage and calculation technology, without the need to transmit data between the memory and the processor, reducing power consumption and time. delay.

为让本发明的上述和其他目的、特征和优点能更明显易懂，下文特举较佳实施例，并配合所附图式，作详细说明如下。In order to make the above and other objects, features and advantages of the present invention more comprehensible, preferred embodiments will be described in detail below together with the accompanying drawings.

附图说明Description of drawings

为了更清楚地说明本申请实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。在附图中：In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are For some embodiments of the present application, those of ordinary skill in the art can also obtain other drawings based on these drawings without creative effort. In the attached picture:

图1为本发明实施例中非易失性存内计算芯片的结构框图一；Fig. 1 is a structural block diagram 1 of a non-volatile in-memory computing chip in an embodiment of the present invention;

图2示出图1中非易失性存内计算模块20的结构；Fig. 2 shows the structure of the computing module 20 in the non-volatile memory in Fig. 1;

图3为本发明实施例中非易失性存内计算芯片的结构框图二；Fig. 3 is a structural block diagram 2 of a non-volatile in-memory computing chip in an embodiment of the present invention;

图4示出了图2中非易失性存储单元的一种结构；Fig. 4 shows a kind of structure of the non-volatile storage unit in Fig. 2;

图5示出了基于图4所示非易失性存储单元阵列的一种结构；Figure 5 shows a structure based on the nonvolatile memory cell array shown in Figure 4;

图6a示出了图2中非易失性存储单元的另一种结构；Figure 6a shows another structure of the nonvolatile memory cell in Figure 2;

图6b示出了图2中非易失性存储单元的第三种结构；Figure 6b shows a third structure of the nonvolatile memory cell in Figure 2;

图7示出了基于图6b所示非易失性存储单元阵列的另一种结构；Fig. 7 shows another structure based on the nonvolatile memory cell array shown in Fig. 6b;

图8a至图8c示出了本发明实施例提供的三种非易失性存储单元的运算逻辑；8a to 8c show the operation logic of three kinds of non-volatile storage units provided by the embodiment of the present invention;

图9示出了采用图8a至图8c示出的逻辑实现同或运算或异或运算的真值表；Fig. 9 shows a truth table for implementing the same-or operation or the exclusive-or operation using the logic shown in Fig. 8a to Fig. 8c;

图10a示出了利用图5所示非易失性存储单元阵列实现同或逻辑运算或者异或逻辑运算的电路结构；FIG. 10a shows a circuit structure for implementing an exclusive-or logical operation or an exclusive-or logical operation by using the nonvolatile memory cell array shown in FIG. 5;

图10b示出了利用图5所示非易失性存储单元阵列实现同或逻辑运算或者异或逻辑运算的另一种电路结构；FIG. 10b shows another circuit structure for realizing the exclusive OR logic operation or the exclusive OR logic operation by using the nonvolatile memory cell array shown in FIG. 5;

图11a示出了利用图7所示非易失性存储单元阵列实现同或逻辑运算或者异或逻辑运算的电路结构；FIG. 11a shows a circuit structure for realizing the same-OR logic operation or the exclusive-or logic operation by using the nonvolatile memory cell array shown in FIG. 7;

图11b示出了利用图7所示非易失性存储单元阵列实现同或逻辑运算或者异或逻辑运算的另一种电路结构；FIG. 11b shows another circuit structure for realizing the exclusive OR logic operation or the exclusive OR logic operation by using the nonvolatile memory cell array shown in FIG. 7;

图12示出了图1中后处理模块30的具体结构；Fig. 12 shows the specific structure of post-processing module 30 in Fig. 1;

图13示出了本发明实施例中基于非易失性存内计算实现乘积累加运算的控制方法的流程图一；FIG. 13 shows a flowchart 1 of a control method for implementing multiply-accumulate operations based on non-volatile in-memory calculations in an embodiment of the present invention;

图14示出了本发明实施例中基于非易失性存内计算实现乘积累加运算的控制方法的流程图二；FIG. 14 shows the second flow chart of the control method for implementing multiply-accumulate operations based on non-volatile in-memory calculations in an embodiment of the present invention;

图15示出了本发明实施例中基于非易失性存内计算实现二值神经网络运算的控制方法的流程图。FIG. 15 shows a flow chart of a control method for implementing binary neural network operations based on non-volatile in-memory calculations in an embodiment of the present invention.

图16示出了一种神经网络运算架构。Fig. 16 shows a neural network computing architecture.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本申请方案，下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分的实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都应当属于本申请保护的范围。In order to enable those skilled in the art to better understand the solution of the present application, the technical solution in the embodiment of the application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiment of the application. Obviously, the described embodiment is only It is an embodiment of a part of the application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the scope of protection of this application.

以下在实施方式中详细叙述本发明的详细特征以及优点，其内容足以使任何本领域技术人员，了解本发明的技术内容并据以实施，且根据本说明书所揭露的内容、权利要求及图式，任何本领域技术人员可轻易地理解本发明相关的目的及优点。以下的实施例进一步详细说明本发明的观点，但非以任何观点限制本发明的范畴。The detailed features and advantages of the present invention are described in detail below in the embodiments, the content of which is sufficient to enable any person skilled in the art to understand the technical content of the present invention and implement it accordingly, and according to the content disclosed in this specification, claims and drawings , any person skilled in the art can easily understand the related objects and advantages of the present invention. The following examples further illustrate the concept of the present invention in detail, but do not limit the scope of the present invention in any way.

需要说明的是，在不冲突的情况下，本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other. The present application will be described in detail below with reference to the accompanying drawings and embodiments.

目前，虽然二值神经网络与浮点型神经网络相变，能够减少存储空间占用，降低运算时间，但是，由于二值神经网络仍然需要在存储器与处理器之间传输数据，频繁的数据移动仍然会带来较高的功耗与时延。At present, although the phase transition between the binary neural network and the floating-point neural network can reduce the storage space occupation and reduce the operation time, however, because the binary neural network still needs to transmit data between the memory and the processor, frequent data movement is still It will bring higher power consumption and delay.

为至少部分解决现有技术中存在的上述技术问题，本发明实施例提供一种非易失性存内计算芯片，能够将存储于计算融合在同一个芯片，基于存算一体技术实现乘积累加运算和二值神经网络运算，从而直接利用存储器进行计算，减少存储器与处理器之间的数据传输，降低功耗与时延。In order to at least partially solve the above-mentioned technical problems existing in the prior art, an embodiment of the present invention provides a non-volatile in-memory computing chip, which can integrate storage and computing into the same chip, and realize multiplication and accumulation operations based on integrated storage and computing technology And binary neural network operations, so as to directly use the memory for calculation, reduce data transmission between the memory and the processor, and reduce power consumption and delay.

图1为本发明实施例中非易失性存内计算芯片的结构框图一。如图1所示，该非易失性存内计算芯片包括：缓存模块10、非易失性存内计算模块20以及后处理模块30。FIG. 1 is a structural block diagram 1 of a non-volatile in-memory computing chip in an embodiment of the present invention. As shown in FIG. 1 , the non-volatile in-memory computing chip includes: a cache module 10 , a non-volatile in-memory computing module 20 and a post-processing module 30 .

其中，缓存模块10用于接收输入数据并缓存数据，也可以用于输出数据，其中，缓存的数据可以为输入数据，也可以为非易失性存内计算模块20的中间运算结果或者后处理模块输出的计算结果等，比如：输入数据、权重数据、特征图数据等。Wherein, the cache module 10 is used to receive input data and cache data, and may also be used to output data, wherein, the cached data may be input data, or may be an intermediate operation result or post-processing of the non-volatile in-memory computing module 20 The calculation results output by the module, etc., such as: input data, weight data, feature map data, etc.

具体地，该缓存模块10可采用SRAM或MRAM实现。Specifically, the cache module 10 can be implemented by using SRAM or MRAM.

非易失性存内计算模块20连接该缓存模块，用于对该缓存模块发送的数据执行运算。The non-volatile in-memory computing module 20 is connected to the cache module, and is used to perform operations on the data sent by the cache module.

其中，该非易失性存内计算模块20可以存储数据，也可以基于非易失的特性，实现与逻辑运算、或逻辑运算、异或逻辑运算、同或逻辑运算、乘加累积运算(MAC)等。Wherein, the non-volatile in-memory computing module 20 can store data, and can also implement AND logic operations, OR logic operations, exclusive-OR logic operations, exclusive-OR logic operations, and multiply-add-accumulate (MAC) operations based on non-volatile characteristics. )Wait.

后处理模块(Post-processing Engine)30连接该非易失性存内计算模块，用于对该非易失性存内计算模块的运算结果进行后处理。A post-processing module (Post-processing Engine) 30 is connected to the non-volatile in-memory computing module for post-processing the calculation results of the non-volatile in-memory computing module.

具体地，该后处理可以包括：池化(Pooling)、批归一化(Batch Normalization)、移位、偏置、求平均值、取最大最小值、激活函数等运算。Specifically, the post-processing may include operations such as pooling, batch normalization, shifting, offset, averaging, maximum and minimum values, and activation functions.

其中，该非易失性存内计算模块20包括：非易失性存储单元阵列21、连接该非易失性存储单元阵列的行列译码器23、连接该非易失性存储单元阵列的读写电路22，还可以包括连接所述非易失性存储单元阵列21的MAC外围电路24(如计数器、放大器等)，参见图2。Wherein, the calculation module 20 in the non-volatile memory includes: a non-volatile memory cell array 21, a row-column decoder 23 connected to the non-volatile memory cell array, a reader connected to the non-volatile memory cell array The write circuit 22 may also include a MAC peripheral circuit 24 (such as a counter, an amplifier, etc.) connected to the non-volatile memory cell array 21, see FIG. 2 .

具体地，该非易失性存储单元阵列21可为RRAM,PCRAM,MRAM等。Specifically, the non-volatile memory cell array 21 can be RRAM, PCRAM, MRAM and so on.

值得说明的是，本发明实施例提供的非易失性存内计算芯片，通过采用缓存模块接收或缓存数据，通过控制非易失性存内计算模块20，以使非易失性存内计算模块20对待运算数据执行逻辑运算，由后处理模块30对运算结果进行处理后，发送至缓存模块输出或者继续参与下一轮运算，进而可基于存算一体技术实现乘积累加运算或二值神经网络运算等运算过程，不需要在存储器与处理器之间传输数据，降低功耗与时延。It is worth noting that the non-volatile in-memory computing chip provided by the embodiment of the present invention receives or caches data by using a cache module, and controls the non-volatile in-memory computing module 20, so that the non-volatile in-memory computing The module 20 performs logical operations on the data to be operated, and the post-processing module 30 processes the operation results, then sends them to the cache module for output or continues to participate in the next round of operations, and then realizes multiplication-accumulation operations or binary neural networks based on the integration of storage and calculation technology In computing and other computing processes, there is no need to transfer data between the memory and the processor, reducing power consumption and delay.

在一个可选的实施例中，参见图3，该缓存模块10可以包括：第一缓存单元11和第二缓存单元12。In an optional embodiment, referring to FIG. 3 , the cache module 10 may include: a first cache unit 11 and a second cache unit 12 .

第一缓存单元11连接在非易失性存内计算模块20的前端，用于接收并缓存输入数据以及特征图数据；第二缓存单元12连接所述非易失性存内计算模块，用于缓存权重数据。The first cache unit 11 is connected to the front end of the non-volatile in-memory computing module 20 for receiving and buffering input data and feature map data; the second cache unit 12 is connected to the non-volatile in-memory computing module for Cache weight data.

其中，通过设置两个缓存单元分别缓存不同数据，能够提高数据缓存读取速度，提高非易失性存内计算芯片的灵活性。Wherein, by setting two cache units to respectively cache different data, the data cache reading speed can be improved, and the flexibility of the non-volatile in-memory computing chip can be improved.

在一个可选的实施例中，该缓存模块20还可以连接一非易失性片外存储器(可以是传统Flash、硬盘，也可是新型非易失性存储器RRAM,MRAM,PCRAM)，以此提高片外存储的容量和存取速度，防止大规模运算时缓存数据严重溢出而影响运算的问题。In an optional embodiment, the cache module 20 can also be connected with a nonvolatile off-chip memory (can be traditional Flash, hard disk, or new nonvolatile memory RRAM, MRAM, PCRAM), thereby improving The capacity and access speed of the off-chip storage prevent the serious overflow of cache data and affect the operation during large-scale operations.

在一个可选的实施例中，该行列译码器包括：行译码器和列译码器，所述非易失性存储单元阵列包括：多个阵列排布的非易失性存储单元；每列非易失性存储单元均通过一位线BL连接列译码器，每行非易失性存储单元均通过一字线WL连接所述行译码器，每行非易失性存储单元的位线BL和源线SL均连接所述读写电路。In an optional embodiment, the row and column decoder includes: a row decoder and a column decoder, and the nonvolatile memory cell array includes: a plurality of nonvolatile memory cells arranged in an array; Each row of nonvolatile memory cells is connected to the column decoder through a bit line BL, and each row of nonvolatile memory cells is connected to the row decoder through a word line WL. Each row of nonvolatile memory cells Both the bit line BL and the source line SL are connected to the read/write circuit.

在一个进一步地实施例中，该非易失性存储单元包括：串联连接的非易失性存储器件R以及三端开关元件T1(简称1T1R结构)，参见图4；In a further embodiment, the nonvolatile memory unit includes: a nonvolatile memory device R connected in series and a three-terminal switch element T1 (1T1R structure for short), see FIG. 4 ;

非易失性存储器件R一端连接所述位线BL，另一端连接所述三端开关元件T1的第一端，所述三端开关元件T1的第二端连接所述字线WL，所述三端开关元件T1的第三端连接所述源线SL，由该非易失性存储单元阵列排布形成的非易失性存储单元阵列的结构参见图5。One end of the non-volatile memory device R is connected to the bit line BL, and the other end is connected to the first end of the three-terminal switching element T1, and the second end of the three-terminal switching element T1 is connected to the word line WL. The third end of the three-terminal switching element T1 is connected to the source line SL. The structure of the non-volatile memory cell array formed by the arrangement of the non-volatile memory cell array is shown in FIG. 5 .

其中，开关元件可采用PMOS晶体管或NMOS晶体管实现，该第一端可为MOS管的漏极，该第二端可为MOS管的栅极，该第三端可为MOS管的源极。Wherein, the switch element can be realized by PMOS transistor or NMOS transistor, the first terminal can be the drain of the MOS transistor, the second terminal can be the gate of the MOS transistor, and the third terminal can be the source of the MOS transistor.

当然，本发明实施例提供的晶体管的第一端可以为源极，则第三端为漏极，本发明对此不作限定，可根据晶体管的类型合理选择即可。Of course, the first terminal of the transistor provided in the embodiment of the present invention may be the source, and the third terminal may be the drain, which is not limited in the present invention, and can be reasonably selected according to the type of the transistor.

在另一个可选的实施例中，行列译码器包括：行译码器和列译码器，非易失性存储单元阵列包括：多个阵列排布的非易失性存储单元；In another optional embodiment, the row and column decoder includes: a row decoder and a column decoder, and the nonvolatile memory cell array includes: a plurality of nonvolatile memory cells arranged in an array;

每列非易失性存储单元均通过一位线连接列译码器，每行非易失性存储单元均通过一源线连接所述行译码器，每行非易失性存储单元的位线和源线均连接所述读写电路。Each row of nonvolatile memory cells is connected to the column decoder through a bit line, and each row of nonvolatile memory cells is connected to the row decoder through a source line. The bit of each row of nonvolatile memory cells Both the source line and the source line are connected to the read-write circuit.

在一个进一步地实施例中，所述非易失性存储单元包括：串联连接的非易失性存储器件R以及两端开关元件T2或T3(也称1T1R结构)，参见图6a和图6b；In a further embodiment, the non-volatile memory unit includes: a non-volatile memory device R connected in series and a two-terminal switching element T2 or T3 (also called 1T1R structure), see FIG. 6a and FIG. 6b;

由所述非易失性存储器件以及所述两端开关元件形成的串联支路一端连接所述位线BL，另一端连接所述源线SL。由该非易失性存储单元阵列排布形成的非易失性存储单元阵列(简称交叉点阵列)的结构参见图7。One end of the series branch formed by the nonvolatile memory device and the two-terminal switching element is connected to the bit line BL, and the other end is connected to the source line SL. Refer to FIG. 7 for the structure of the nonvolatile memory cell array (referred to as the cross-point array) formed by the arrangement of the nonvolatile memory cell array.

在1T1R单元结构中，非易失性存储器件的状态(通常有两个状态，表示逻辑0的低阻态和表示逻辑1的高阻态，或者反之亦可)取决于位线BL与源线SL上的电压差(注意：高压代表1，低压代表0)，当位线BL与源线SL之间的电压差超过某个阈值时，非易失性存储器件的状态就翻转(不论当前是何状态)，当位线BL与源线SL之间的电压不超过阈值时，非易失性存储器件的状态保持初始状态，如单向RRAM器件、电场调控MRAM器件等。基于此原理，发明人通过大量研究分析发现，可以利用上述1T1R结构实现同或运算XNOR。如图8a、图8b、图8c所示，令BL上的电压为输入操作数A,SL上的电压为输入操作数C,非易失性存储器件当前存储的数据为操作数Bi，则可以得到真值表，参见图9，可以看到，当C＝0时，Bi与A执行异或(XOR)操作，但当C＝1时，Bi与A执行同或(XNOR)操作。In the 1T1R cell structure, the state of the nonvolatile memory device (usually two states, a low-impedance state representing a logic 0 and a high-impedance state representing a logic 1, or vice versa) depends on the bit line BL and the source line The voltage difference on SL (note: high voltage represents 1, low voltage represents 0), when the voltage difference between the bit line BL and the source line SL exceeds a certain threshold, the state of the non-volatile memory device is flipped (regardless of the current any state), when the voltage between the bit line BL and the source line SL does not exceed the threshold, the state of the non-volatile memory device remains in the initial state, such as a unidirectional RRAM device, an electric field regulated MRAM device, and the like. Based on this principle, the inventor found through extensive research and analysis that the above-mentioned 1T1R structure can be used to realize the XNOR operation. As shown in Figure 8a, Figure 8b, and Figure 8c, let the voltage on BL be the input operand A, the voltage on SL be the input operand C, and the data currently stored in the non-volatile memory device be the operand Bi, then it can be Get the truth table, referring to FIG. 9 , it can be seen that when C=0, Bi and A perform exclusive OR (XOR) operation, but when C=1, Bi and A perform exclusive OR (XNOR) operation.

通过发明人大量的研究发现，对两个向量的MAC运算的过程能够等效于两个两项的各元素的同或运算(XNOR)的累加，例如，序列A＝[0101]和B＝[1011]，A×B＝a1×b1+a2×b2+a3×b3+a4×b4＝a1⊙b1+a2⊙b2+a3⊙b3+a4⊙b4＝1。其中，XNOR的符号是⊙，另外，XNOR＝XOR(异或运算⊕)取反。A large number of studies by the inventor have found that the process of the MAC operation of two vectors can be equivalent to the accumulation of the same-or operation (XNOR) of each element of two items, for example, sequence A=[0101] and B=[ 1011], A*B=a1*b1+a2*b2+a3*b3+a4*b4=a1⊙b1+a2⊙b2+a3⊙b3+a4⊙b4=1. Wherein, the symbol of XNOR is ⊙, and XNOR=XOR (exclusive OR operation ⊕) is inverse.

基于上述原理，对于图5和图7所示的非易失性存储单元阵列，通过增加外围电路，可以实现两个向量的XNOR操作。Based on the above principles, for the nonvolatile memory cell arrays shown in FIG. 5 and FIG. 7 , the XNOR operation of two vectors can be realized by adding peripheral circuits.

图10a示出了利用图5所示非易失性存储单元阵列实现同或逻辑运算或者异或逻辑运算的电路结构。如图10a所示，各条位线上均连接一个放大器(相当于读写电路的读写单元)，各放大器的输出端均连接至计数器，计数器对读取的数据中表示1的数据进行计数，计数结果作为非易失性存内计算模块的运算结果。FIG. 10 a shows a circuit structure for implementing exclusive OR logic operations or exclusive OR logic operations by using the nonvolatile memory cell array shown in FIG. 5 . As shown in Figure 10a, each bit line is connected to an amplifier (equivalent to the read-write unit of the read-write circuit), and the output terminals of each amplifier are connected to a counter, and the counter counts the data representing 1 in the read data , and the counting result is used as the operation result of the non-volatile in-memory computing module.

可以将运算数据B＝{b₁,b₂,…,b_M}(针对神经网络运算，B相当于某一层的权重数据，针对卷积神经网络运算，B相当于卷积核数据)存储于某一行非易失性存储单元中(通过读写电路配合行列译码器，通过控制位线与源线的电压差实现)，将表征A＝{a₁,a₂,…,a_M}的信号加载在该行非易失性存储单元的位线，将同或运算指令{1,1,…,1}加载在非易失性存储单元的源线，当所有位线为高时，在A的作用下，在每一个单元中执行XNOR操作，对应每一行执行A与B的XNOR操作，最后通过选通不同的WL,读取每一行单元1的状态并通过计数器累加，即可实现MAC操作。The operation data B={b ₁ ,b ₂ ,...,b _M } (for neural network operations, B is equivalent to the weight data of a certain layer, for convolutional neural network operations, B is equivalent to convolution kernel data) can be stored In a row of non-volatile memory cells (through the read-write circuit and the row-column decoder, by controlling the voltage difference between the bit line and the source line), the representation A={a ₁ ,a ₂ ,…,a _M } The signal is loaded on the bit lines of the non-volatile memory cells in this row, and the NOR instruction {1,1,...,1} is loaded on the source lines of the non-volatile memory cells. When all the bit lines are high, Under the action of A, XNOR operation is performed in each unit, and XNOR operation of A and B is performed corresponding to each row. Finally, by gating different WL, reading the state of unit 1 of each row and accumulating through the counter can be realized. MAC operation.

本领域技术人员可以理解的是，也可将异或运算指令{0,0,…,0}加载在非易失性存储单元的源线，在A的作用下，在每一个单元中执行XOR操作，对应每一行执行A与B的XOR操作，最后通过选通不同的WL,读取每一行单元0的状态并通过计数器累加，也可实现MAC操作。Those skilled in the art can understand that the XOR operation instruction {0,0,...,0} can also be loaded on the source line of the non-volatile storage unit, and under the action of A, XOR is performed in each unit Operation, corresponding to each row to perform the XOR operation of A and B, and finally by gating different WL, read the state of unit 0 of each row and accumulate it through the counter, and also realize the MAC operation.

图10b示出了利用图5所示非易失性存储单元阵列实现同或逻辑运算或者异或逻辑运算的另一种电路结构；如图10b所示，放大器连接各条位线，用于将各条位线上的总的模拟电流/电压与参考信息进行对比，输出非易失性存内计算模块的运算结果。Fig. 10b shows another circuit structure that utilizes the nonvolatile memory cell array shown in Fig. 5 to realize NOR logic operation or XOR logic operation; As shown in Fig. 10b, the amplifier is connected to each bit line for The total analog current/voltage on each bit line is compared with the reference information, and the calculation result of the calculation module in the non-volatile memory is output.

可将运算数据B＝{b₁,b₂,…,b_M}(针对神经网络运算，B相当于某一层的权重数据，针对卷积神经网络运算，B相当于卷积核数据)存储于某一行非易失性存储单元中(通过读写电路配合行列译码器，通过控制位线与源线的电压差实现)，将表征A＝{a₁,a₂,…,a_M}的信号加载在该行非易失性存储单元的位线，将同或运算指令{1,1,…,1}加载在非易失性存储单元的源线，当所有位线为高时，在A的作用下，在每一个单元中执行XNOR操作，对应每一行执行A与B的XNOR操作，通过读取所有单元总的模拟电流/电压的方式，通过放大器将所有单元总的模拟电流/电压与参考信号的对比，并将对比结果作为MAC操作结果。The operation data B={b ₁ ,b ₂ ,...,b _M } (for neural network operations, B is equivalent to the weight data of a certain layer, for convolutional neural network operations, B is equivalent to convolution kernel data) can be stored In a row of non-volatile memory cells (through the read-write circuit and the row-column decoder, by controlling the voltage difference between the bit line and the source line), the representation A={a ₁ ,a ₂ ,…,a _M } The signal is loaded on the bit lines of the non-volatile memory cells in this row, and the NOR instruction {1,1,...,1} is loaded on the source lines of the non-volatile memory cells. When all the bit lines are high, Under the action of A, XNOR operation is performed in each unit, and XNOR operation of A and B is performed corresponding to each row. By reading the total analog current/voltage of all units, the total analog current/voltage of all units is read by the amplifier. The voltage is compared with the reference signal, and the comparison result is used as the MAC operation result.

图11a示出了利用图7所示非易失性存储单元阵列实现同或逻辑运算或者异或逻辑运算的电路结构。图11b示出了利用图7所示非易失性存储单元阵列实现同或逻辑运算或者异或逻辑运算的另一种电路结构。其运算原理和电路描述参考图10a和图10b，在此不再赘述。FIG. 11 a shows a circuit structure for implementing exclusive OR logic operations or exclusive OR logic operations by using the nonvolatile memory cell array shown in FIG. 7 . FIG. 11 b shows another circuit structure for implementing exclusive-or logic operations or exclusive-or logic operations by using the nonvolatile memory cell array shown in FIG. 7 . Its operation principle and circuit description refer to FIG. 10a and FIG. 10b , which will not be repeated here.

图12示出了图1中后处理模块30的具体结构。参见图12，该后处理模块30包括多个PE通道，分别实现不同中预算组配的后处理功能，该PE通道1包括：非线性函数+池化+批归一化+激活函数。不同PE通道由不同种运算按所需顺序组配实现。FIG. 12 shows the specific structure of the post-processing module 30 in FIG. 1 . Referring to FIG. 12 , the post-processing module 30 includes a plurality of PE channels, respectively realizing post-processing functions of different mid-budget combinations. The PE channel 1 includes: nonlinear function + pooling + batch normalization + activation function. Different PE channels are implemented by combining different types of operations in the required order.

值得说明的是，后处理模块为本领域常见的技术，在此不再赘述。It is worth noting that the post-processing module is a common technology in the art and will not be repeated here.

值得说明的是，本发明实施例中所采用的非易失性存储单元优选为阻变存储单元、相变存储单元、铁电存储单元、自旋存储单元等。It is worth noting that the nonvolatile memory cells used in the embodiments of the present invention are preferably resistive change memory cells, phase change memory cells, ferroelectric memory cells, spin memory cells, and the like.

其中，该非易失性存内计算芯片还可以包括控制器，用于控制整个芯片的状态与时序。Wherein, the non-volatile in-memory computing chip may also include a controller for controlling the state and timing of the entire chip.

图13示出了本发明实施例中基于非易失性存内计算实现乘积累加运算的控制方法的流程图一；如图13所示，该基于非易失性存内计算实现乘积累加运算的控制方法可以用于控制上述的非易失性存内计算芯片实现乘积累加运算。Fig. 13 shows a flowchart 1 of a control method for implementing a multiply-accumulate operation based on non-volatile in-memory calculations in an embodiment of the present invention; The control method can be used to control the above-mentioned non-volatile in-memory computing chip to realize the multiply-accumulate operation.

该基于非易失性存内计算实现乘积累加运算的控制方法可以包括以下内容：The control method for implementing multiplication and accumulation operations based on non-volatile in-memory calculations may include the following:

步骤S100：将第一二进制运算信号存入一行非易失性存储单元。Step S100: Store the first binary operation signal into a row of non-volatile memory cells.

其中，每个非易失性存储单元中存储所述第一二进制运算信号的一位。Wherein, each non-volatile storage unit stores one bit of the first binary operation signal.

具体地，通过行列译码器、读写电路配合，控制非易失性存储单元位线和源线上的电压差将每个二进制位写入一个非易失性存储单元中。Specifically, each binary bit is written into a non-volatile memory cell by controlling the voltage difference between the bit line and the source line of the non-volatile memory cell through the cooperation of the row-column decoder and the read-write circuit.

值得说明的是，第一二进制运算信号表征第一二进制运算数据。It should be noted that the first binary operation signal represents the first binary operation data.

步骤S200：将第二二进制运算信号加载至该行非易失性存储单元。Step S200: Load the second binary operation signal to the row of non-volatile memory cells.

其中，第一二进制运算信号和第二二进制运算信号执行乘积累加运算时的对应位施加于同一非易失性存储单元；Wherein, the corresponding bits of the first binary operation signal and the second binary operation signal are applied to the same non-volatile storage unit when performing a multiply-accumulate operation;

具体地，通过行列译码器、读写电路配合，根据第二二进制运算信号配置该行非易失性存储单元对应存储单元的位线。Specifically, through the cooperation of the row and column decoder and the read-write circuit, the bit line corresponding to the storage unit of the non-volatile storage unit in the row is configured according to the second binary operation signal.

值得说明的是，第二二进制运算信号表征第二二进制运算数据。It should be noted that the second binary operation signal represents the second binary operation data.

步骤S300：将同或运算指令加载至该行非易失性存储单元，以使该行非易失性存储单元响应于所述同或运算指令执行所述第一二进制运算信号和所述第二二进制运算信号对应位的同或运算，并将运算结果存储在对应的非易失性存储单元中。Step S300: Load an exclusive-OR operation instruction into the row of non-volatile storage units, so that the row of non-volatile storage units responds to the exclusive-or operation instruction to execute the first binary operation signal and the The second binary operation signal corresponds to the NOR operation of the bits, and the operation result is stored in the corresponding non-volatile storage unit.

其中，同或运算指令可以设置为全1或全0，根据电路情况进行配置，异或运算指令与同或运算指令相反即可。Wherein, the same-or operation instruction can be set as all 1s or all 0s, and is configured according to the circuit conditions, and the exclusive-or operation instruction and the same-or operation instruction can be reversed.

具体地，通过行列译码器、读写电路配合，根据同或运算指令配置该行非易失性存储单元。Specifically, through the cooperation of the row-column decoder and the read-write circuit, the row of non-volatile memory cells is configured according to an NOR operation instruction.

步骤S400：读取该行非易失性存储单元中每个非易失性存储单元中的数据并累加，得到所述第一二进制运算信号和第二二进制运算信号各位的乘积累加运算结果。Step S400: Read and accumulate the data in each non-volatile storage unit in the row of non-volatile storage units to obtain the multiplication and accumulation of each bit of the first binary operation signal and the second binary operation signal Operation result.

具体地，可以通过行列译码器、读写电路配合，读取该行非易失性存储单元中每个非易失性存储单元中的数据，通过计数器对读取的该行各非易失性存储单元的某一特定状态进行计数，即可实现MAC操作，参考图10a和图11a,首先，读取该行非易失性存储单元中每个非易失性存储单元中的数据；然后，对读取的数据中表示1的数据进行计数；最后，将计数结果作为所述第一二进制运算信号和第二二进制运算信号各位的乘积累加运算结果。Specifically, the data in each non-volatile memory cell in the row of non-volatile memory cells can be read through the cooperation of the row-column decoder and the read-write circuit, and the data in each non-volatile memory cell of the row that is read can be read through the counter. The MAC operation can be realized by counting a specific state of the non-volatile storage unit. Referring to FIG. 10a and FIG. 11a, first, read the data in each non-volatile storage unit in the row of non-volatile storage units; then , counting the data representing 1 in the read data; finally, using the counting result as the result of the multiplication and accumulation operation of each bit of the first binary operation signal and the second binary operation signal.

或者，通过读取所有单元总的模拟电流/电压的方式，通过放大器将所有单元总的模拟电流/电压与参考信号的对比，并将对比结果作为MAC操作结果，参考图10b和图11b，首先，读取该行非易失性存储单元中所有非易失性存储单元的总模拟电流/电压；然后，将所述总模拟电流/电压与第一参考信号进行比较；最后，将比较结果作为所述第一二进制运算信号和第二二进制运算信号各位的乘积累加运算结果。Or, by reading the total analog current/voltage of all units, compare the total analog current/voltage of all units with the reference signal through the amplifier, and use the comparison result as the result of the MAC operation, refer to Figure 10b and Figure 11b, first , read the total analog current/voltage of all non-volatile memory cells in the row of non-volatile memory cells; then, compare the total analog current/voltage with the first reference signal; finally, use the comparison result as A multiplication and accumulation operation result of each bit of the first binary operation signal and the second binary operation signal.

通过上述技术方案可以的值，本发明可以采用上述控制方法控制上述基于非易失性存内计算实现乘积累加运算，不需要在存储器与处理器之间传输数据，降低功耗与时延。Through the above-mentioned technical solution, the present invention can use the above-mentioned control method to control the above-mentioned non-volatile in-memory calculation to realize the multiplication and accumulation operation, without the need to transmit data between the memory and the processor, reducing power consumption and delay.

图14示出了本发明实施例中基于非易失性存内计算实现乘积累加运算的控制方法的流程图二；如图14所示，该基于非易失性存内计算实现乘积累加运算的控制方法可以用于控制上述的非易失性存内计算芯片实现乘积累加运算。Fig. 14 shows the second flow chart of the control method for implementing the multiply-accumulate operation based on the non-volatile in-memory calculation in the embodiment of the present invention; The control method can be used to control the above-mentioned non-volatile in-memory computing chip to realize the multiply-accumulate operation.

该图14示出的基于非易失性存内计算实现乘积累加运算的控制方法与图13示出的控制方法原理相同，区别在于步骤S300’中将异或运算指令加载至该行非易失性存储单元，利用同或运算为异或运算取反的原理，在读取运算结果时，对读取的数据中表示0的数据进行计数；最后，将计数结果作为所述第一二进制运算信号和第二二进制运算信号各位的乘积累加运算结果。The control method shown in FIG. 14 based on non-volatile in-memory calculations to realize multiply-accumulate operations is the same as the control method shown in FIG. The characteristic storage unit utilizes the principle that the same OR operation is the inversion of the XOR operation, and when the operation result is read, the data representing 0 in the read data is counted; finally, the count result is used as the first binary A multiplication and accumulation operation result of each bit of the operation signal and the second binary operation signal.

或者，通过读取所有单元总的模拟电流/电压的方式，通过放大器将所有单元总的模拟电流/电压与参考信号的对比，并将对比结果作为MAC操作结果时，所采用的参考信号与执行同或运算时所采用的参考信号不同，其他原理与采用同或运算时相同，在此不再赘述。Or, by reading the total analog current/voltage of all units, compare the total analog current/voltage of all units with the reference signal through the amplifier, and use the comparison result as the result of the MAC operation. The reference signals used in the exclusive OR operation are different, and other principles are the same as those in the exclusive OR operation, which will not be repeated here.

图15示出了本发明实施例中基于非易失性存内计算实现二值神经网络运算的控制方法的流程图。如图15所示，该基于非易失性存内计算实现二值神经网络运算的控制方法可以包括以下内容：FIG. 15 shows a flow chart of a control method for implementing binary neural network operations based on non-volatile in-memory calculations in an embodiment of the present invention. As shown in Figure 15, the control method for implementing binary neural network operations based on non-volatile in-memory calculations may include the following:

步骤S1000：将至少一个二值权重信号(相当于Bi)存入至少一行非易失性存储单元；Step S1000: storing at least one binary weight signal (equivalent to Bi) into at least one row of non-volatile storage units;

值得说明的是，参见图16，神经网络运算包括多层，每层用于对输入数据和权重数据进行MAC运算，并将运算结果作为下一层的输入。It is worth noting that, referring to Fig. 16, the neural network operation includes multiple layers, each layer is used to perform MAC operation on the input data and weight data, and the operation result is used as the input of the next layer.

其中，每个非易失性存储单元中存储所述二值权重信号的一位。Wherein, one bit of the binary weight signal is stored in each non-volatile storage unit.

另外，针对卷积神经网络来说，一层可能对应多个卷积核，此时，将每个卷积核对应的数据作为一权重数据，将多个卷积核对应的多个权重数据分别写入多行非易失性存储单元中，以便同时实现多个卷积核对应的运算。In addition, for convolutional neural networks, one layer may correspond to multiple convolution kernels. At this time, the data corresponding to each convolution kernel is used as a weight data, and the multiple weight data corresponding to multiple convolution kernels are respectively It is written into multiple rows of non-volatile storage units, so as to realize the operations corresponding to multiple convolution kernels at the same time.

步骤S2000：将特征信号(相当于A)加载至该行非易失性存储单元。Step S2000: Load the characteristic signal (equivalent to A) to the row of non-volatile memory cells.

二值权重信号和特征信号执行乘积累加运算时的对应位施加于同一非易失性存储单元；The corresponding bits of the binary weight signal and the characteristic signal are applied to the same non-volatile storage unit when performing a multiply-accumulate operation;

具体地，通过行列译码器、读写电路配合，根据特征信号配置该行非易失性存储单元对应存储单元的位线。Specifically, through the cooperation of row and column decoders and read-write circuits, the bit lines corresponding to the storage cells of the row of non-volatile memory cells are configured according to the characteristic signal.

步骤S3000：将同或运算指令加载至该行非易失性存储单元，以使该行非易失性存储单元响应于所述同或运算指令执行所述二值权重信号和所述特征信号对应位的同或运算，并将运算结果存储在对应的非易失性存储单元中；Step S3000: Load the exclusive-OR operation instruction into the non-volatile storage unit of the row, so that the non-volatile storage unit of the row performs the correspondence between the binary weight signal and the characteristic signal in response to the exclusive-or operation instruction. bit exclusive OR operation, and store the operation result in the corresponding non-volatile storage unit;

步骤S4000：读取该行非易失性存储单元中每个非易失性存储单元中的数据并累加，得到所述二值权重信号和所述特征信号各位的乘积累加运算结果。Step S4000: Read and accumulate the data in each non-volatile storage unit in the row of non-volatile storage units, and obtain the multiplication-accumulation operation result of each bit of the binary weight signal and the characteristic signal.

具体地，可以通过行列译码器、读写电路配合，读取该行非易失性存储单元中每个非易失性存储单元中的数据，通过计数器对该行各非易失性存储单元的某一状态进行计数，即可实现MAC操作，参考图10a和图11a,首先，读取该行非易失性存储单元中每个非易失性存储单元中的数据；然后，对读取的数据中表示1的数据进行计数；最后，将计数结果作为权重信号和特征信号各位的乘积累加运算结果。Specifically, the data in each non-volatile storage unit in the row of non-volatile storage units can be read through the cooperation of the row-column decoder and the read-write circuit, and the data in each non-volatile storage unit in the row can be read through the counter. A certain state of counting can realize the MAC operation, referring to Fig. 10a and Fig. 11a, first, read the data in each non-volatile memory unit in the non-volatile memory unit of this row; Then, to read The data representing 1 in the data is counted; finally, the counting result is used as the result of the multiplication and accumulation operation of each bit of the weight signal and the characteristic signal.

或者，通过读取所有单元总的模拟电流/电压的方式，通过放大器将所有单元总的模拟电流/电压与参考信号的对比，并将对比结果作为MAC操作结果，参考图10b和图11b，首先，读取该行非易失性存储单元中所有非易失性存储单元的总模拟电流/电压；然后，将所述总模拟电流/电压与第一参考信号进行比较；最后，将比较结果作为所述权重信号和特征信号各位的乘积累加运算结果。Or, by reading the total analog current/voltage of all units, compare the total analog current/voltage of all units with the reference signal through the amplifier, and use the comparison result as the result of the MAC operation, refer to Figure 10b and Figure 11b, first , read the total analog current/voltage of all non-volatile memory cells in the row of non-volatile memory cells; then, compare the total analog current/voltage with the first reference signal; finally, use the comparison result as The multiplication and accumulation operation result of each bit of the weight signal and the feature signal.

值得说明的是，假设权重数据本身存储在非易失性存储单元阵列中，那么，首先需要把权重数据B_i读出到缓存模块(因为运算过程中，运算结果B_i+1存储在当前单元中，意味着初始的权重数据B_i会被破坏，因此在执行MAC运算时，需要把B_i先拷贝到缓存模块，执行完之后，再写回)；It is worth noting that, assuming that the weight data itself is stored in the non-volatile memory cell array, then, firstly, the weight data B _i needs to be read out to the cache module (because during the operation, the operation result B _i+1 is stored in the current unit , which means that the initial weight data B _i will be destroyed, so when executing the MAC operation, it is necessary to copy B _i to the cache module first, and then write it back after execution);

另外，如果权重数据存储于片外存储器中，那么首先把权重数据导入非易失性存储单元阵列当中。In addition, if the weight data is stored in an off-chip memory, the weight data is first imported into a non-volatile memory cell array.

通过上述技术方案可以得知，本发明可以采用上述控制方法控制上述基于非易失性存内计算实现神经网络运算，尤其对于卷积神经网络运算，效果更佳明显，不需要在存储器与处理器之间传输数据，降低功耗与时延。From the above technical solution, it can be known that the present invention can use the above control method to control the above-mentioned non-volatile in-memory calculation to realize the neural network operation, especially for the convolutional neural network operation, the effect is better and obvious, and there is no need for memory and processor Transfer data between, reduce power consumption and delay.

在一个可选的实施例中，该基于非易失性存内计算实现二值神经网络运算的控制方法还可以包括：In an optional embodiment, the control method for implementing binary neural network operations based on non-volatile in-memory calculations may also include:

缓存所述乘积累加运算结果，作为下一层的特征信号。The result of the multiply-accumulate operation is cached as a feature signal of the next layer.

具体地，将乘积累加运算结果缓存至缓存模块中，以便后续使用。Specifically, the result of the multiply-accumulate operation is cached in the cache module for subsequent use.

对所述乘积累加运算结果进行后处理，得到二值神经网络运算结果。The result of the multiply-accumulate operation is post-processed to obtain the result of the binary neural network operation.

本领域技术人员可以理解的是，针对上述基于非易失性存内计算实现二值神经网络运算的控制方法，可以采用异或运算指令替代该同或运算指令，因为异或运算与同或运算的结果互为相返，因此，若采用同或运算时统计存储单元中1的个数，那么采用异或运算时统计存储单元中0的个数即可，原理与上述方法相同，在此不再赘述。Those skilled in the art can understand that, for the above-mentioned control method based on non-volatile in-memory calculations to realize binary neural network operations, an XOR operation instruction can be used to replace the XOR operation instruction, because the XOR operation is the same as the XOR operation The results are mutually reciprocal. Therefore, if the number of 1s in the storage unit is counted when the exclusive OR operation is used, then the number of 0s in the storage unit is counted when the XOR operation is used. The principle is the same as the above method. Let me repeat.

本发明中应用了具体实施例对本发明的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本发明的方法及其核心思想；同时，对于本领域的一般技术人员，依据本发明的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本发明的限制。In the present invention, specific examples have been applied to explain the principles and implementation methods of the present invention. The description of the above examples is only used to help understand the method of the present invention and its core idea; meanwhile, for those of ordinary skill in the art, according to this The idea of the invention will have changes in the specific implementation and scope of application. To sum up, the contents of this specification should not be construed as limiting the present invention.

还需要说明的是，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes Other elements not expressly listed, or elements inherent in the process, method, commodity, or apparatus are also included. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.

本说明书中的各个实施例均采用递进的方式描述，各个实施例之间相同相似的部分互相参见即可，每个实施例重点说明的都是与其他实施例的不同之处。尤其，对于系统实施例而言，由于其基本相似于方法实施例，所以描述的比较简单，相关之处参见方法实施例的部分说明即可。Each embodiment in this specification is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, as for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for the related parts, please refer to the part of the description of the method embodiment.

以上所述仅是本发明的较佳实施例而已，并非对本发明做任何形式上的限制，虽然本发明已以较佳实施例揭露如上，然而并非用以限定本发明，任何本领域技术人员，在不脱离本发明技术方案的范围内，当可利用上述揭示的技术内容做出些许更动或修饰为等同变化的等效实施例，但凡是未脱离本发明技术方案的内容，依据本发明的技术实质对以上实施例所作的任何简单修改、等同变化与修饰，均仍属于本发明技术方案的范围内。The above description is only a preferred embodiment of the present invention, and does not limit the present invention in any form. Although the present invention has been disclosed as above with preferred embodiments, it is not intended to limit the present invention. Any person skilled in the art, Within the scope of not departing from the technical solution of the present invention, when the technical content disclosed above can be used to make some changes or be modified into equivalent embodiments with equivalent changes, but all the content that does not depart from the technical solution of the present invention, according to the technical content of the present invention Technical Essence Any simple modifications, equivalent changes and modifications made to the above embodiments still fall within the scope of the technical solution of the present invention.

Claims

1. A computing chip in a non-volatile memory, characterized in that, comprising:

A cache module for caching data;

A non-volatile in-memory computing module, connected to the cache module, for performing operations on the data sent by the cache module;

A post-processing module, connected to the non-volatile in-memory computing module, for post-processing the calculation results of the non-volatile in-memory computing module;

Wherein, the non-volatile in-memory computing module includes: a non-volatile memory cell array, a row-column decoder connected to the non-volatile memory cell array, a reader connected to the non-volatile memory cell array Write the circuit.

2. The non-volatile in-memory computing chip according to claim 1, wherein the cache module comprises: a first cache unit and a second cache unit,

The first cache unit is connected to the front end of the non-volatile in-memory computing module, and is used to receive and cache input data and feature map data;

The second cache unit is connected to the non-volatile in-memory computing module for caching weight data.

3. The non-volatile in-memory computing chip according to claim 1, wherein the row-column decoder comprises: a row decoder and a column decoder, and the non-volatile memory cell array comprises : non-volatile storage units arranged in multiple arrays;

Each row of nonvolatile memory cells is connected to the column decoder through a bit line, and each row of nonvolatile memory cells is connected to the row decoder through a word line. The bit of each row of nonvolatile memory cells Both the source line and the source line are connected to the read-write circuit.

4. The non-volatile in-memory computing chip according to claim 3, wherein the non-volatile storage unit comprises: a non-volatile storage device and a three-terminal switch element connected in series;

One end of the non-volatile memory device is connected to the bit line, the other end is connected to the first end of the three-terminal switch element, the second end of the three-terminal switch element is connected to the word line, and the three-terminal switch The third end of the element is connected to the source line.

5. The non-volatile in-memory computing chip according to claim 1, wherein the row-column decoder comprises: a row decoder and a column decoder, and the non-volatile memory cell array comprises : non-volatile storage units arranged in multiple arrays;

Each row of nonvolatile memory cells is connected to the column decoder through a bit line, and each row of nonvolatile memory cells is connected to the row decoder through a source line. The bit of each row of nonvolatile memory cells Both the source line and the source line are connected to the read-write circuit.

6. The non-volatile in-memory computing chip according to claim 5, wherein the non-volatile storage unit comprises: a non-volatile storage device connected in series and a two-terminal switching element;

One end of the series branch formed by the nonvolatile memory device and the two-terminal switching element is connected to the bit line, and the other end is connected to the source line.

7. The non-volatile in-memory computing chip according to any one of claims 3 or 5, further comprising: an amplifier connected to each bit line for converting The total analog current/voltage is compared with the reference information, and the calculation result of the non-volatile memory calculation module is output.

8. The non-volatile in-memory computing chip according to any one of claims 3 or 5, further comprising: a counter, the counter is connected to the read-write circuit, and the output of the counter is used as a non-volatile The operation result of the volatile in-memory computing module.

9. The non-volatile in-memory computing chip according to any one of claims 4 or 6, wherein the non-volatile storage unit is a resistive storage unit, a phase-change storage unit, or a ferroelectric storage unit , Spin storage unit.

10. A control method based on non-volatile in-memory calculations to realize multiply-accumulate operations, characterized in that it comprises:

storing the first binary operation signal into a row of non-volatile storage units, and storing one bit of the first binary operation signal in each non-volatile storage unit;

The second binary operation signal is loaded to the row of non-volatile storage units, and the corresponding bits of the first binary operation signal and the second binary operation signal are applied to the same non-volatile storage when the multiplication and accumulation operation is performed. unit;

Loading an NOR operation instruction into the row of nonvolatile storage units, so that the row of nonvolatile storage units responds to the NOR operation instruction to execute the first binary operation signal and the second binary operation signal. Exclusive OR operation of the corresponding bit of the binary operation signal, and store the operation result in the corresponding non-volatile storage unit;

Reading and accumulating the data in each non-volatile storage unit in the row of non-volatile storage units to obtain a multiplication-accumulation operation result of each bit of the first binary operation signal and the second binary operation signal.

11. A control method based on non-volatile in-memory calculations to realize binary neural network operations, comprising:

storing at least one binary weight signal in at least one row of non-volatile storage units, and storing one bit of the binary weight signal in each non-volatile storage unit;

Loading the feature signal to the row of non-volatile storage units, and applying the corresponding bit when the binary weight signal and the feature signal perform a multiplication-accumulation operation to the same non-volatile storage unit;

Loading the NOR operation instruction to the non-volatile storage unit of the row, so that the non-volatile storage unit of the row responds to the NOR operation instruction to perform the same operation of the binary weight signal and the corresponding bit of the characteristic signal. OR operation, and store the operation result in the corresponding non-volatile storage unit;

Reading and accumulating the data in each non-volatile storage unit in the row of non-volatile storage units to obtain a multiplication-accumulation operation result of each bit of the binary weight signal and the characteristic signal.

12. The control method for realizing binary neural network operations based on non-volatile in-memory calculations according to claim 11, further comprising:

The result of the multiply-accumulate operation is cached as a feature signal of the next layer.

13. The control method for realizing binary neural network operations based on non-volatile in-memory calculations according to claim 11, further comprising:

The result of the multiply-accumulate operation is post-processed to obtain the result of the binary neural network operation.