CN118536563A - Analog in-memory computing circuit, processing device and electronic device - Google Patents
Analog in-memory computing circuit, processing device and electronic device Download PDFInfo
- Publication number
- CN118536563A CN118536563A CN202410570880.XA CN202410570880A CN118536563A CN 118536563 A CN118536563 A CN 118536563A CN 202410570880 A CN202410570880 A CN 202410570880A CN 118536563 A CN118536563 A CN 118536563A
- Authority
- CN
- China
- Prior art keywords
- analog
- calculation
- signal line
- computing
- signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/491—Computations with decimal numbers radix 12 or 20.
- G06F7/498—Computations with decimal numbers radix 12 or 20. using counter-type accumulators
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/523—Multiplying only
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Neurology (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Analogue/Digital Conversion (AREA)
Abstract
Description
技术领域Technical Field
本发明涉及神经网络技术领域,尤其涉及一种模拟存内计算电路、处理装置及电子设备。The present invention relates to the field of neural network technology, and in particular to an analog in-memory computing circuit, a processing device and an electronic device.
背景技术Background Art
随着摩尔定律逐渐趋近于极限,基于冯·诺依曼架构人工智能(ArtificialIntelligence,AI)芯片的“存储墙”和“功耗墙”问题日益凸显,芯片算力的增长速度变得越来越慢。基于这个背景,存算一体这一新兴的芯片架构正在逐渐崛起为下一代人工智能芯片提供更庞大的算力与更优的能效比。存算一体架构可理解为在存储器中嵌入计算能力,以新的运算架构进行二维和三维矩阵乘法/加法运算,而不是在传统逻辑运算单元或工艺上优化。这样能从本质上消除不必要的数据搬移的延迟和功耗,成百上千倍的提高人工智能计算效率,降低成本,打破“存储墙”和“功耗墙”。As Moore's Law gradually approaches its limit, the "storage wall" and "power consumption wall" problems of artificial intelligence (AI) chips based on the von Neumann architecture are becoming increasingly prominent, and the growth rate of chip computing power is becoming slower and slower. Against this background, the emerging chip architecture of integrated storage and computing is gradually emerging to provide the next generation of artificial intelligence chips with greater computing power and better energy efficiency. The integrated storage and computing architecture can be understood as embedding computing power in the memory, performing two-dimensional and three-dimensional matrix multiplication/addition operations with a new computing architecture, rather than optimizing on traditional logic operation units or processes. This can essentially eliminate unnecessary data movement delays and power consumption, improve artificial intelligence computing efficiency by hundreds or thousands of times, reduce costs, and break the "storage wall" and "power consumption wall".
存内计算(CIM,Computing-in-Memory)架构是一种创新的芯片架构,旨在通过将存储与计算功能融合在同一芯片上,克服“内存墙”问题,基本思想是将数据计算移至存储数据的内存单元,从而实现原位计算,消除带宽限制和数据移动成本。这一技术通过在存储器中嵌入计算能力,采用新的运算架构来实现矩阵向量的乘累加运算。存算一体技术的优点在于能够减少数据搬运导致的额外能耗,提高数据的并行处理效率。这项技术不仅适用于AI计算,还可以应用于感存算一体芯片和类脑芯片,代表着未来主流的大数据计算芯片架构。The Computing-in-Memory (CIM) architecture is an innovative chip architecture that aims to overcome the "memory wall" problem by integrating storage and computing functions on the same chip. The basic idea is to move data calculations to the memory units that store data, thereby achieving in-situ computing and eliminating bandwidth limitations and data movement costs. This technology embeds computing power in the memory and adopts a new computing architecture to implement matrix-vector multiplication and accumulation operations. The advantage of the integrated storage and computing technology is that it can reduce the additional energy consumption caused by data movement and improve the efficiency of parallel data processing. This technology is not only suitable for AI computing, but can also be applied to integrated sensing, storage and computing chips and brain-like chips, representing the mainstream big data computing chip architecture in the future.
存算一体芯片架构中关于计算的实现,现有的传统实现架构主要有两种:数字计算架构和模拟计算架构。数字计算架构是指将存内计算单元矩阵中存储数据和输入向量数据的乘积通过多级加法树级联求得矩阵向量乘积的累加和。而模拟计算架构是指将存内计算单元矩阵中存储数据和输入向量数据的乘积通过电容累加再由模数转换器(ADC)转换成数字信号的累加和。其中,模拟存内计算电路物理实现相比数字存内计算电路物理实现具有更小的芯片面积和更低的功耗。Regarding the implementation of computing in the integrated storage and computing chip architecture, there are two main traditional implementation architectures: digital computing architecture and analog computing architecture. The digital computing architecture refers to the cumulative sum of the matrix-vector product obtained by cascading a multi-level addition tree to obtain the product of the data stored in the in-memory computing unit matrix and the input vector data. The analog computing architecture refers to the cumulative sum of the product of the data stored in the in-memory computing unit matrix and the input vector data, which is then converted into a digital signal by an analog-to-digital converter (ADC) through capacitor accumulation. Among them, the physical implementation of the analog in-memory computing circuit has a smaller chip area and lower power consumption than the physical implementation of the digital in-memory computing circuit.
但模拟存内计算通常容易受周围电压波动影响,造成电荷累加产生误差,从而导致计算错误。在具体计算过程中,电容周边的存算单元矩阵和模数转换器的电源电压为VDD,接地电压为VSS,VSS理想电压为0。每一行电容下级板S端的电压变化幅度为一个LSB(Least Significant Bit,最低有效位),LSB等于电压VDD除以行数X,即LSB=VDD/X,从公式中可以看出,随着行数X的增加,LSB的电压也会越小,如VDD为800mV,当行数为32时,LSB电压为25mV,而行数增加到64时,LSB电压减小为12.5mV,LSB的电压越小对模数转换器的分辨率要求越高,模数转换器转换出错的概率越高,尤其是当电源电压受外部影响发生波动时,如果波动电压△V超过了LSB的电压,就会出现模数转换器的转换错误,进而计算结果错误。所以,现有存内计算方案随着计算行数的增加,模拟电荷计算的可靠性会下降,但如果降低计算行数,又会出现计算的吞吐量降低,影响计算性能,亟需一种高可靠性高吞吐量的模拟存内计算电路。However, the calculation in analog memory is usually easily affected by the surrounding voltage fluctuation, which causes errors in charge accumulation and thus calculation errors. In the specific calculation process, the power supply voltage of the storage unit matrix and the analog-to-digital converter around the capacitor is VDD, and the ground voltage is VSS. The ideal voltage of VSS is 0. The voltage change amplitude of the S end of each row of capacitors is one LSB (Least Significant Bit). LSB is equal to the voltage VDD divided by the number of rows X, that is, LSB = VDD/X. It can be seen from the formula that as the number of rows X increases, the voltage of LSB will also be smaller. For example, when VDD is 800mV, when the number of rows is 32, the LSB voltage is 25mV, and when the number of rows increases to 64, the LSB voltage is reduced to 12.5mV. The smaller the LSB voltage, the higher the resolution requirement for the analog-to-digital converter, and the higher the probability of the analog-to-digital converter conversion error. Especially when the power supply voltage fluctuates due to external influences, if the fluctuation voltage △V exceeds the voltage of LSB, the conversion error of the analog-to-digital converter will occur, and then the calculation result will be wrong. Therefore, the reliability of analog charge calculation in existing in-memory computing schemes will decrease as the number of calculation rows increases. However, if the number of calculation rows is reduced, the calculation throughput will decrease, affecting the computing performance. There is an urgent need for an analog in-memory computing circuit with high reliability and high throughput.
发明内容Summary of the invention
本发明的目的在于提供一种模拟存内计算电路、处理装置及电子设备,以至少解决上述技术问题。本发明提供的诸多技术方案中的优选技术方案所能产生的诸多技术效果详见下文阐述。The object of the present invention is to provide an analog in-memory computing circuit, a processing device and an electronic device to at least solve the above technical problems. The preferred technical solutions among the many technical solutions provided by the present invention can produce many technical effects as described below.
为实现上述目的,本发明提供了以下技术方案:To achieve the above object, the present invention provides the following technical solutions:
本发明提供的一种模拟存内计算电路,包括M列存算模块,每列存算模块包括:用于将N个输入数据进行计算的N行存算子模块、选择器、以及用于将选择器输出的模拟信号转换为数字信号的模数转换器;每行的存算子模块包括:用于存储及将存储的数据与输入数据进行乘法运算的存算单元、及用于感应所述存算单元输出的乘积电压变化的求和电容,每行的所述求和电容的一端与该行的存算单元的输出端连接;所述N行存算子模块至少包括:含有若干行存算子模块的第一组计算子模块、及含有若干行存算子模块的第二组计算子模块;所述第一组计算子模块中每行的存算单元的输入端接入输入数据,每行的求和电容的另一端均连接第一信号线,以将所述求和电容上的信号累加,所述第一信号线通过所述选择器连接所述模数转换器;所述第二组计算子模块中,每行的存算单元的输入端通过延时单元接入输入数据,以使得输入数据输入第二组计算子模块中的存算单元的时间晚于输入数据输入第一组计算子模块中的存算单元的时间,第二组计算子模块中每行的求和电容的另一端连接第二信号线,以将所述求和电容上的信号累积,所述第二信号线通过所述选择器连接所述模数转换器;所述选择器用于将所述第一信号线、第二信号线中的一者的信号接入所述模数转换器,以基于所述模数转换器输出的数字信号,得到该列存算模块的计算结果。The present invention provides an analog in-memory computing circuit, comprising M columns of memory computing modules, each column of the memory computing module comprising: N rows of memory computing submodules for computing N input data, a selector, and an analog-to-digital converter for converting an analog signal output by the selector into a digital signal; each row of the memory computing submodule comprises: a memory computing unit for storing and multiplying the stored data with the input data, and a summing capacitor for sensing the change of the product voltage output by the memory computing unit, one end of the summing capacitor of each row is connected to the output end of the memory computing unit of the row; the N rows of memory computing submodules at least comprise: a first group of computing submodules including a plurality of rows of memory computing submodules, and a second group of computing submodules including a plurality of rows of memory computing submodules; the input end of each row of the memory computing unit in the first group of computing submodules is connected to the input data, and the other end of the summing capacitor of each row is connected to the output end of the memory computing unit of the row. One end is connected to the first signal line to accumulate the signal on the summing capacitor, and the first signal line is connected to the analog-to-digital converter through the selector; in the second group of calculation sub-modules, the input end of the storage and calculation unit of each row is connected to the input data through the delay unit, so that the time when the input data is input into the storage and calculation unit in the second group of calculation sub-modules is later than the time when the input data is input into the storage and calculation unit in the first group of calculation sub-modules, and the other end of the summing capacitor of each row in the second group of calculation sub-modules is connected to the second signal line to accumulate the signal on the summing capacitor, and the second signal line is connected to the analog-to-digital converter through the selector; the selector is used to connect the signal of one of the first signal line and the second signal line to the analog-to-digital converter, so as to obtain the calculation result of the storage and calculation module of this column based on the digital signal output by the analog-to-digital converter.
优选的,还包括与所述延时单元连接的延时数字电源,所述延时数字电源用于给所述延时单元供电。Preferably, it also includes a delayed digital power supply connected to the delay unit, and the delayed digital power supply is used to supply power to the delay unit.
优选的,所述第一组计算子模块由位于偶数行的存算子模块形成,所述第二组计算子模块由位于奇数行的存算子模块形成。Preferably, the first group of computing submodules is formed by storage-operation submodules located in even-numbered rows, and the second group of computing submodules is formed by storage-operation submodules located in odd-numbered rows.
优选的,所述第一组计算子模块的存内计算过程包括:第一乘法计算及电荷采样过程、第一信号模数转换过程;所述第二组计算子模块的存内计算过程包括:延时过程、第二乘法计算及电荷采样过程、第二信号模数转换过程;所述第一组计算子模块的第一信号模数转换过程与所述第二组计算子模块的延时过程、第二乘法计算及电荷采样过程并行操作,所述第二组计算子模块的第二信号模数转换过程与所述第一组计算子模块的第一乘法计算及电荷采样过程并行操作。Preferably, the in-memory calculation process of the first group of calculation submodules includes: a first multiplication calculation and charge sampling process, and a first signal analog-to-digital conversion process; the in-memory calculation process of the second group of calculation submodules includes: a delay process, a second multiplication calculation and charge sampling process, and a second signal analog-to-digital conversion process; the first signal analog-to-digital conversion process of the first group of calculation submodules operates in parallel with the delay process, the second multiplication calculation and charge sampling process of the second group of calculation submodules, and the second signal analog-to-digital conversion process of the second group of calculation submodules operates in parallel with the first multiplication calculation and charge sampling process of the first group of calculation submodules.
优选的,所述延时单元的延时时间基于所述第一信号线上的模拟信号进入所述模数转换器进行模数转换的时间设置。Preferably, the delay time of the delay unit is set based on the time it takes for the analog signal on the first signal line to enter the analog-to-digital converter for analog-to-digital conversion.
优选的,所述延时单元的延时时间与所述第一信号线上的模拟信号进入所述模数转换器进行模数转换的时间相等。Preferably, the delay time of the delay unit is equal to the time it takes for the analog signal on the first signal line to enter the analog-to-digital converter for analog-to-digital conversion.
优选的,所述求和电容上的信号累加为将所述求和电容上的电荷累加或者为将所述求和电容上的电流累加。Preferably, the signal accumulation on the summing capacitor is accumulation of charges on the summing capacitor or accumulation of current on the summing capacitor.
优选的,当所述第一组计算子模块进行乘法计算及电荷采样后,将得到的乘积电荷累加到所述第一信号线,所述选择器与所述第一信号线导通且与所述第二信号线截止,所述模数转换器开始将所述第一信号线上累加的乘积电荷转换为第一数字信号。Preferably, after the first group of computing submodules performs multiplication calculation and charge sampling, the obtained product charge is accumulated to the first signal line, the selector is connected to the first signal line and is cut off from the second signal line, and the analog-to-digital converter starts to convert the product charge accumulated on the first signal line into a first digital signal.
优选的,在所述第一信号线上累加的乘积电荷转换为第一数字信号过程中,延迟后的所述第二组计算子模块将计算得到的乘积电荷累加到所述第二信号线上。Preferably, in the process of converting the product charge accumulated on the first signal line into the first digital signal, the delayed second group of calculation submodules accumulates the calculated product charge on the second signal line.
优选的,在所述第一信号线上累加的乘积电荷完成转换为第一数字信号后,所述选择器选择与所述第二信号线导通且与所述第一信号线截止,所述第一信号线充电至VDD电位或复位至VSS电位,所述模数转换器开始将所述第二信号线上累加的乘积电荷转换为第二数字信号。Preferably, after the product charge accumulated on the first signal line is converted into the first digital signal, the selector selects to be turned on with the second signal line and turned off with the first signal line, the first signal line is charged to the VDD potential or reset to the VSS potential, and the analog-to-digital converter starts to convert the product charge accumulated on the second signal line into a second digital signal.
优选的,在所述第二信号线上累加的乘积电荷转换为第二数字信号过程中,下一个计算周期的所述第一组计算子模块对应的输入数据开始接入所述存算单元,并进行乘法计算及电荷采样。Preferably, during the process of converting the product charge accumulated on the second signal line into the second digital signal, the input data corresponding to the first group of calculation submodules in the next calculation cycle begins to be connected to the storage and calculation unit, and multiplication calculation and charge sampling are performed.
优选的,在所述模数转换器将所述第二信号线上累加的乘积电荷转换为第二数字信号后,所述选择器与所述第一信号线导通且与所述第二信号线截止,所述第一信号线充电至VDD电位或复位至VSS电位,所述模数转换器再次将所述第一信号线上累加的乘积电荷转换为第一数字信号。Preferably, after the analog-to-digital converter converts the product charge accumulated on the second signal line into a second digital signal, the selector is connected to the first signal line and is cut off from the second signal line, the first signal line is charged to the VDD potential or reset to the VSS potential, and the analog-to-digital converter again converts the product charge accumulated on the first signal line into a first digital signal.
一种处理装置,包括以上任一项所述的模拟存内计算电路。A processing device comprises the analog in-memory computing circuit described in any one of the above items.
一种电子设备,包括以上任一项所述的模拟存内计算电路。An electronic device comprises the analog in-memory computing circuit described in any one of the above items.
实施本发明上述技术方案中的一个技术方案,具有如下优点或有益效果:Implementing one of the above technical solutions of the present invention has the following advantages or beneficial effects:
本发明将同一列的存算模块至少分为第一组计算子模块、第二组计算子模块,并通过延时单元将第一组计算子模块、第二组计算子模块分时进行存内计算操作。由于每一行电容极板的电压变化幅度为一个最低有效位LSB,而存内运算的LSB等于信号线上的电压VDD除以行数X,即LSB=VDD/X,当同一列的存算模块进行分组后,每次参与存内计算的行数X随存算模块的分组而减少,LSB自然增大。随着LSB的提高,对模数转换器的分辨率要求降低,模数转换器转换出错的概率也随之降低,自然提高了模数转换器的可靠性。同时,模数转换器通过对第一组计算子模块、第二组计算子模块等的信号模数转换进行分时选择,由于同一时间需要转换的数据减少,模数转换器比特数减少,从而使模数转换的周期更短、功耗更低,同时,由于同一时间转换数据减少,模拟电源的波动△V也随之降低,从而进一步提高模数转换器的转换可靠性。计算吞吐量方面,由于第一组计算子模块、第二组计算子模块等可以并行计算,并会在分别计算后再相加,所以,同一计算周期的计算输入量不受影响,计算吞吐量也不受影响,即本发明提供的模拟存内计算电路可以支持输入量大的情况,进而实现高吞吐量。The present invention divides the storage and calculation modules of the same column into at least a first group of calculation submodules and a second group of calculation submodules, and performs the storage calculation operation in time-sharing on the first group of calculation submodules and the second group of calculation submodules through a delay unit. Since the voltage variation amplitude of each row of capacitor plates is a least significant bit LSB, and the LSB of the storage and calculation operation is equal to the voltage VDD on the signal line divided by the number of rows X, that is, LSB=VDD/X, when the storage and calculation modules of the same column are grouped, the number of rows X participating in the storage and calculation calculation each time decreases with the grouping of the storage and calculation modules, and the LSB naturally increases. With the increase of LSB, the resolution requirements of the analog-to-digital converter are reduced, and the probability of the analog-to-digital converter conversion error is also reduced, which naturally improves the reliability of the analog-to-digital converter. At the same time, the analog-to-digital converter selects the analog-to-digital conversion of the signals of the first group of calculation submodules, the second group of calculation submodules, etc. in time-sharing. Since the data to be converted at the same time is reduced, the number of bits of the analog-to-digital converter is reduced, thereby making the cycle of the analog-to-digital conversion shorter and the power consumption lower. At the same time, since the conversion data is reduced at the same time, the fluctuation △V of the analog power supply is also reduced, thereby further improving the conversion reliability of the analog-to-digital converter. In terms of computing throughput, since the first group of computing sub-modules, the second group of computing sub-modules, etc. can be calculated in parallel and added together after being calculated separately, the computing input amount of the same computing cycle is not affected, and the computing throughput is also not affected. That is, the analog in-memory computing circuit provided by the present invention can support situations with large input amounts, thereby achieving high throughput.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单的介绍,显而易见,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图,附图中:In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following briefly introduces the drawings required for use in the description of the embodiments. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative work. In the drawings:
图1是本发明实施例一模拟存内计算电路的电路原理图;FIG1 is a circuit diagram of an analog in-memory computing circuit according to an embodiment of the present invention;
图2是本发明实施例一模拟存内计算电路的示意图一;FIG2 is a schematic diagram of an analog in-memory computing circuit according to an embodiment of the present invention;
图3是本发明实施例一模拟存内计算电路的示意图二;FIG3 is a second schematic diagram of an analog in-memory computing circuit according to an embodiment of the present invention;
图4是现有技术与本发明实施例一模拟存内计算电路的计算与模数转换的时序对比示意图;4 is a schematic diagram showing a comparison of the timing of calculation and analog-to-digital conversion of the analog in-memory calculation circuit in the prior art and in the first embodiment of the present invention;
图5是本发明实施例二中的神经网络加速器的示意图。FIG5 is a schematic diagram of a neural network accelerator in Embodiment 2 of the present invention.
具体实施方式DETAILED DESCRIPTION
为了使本发明的目的、技术方案及优点更加清楚明白,下文将要描述的各种示例性实施例将要参考相应的附图,这些附图构成了示例性实施例的一部分,其中描述了实现本发明可能采用的各种示例性实施例。除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。应明白,它们仅是与如所附权利要求书中所详述的、本发明公开的一些方面相一致的流程、方法和装置等的例子,还可使用其他的实施例,或者对本文列举的实施例进行结构和功能上的修改,而不会脱离本发明的范围和实质。In order to make the purpose, technical solutions and advantages of the present invention clearer, the various exemplary embodiments to be described below will refer to the corresponding drawings, which constitute a part of the exemplary embodiments, wherein various exemplary embodiments that may be used to implement the present invention are described. Unless otherwise indicated, the same numbers in different drawings represent the same or similar elements. The implementation methods described in the following exemplary embodiments do not represent all implementation methods consistent with the present disclosure. It should be understood that they are only examples of processes, methods, devices, etc. that are consistent with some aspects of the present disclosure as detailed in the attached claims, and other embodiments may also be used, or the embodiments listed herein may be modified in structure and function without departing from the scope and essence of the present invention.
在本发明的描述中,需要理解的是,术语“中心”、“纵向”、“横向”等指示的是基于附图所示的方位或位置关系,仅是为了便于描述本发明和简化描述,而不是指示或暗示所指的元件必须具有的特定的方位、以特定的方位构造和操作。术语“第一”、“第二”等仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。术语“多个”的含义是两个或两个以上。术语“相连”、“连接”应做广义理解,例如,可以是固定连接、可拆卸连接、一体连接、机械连接、电连接、通信连接、直接相连、通过中间媒介间接相连,可以是两个元件内部的连通或两个元件的相互作用关系。术语“和/或”包括一个或多个相关的所列项目的任意的和所有的组合。对于本领域的普通技术人员而言,可以根据具体情况理解上述术语在本发明中的具体含义。In the description of the present invention, it should be understood that the terms "center", "longitudinal", "lateral", etc. indicate the orientation or position relationship based on the drawings, which is only for the convenience of describing the present invention and simplifying the description, rather than indicating or implying that the elements referred to must have a specific orientation, be constructed and operated in a specific orientation. The terms "first", "second", etc. are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. The term "multiple" means two or more. The terms "connected" and "connected" should be understood in a broad sense, for example, it can be a fixed connection, a detachable connection, an integral connection, a mechanical connection, an electrical connection, a communication connection, a direct connection, an indirect connection through an intermediate medium, and can be the internal connection of two elements or the interaction relationship between two elements. The term "and/or" includes any and all combinations of one or more related listed items. For those of ordinary skill in the art, the specific meanings of the above terms in the present invention can be understood according to the specific circumstances.
为了说明本发明所述的技术方案,下面通过具体实施例来进行说明,仅示出了与本发明实施例相关的部分。In order to illustrate the technical solution of the present invention, a specific embodiment is used below for description, and only the parts related to the embodiment of the present invention are shown.
实施例一:Embodiment 1:
如图1-图3所示,本发明提供了一种模拟存内计算电路,包括M列存算模块。每列存算模块包括:用于将N个输入数据进行计算的N行存算子模块、选择器(优选为二选一选择器,图1中的MUX)、以及用于将选择器输出的模拟信号转换为数字信号的模数转换器(图1中的ADC),N行存算子模块、模数转换器均连接模拟电源域AVDD和AVSS,以便于进行存内计算。每行的存算子模块包括:用于存储及将存储的数据与输入数据进行乘法运算的存算单元、及用于感应存算单元输出的乘积电压变化的求和电容,每行的求和电容的一端(图2中的M端)与该行的存算单元的输出端连接。N行存算子模块至少包括:含有若干行存算子模块的第一组计算子模块、及含有若干行存算子模块的第二组计算子模块;第一组计算子模块中每行的存算单元的输入端接入输入数据,每行的求和电容的另一端(图2中的S端)均连接第一信号线,以将求和电容上的信号累加(如电压信号、电流信号等),第一信号线通过选择器连接模数转换器。第二组计算子模块中,每行的存算单元的输入端通过延时单元接入输入数据,以使得输入数据输入第二组计算子模块中的存算单元的时间晚于输入数据输入第一组计算子模块中的存算单元的时间,即第一组计算子模块、第二组计算子模块的输入数据进行存算单元时间不同,两者分时进行计算,第二组计算子模块中每行的求和电容的另一端连接第二信号线,以将求和电容上的信号累积,第二信号线通过选择器连接模数转换器。选择器用于将第一信号线(图1中的SL)、第二信号线(图1中的SR)中的一者的信号接入模数转换器,即第一信号线、第二信号线不能同时接入,以基于模数转换器输出的数字信号,得到该列存算模块的计算结果,具体而言,第一组计算子模块、第二组计算子模块会在分别计算后再相加得到存内计算的最终计算结果,从而同一计算周期的计算输入量不变,计算吞吐量不变。由于模拟存内计算的电荷域计算的可靠性和LSB大小成正比,与相关电源波动△V成反比,计算吞吐量与输入的数据数量成正比。本发明将同一列的存算模块至少分为第一组计算子模块、第二组计算子模块,并通过延时单元将第一组计算子模块、第二组计算子模块分时进行存内计算操作。由于每一行电容极板的电压变化幅度为一个最低有效位LSB,而存内运算的LSB等于信号线上的电压VDD除以行数X,即LSB=VDD/X,当同一列的存算模块进行分组后,每次参与存内计算的行数X随存算模块的分组而减少,LSB自然增大。如同一列的存算模块只分为上述两组计算子模块时,每次参与存内计算的行数为X/2,则计算LSB等于AVDD除以X除以2,即LSB=AVDD/X/2=2*AVDD/X,LSB变为现有技术方案的2倍,随着LSB的提高,对模数转换器的分辨率要求降低,模数转换器转换出错的概率也随之降低,自然提高了模数转换器的可靠性,此处即提高了2倍模数转换器的转换可靠性。同时,模数转换器通过对第一组计算子模块、第二组计算子模块等的信号模数转换进行分时选择,由于同一时间需要转换的数据减少,模数转换器比特数减少,从而使模数转换的周期更短、功耗更低,同时,由于同一时间转换数据减少,模拟电源的波动△V也随之降低,从而进一步提高模数转换器的转换可靠性。具体而言,存算模块只分为第一组计算子模块、第二组计算子模块时,由于同一时间需要转换的数据减半,模数转换器比特数减一,从而使模数转换的周期更短、功耗更低,同时,由于同一时间转换数据减半,模拟电源的波动△V也近似减半,相当于模数转换器的转换可靠性再提高2倍。计算吞吐量方面,由于第一组计算子模块、第二组计算子模块等可以并行计算,并会在分别计算后再相加,所以,同一计算周期的计算输入量不变,计算吞吐量不变。As shown in Figures 1 to 3, the present invention provides an analog in-memory calculation circuit, including M columns of storage and calculation modules. Each column of the storage and calculation module includes: N rows of storage and calculation submodules for calculating N input data, a selector (preferably a two-choice selector, MUX in Figure 1), and an analog-to-digital converter (ADC in Figure 1) for converting the analog signal output by the selector into a digital signal, and the N rows of storage and calculation submodules and the analog-to-digital converter are connected to the analog power domains AVDD and AVSS to facilitate in-memory calculation. The storage and calculation submodule of each row includes: a storage and calculation unit for storing and multiplying the stored data with the input data, and a summing capacitor for sensing the change in the product voltage output by the storage and calculation unit, and one end of the summing capacitor of each row (the M end in Figure 2) is connected to the output end of the storage and calculation unit of the row. The N-row storage and calculation submodules at least include: a first group of calculation submodules containing several rows of storage and calculation submodules, and a second group of calculation submodules containing several rows of storage and calculation submodules; the input end of the storage and calculation unit of each row in the first group of calculation submodules is connected to the input data, and the other end of the summing capacitor of each row (the S end in FIG2) is connected to the first signal line to accumulate the signal on the summing capacitor (such as a voltage signal, a current signal, etc.), and the first signal line is connected to the analog-to-digital converter through a selector. In the second group of calculation submodules, the input end of the storage and calculation unit of each row is connected to the input data through a delay unit, so that the time when the input data is input into the storage and calculation unit in the second group of calculation submodules is later than the time when the input data is input into the storage and calculation unit in the first group of calculation submodules, that is, the input data of the first group of calculation submodules and the second group of calculation submodules are different in the storage and calculation unit time, and the two are calculated in time sharing, and the other end of the summing capacitor of each row in the second group of calculation submodules is connected to the second signal line to accumulate the signal on the summing capacitor, and the second signal line is connected to the analog-to-digital converter through a selector. The selector is used to connect the signal of one of the first signal line (SL in FIG. 1) and the second signal line (SR in FIG. 1) to the analog-to-digital converter, that is, the first signal line and the second signal line cannot be connected at the same time, so as to obtain the calculation result of the storage and calculation module of the column based on the digital signal output by the analog-to-digital converter. Specifically, the first group of calculation submodules and the second group of calculation submodules will be added after calculation respectively to obtain the final calculation result of the in-memory calculation, so that the calculation input amount and the calculation throughput of the same calculation cycle remain unchanged. Since the reliability of the charge domain calculation of the analog in-memory calculation is proportional to the LSB size and inversely proportional to the related power supply fluctuation △V, the calculation throughput is proportional to the amount of input data. The present invention divides the storage and calculation modules of the same column into at least a first group of calculation submodules and a second group of calculation submodules, and uses a delay unit to divide the first group of calculation submodules and the second group of calculation submodules into time-sharing in-memory calculation operations. Since the voltage variation amplitude of each row of capacitor plates is a least significant bit LSB, and the LSB of the in-memory operation is equal to the voltage VDD on the signal line divided by the number of rows X, that is, LSB = VDD/X, when the storage and calculation modules in the same column are grouped, the number of rows X participating in the in-memory calculation each time decreases with the grouping of the storage and calculation modules, and the LSB naturally increases. If the storage and calculation modules in the same column are divided into only the above two groups of calculation submodules, the number of rows participating in the in-memory calculation each time is X/2, then the calculated LSB is equal to AVDD divided by X divided by 2, that is, LSB = AVDD/X/2 = 2*AVDD/X, and the LSB becomes twice that of the prior art solution. With the increase of LSB, the resolution requirements for the analog-to-digital converter are reduced, and the probability of the analog-to-digital converter conversion error is also reduced, which naturally improves the reliability of the analog-to-digital converter, and here the conversion reliability of the analog-to-digital converter is improved by 2 times. At the same time, the analog-to-digital converter selects the analog-to-digital conversion of the signals of the first group of computing submodules, the second group of computing submodules, etc. in a time-sharing manner. Since the data that needs to be converted at the same time is reduced, the number of bits of the analog-to-digital converter is reduced, thereby making the cycle of the analog-to-digital conversion shorter and the power consumption lower. At the same time, since the conversion data is reduced at the same time, the fluctuation △V of the analog power supply is also reduced, thereby further improving the conversion reliability of the analog-to-digital converter. Specifically, when the storage and calculation module is divided into only the first group of computing submodules and the second group of computing submodules, since the data that needs to be converted at the same time is halved, the number of bits of the analog-to-digital converter is reduced by one, thereby making the cycle of the analog-to-digital conversion shorter and the power consumption lower. At the same time, since the conversion data is halved at the same time, the fluctuation △V of the analog power supply is also approximately halved, which is equivalent to increasing the conversion reliability of the analog-to-digital converter by 2 times. In terms of calculation throughput, since the first group of computing submodules, the second group of computing submodules, etc. can be calculated in parallel and will be added after calculation, the calculation input amount of the same calculation cycle remains unchanged, and the calculation throughput remains unchanged.
作为可选的实施方式,如图1所示,还包括与延时单元连接的延时数字电源,延时数字电源(图1中的DVDD和DVSS)用于给延时单元供电。通过延时数字电源给延时单元供电便于使延时单元导通或断开,延时单元导通或断开就可以使第二组计算子模块中每行存算子模块中的存算单元接入输入数据的时间与第一计算子模块中的存算单元接入输入数据的时间相互错开,基于延时数字电源给延时单元的供电,第一组计算子模块、第二组计算子模块的计算相互错开交替进行,从而可实现两种计算的并行操作。As an optional implementation, as shown in FIG1 , a delayed digital power supply connected to the delay unit is also included, and the delayed digital power supply (DVDD and DVSS in FIG1 ) is used to supply power to the delay unit. Supplying power to the delay unit through the delayed digital power supply facilitates turning the delay unit on or off, and when the delay unit is turned on or off, the time when the storage and calculation units in each row of the storage and calculation submodules in the second group of calculation submodules access the input data is staggered with the time when the storage and calculation units in the first calculation submodule access the input data. Based on the power supply of the delayed digital power supply to the delay unit, the calculations of the first group of calculation submodules and the second group of calculation submodules are staggered and performed alternately, thereby realizing the parallel operation of the two calculations.
作为可选的实施方式,如图1所示,第一组计算子模块由位于偶数行的存算子模块形成,数量至少为一个,即图1中IN[0]、IN[2]、……IN[x-2]行,对应为偶数行,第二组计算子模块由位于奇数行的存算子模块形成,数量也至少为一个,即图1中的IN[1]、IN[3]、……IN[x-1]行,对应为奇数行,第一组计算子模块、第二组计算子模块的存算子模块总行数相同,即都为X/2行,从而第一组计算子模块、第二组计算子模块的计算耗时基本相等,便于将第一组计算子模块、第二组计算子模块进行并行计算,从而提高运算效率。As an optional implementation, as shown in Figure 1, the first group of computing sub-modules is formed by storage operator modules located in even rows, and the number is at least one, that is, IN[0], IN[2], ... IN[x-2] rows in Figure 1 correspond to even rows; the second group of computing sub-modules is formed by storage operator modules located in odd rows, and the number is also at least one, that is, IN[1], IN[3], ... IN[x-1] rows in Figure 1 correspond to odd rows; the total number of storage operator modules of the first group of computing sub-modules and the second group of computing sub-modules is the same, that is, both are X/2 rows, so that the computing time of the first group of computing sub-modules and the second group of computing sub-modules is basically equal, which facilitates parallel computing of the first group of computing sub-modules and the second group of computing sub-modules, thereby improving computing efficiency.
作为可选的实施方式,第一组计算子模块的存内计算过程包括:第一乘法计算及电荷采样过程(第一组计算子模块进行乘法计算及电荷采样过程)、第一信号模数转换过程(第一信号线的模拟信号进行模数转换过程);第二组计算子模块的存内计算过程包括:延时过程、第二乘法计算及电荷采样过程(第二组计算子模块进行乘法计算及电荷采样过程)、第二信号模数转换过程(第二信号线的模拟信号进行模数转换过程)。乘法计算及电荷采样过程具体包括:输入数据与对应行存算单元内的存储数据进行乘法计算,将乘积反映到对应求和电容的第一极板;求和电容的第一极板电压达到后,开始对应行电荷的采样,并将采样电荷反馈至求和电容的第二极板;求和电容的第二极板将乘积电荷累加到第一信号线或第二信号线。如图4所示,第一组计算子模块的第一信号模数转换过程与第二组计算子模块的延时过程、第二乘法计算及电荷采样过程并行操作(即计算时钟1、计算时钟3),第二组计算子模块的第二信号模数转换过程与第一组计算子模块的第一乘法计算及电荷采样过程并行操作(即计算时钟2、计算时钟4),并行操作即两者同时独立执行,而不存在先后顺序,从而节约整个存内计算时间。延时单元的延时时间基于第一信号线上的模拟信号进入模数转换器进行模数转换的时间设置,如等于、约等于(延时时间稍长于或稍短于第一信号线上的模拟信号进入模数转换器进行模数转换的时间,具体根据实际使用情况进行设置)第一信号线上的模拟信号进入模数转换器进行模数转换的时间。通过并行执行,从而确保了分组计算的计算周期基本不变,计算吞吐量在计算可靠性大幅提高后保持高吞吐量不变。高度并行的操作,大大提高了计算效率,弥补了因分别进行奇偶数行组计算操作带来的延时,确保了计算时钟不变,计算吞吐量不变。As an optional implementation, the in-memory calculation process of the first group of calculation submodules includes: a first multiplication calculation and charge sampling process (the first group of calculation submodules performs multiplication calculation and charge sampling process), a first signal analog-to-digital conversion process (the analog signal of the first signal line performs analog-to-digital conversion process); the in-memory calculation process of the second group of calculation submodules includes: a delay process, a second multiplication calculation and charge sampling process (the second group of calculation submodules performs multiplication calculation and charge sampling process), and a second signal analog-to-digital conversion process (the analog signal of the second signal line performs analog-to-digital conversion process). The multiplication calculation and charge sampling process specifically includes: multiplying the input data with the stored data in the corresponding row storage unit, and reflecting the product to the first plate of the corresponding summing capacitor; after the voltage of the first plate of the summing capacitor is reached, the sampling of the corresponding row charge is started, and the sampled charge is fed back to the second plate of the summing capacitor; the second plate of the summing capacitor accumulates the product charge to the first signal line or the second signal line. As shown in FIG4 , the first signal analog-to-digital conversion process of the first group of computing submodules and the delay process, second multiplication calculation and charge sampling process of the second group of computing submodules are operated in parallel (i.e., calculation clock 1 and calculation clock 3), and the second signal analog-to-digital conversion process of the second group of computing submodules and the first multiplication calculation and charge sampling process of the first group of computing submodules are operated in parallel (i.e., calculation clock 2 and calculation clock 4). Parallel operation means that both are executed independently at the same time without any sequence, thereby saving the entire in-memory computing time. The delay time of the delay unit is set based on the time when the analog signal on the first signal line enters the analog-to-digital converter for analog-to-digital conversion, such as being equal to or approximately equal to (the delay time is slightly longer or shorter than the time when the analog signal on the first signal line enters the analog-to-digital converter for analog-to-digital conversion, which is set according to the actual use situation) the time when the analog signal on the first signal line enters the analog-to-digital converter for analog-to-digital conversion. Through parallel execution, the computing cycle of the group computing is basically unchanged, and the computing throughput maintains a high throughput after the computing reliability is greatly improved. Highly parallel operations greatly improve computing efficiency, make up for the delay caused by performing separate calculations on odd and even row groups, and ensure that the computing clock and computing throughput remain unchanged.
作为可选的实施方式,求和电容上的信号累加为将求和电容上的电荷累加或者为将求和电容上的电流累加。电荷累加、电量累加均能通过第一信号线、第二信号线进行,当然还可以选择电压累加,从而提高了本发明的适应性。As an optional implementation, the signal accumulation on the summing capacitor is to accumulate the charge on the summing capacitor or to accumulate the current on the summing capacitor. Charge accumulation and current accumulation can be performed through the first signal line and the second signal line. Of course, voltage accumulation can also be selected, thereby improving the adaptability of the present invention.
作为可选的实施方式,存算模块中,存算子模块进行计算的电源、求和电容进行采样的电源、以及第一信号线、第二信号线进行电荷累积的电源以外的电源均为数字电源。由于数字电源的翻转波动不会影响存算子模块和模数转换器的模拟电源域的波动,从而可以降低电荷计算相关的模拟电源的电压波动,进一步提高本发明进行电荷计算的可靠性。As an optional implementation, in the storage and calculation module, the power supply for the storage and calculation submodule to perform calculations, the power supply for the summing capacitor to perform sampling, and the power supply for the first signal line and the second signal line to perform charge accumulation are all digital power supplies. Since the flip fluctuation of the digital power supply will not affect the fluctuation of the analog power supply domain of the storage and calculation submodule and the analog-to-digital converter, the voltage fluctuation of the analog power supply related to the charge calculation can be reduced, further improving the reliability of the charge calculation of the present invention.
作为可选的实施方式,如图1所示,还包括充电复位电路;充电复位电路与第一信号线、第二信号线连接,用于将第一信号线、第二信号线充电至VDD电位或复位至VSS电位,在计算前准备第一信号线、第二信号线上的电荷,从而第一信号线、第二信号线上的电荷累积不会出现错误,充电复位电路可避免第一信号线、第二信号线上的残留电荷影响存内计算结果的准确性。As an optional embodiment, as shown in Figure 1, it also includes a charging reset circuit; the charging reset circuit is connected to the first signal line and the second signal line, and is used to charge the first signal line and the second signal line to the VDD potential or reset them to the VSS potential, and prepare the charges on the first signal line and the second signal line before calculation, so that the charge accumulation on the first signal line and the second signal line will not be erroneous. The charging reset circuit can prevent the residual charge on the first signal line and the second signal line from affecting the accuracy of the calculation results in the memory.
作为可选的实施方式,如图4所示,本发明的并行执行过程主要如下描述。当第一组计算子模块进行乘法计算及电荷采样后,将得到的乘积电荷累加到第一信号线,选择器与第一信号线导通且与第二信号线截止,模数转换器开始将第一信号线上累加的乘积电荷转换为第一数字信号。在第一信号线上累加的乘积电荷转换为第一数字信号过程中,延迟后的第二组计算子模块将计算得到的乘积电荷累加到第二信号线上。在第一信号线上累加的乘积电荷完成转换为第一数字信号后,选择器选择与第二信号线导通且与第一信号线截止,第一信号线充电至VDD电位或复位至VSS电位,模数转换器开始将第二信号线上累加的乘积电荷转换为第二数字信号。在第二信号线上累加的乘积电荷转换为第二数字信号过程中,下一个计算周期的第一组计算子模块对应的输入数据开始接入存算单元,并进行乘法计算及电荷采样。在模数转换器将第二信号线上累加的乘积电荷转换为第二数字信号后,选择器与第一信号线导通且与第二信号线截止,第一信号线充电至VDD电位或复位至VSS电位,模数转换器再次将第一信号线上累加的乘积电荷转换为第一数字信号。以此类推,循环往复得到本发明电路的运行逻辑。因此,本发明将模拟存内计算过程分为4个并行和交替的时序:第一乘法计算及电荷采样过程、第一信号模数转换过程、第二乘法计算及电荷采样过程、第二信号模数转换过程。其中,第一乘法计算及电荷采样过程、第二乘法计算及电荷采样过程是并行的,第一信号模数转换过程、第二信号模数转换过程是交替进行的。本发明该创新时序的计算效果更高,弥补了第一信号模数转换过程、第二信号模数转换过程中交替转换的计算时间增量。再者,由于数组分时计算减少了计算量,多比特模数转换器的计算速度提高,计算功耗降低。As an optional implementation, as shown in FIG4 , the parallel execution process of the present invention is mainly described as follows. After the first group of calculation submodules performs multiplication calculation and charge sampling, the obtained product charge is accumulated to the first signal line, the selector is turned on with the first signal line and turned off with the second signal line, and the analog-to-digital converter begins to convert the accumulated product charge on the first signal line into a first digital signal. During the process of converting the accumulated product charge on the first signal line into the first digital signal, the delayed second group of calculation submodules accumulates the calculated product charge on the second signal line. After the accumulated product charge on the first signal line is converted into the first digital signal, the selector selects to be turned on with the second signal line and turned off with the first signal line, the first signal line is charged to the VDD potential or reset to the VSS potential, and the analog-to-digital converter begins to convert the accumulated product charge on the second signal line into a second digital signal. During the process of converting the accumulated product charge on the second signal line into the second digital signal, the input data corresponding to the first group of calculation submodules in the next calculation cycle begins to be connected to the storage unit, and multiplication calculation and charge sampling are performed. After the analog-to-digital converter converts the product charge accumulated on the second signal line into a second digital signal, the selector is turned on with the first signal line and turned off with the second signal line, the first signal line is charged to the VDD potential or reset to the VSS potential, and the analog-to-digital converter converts the product charge accumulated on the first signal line into the first digital signal again. By analogy, the operation logic of the circuit of the present invention is obtained by repeating the cycle. Therefore, the present invention divides the analog memory calculation process into 4 parallel and alternating time sequences: the first multiplication calculation and charge sampling process, the first signal analog-to-digital conversion process, the second multiplication calculation and charge sampling process, and the second signal analog-to-digital conversion process. Among them, the first multiplication calculation and charge sampling process and the second multiplication calculation and charge sampling process are parallel, and the first signal analog-to-digital conversion process and the second signal analog-to-digital conversion process are performed alternately. The calculation effect of the innovative time sequence of the present invention is higher, which makes up for the calculation time increment of the alternating conversion in the first signal analog-to-digital conversion process and the second signal analog-to-digital conversion process. Furthermore, since the array time-sharing calculation reduces the amount of calculation, the calculation speed of the multi-bit analog-to-digital converter is improved, and the calculation power consumption is reduced.
实施例仅是一个特例,并不表明本发明就这样一种实现方式。The embodiment is only a specific example and does not represent only one implementation mode of the present invention.
实施例二:Embodiment 2:
一种处理装置,包括实施例一中的模拟存内计算电路。通过实施例一中的拟存内计算电路,本发明将同一列的存算模块至少分为第一组计算子模块、第二组计算子模块,并通过延时单元将第一组计算子模块、第二组计算子模块分时进行存内计算操作。由于每一行电容极板的电压变化幅度为一个最低有效位LSB,而存内运算的LSB等于信号线上的电压VDD除以行数X,即LSB=VDD/X,当同一列的存算模块进行分组后,每次参与存内计算的行数X随存算模块的分组而减少,LSB自然增大。随着LSB的提高,对模数转换器的分辨率要求降低,模数转换器转换出错的概率也随之降低,自然提高了模数转换器的可靠性。同时,模数转换器通过对第一组计算子模块、第二组计算子模块等的信号模数转换进行分时选择,由于同一时间需要转换的数据减少,模数转换器比特数减少,从而使模数转换的周期更短、功耗更低,同时,由于同一时间转换数据减少,模拟电源的波动△V也随之降低,从而进一步提高模数转换器的转换可靠性。计算吞吐量方面,由于第一组计算子模块、第二组计算子模块等可以并行计算,并会在分别计算后再相加,所以,同一计算周期的计算输入量不受影响,计算吞吐量也不受影响,即本发明提供的模拟存内计算电路可以支持输入量大的情况,进而实现高吞吐量。A processing device includes the simulated in-memory calculation circuit in embodiment 1. Through the simulated in-memory calculation circuit in embodiment 1, the present invention divides the storage and calculation modules of the same column into at least a first group of calculation submodules and a second group of calculation submodules, and performs in-memory calculation operations on the first group of calculation submodules and the second group of calculation submodules in time-sharing through a delay unit. Since the voltage variation amplitude of each row of capacitor plates is a least significant bit LSB, and the LSB of the in-memory operation is equal to the voltage VDD on the signal line divided by the number of rows X, that is, LSB=VDD/X, when the storage and calculation modules of the same column are grouped, the number of rows X participating in the in-memory calculation each time decreases with the grouping of the storage and calculation modules, and the LSB naturally increases. With the increase of LSB, the resolution requirement of the analog-to-digital converter is reduced, and the probability of the analog-to-digital converter conversion error is also reduced, which naturally improves the reliability of the analog-to-digital converter. At the same time, the analog-to-digital converter performs time-sharing selection for the analog-to-digital conversion of the signals of the first group of computing submodules, the second group of computing submodules, etc. Since the data that needs to be converted at the same time is reduced, the number of bits of the analog-to-digital converter is reduced, thereby making the cycle of analog-to-digital conversion shorter and the power consumption lower. At the same time, since the conversion data is reduced at the same time, the fluctuation △V of the analog power supply is also reduced, thereby further improving the conversion reliability of the analog-to-digital converter. In terms of calculation throughput, since the first group of computing submodules, the second group of computing submodules, etc. can be calculated in parallel and will be added after calculation, the calculation input amount of the same calculation cycle is not affected, and the calculation throughput is also not affected, that is, the analog in-memory calculation circuit provided by the present invention can support the situation of large input amount, thereby achieving high throughput.
本发明的处理装置可为芯片、或IP核(具有独立功能的电路模块,该电路模块可以应用在包含该电路模块的其他芯片设计项目中)。本发明提供的处理装置例如可以为神经网络加速器(即NPU,如图5所示,NPU包括存内计算矩阵、向量处理器、编译器等功能模块,用于进行神经网络计算)、或包含神经网络加速器的语音识别装置、包含神经网络加速器的视频处理装置、大模型处理装置等各种装置。当处理装置为芯片时,芯片中各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算设备中的处理器中,也可以以软件形式存储于计算设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。The processing device of the present invention may be a chip or an IP core (a circuit module with independent functions, which can be applied in other chip design projects containing the circuit module). The processing device provided by the present invention may be, for example, a neural network accelerator (i.e., NPU, as shown in FIG5 , NPU includes functional modules such as an in-memory computing matrix, a vector processor, a compiler, etc., for performing neural network calculations), or various devices such as a speech recognition device containing a neural network accelerator, a video processing device containing a neural network accelerator, and a large model processing device. When the processing device is a chip, each module in the chip may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in or independent of a processor in a computing device in the form of hardware, or may be stored in a memory in a computing device in the form of software, so that the processor may call and execute operations corresponding to the above modules.
本发明创造可以应用于适用模拟存内计算芯片的人工智能和元宇宙等高精度、高可靠性、低功耗、低面积成本的算力需求领域、人工智能训练或推理芯片产品、自动驾驶芯片、VR芯片、机器人内置芯片、可穿戴智能芯片、以及并行计算的深度学习应用等领域。The invention can be applied to fields requiring high-precision, high-reliability, low-power consumption, and low-area-cost computing power, such as artificial intelligence and metaverse that are suitable for analog in-memory computing chips, artificial intelligence training or reasoning chip products, autonomous driving chips, VR chips, robot built-in chips, wearable smart chips, and deep learning applications of parallel computing.
实施例三:Embodiment three:
一种电子设备,包括实施例一中的模拟存内计算电路。本发明可应用于算力中心、计算设备、边缘计算、自动驾驶、AR、VR、激光雷达等相关的电子设备,以及智能手机、平板电脑、可穿戴电子装备、智能家居电子产品、工业或医疗或电池供电类等一系列电子设备。An electronic device includes the analog in-memory computing circuit in the first embodiment. The present invention can be applied to electronic devices related to computing centers, computing devices, edge computing, autonomous driving, AR, VR, laser radar, etc., as well as a series of electronic devices such as smart phones, tablet computers, wearable electronic equipment, smart home electronic products, industrial or medical or battery-powered devices.
以上所述仅为本发明的较佳实施例而已,本领域技术人员知悉,在不脱离本发明的精神和范围的情况下,可以对这些特征和实施例进行各种改变或等同替换。另外,在本发明的教导下,可以对这些特征和实施例进行修改以适应具体的情况及材料而不会脱离本发明的精神和范围。因此,本发明不受此处所公开的具体实施例的限制,所有落入本申请的权利要求范围内的实施例都属于本发明的保护范围。The above description is only the preferred embodiment of the present invention. It is known to those skilled in the art that various changes or equivalent substitutions may be made to these features and embodiments without departing from the spirit and scope of the present invention. In addition, under the teachings of the present invention, these features and embodiments may be modified to adapt to specific circumstances and materials without departing from the spirit and scope of the present invention. Therefore, the present invention is not limited to the specific embodiments disclosed herein, and all embodiments falling within the scope of the claims of this application belong to the protection scope of the present invention.
Claims (14)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410570880.XA CN118536563B (en) | 2024-05-09 | 2024-05-09 | Analog in-memory computing circuit, processing device and electronic device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410570880.XA CN118536563B (en) | 2024-05-09 | 2024-05-09 | Analog in-memory computing circuit, processing device and electronic device |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN118536563A true CN118536563A (en) | 2024-08-23 |
| CN118536563B CN118536563B (en) | 2024-12-20 |
Family
ID=92381793
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202410570880.XA Active CN118536563B (en) | 2024-05-09 | 2024-05-09 | Analog in-memory computing circuit, processing device and electronic device |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN118536563B (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119761435A (en) * | 2024-12-11 | 2025-04-04 | 北京邮电大学 | Method and electronic device for deploying neural network in analog in-memory computing NPU |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114546335A (en) * | 2022-04-25 | 2022-05-27 | 中科南京智能技术研究院 | Memory computing device for multi-bit input and multi-bit weight multiplication accumulation |
| CN115910152A (en) * | 2022-11-28 | 2023-04-04 | 安徽大学 | Charge domain memory calculation circuit and calculation circuit with positive and negative number operation function |
| CN116070685A (en) * | 2023-03-27 | 2023-05-05 | 南京大学 | An in-memory computing unit, a memory-calculation array, and a memory-calculation chip |
| WO2023207441A1 (en) * | 2022-04-27 | 2023-11-02 | 北京大学 | Sram storage and computing integrated chip based on capacitive coupling |
| CN117130978A (en) * | 2023-10-12 | 2023-11-28 | 东南大学 | Charge domain in-memory calculation circuit and calculation method based on sparse tracking ADC |
| CN117877553A (en) * | 2023-11-08 | 2024-04-12 | 北京航空航天大学 | In-memory computing circuit for nonvolatile random access memory |
| US20240137035A1 (en) * | 2022-10-12 | 2024-04-25 | Washington University | Scaling-friendly, analog correlators using charge-based margin propagation |
-
2024
- 2024-05-09 CN CN202410570880.XA patent/CN118536563B/en active Active
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114546335A (en) * | 2022-04-25 | 2022-05-27 | 中科南京智能技术研究院 | Memory computing device for multi-bit input and multi-bit weight multiplication accumulation |
| WO2023207441A1 (en) * | 2022-04-27 | 2023-11-02 | 北京大学 | Sram storage and computing integrated chip based on capacitive coupling |
| US20240137035A1 (en) * | 2022-10-12 | 2024-04-25 | Washington University | Scaling-friendly, analog correlators using charge-based margin propagation |
| CN115910152A (en) * | 2022-11-28 | 2023-04-04 | 安徽大学 | Charge domain memory calculation circuit and calculation circuit with positive and negative number operation function |
| CN116070685A (en) * | 2023-03-27 | 2023-05-05 | 南京大学 | An in-memory computing unit, a memory-calculation array, and a memory-calculation chip |
| CN117130978A (en) * | 2023-10-12 | 2023-11-28 | 东南大学 | Charge domain in-memory calculation circuit and calculation method based on sparse tracking ADC |
| CN117877553A (en) * | 2023-11-08 | 2024-04-12 | 北京航空航天大学 | In-memory computing circuit for nonvolatile random access memory |
Non-Patent Citations (1)
| Title |
|---|
| PETER DEAVILLE: "A Fully Row/Column-Parallel In-Memory Computing Macro in Foundry MRAM With Differential Readout for Noise Rejection", IEEE, 12 April 2024 (2024-04-12) * |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119761435A (en) * | 2024-12-11 | 2025-04-04 | 北京邮电大学 | Method and electronic device for deploying neural network in analog in-memory computing NPU |
Also Published As
| Publication number | Publication date |
|---|---|
| CN118536563B (en) | 2024-12-20 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| TWI779285B (en) | Method and apparatus for performing vector-matrix multiplication, and vector-matrix multiplier circuit | |
| US20050125477A1 (en) | High-precision matrix-vector multiplication on a charge-mode array with embedded dynamic memory and stochastic method thereof | |
| CN112698811B (en) | Neural network random number generator shared circuit, shared method, and processor chip | |
| CN117130978A (en) | Charge domain in-memory calculation circuit and calculation method based on sparse tracking ADC | |
| CN110991623A (en) | Neural network computing system based on digital-analog hybrid neurons | |
| CN115390789A (en) | Analog domain full-precision in-memory computing circuit and method based on magnetic tunnel junction computing unit | |
| CN113364462B (en) | Analog storage and calculation integrated multi-bit precision implementation structure | |
| CN110515587A (en) | Multiplier, data processing method, chip and electronic device | |
| CN118536563A (en) | Analog in-memory computing circuit, processing device and electronic device | |
| CN111325334A (en) | Intelligent processor | |
| Liu et al. | Aa 22nm 0.43 pj/sop sparsity-aware in-memory neuromorphic computing system with hybrid spiking and artificial neural network and configurable topology | |
| US11656988B2 (en) | Memory device and operation method thereof | |
| CN222319483U (en) | Analog in-memory computing circuit, processing device and electronic equipment | |
| CN115756388B (en) | Multi-mode storage and calculation integrated circuit, chip and calculation device | |
| CN118349212A (en) | An in-memory computing method and chip design | |
| Chen et al. | 19.7 A scalable pipelined time-domain DTW engine for time-series classification using multibit time flip-flops with 140Giga-cell-updates/s throughput | |
| CN110046695B (en) | A Configurable Array of Highly Parallel Spiking Neurons | |
| Zhang et al. | An energy-efficient mixed-signal parallel multiply-accumulate (MAC) engine based on stochastic computing | |
| CN118519611B (en) | In-memory computing circuit with redundancy design, method and device and electronic equipment | |
| CN222482729U (en) | In-memory computing circuit with redundancy design, device and electronic equipment | |
| Xuan et al. | HPSW-CIM: A novel ReRAM-based computing-in-memory architecture with constant-term circuit for full parallel hybrid-precision-signed-weight MAC operation | |
| Xuan et al. | AiDAC: A Low-Cost In-Memory Computing Architecture with All-Analog Multi-Bit Compute and Interconnect | |
| Lee et al. | A Bit Serial Accelerator Architecture for Efficient ML Compute in Area, Power and Cost Constrained Sensors | |
| US20250103678A1 (en) | Iterative hybrid matrix multiplier | |
| CN119336292B (en) | A linear multiply-accumulate circuit for image processing and its implementation method and chip |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |