CN119296609B - 8T-SRAM memory computing unit, memory computing array and memory computing circuit - Google Patents
8T-SRAM memory computing unit, memory computing array and memory computing circuit Download PDFInfo
- Publication number
- CN119296609B CN119296609B CN202411832795.2A CN202411832795A CN119296609B CN 119296609 B CN119296609 B CN 119296609B CN 202411832795 A CN202411832795 A CN 202411832795A CN 119296609 B CN119296609 B CN 119296609B
- Authority
- CN
- China
- Prior art keywords
- memory
- computing
- nmos tube
- bit line
- word line
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C11/00—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
- G11C11/21—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
- G11C11/34—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
- G11C11/40—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
- G11C11/41—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming static cells with positive feedback, i.e. cells not needing refreshing or charge regeneration, e.g. bistable multivibrator or Schmitt trigger
- G11C11/413—Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing, timing or power reduction
- G11C11/417—Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing, timing or power reduction for memory cells of the field-effect type
- G11C11/419—Read-write [R-W] circuits
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7807—System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
- G06F15/7821—Tightly coupled to memory, e.g. computational memory, smart memory, processor in memory
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Biophysics (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Neurology (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Static Random-Access Memory (AREA)
Abstract
The application relates to an 8T-SRAM memory calculation unit, an in-memory calculation array and an in-memory calculation circuit, wherein the in-memory calculation array comprises a unit array and a bipolar calculation unit, the unit array comprises a plurality of 8T-SRAM memory calculation units distributed in rows, the bipolar calculation unit comprises a first inverter, a second inverter, a seventh NMOS tube, an eighth NMOS tube, a ninth NMOS tube, a tenth NMOS tube, a first capacitor and a second capacitor, the output end of the first inverter is connected with the input end of the second inverter, the grid electrode, the drain electrode and the source electrode of the seventh NMOS tube are respectively connected with the output end of the first inverter, the upper polar plate of the first capacitor and the drain electrode of the eighth NMOS tube, the grid electrode, the drain electrode and the source electrode of the ninth NMOS tube are respectively connected with the output end of the second inverter, the upper polar plate of the second capacitor and the drain electrode of the tenth NMOS tube, and the first capacitor and the second capacitor are respectively connected with a first calculation bit line and a second calculation bit line.
Description
Technical Field
The application relates to the field of integrated circuits, in particular to an 8T-SRAM memory calculation unit, an in-memory calculation array and an in-memory calculation circuit.
Background
Convolutional neural networks (Convolutional Neural Network, CNN) are the basis for current AI applications. Since CNN has made breakthrough applications in speech recognition, image recognition and internet of things, the application scenario of CNN has shown explosive growth. The core operation of the convolutional neural network is to realize the convolutional operation of the activation value and the convolutional kernel, and the mapping to the circuit level is to realize the Multiply-and-accumulate (MAC) operation of the input and the pre-training weight. Conventional fully digital AI edge processors based on von neumann architecture, when performing MAC operations, generate a lot of energy overhead and delay due to the movement of data between the processing unit and the memory, preventing further improvement of chip performance, which is referred to as "memory wall" as shown in fig. 1 (a).
In-memory computing (Compute in Memory, CIM) is a method that overcomes the "memory wall" problem and improves energy efficiency by allowing parallel data processing within memory modules, which is particularly advantageous when performing multi-bit MAC operations, as shown in fig. 1 (b). In addition, CNN has high requirement on memory capacity and high resource overhead, and is not suitable for application of edge devices, while binary weighted neural network (Binary Weight Network, BWN) greatly improves the energy efficiency of the network under the condition of sacrificing smaller accuracy by binarizing weights in the MAC process, so BWN becomes one of the neural networks with the most application prospect in the edge devices.
The analog CIM operation is performed in the SRAM bit cell (i.e. multiplication) and a column or row of bit cells connected to the Bit Line (BLs) to calculate in the memory array, the positive and negative calculation corresponds to the charge and discharge of the same bit line, the charge uses PMOS tube, the discharge uses NMOS tube, and the process of the two types of MOS tube is different, so that the situation of inconsistent charge and discharge occurs. In particular, the simulated CIM has high energy and area efficiency, but it suffers from various simulation non-idealities. CIM macros in the voltage domain form are affected by inconsistent charge and discharge rates caused by different factors such as carrier mobility and the like, so that the calculation result is inaccurate.
Aiming at the problem that the calculation result is inaccurate due to inconsistent charge and discharge rate in the positive and negative calculation of the conventional SRAM in-memory calculation circuit, no effective solution is proposed at present.
Disclosure of Invention
The invention provides an 8T-SRAM memory computing unit, an in-memory computing array and an in-memory computing circuit, which are used for solving the problem that the computing result is inaccurate due to inconsistent charge and discharge rate in the positive and negative computation of the conventional SRAM memory computing circuit.
In a first aspect, the present invention provides an in-memory computing array, comprising a row-distributed cell array and bipolar computing units, wherein the cell array comprises a plurality of row-distributed 8T-SRAM memory computing units;
Each 8T-SRAM memory cell comprises a 6T-SRAM memory structure, a fifth NMOS tube and a sixth NMOS tube, wherein the source electrode and the grid electrode of the fifth NMOS tube are respectively connected with a first storage node and a first symbol input word line of the 6T-SRAM memory structure, the source electrode and the grid electrode of the sixth NMOS tube are respectively connected with a second storage node and a second symbol input word line of the 6T-SRAM memory structure, and the drain electrode of the fifth NMOS tube and the drain electrode of the sixth NMOS tube are connected and form a judging node;
The bipolar computing unit comprises a first inverter, a second inverter, a seventh NMOS tube, an eighth NMOS tube, a ninth NMOS tube, a tenth NMOS tube, a first capacitor and a second capacitor, wherein the output end of the first inverter is connected with the input end of the second inverter, the grid electrode, the drain electrode and the source electrode of the seventh NMOS tube are respectively connected with the output end of the first inverter, the upper polar plate of the first capacitor and the drain electrode of the eighth NMOS tube, the grid electrode, the drain electrode and the source electrode of the ninth NMOS tube are respectively connected with the output end of the second inverter, the upper polar plate of the second capacitor and the drain electrode of the tenth NMOS tube, the first capacitor and the second capacitor are respectively connected with a first computing bit line and a second computing bit line, the lower polar plates of the first capacitor and the second capacitor are grounded, the grid electrodes of the eighth NMOS tube and the tenth NMOS tube are respectively connected with a numerical value input word line, and the drain electrodes of the eighth NMOS tube and the tenth NMOS tube are grounded;
In each cell array, a plurality of the judging nodes are connected with the input end of the first inverter through the same judging bit line.
In a second aspect, the invention provides an 8T-SRAM memory cell, comprising a first PMOS tube, a second PMOS tube, a first NMOS tube, a second NMOS tube, a third NMOS tube, a fourth NMOS tube, a fifth NMOS tube and a sixth NMOS tube;
The grid electrode, the source electrode and the drain electrode of the second PMOS tube are respectively connected with the grid electrode, the power supply voltage and the drain electrode of the second NMOS tube, and the source electrode of the second NMOS tube is connected with the power supply ground;
The grid electrodes of the second PMOS tube and the second NMOS tube and the drain electrodes of the first PMOS tube and the first NMOS tube are connected and form a first storage node, and the grid electrodes of the first PMOS tube and the first NMOS tube and the drain electrodes of the second PMOS tube and the second NMOS tube are connected and form a second storage node;
The grid electrode, the source electrode and the drain electrode of the third NMOS tube are respectively connected with a memory word line, the first storage node and the first bit line, and the grid electrode, the source electrode and the drain electrode of the fourth NMOS tube are respectively connected with the memory word line, the second storage node and the second bit line;
The source electrode and the grid electrode of the fifth NMOS tube are respectively connected with the first storage node and the first symbol input word line INP, the source electrode and the grid electrode of the sixth NMOS tube are respectively connected with the second storage node and the second symbol input word line INN, and the drain electrode of the fifth NMOS tube is connected with the drain electrode of the sixth NMOS tube and forms a judging node.
In a third aspect, the present invention provides an in-memory computing circuit, including a main memory array, where the main memory array includes a pair of main memory modules, each of the main memory modules includes a plurality of sub memory modules distributed in columns, each of the sub memory modules includes a plurality of sub memory arrays distributed in rows, a voltage delay converter, a TDC, a counter, and a digital subtractor, and the sub memory array is the in-memory computing array according to the first aspect.
In a fourth aspect, the present invention provides an in-memory computing chip integrated with the in-memory computing circuit according to the second aspect.
In a fifth aspect, the present invention provides a static random access memory, including the in-memory computing chip according to the third aspect.
Compared with the related art, the invention calculates positive and negative calculation results through the first calculation bit line (negative) and the second calculation bit line (positive), and the positive and negative two calculation bit lines have completely consistent discharge paths, the circuit structure is mirror image and is controlled by the same numerical input word line, so that the positive and negative two values with equal absolute values of the calculation results and the discharge capacity are completely consistent, and the structure greatly improves the accuracy of MAC operation. Therefore, the problem that the calculation result is inaccurate due to inconsistent charge and discharge rate in the positive and negative calculation of the existing SRAM in-memory calculation circuit is solved.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the other features, objects, and advantages of the application.
Drawings
FIG. 1 is a schematic diagram of a prior art von Neumann architecture and in-memory computing architecture;
FIG. 2 is a schematic diagram of an 8T-SRAM memory cell in some embodiments of the present invention;
FIG. 3 is a schematic diagram of a bipolar computation unit in some embodiments of the invention;
FIG. 4 is a schematic diagram of a neutron storage module according to some embodiments of the invention;
FIG. 5 is a schematic diagram of an in-memory computing circuit (two columns are examples) according to some embodiments of the invention;
FIG. 6 is a flow chart of multiply and multiply accumulate operations for multiple bit inputs and weights for an in-memory computational circuit in some embodiments of the present invention;
FIG. 7 is a functional simulation of an in-memory computing circuit in accordance with some embodiments of the present invention;
FIG. 8 is a product simulation result of an in-memory computing circuit in some embodiments of the invention;
FIG. 9 is a MAC simulation result of an in-memory computing circuit in some embodiments of the invention;
FIG. 10 is a schematic diagram illustrating a simulation of the linearity of the positive and negative MAC values of the in-memory computing circuit according to some embodiments of the present invention.
Detailed Description
The present application will be described and illustrated with reference to the accompanying drawings and examples for a clearer understanding of the objects, technical solutions and advantages of the present application.
Unless defined otherwise, technical or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terms "a," "an," "the," "these" and similar terms in this application are not intended to be limiting in number, but may be singular or plural. The terms "comprises," "comprising," "includes," "including," "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, and system, article, or apparatus that comprises a list of steps or modules (units) is not limited to the list of steps or modules (units), but may include other steps or modules (units) not listed or inherent to such process, method, article, or apparatus. The terms "connected," "coupled," and the like in this disclosure are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as used herein means two or more. "and/or" describes the association relationship of the association object, and indicates that three relationships may exist, for example, "a and/or B" may indicate that a exists alone, a and B exist simultaneously, and B exists alone. Typically, the character "/" indicates that the associated object is an "or" relationship. The terms "first," "second," "third," and the like, as referred to in this disclosure, merely distinguish similar objects and do not represent a particular ordering for objects.
In an embodiment of the present invention, an in-memory computing array is provided that includes a row-distributed cell array including a plurality of 8T-SRAM memory computing cells and bipolar computing cells.
Referring to fig. 2, each 8T-SRAM memory cell includes a 6T-SRAM memory structure, a fifth NMOS transistor N5 and a sixth NMOS transistor N6, the source and gate of the fifth NMOS transistor N5 are respectively connected to the first storage node Q and the first symbol input word line INP of the 6T-SRAM memory structure, the source and gate of the sixth NMOS transistor N6 are respectively connected to the second storage node QB and the second symbol input word line INN of the 6T-SRAM memory structure, and the drain of the fifth NMOS transistor N5 is connected to the drain of the sixth NMOS transistor N6 and forms the determination node Judge.
The 6T-SRAM memory cell can be a traditional 6T-SRAM memory cell.
In some embodiments, the 6T-SRAM memory structure includes a first inverting structure, a second inverting structure, a third NMOS transistor N3, and a fourth NMOS transistor N4, where the first inverting structure and the second inverting structure are coupled in an inverted cross manner to form a first storage node Q and a second storage node QB, a gate, a source, and a drain of the third NMOS transistor N3 are respectively connected to the memory word line WL, the first storage node Q, and the first bit line BL, and a gate, a source, and a drain of the fourth NMOS transistor N4 are respectively connected to the memory word line WL, the second storage node QB, and the second bit line BLB.
The first inverting structure comprises a first PMOS tube and a first NMOS tube, the second inverting structure comprises a second PMOS tube and a second NMOS tube, the grid electrode, the source electrode and the drain electrode of the first PMOS tube are respectively connected with the grid electrode, the power supply voltage and the drain electrode of the first NMOS tube, the source electrode of the first NMOS tube is connected with the power supply ground, the grid electrode, the source electrode and the drain electrode of the second PMOS tube are respectively connected with the grid electrode, the power supply voltage and the drain electrode of the second NMOS tube, and the source electrode of the second NMOS tube is connected with the power supply ground.
The gates of the second PMOS tube and the second NMOS tube and the drains of the first PMOS tube and the first NMOS tube are connected and form a first storage node, and the gates of the first PMOS tube and the first NMOS tube and the drains of the second PMOS tube and the second NMOS tube are connected and form a second storage node.
For the above 8T-SRAM memory cell, the 6T-SRAM memory structure provides basic data reading, writing, and holding functions, and the determination node provides the determination signal Judge for logical calculation of the weight data and the input data stored in the 6T-SRAM memory structure. Specific:
before writing, the voltages of the first bit line BL and the second bit line BLB are precharged to VDD, when the external address signal is valid, the 6T-SRAM memory structure to be written is determined by the address decoding circuit, the written data is sent to the bit line by the write driving circuit, when the third NMOS tube N3 and the fourth NMOS tube N4 are opened, the data is written into the 6T-SRAM memory structure, and the writing operation is completed.
Before a read operation, voltages of the first bit line BL and the second bit line BLB are precharged to VDD, when an external address signal is valid, a 6T-SRAM memory structure of data to be read is determined by an address decoding circuit, when the memory word line WL is at a high level, the first bit line BL and the second bit line BLB have a certain voltage difference, and the voltage difference of the first bit line BL and the second bit line BLB passes through an amplifying and data output driving circuit of a sense amplifier, so that the data of the 6T-SRAM memory structure is read, and the read operation is completed.
And (3) maintaining operation, namely when the memory word line WL of the 6T-SRAM memory structure is at a low level, isolating the bit line signal of the 6T-SRAM memory structure from the storage node. The core of the 6T-SRAM memory architecture is a latch structure formed by a pair of inverters, which keeps the first storage node Q and the second storage node QB in a bistable state at all times, keeping the operation complete.
And the logical function supported by the 8T-SRAM memory unit is an exclusive OR operation, and the exclusive OR result is represented on the judging node Judge and used as a symbol judging result to control the next stage. The symbol judgment result is expressed as follows:
Here, the weight is 1, and the symbol input (input from the symbol input word line) is 1. Before the symbol calculation starts, the first symbol input word line INP and the second symbol input word line INN are at low levels, and the fifth NMOS transistor N5 and the sixth NMOS transistor N6 are not turned on. When the symbol input is 1, the first symbol input word line INP turns to high level, the second symbol input word line INN remains low level, the fifth NMOS transistor N5 is turned on, the sixth NMOS transistor N6 is turned off, the determination node Judge turns to high level through the charging of the first PMOS transistor P1 and the fifth NMOS transistor N5, so that the symbol is determined to be positive, and the rest of the calculation conditions are as follows:
Table 1 symbol judgment example
As above, a specific description of an 8T-SRAM memory cell is provided.
Referring to fig. 3, the bipolar computing unit includes a first inverter INV1, a second inverter IVN2, a seventh NMOS transistor N7, an eighth NMOS transistor N8, a ninth NMOS transistor N9, a tenth NMOS transistor N20, a first capacitor C1 and a second capacitor C2, wherein an output end of the first inverter INV1 is connected to an input end of the second inverter INV2, a gate, a drain and a source of the seventh NMOS transistor N7 are respectively connected to the output end of the first inverter INV1, an upper plate of the first capacitor C1 and a drain of the eighth NMOS transistor N8, a gate, a drain and a source of the ninth NMOS transistor N9 are respectively connected to an output end of the second inverter INV2, an upper plate of the second capacitor C2 and a drain of the tenth NMOS transistor N10, the first capacitor C1 and the second capacitor C2 are respectively further connected to the first computing bit line NCC and the second computing bit line PCC, a lower plate of the first capacitor C1 and the second capacitor C2 are respectively grounded VSS, a gate of the eighth NMOS transistor N8 and a gate of the tenth NMOS transistor N10 are respectively connected to the drain of the tenth NMOS transistor N8, and the value of the tenth NMOS transistor N10 is grounded.
In each cell array, a plurality of determination nodes Judge are connected to an input terminal of the first inverter INV1 through the same determination bit line.
In the bipolar computing unit, the positive and negative computing results are respectively computed through the first computing bit line NCC (negative) and the second computing bit line PCC (positive), and the positive and negative computing bit lines have completely identical discharge paths, the circuit structure is mirrored and is controlled by the same numerical input word line WL, so that the positive and negative binary values with equal absolute values of the computing results are completely identical, and the discharge capacity is also completely identical, so that the structure greatly improves the accuracy of MAC operation. Therefore, the problem that the calculation result is inaccurate due to inconsistent charge and discharge rate in the positive and negative calculation of the existing SRAM in-memory calculation circuit is solved.
The 8T-SRAM memory unit is used for calculating symbols in the binary weight neural network and can complete convolution calculation of the binary weight neural network by matching with the bipolar calculation unit.
Referring to fig. 4 and 5, in an embodiment of the present invention, there is further provided an in-memory computing circuit, including a main memory computing array, a row decoder, a word line driver, a column decoder, a precharge module, a mode selection module, a PWM, a sense amplifier, an input/output port, and an in-memory computing timing control module, the main memory computing array including a pair of main memory modules, each main memory module including a plurality of sub memory modules distributed in columns, each sub memory module including a plurality of sub memory arrays distributed in rows, a voltage delay converter, a TDC, a counter, and a digital subtractor, the sub memory computing array being the in-memory computing array provided by the present invention. The bipolar computation unit is connected with the voltage delay converter through the computation bit line pair (the first computation bit line and the second computation bit line), and the conversion result of the voltage delay converter is quantized by the TDC and the calculator.
In the main memory module, the 8T-SRAM memory cells of the same row share the same memory word line WL, the first symbol input word line INP and the second symbol input word line INN, the bipolar computation cells of the same row share the same value input word line CEN, the 8T-SRAM memory cells of the same column share the same first bit line BL and the second bit line BLB, and the bipolar computation cells of the same column share the same first computation bit line NCC and the second computation bit line PCC.
In some of these embodiments, the number of sub-storage modules in each main storage module is 64 and the number of sub-storage arrays in each sub-storage module is 16.
It should be noted that, in the above-mentioned in-memory computing circuit, a new main memory computing array is mainly adopted, and the modules of a row decoder, a word line driver, a column decoder, a pre-charge module, a mode selection module, PWM, a sense amplifier, an input/output port, an in-memory computing timing control module and the like are standard modules of the in-memory computing circuit, so the present invention is not described excessively.
Referring to fig. 2, an 8T-SRAM memory cell is further provided in an embodiment of the present invention, which includes a first PMOS transistor P1, a second PMOS transistor P2, a first NMOS transistor N1, a second NMOS transistor N2, a third NMOS transistor N3, a fourth NMOS transistor N4, a fifth NMOS transistor N5, and a sixth NMOS transistor N6.
The grid electrode, the source electrode and the drain electrode of the first PMOS tube are respectively connected with the grid electrode, the power supply voltage and the drain electrode of the first NMOS tube, the source electrode of the first NMOS tube is connected with the power supply ground, the grid electrode, the source electrode and the drain electrode of the second PMOS tube are respectively connected with the grid electrode, the power supply voltage and the drain electrode of the second NMOS tube, and the source electrode of the second NMOS tube is connected with the power supply ground.
The gates of the second PMOS tube and the second NMOS tube and the drains of the first PMOS tube and the first NMOS tube are connected and form a first storage node, and the gates of the first PMOS tube and the first NMOS tube and the drains of the second PMOS tube and the second NMOS tube are connected and form a second storage node.
The gate, source and drain of the third NMOS transistor N3 are respectively connected to the word line WL, the first storage node Q and the first bit line BL, and the gate, source and drain of the fourth NMOS transistor N4 are respectively connected to the word line WL, the second storage node QB and the second bit line BLB.
The source and gate of the fifth NMOS transistor N5 are respectively connected to the first storage node Q and the first symbol input word line INP, the source and gate of the sixth NMOS transistor N6 are respectively connected to the second storage node QB and the second symbol input word line INN, and the drain of the fifth NMOS transistor N5 is connected to the drain of the sixth NMOS transistor N6 and forms the determination node Judge.
The above provides a new 8T-SRAM memory unit, for the above 8T-SRAM memory unit, in which the 6T-SRAM memory structure provides basic data reading, writing, and holding functions, and the determination node provides the determination signal for logical calculation of the weight data and the input data stored in the 6T-SRAM memory structure. In particular, reference may be made to the description of 8T-SRAM memory cells in an in-memory compute array.
The in-memory computing circuit and the in-memory computing array provided by the invention are specifically described below by a specific embodiment.
Referring to fig. 2-5, in one embodiment (in which the various devices, bit lines, and word lines are abbreviated for simplicity), a signed in-memory computation circuit is provided that includes a main memory computation array, a bit line group, a word line group, a row decoder, a word line driver, a column decoder, a precharge module, a mode selection module, a PWM, a voltage delay converter, a TDC, a counter, a digital subtractor, a sense amplifier, an input output port, an in-memory computation timing control module, and an in-memory computation module.
The main memory array comprises two main memory modules, each main memory module comprises 64 sub memory modules distributed in columns, each sub memory module comprises 16 sub memory arrays distributed in rows, a voltage delay converter (VTC), a digital subtracting circuit, a TDC and a counter, each sub memory array comprises a cell array distributed in rows and a bipolar computing unit, and the cell array comprises a plurality of 8T-SRAM memory cells.
The 8T-SRAM memory cell consists of a traditional 6T-SRAM cell and a 2T symbol judgment port. The system comprises 2 PMOS tubes P1-P2 and 6 NMOS tubes N1-N6, wherein P1, P2, N1, N2, N3 and N4 are cross-coupled to form a basic 6T-SRAM unit with two storage nodes Q and QB, and the rest two NMOS tubes N5 and N6 form a symbol judgment port. The symbol judgment port provides judgment signals Judge for logic calculation of weight data and input data stored in the 6T-SRAM unit.
The circuit connection relation of the 6T-SRAM unit in the 8T-SRAM memory unit is that P1 and N1 form an inverter, P2 and N2 form another inverter, the two inverters are in inverse cross coupling to form storage nodes Q and QB, the storage node Q is connected to a bit line BL through a transmission pipe N3, the storage node QB is connected to the bit line BLB through a transmission pipe N4, the gates of the N3 and N4 are connected to a word line WL, the 2T symbol judgment port circuit connection relation is that the source of the N5 is connected to the storage node Q of the 6T-SRAM, the gate of the N5 is connected to the word line INP, the source of the N6 is connected to the storage node QB of the 6T-SRAM, the drain of the N5 is connected to the drain of the N6 is connected to a judgment node Judge.
The bipolar computing unit is characterized in that an input end of an inverter INV1 is connected to a judging node Judge, an output node NCE is connected to a grid electrode of N7 and is connected to an input end of an inverter INV2, an output node PCE of INV2 is connected to a grid electrode of N9, a drain electrode of N7 is connected to an upper pole plate of a capacitor C1, a source electrode of N7 is connected to a drain electrode of N8, a source electrode of N8 is grounded VSS, a drain electrode of N9 is connected to an upper pole plate of a capacitor C2, a source electrode of N9 is connected to a drain electrode of N10, a source electrode of N10 is grounded VSS, grid electrode connection values of N8 and N10 are input to a word line CEN, and lower pole plates of C1 and C2 are grounded VSS.
The word line group includes 256 memory word lines WL,256 pairs of symbol input word lines INP and INN, and 16 number input word lines CEN.
In each main memory module, the same word line WL is connected to the same row cell array, the same symbol input word line pair INP and INN is connected to all the same row bipolar computation units, and the same value input word line CEN is connected to all the same row bipolar computation units.
The bit line set includes 128 pairs of bit lines BL and BLB,128 pairs of compute bit lines PCC and NCC,128 sense bit lines Judge, and a bipolar compute cell capacitor upper plate for each column connected to compute bit line PCC (NCC).
In each sub-memory array, the unit array is connected with the same bit line pair, the judging nodes of the unit array are connected with the same judging bit line Judge, the bipolar computing unit is connected with the same computing bit line pair, the bipolar computing unit is connected with the voltage delay converter through the computing bit line pair, and the conversion result of the voltage delay converter is quantized by the TDC and the calculator.
The row decoder is used for controlling the word line driving of each word line.
The word line driver is used for controlling the opening or closing of each word line according to the decoding result of the row decoder.
The column decoder is used for addressing the target cell for read operations.
The pre-charge module is used for charging each bit line in the bit line group.
The mode selection module is used for selecting a read-write mode or a calculation mode.
PWM is used to convert input data into pulse width delays of a corresponding length.
The voltage delay converter is used for converting the MAC calculation value into a corresponding pulse width delay.
The TDC is used for converting the pulse width delay converted by the voltage delay converter into a corresponding digital quantity.
The counter is used for assisting the TDC to finish converting the delay pulse width.
And the digital subtracter is used for subtracting the quantized two digital values in the digital domain.
The sense amplifier is used to output data stored in any 8T-SRAM memory cell in a read mode.
The input/output port is used for acquiring input data to be written in a write mode and outputting read storage data in a read mode, and the in-memory calculation time sequence control module is used for generating each clock signal required by MAC operation.
The time sequence control module is used for generating various clock signals required in the process of reading/writing operation and operation.
The multi-bit multiplication logic and circuitry principle of the in-memory computation circuit in this embodiment is as follows:
1. Multi-bit number multiplication.
The arithmetic logic and circuit principle of the multi-bit multiplication in this embodiment will be described below by taking a 5bit×1bit operation as an example:
in a 5bit by 1bit multiplication operation, only one bipolar computation unit is needed. Wherein, IN <0> IN the input IN <4:0> is used as a sign bit value to be input into the 8T-SRAM for performing an exclusive OR operation, and the corresponding calculation result is Judge as a judgment basis of the bipolar calculation unit.
The high level Judge is taken as an example in the following. Before the "multiply" operation begins, the compute bit lines PCC and NCC are precharged to a high level by a precharge module. When Judge =1, PCE is high, NCE is low, N7 is turned off, N9 is turned on, PWM is input corresponding to pulse width IN accordance with IN <4:1>, N8, N10 are correspondingly turned on when IN <4:1> noteq0, discharge of bit line PCC through ground path formed by N9, N10 is calculated, discharge amount is determined according to number of delay pulse width formed by PWM, a delay pulse width is set as Δt, each Δt corresponds to one Δv to be discharged, and all conditions are as follows:
Table 2 multiplication table
2. And (5) accumulating the multiple bits.
The above describes the basic logic and principles for implementing multi-bit multiplication, on the basis of which the following further describes the operation of multi-bit multiply-accumulate:
Taking 16 groups of 5bit by 1bit accumulation as an example, judge0 to Judge7 are 1, and Judge8 to Judge15 are 0.
After the start of the MAC operation, 16 sets of IN <4:0> are input externally, since the computation bit lines of the bipolar computation cells of each column are connected to the same PCC and NCC, the MAC result is as follows:
the above equation is to calculate the discharge amount of the bit line PCC.
The above equation is to calculate the discharge amount of the bit line NCC.
To sum up, as shown in fig. 6, the multi-bit input and weight multiplication and multiply-accumulate operation procedure of the signed in-memory computing circuit provided in the present embodiment is as follows:
s0 pre-charge circuit pre-charges the computation bit line to pre-charge all PCC (NCC) to the supply voltage VDD before the computation operation starts.
S1, when the calculation operation starts, an external circuit serially inputs 16 groups of activation values IN0[4:0] to IN15[4:0] through an input-output module.
S2, according to the sign bit values IN0[0] to IN15[0] of the activation values, corresponding INP0 to INP15, and the sign calculation results of the 8TSRAM are represented by INN0 to INN15 and Judge, wherein PCEs (NCEs) IN each sub-memory array are enabled.
S3, according to the data bit values IN0[4:1] to IN15[4:1] of the activation values, PWM is correspondingly converted into pulse width delays CEN0[3:0] to CEN15[3:0] with corresponding magnitudes, PCC and NCC are discharged to the ground through a bipolar computing module, and the multi-bit MAC result is characterized on a computing bit line PCC (NCC).
S4, converting the result on each PCC and NCC through a voltage delay converter, subtracting the pulse width delay of each row of PCCs and NCCs through a pulse width delay subtracting circuit, and sending the result into the TDC.
S5, the TDC is matched with a counter to finish the quantization and output of a final result.
In the single MAC operation process, only one pair of 16 pairs of symbol input word line pairs INP 0-INP 15 and INN 0-INN 15 in each memory array is activated to participate in calculation, and the symbol pairs in 16 sub memory arrays in each sub memory module are correspondingly activated to participate in calculation, so that multiple groups of parallel MAC operations are executed.
In order to verify the validity of the above embodiment, the signed in-memory computing circuit mentioned in the embodiment is simulated and tested in a simulator and test platform, and the simulation and test results are as follows:
1. Or functional simulation.
The experiment is based on a 28nm technology, the symbol judgment function of the 8TSRAM is simulated, and the simulation result is shown in figure 7. Analysis shows that Judge signal changes in the four cases are consistent with expectations, so that a symbol judgment function can be realized.
2. "Product" simulation.
The product function of the bipolar calculation unit is simulated, the input is 5 bits, the weight is 1bit, and the voltage level after discharging is shown in fig. 8.
Analysis shows that the bipolar calculation unit provided by the experiment can realize the product function in one sub-calculation array, the simulation voltage level is 900 mv-593 mv, and the voltage difference between two adjacent values is about 20mv, so that the bipolar calculation unit can be distinguished and converted by a subsequent quantization unit.
3. And (5) MAC simulation.
The MAC calculation function of the bipolar calculation unit is simulated, the basic unit of the simulation is a sub-memory module and 16 sub-memory arrays distributed in rows, and the voltage level after discharge is shown in fig. 9. Analysis shows that the bipolar computing unit provided by the experiment can realize the MAC function in a row of sub-computing arrays, the simulation voltage level is 900 mv-597.8 mv, the voltage difference between two adjacent values is about 20mv, and the voltage level of the same value is consistent with the voltage level of the same value in the process of multiplication, so that the simulation voltage level can be differentiated and converted by a subsequent quantization unit.
For the simulation of the MAC result being positive and negative, respectively, the basic unit of the simulation is a sub-memory module, and 16 sub-memory arrays distributed in rows, and the voltage level after discharging is shown in fig. 10. Analysis shows that the voltage levels of the positive and negative binary values corresponding to the same absolute value of the MAC are completely consistent, so that higher linearity is realized.
The embodiment of the invention also provides an in-memory computing chip which is integrated with the in-memory computing circuit.
The embodiment of the invention also provides a static random access memory which comprises the in-memory computing chip provided by the invention.
It should be understood that the specific embodiments described herein are merely illustrative of this application and are not intended to be limiting. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure in accordance with the embodiments provided herein.
It is to be understood that the drawings are merely illustrative of some embodiments of the present application and that it is possible for those skilled in the art to adapt the present application to other similar situations without the need for inventive work. In addition, it should be appreciated that while the development effort might be complex and lengthy, it would nevertheless be a routine undertaking of design, fabrication, or manufacture for those of ordinary skill having the benefit of this disclosure, and thus should not be construed as a departure from the disclosure.
Claims (9)
1. The in-memory computing array is characterized by comprising a cell array and bipolar computing cells, wherein the cell array is distributed in rows and comprises a plurality of 8T-SRAM (static random access memory) computing cells distributed in rows;
Each 8T-SRAM memory unit comprises a 6T-SRAM memory structure, a fifth NMOS tube N5 and a sixth NMOS tube N6, the source electrode and the grid electrode of the N5 are respectively connected with a first storage node and a first symbol input word line of the 6T-SRAM memory structure, the source electrode and the grid electrode of the N6 are respectively connected with a second storage node and a second symbol input word line of the 6T-SRAM memory structure, and the drain electrode of the N5 is connected with the drain electrode of the N6 to form a judging node;
the bipolar computing unit comprises a first inverter INV1, a second inverter INV2, a seventh NMOS tube N7, an eighth NMOS tube N8, a ninth NMOS tube N9, a tenth NMOS tube N10, a first capacitor C1 and a second capacitor C2, wherein the output end of the INV1 is connected with the input end of the INV2, the grid electrode, the drain electrode and the source electrode of the N7 are respectively connected with the output end of the INV1, the upper polar plate of the C1 and the drain electrode of the N8, the grid electrode, the drain electrode and the source electrode of the N9 are respectively connected with the output end of the INV2, the upper polar plate of the C2 and the drain electrode of the N10, the C1 and the C2 are respectively connected with a first computing bit line and a second computing bit line, the lower polar plates of the C1 and the C2 are grounded, the grid electrodes of the N8 and the N10 are respectively connected with a numerical value input word line, and the drain electrodes of the N8 and the N10 are grounded;
in each cell array, a plurality of the judging nodes are connected with the input end of the INV1 through the same judging bit line.
2. The in-memory compute array of claim 1, wherein the 6T-SRAM memory architecture comprises a first inverting architecture, a second inverting architecture, a third NMOS transistor N3, and a fourth NMOS transistor N4, the first inverting architecture and the second inverting architecture being coupled in inverted cross-coupling to form the first storage node and the second storage node, a gate, a source, and a drain of N3 being connected to a memory word line, the first storage node, and a first bit line, respectively, and a gate, a source, and a drain of N4 being connected to the memory word line, the second storage node, and the second bit line, respectively.
3. The in-memory computing array of claim 2, wherein the first inverting structure comprises a first PMOS transistor P1 and a first NMOS transistor N1, and the second inverting structure comprises a second PMOS transistor P2 and a second NMOS transistor N2;
The grid, the source and the drain of the P1 are respectively connected with the grid, the power supply voltage and the drain of the N1, and the source of the N1 is connected with the power supply ground;
the gates of P2 and N2 and the drains of P1 and N1 are connected and form a first storage node, and the gates of P1 and N1 and the drains of P2 and N2 are connected and form a second storage node.
4. An in-memory computing circuit comprising a main memory array comprising a pair of main memory modules, each of the main memory modules comprising a plurality of sub-memory modules distributed in columns, each of the sub-memory modules comprising a plurality of sub-memory arrays distributed in rows, a voltage delay converter, a TDC, a counter, and a digital subtractor, wherein the sub-memory array is the in-memory computing array of any one of claims 1-3.
5. The in-memory computing circuit of claim 4, wherein the number of sub-memory modules in each of the main memory modules is 64 and the number of sub-memory arrays in each of the sub-memory modules is 16.
6. The in-memory computing circuit of claim 4, wherein in the main memory module, the 8T-SRAM memory cells of a same row share a same memory word line, a first symbol input word line, and a second symbol input word line, the bipolar computing cells of a same row share a same value input word line, the 8T-SRAM memory cells of a same column share a same first bit line and a second bit line, and the bipolar computing cells of a same column share a same first computation bit line and a second computation bit line.
7. The in-memory computation circuit of claim 4, further comprising a row decoder, a word line driver, a column decoder, a precharge module, a mode selection module, PWM, a sense amplifier, an input-output port, an in-memory computation timing control module.
8. An in-memory computing chip, characterized in that an in-memory computing circuit according to any one of claims 4-7 is integrated.
9. A static random access memory comprising the in-memory computing chip of claim 8.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411832795.2A CN119296609B (en) | 2024-12-13 | 2024-12-13 | 8T-SRAM memory computing unit, memory computing array and memory computing circuit |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411832795.2A CN119296609B (en) | 2024-12-13 | 2024-12-13 | 8T-SRAM memory computing unit, memory computing array and memory computing circuit |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN119296609A CN119296609A (en) | 2025-01-10 |
| CN119296609B true CN119296609B (en) | 2025-03-07 |
Family
ID=94165483
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202411832795.2A Active CN119296609B (en) | 2024-12-13 | 2024-12-13 | 8T-SRAM memory computing unit, memory computing array and memory computing circuit |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN119296609B (en) |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117636945A (en) * | 2024-01-26 | 2024-03-01 | 安徽大学 | 5-bit signed bit AND OR accumulation operation circuit and CIM circuit |
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7671422B2 (en) * | 2007-05-04 | 2010-03-02 | Taiwan Semiconductor Manufacturing Company, Ltd. | Pseudo 6T SRAM cell |
| US11211115B2 (en) * | 2020-05-05 | 2021-12-28 | Ecole Polytechnique Federale De Lausanne (Epfl) | Associativity-agnostic in-cache computing memory architecture optimized for multiplication |
| CN114360595B (en) * | 2021-11-22 | 2025-05-27 | 安徽大学 | A bidirectional subtraction circuit structure based on row and column in 8T SRAM memory |
| EP4354435A1 (en) * | 2022-10-11 | 2024-04-17 | Semron GmbH | Analog multiplying unit with binary multipliers |
| CN116126778A (en) * | 2022-12-28 | 2023-05-16 | 上海科技大学 | Low-temperature high-energy-efficiency in-memory computing accelerator |
| CN117316237B (en) * | 2023-12-01 | 2024-02-06 | 安徽大学 | Time domain 8T1C-SRAM storage and computing unit and timing tracking and quantization storage and computing circuit |
-
2024
- 2024-12-13 CN CN202411832795.2A patent/CN119296609B/en active Active
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117636945A (en) * | 2024-01-26 | 2024-03-01 | 安徽大学 | 5-bit signed bit AND OR accumulation operation circuit and CIM circuit |
Also Published As
| Publication number | Publication date |
|---|---|
| CN119296609A (en) | 2025-01-10 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN110414677B (en) | Memory computing circuit suitable for full-connection binarization neural network | |
| CN112992223B (en) | Memory computing unit, memory computing array and memory computing device | |
| CN113255904B (en) | Voltage margin enhanced capacitive coupling storage and computing integrated unit, sub-array and device | |
| CN112558919B (en) | In-memory computing bit unit and in-memory computing device | |
| CN109979503B (en) | Static random access memory circuit structure for realizing Hamming distance calculation in memory | |
| CN112133348B (en) | A 6T cell-based storage unit, storage array and in-memory computing device | |
| CN110176264B (en) | A High-Low-Bit Combining Circuit Structure Based on In-Memory Computing | |
| CN113257306A (en) | Storage and calculation integrated array and accelerating device based on static random access memory | |
| CN112036562B (en) | Bit cell applied to memory computation and memory computation array device | |
| CN114300012B (en) | Decoupling SRAM memory computing device | |
| CN110058839A (en) | A kind of circuit structure based on subtraction in Static RAM memory | |
| CN112599165A (en) | Memory computing unit for multi-bit input and multi-bit weight multiplication accumulation | |
| CN117636945B (en) | 5-bit XOR and XOR accumulation circuit with sign bit, CIM circuit | |
| CN119068948B (en) | Multi-bit multiplication and addition operation circuit based on 6T-SRAM and control method thereof | |
| CN118312468B (en) | In-memory operation circuit with symbol multiplication and CIM chip | |
| CN110941185B (en) | A Double Word Line 6TSRAM Cell Circuit for Binary Neural Network | |
| CN113936717A (en) | Storage and calculation integrated circuit for multiplexing weight | |
| CN117807021B (en) | 2T-2MTJ memory cell and MRAM in-memory computing circuit | |
| CN115954029B (en) | Multi-bit operation module and in-memory computing circuit structure using the module | |
| CN114895869B (en) | Multi-bit memory computing device with symbols | |
| CN114038492A (en) | A Multiphase Sampling In-Memory Computing Circuit | |
| CN119296609B (en) | 8T-SRAM memory computing unit, memory computing array and memory computing circuit | |
| CN114464239A (en) | Memory computing unit | |
| CN116204490B (en) | A 7T storage and calculation circuit and multiplication and accumulation circuit based on low voltage technology | |
| CN113391786B (en) | Computing device for multi-bit positive and negative weights |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |