[go: up one dir, main page]

WO2023207441A1 - Puce intégrée de stockage et de calcul sram basée sur un couplage capacitif - Google Patents

Puce intégrée de stockage et de calcul sram basée sur un couplage capacitif Download PDF

Info

Publication number
WO2023207441A1
WO2023207441A1 PCT/CN2023/083070 CN2023083070W WO2023207441A1 WO 2023207441 A1 WO2023207441 A1 WO 2023207441A1 CN 2023083070 W CN2023083070 W CN 2023083070W WO 2023207441 A1 WO2023207441 A1 WO 2023207441A1
Authority
WO
WIPO (PCT)
Prior art keywords
module
input
sram
multiplication
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2023/083070
Other languages
English (en)
Chinese (zh)
Inventor
王源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Publication of WO2023207441A1 publication Critical patent/WO2023207441A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present disclosure relates to the technical field of integrated circuit design, and in particular to an SRAM storage and calculation integrated chip based on capacitive coupling.
  • In-memory computing (Compute-In-Memory, CIM) technology refers to the transformation of the traditional computing-centered architecture into a data-centered architecture, which directly uses memory for data processing, thereby integrating data storage and computing at the same time.
  • One chip forms an integrated storage and computing chip, which can completely eliminate the bottleneck of the von Neumann computing architecture and reduce the additional power consumption and performance loss caused by data transmission.
  • Static Random Access Memory (SRAM) can be widely used to construct storage and computing integrated chips due to its high speed, low power consumption and high robustness.
  • the storage and calculation integrated chip can be used as a hardware implementation of the multiplication and accumulation operation of the neural network model.
  • the existing storage and calculation integrated chip usually uses a charge-based CIM structure and uses switches in the analog domain.
  • the array and additional control signals implement a charge sharing circuit.
  • the sharing control is complex and the delay is large, which has a great impact on the computing performance of the integrated storage and computing chip.
  • the present disclosure provides an SRAM storage and computing integrated chip based on capacitive coupling to solve the defects existing in the prior art.
  • the present disclosure provides an SRAM storage and calculation integrated chip based on capacitive coupling, including: an input module, a bitwise multiplication module, a capacitance attenuation module and an output module.
  • the input module, the bitwise multiplication module, the capacitance attenuation module and The output modules are connected in sequence;
  • the input module is used to receive input data
  • the bitwise multiplication module includes a plurality of bitwise multiplication units, and each bitwise multiplication unit is used to multiply the input data with one bit of the bitwise stored data based on the capacitive coupling principle to obtain the bitwise multiplication unit. Describes the multiplication result corresponding to one bit of stored data;
  • the capacitive attenuation module includes a two-layer capacitive attenuator array. Each first-type capacitive attenuator in the first-layer capacitive attenuator array is connected between two adjacent bit multiplication units. The second-layer capacitive attenuator array Each second-type capacitive attenuator is connected between two adjacent first-type capacitive attenuators; the capacitive attenuation module is used to accumulate the multiplication results corresponding to each bit of the stored data in layers. , obtain multi-bit data simulation accumulation results;
  • the output module is used to determine and output the digital accumulation result corresponding to the multi-bit data analog accumulation result.
  • the input module includes an input sparse sensing module and an input sparse comparison module, and the input sparse sensing module is connected to the bitwise multiplication module;
  • the output module includes a Flash analog-to-digital conversion module, and the input sparse sensing module, the input sparse comparison module, and the Flash analog-to-digital conversion module are connected in sequence;
  • the input sparse sensing module is used to convert the input data into an analog voltage
  • the input sparse comparison module is used to compare the analog voltage with the first reference voltage to obtain a first comparison result
  • the Flash analog-to-digital conversion module is used to compare the multi-bit data simulation accumulation result with the second reference voltage based on the first comparison result to obtain a second comparison result, and use the second comparison result as the The above numbers add up.
  • the working mode of the SRAM storage and computing integrated chip includes a storage operation mode and a computing operation mode
  • the input module and the output module do not work
  • the SRAM storage and computing integrated chip performs multiplication and accumulation operations on the input data and the storage data.
  • the input sparse comparison module includes a plurality of first comparators
  • the Flash analog-to-digital conversion module includes a plurality of Flash analog-to-digital conversion units, each of which The Flash analog-to-digital conversion unit includes a plurality of second comparators;
  • the first comparator and the second comparator are connected in a one-to-one correspondence, and the first reference of each first comparator The voltage is the same as the second reference voltage of the corresponding connected second comparator.
  • the number of the Flash analog-to-digital conversion units is the same as the number of the second type of capacitive attenuators.
  • the bit multiplication unit includes a column of 9T1C unit array, and the 9T1C unit array includes a plurality of 9T1C units;
  • the SRAM storage and calculation integrated chip also includes an SRAM read and write external structure, and the SRAM read and write external structure is connected to the 9T1C unit.
  • the 9T1C unit includes six first-type transistors and three second-type transistors, and the first-type transistors and the second-type transistors are both Connected to the SRAM read and write external structure;
  • the six first-type transistors are used to store one bit of data of the stored data
  • the three second-type transistors are used to perform a multiplication operation between one bit of the stored data stored by the six first-type transistors and a corresponding bit of the input data.
  • the SRAM read and write external structure includes an SRAM controller, an SRAM peripheral circuit and an address decoding driver;
  • the SRAM controller is respectively connected to the SRAM peripheral circuit and the address decoding driver, and the SRAM peripheral circuit and the address decoding driver are both connected to the 9T1C unit.
  • the SRAM storage and computing integrated chip also includes an in-memory computing controller, and the in-memory computing controller is connected to the input module and the output module respectively. connect.
  • the storage data includes multiple 4-bit weight data in the neural network.
  • the SRAM storage and calculation integrated chip based on capacitive coupling includes: an input module, a bitwise multiplication module, a capacitance attenuation module and an output module.
  • the input data is received through the input module; the input data and the stored data are realized through the bitwise multiplication module.
  • Multiplication operation is used to obtain the multiplication operation result; and the capacitance attenuation module is used to accumulate the multiplication operation results layer by layer with a hierarchical capacitance attenuator structure. Not only the structure is simpler, but also the calculation time is shorter.
  • the digital accumulation results can be obtained quickly, improving multiplication and accumulation. Energy efficiency and computational throughput of operations.
  • Figure 1 is one of the structural schematic diagrams of an SRAM storage and computing integrated chip based on capacitive coupling provided by the present disclosure
  • Figure 2 is a schematic structural diagram of the 4b-DAC in the SRAM storage and calculation integrated chip based on capacitive coupling provided by the present disclosure
  • Figure 3 is a schematic diagram of the connection between the DAC array, the input sparse sensing module and the bitwise multiplication module in the SRAM storage and calculation integrated chip based on capacitive coupling provided by the present disclosure
  • Figure 4 is a schematic structural diagram of each bit multiplication unit in the SRAM storage and calculation integrated chip based on capacitive coupling provided by the present disclosure
  • Figure 5 is a multiplication operation timing diagram of the 9T1C unit in the SRAM storage and calculation integrated chip based on capacitive coupling provided by the present disclosure
  • Figure 6 is a schematic diagram of the connection between the bitwise multiplication module and the capacitance attenuation module when each bitwise multiplication unit in the bitwise multiplication module of the SRAM storage and calculation integrated chip based on capacitive coupling provided by the present disclosure includes a column of 9T1C unit arrays.
  • Figure 7 is the layout of the 9T1C unit and HCA column in the SRAM storage and computing integrated chip based on capacitive coupling provided by the present disclosure
  • Figure 8 is the second structural schematic diagram of the SRAM storage and computing integrated chip based on capacitive coupling provided by the present disclosure
  • Figure 9 is a working timing diagram of the MAC operation with input sparse sensing of the SRAM storage and computing integrated chip based on capacitive coupling provided by the present disclosure
  • Figure 10 is a Monte Carlo simulation schematic diagram of the simulated calculation transfer function, linear fitting results and process fluctuations of the capacitive coupling-based SRAM storage and computing integrated chip provided by the present disclosure at different temperatures and process angles;
  • Figure 11 is a distribution diagram of Monte Carlo simulation results and MAC operation results at point A in Figure 10;
  • Figure 12 is a distribution diagram of Monte Carlo simulation results and MAC operation results at point B in Figure 10;
  • Figure 13 is a distribution diagram of Monte Carlo simulation results and MAC operation results at point C in Figure 10.
  • Neural networks have been widely used and achieved excellent performance in pattern recognition, automatic control, financial analysis, biomedicine and other fields.
  • convolutional neural networks have great application in image processing. The performance is particularly outstanding.
  • the complexity of the task continues to increase, the scale of the neural network continues to increase, and the amount of parameters and calculations in the network are also increasing. This also means that the resources and power consumption consumed by the neural network mapped to the hardware increasing day by day.
  • the core and largest operation in a convolutional neural network is the multiplication and accumulation (Multiply Accumulate, MAC) operation. Therefore, the key to realizing a low-power convolutional neural network lies in the design of a low-power MAC operation unit.
  • Compute-In-Memory (CIM) technology aims to transform the traditional computing-centered architecture into a data-centered architecture, which directly uses memory for data processing, thereby integrating data storage and computing at the same time.
  • the bottleneck of the von Neumann computing architecture can be completely eliminated, which is especially suitable for large-scale parallel application scenarios such as deep learning neural networks (Deep Convolution Neural Network, DCNN) with large amounts of data.
  • This system architecture not only retains the storage, reading and writing functions of the storage circuit itself, but also supports different logic or multiplication and addition operations because the storage unit and computing unit are integrated together, thereby greatly reducing the need for central processing.
  • the frequent bus interactions between the memory and the memory circuit also further reduce the amount of data movement. It can perform a large number of parallel calculations with ultra-low power consumption, which greatly improves the energy efficiency of the system and is an important factor in promoting the realization of high-end artificial intelligence applications. A highly potential research direction in energy efficiency computing.
  • CIM based on charge domain calculation has smaller capacitor mismatch and process changes, and has better nonlinearity and accuracy.
  • CIM based on charge domain calculations still faces some challenges, including:
  • the design of storage and computing units to achieve dot multiplication requires a trade-off between the number of transistors, size and computational dynamic range. For example, if the 8T1C unit is used, the number of transistors is small, but there will be a threshold loss in the dynamic range, while the 10T1C unit is used , although it can achieve rail-to-rail dynamic range, its storage and computing units are larger.
  • CIM based on the charge domain uses a switch array and an amount in the analog domain.
  • the former has complex sharing control and large delay, while the latter has high power consumption and large area.
  • the multi-bit Analog-to-Digital Converter (ADC) that converts the analog MAC operation results into digital encoding consumes a large amount of energy and has a serious impact on the overall energy efficiency.
  • ADC Analog-to-Digital Converter
  • Figure 1 is a schematic structural diagram of an SRAM storage and computing integrated chip based on capacitive coupling provided in an embodiment of the present disclosure.
  • the chip includes: an input module 1, a bitwise multiplication module 2, a capacitive attenuation module 3 and Output module 4;
  • the input module 1, the bitwise multiplication module 2, the capacitance attenuation module 3 and the output module 4 are connected in sequence;
  • the input module 1 is used to receive input data
  • the bitwise multiplication module 2 includes a plurality of bitwise multiplication units 21, each of which is used to multiply the input data with one bit of the bitwise stored data based on the capacitive coupling principle. Obtain the multiplication result corresponding to one bit of the stored data;
  • the capacitive attenuation module 3 includes two layers of capacitive attenuator arrays. Each first-type capacitive attenuator 311 in the first layer of capacitive attenuator array 31 is respectively connected between two adjacent bit multiplication units 21. Each second type capacitive attenuator 321 in the capacitive attenuator array 32 is respectively connected between two adjacent first type capacitive attenuators 311;
  • the capacitance attenuation module 3 is used to accumulate the multiplication results corresponding to each bit of the stored data layer by layer to obtain a multi-bit data simulation accumulation result;
  • the output module 4 is used to determine and output the digital accumulation result corresponding to the multi-bit data analog accumulation result.
  • the input module 1 may include a Digital-to-Analog Converter (DAC) array.
  • the DAC array can include multiple DACs, and the number of bits of each DAC can be determined based on the number of bits of a single storage data stored in each bit multiplication unit and the number of bits of a single input data. The three can be consistent, for example, they can all be 4 bits (i.e. 4b).
  • the DAC array can include multiple 4b-DACs, and each 4b-DAC can be used to receive a 4b of input data.
  • the number of 4b-DACs included in the DAC array can be set as needed, for example, it can be set to 128.
  • each 4b-DAC provides a driving voltage through an off-chip external bias.
  • the off-chip bias provides 16 driving voltages for each 4b DAC, 1/16VDD from GND to VDD
  • the 4-16 decoder function is mainly implemented on the chip. Input a 4b input data (4b Input) into a 4b DAC to obtain the decoding result (DAC-OUT).
  • the bitwise multiplication module 2 can include multiple bitwise multiplication units 21.
  • the number of bitwise multiplication units 21 in the bitwise multiplication module 2 can be set as needed. For example, it can be set to 64, that is, the bitwise multiplication module 2 can implement a total of 64 4b Bitwise multiplication of stored data and input data.
  • Each bit multiplication unit 21 may include the same number of computing units as the DACs in the DAC array, so that the DACs are connected to the computing units in a one-to-one correspondence. Storage data of corresponding bits (that is, storage data of 1b) can be stored in each calculation unit.
  • Each calculation unit includes a capacitor, and then through the capacitive coupling principle, the multiplication operation between the decoding result output by the connected DAC and the stored data of the corresponding bit can be realized. Then each bit multiplication unit can realize all input data and By performing multiplication on one bit of stored data, the multiplication result corresponding to one bit of stored data can be obtained. Therefore, four adjacent bit multiplication units can jointly realize the multiplication operation of all input data and a 4b stored data stored in bitwise.
  • the capacitance attenuator module 3 includes a two-layer capacitance attenuator (CA) array.
  • Each first-type capacitance attenuator 311 in the first-layer capacitance attenuator array 31 is connected to two adjacent ones respectively.
  • each second type capacitive attenuator 321 in the second layer capacitive attenuator array 32 is respectively connected between two adjacent first type capacitive attenuators 311 .
  • the attenuation coefficient of the first type of capacitive attenuator 311 and the second type of capacitive attenuator 321 can be determined according to the ratio of each bit of stored data.
  • the multiplication results corresponding to each bit of each stored data can be accumulated layer by layer to obtain the multi-bit data simulation accumulation result of each stored data.
  • the device 321 may form a hierarchical capacitor attenuator (HCA) structure. Therefore, the capacitance attenuation module 3 connected to the bitwise multiplication module 2 can include a total of 64 HCA structures to realize the accumulation of the bitwise multiplication results of 64 4b stored data and input data.
  • HCA hierarchical capacitor attenuator
  • the HCA structure Compared with the accumulation method of capacitance sharing based on weighted capacitor arrays and compensation capacitor arrays, the HCA structure does not use a switch array for temporary storage and charge sharing of analog data. Therefore, the HCA structure is simpler.
  • the capacitance attenuation module 3 can be driven by a strong external voltage to achieve a stable voltage output. Compared with the weight accumulation mode based on capacitance sharing that achieves a stable voltage output through weak internal voltage rebalancing. This way, the calculation time for obtaining multi-bit data simulation accumulation results is shorter.
  • the output module 4 in the chip may include multiple analog-to-digital converters (ADCs), and the number of bits of each DAC can be determined based on the number of bits of a single stored data stored bitwise in each bit multiplication unit. OK, for example, they can all be 4 bits.
  • the output module 4 can include multiple 4b-ADCs, each 4b-ADC is connected to an HCA structure, converts the multi-bit data analog accumulation result corresponding to each stored data into a digital accumulation result, and converts the digital The accumulated results are output.
  • the SRAM storage and computing integrated chip based on capacitive coupling includes: an input module, a bitwise multiplication module, a capacitance attenuation module and an output module.
  • the input data is received through the input module; the input data is combined with the bitwise multiplication module.
  • the multiplication operation of stored data is used to obtain the multiplication operation result; and the capacitance attenuation module is used to accumulate the multiplication operation results layer by layer with a hierarchical capacitance attenuator structure. Not only the structure is simpler, but also the calculation time is shorter, and the digital accumulation result can be obtained quickly. Improve the energy efficiency and computational throughput of multiply-accumulate operations.
  • the input module includes an input sparse sensing module and an input sparse comparison module, and the input sparse sensing module and the button bit multiplication module connection;
  • the output module includes a Flash analog-to-digital conversion module, and the input sparse sensing module, the input sparse comparison module, and the Flash analog-to-digital conversion module are connected in sequence;
  • the input sparse sensing module is used to convert the input data into an analog voltage
  • the input sparse comparison module is used to compare the analog voltage with the first reference voltage to obtain a first comparison result
  • the Flash analog-to-digital conversion module is used to compare the multi-bit data simulation accumulation result with the second reference voltage based on the first comparison result to obtain a second comparison result, and use the second comparison result as the The numbers are tired Add results.
  • the input module may also include an input sparse sensing module and an input sparse comparison module.
  • the DAC array, the input sparse sensing module and the bitwise multiplication module are connected.
  • the decoding result obtained by each DAC in the DAC array can be expressed as IA[i], 0 ⁇ i ⁇ N-1, and N is the number of DACs in the DAC array, which can be 128. It can be understood that, regardless of whether the input module includes an input sparse sensing module and an input sparse comparison module, IA[i] can be input to the bit multiplication unit for multiplication operations.
  • the input sparsity sensing module can be an IS-DAC (Input Sparsity Sensing DAC), which can include NMOS11 and multiple sensing branches.
  • the sensing branches are connected to the DACs in the DAC array in one-to-one correspondence.
  • the IS-DAC can include a total of one NMOS and 128 A sensing branch, NMOS is responsible for discharging, the source of NMOS can be grounded, and the gate of NMOS can receive an external reset signal (RST_IS).
  • Each sensing branch includes a switch 12 and a capacitor 13.
  • the DAC, switch 12 and capacitor 13 are connected in sequence.
  • the IS-DAC can include a switch array composed of 128 switches and a capacitor array composed of 128 capacitors. The other plates of all capacitors are connected to the collector of NMOS and the input sparse comparison module respectively.
  • the switch array can receive an external gate connection control signal (IS-Eval).
  • the IS-DAC combines the control signal of the switch array and converts all IA[i] into a representation of the input sparsity through capacitive coupling through the capacitor array.
  • Analog voltage V IS Analog voltage
  • the input sparsity comparison module may include an input sparsity comparator array (Input Sparsity Comparator, IS-CA), including a plurality of first comparators.
  • the number of first comparators may be set as needed, for example, 15 may be included.
  • IS-CA is controlled through the external enable signal (IS_SA_EN).
  • the inverting terminal of each first comparator is connected to the first reference voltage Vref[j], 0 ⁇ j ⁇ M-1, and M is the number of comparators in the IS-CA, which can be 15.
  • Each first comparator in IS-CA can compare the analog voltage V IS with the first reference voltage Vref[j] to obtain the first comparison result, that is, the thermometer code DR[j] of 1b, and the IS-CA can output The thermometer code DR ⁇ 0:14> of 15b.
  • the output module can include a Flash analog-to-digital conversion module.
  • the Flash analog-to-digital conversion module can include multiple Flash analog-to-digital conversion units (Flash-ADC). Each Flash-ADC can be a 4b-ADC, so the Flash analog-to-digital conversion module Can be regarded as a 4b-Flash-ADC array.
  • the number of Flash-ADCs in the Flash analog-to-digital conversion module can be the same as the number of stored data, that is, the Flash analog-to-digital conversion module can include a total of 64 Flash-ADCs, which can be recorded as Flash-ADC ⁇ k>, 0 ⁇ k ⁇ K -1, K is the number of Flash-ADC in the Flash analog-to-digital conversion module, which can be 64.
  • Each Flash-ADC can include multiple second comparators.
  • the second comparators in the Flash-ADC are connected to the first comparators in the IS-CA in a one-to-one correspondence. Therefore, there can be 15 comparators in each Flash-ADC.
  • Each of the Flash-ADC Both comparators have a second reference voltage.
  • the second comparator in Flash-ADC can compare the multi-bit data simulation accumulation result with the second reference voltage based on the first comparison result of the corresponding first comparator to obtain the second reference voltage. Compare the result, and output the second comparison result as a digital accumulation result corresponding to the multi-bit data analog accumulation result.
  • the first comparator in IS-CA and the second comparator in Flash-ADC are both strong-arm comparators.
  • the first comparator in IS-CA is arranged in order from the closest to the IS-DAC. In far order, the first reference voltage is from low to high.
  • the second comparator in the Flash-ADC is arranged in the order of the distance from the first comparator in the connected IS-CA to the IS-DAC from the nearest to the farthest, and the second reference voltage is from low to high.
  • the first second comparator can be expressed as L-Comp ⁇ 0>, and its corresponding second reference voltage ranges from 0-400mV.
  • the last second comparator can be represented as H-Comp ⁇ 14>, and its corresponding second reference voltage ranges from 400-900mV.
  • the input sparse sensing module, the input sparse comparison module and the Flash analog-to-digital conversion module are combined to achieve high throughput of the chip.
  • the Flash analog-to-digital conversion module has a rail-to-rail decoding range. Since in MAC operations, the entire dynamic range is rarely achieved, especially when the input data is sparse.
  • the input sparse sensing strategy based on the real-time sensing of the input sparsity characteristics of the input sparse sensing module is used in the decoding of the Flash analog-to-digital conversion module to reduce the number of comparisons and thereby reduce energy. This strategy estimates the sum of 128 4b input data and quantizes it without considering the stored data. According to the quantization result, it allows to skip redundant comparator work.
  • embodiments of the present disclosure provide an SRAM storage and computing integrated chip based on capacitive coupling.
  • the working mode of the SRAM storage and computing integrated chip includes a storage operation mode and a computing operation mode;
  • the input module and the output module do not work
  • the SRAM storage and computing integrated chip performs multiplication and accumulation operations on the input data and the storage data.
  • the working mode of the SRAM storage and computing integrated chip may include two types, namely the storage operation (SRAM) mode and the computing operation (CIM) mode.
  • the storage operation mode refers to the operation mode in which the storage data is stored in the SRAM storage and calculation integrated chip bit by bit.
  • the storage location can be in the bit multiplication unit.
  • the calculation operation mode refers to the operation mode in which input data and stored data are calculated.
  • the input sparse comparison module includes a plurality of first comparators
  • the Flash analog-to-digital conversion module includes a plurality of Flash An analog-to-digital conversion unit, each of the Flash analog-to-digital conversion units includes a plurality of second comparators;
  • the first comparator and the second comparator are connected in a one-to-one correspondence, and the first reference voltage of each first comparator is the same as the second reference voltage of the correspondingly connected second comparator.
  • the first reference voltage of the first comparator and the second reference voltage of the connected second comparator can be the same, so as to ensure that the working state of the second comparator is accurately determined through the input sparseness. judgment, thereby improving the accuracy of the output results while reducing the number of working second comparators.
  • the number of the Flash analog-to-digital conversion units is the same as the number of the second type capacitive attenuators.
  • the number of Flash-ADCs is the same as the number of second-type capacitive attenuators, and they can be connected in one-to-one correspondence. This can ensure that each Flash analog-to-digital conversion unit realizes a corresponding 4b storage data Determination and output of digital accumulation results.
  • the input sparse sensing module can convert 128 4b input data into one representing the input data through capacitive coupling.
  • IS-CA compares V IS with the first reference voltage of the first comparator.
  • the first comparison result is the 15b thermometer code DR[0:14], which represents the quantized input sparsity.
  • the thermometer code DR[0:14] determines the working status and second comparison result of the 15 second comparators in each Flash-ADC during the readout stage.
  • thermometer code DR[i] 0
  • corresponding second comparator Comp ⁇ i> will be skipped and the comparison result will be set to 0.
  • thermometer code DR[i] 1
  • its corresponding second comparator Comp ⁇ i> will work normally and generate an output.
  • the bit multiplication unit includes a column of 9T1C unit array, and the 9T1C unit array includes multiple 9T1C units;
  • the SRAM storage and calculation integrated chip also includes an SRAM read and write external structure, and the SRAM read and write external structure is connected to the 9T1C unit.
  • each bit multiplication unit may include a column of 9T1C unit arrays, and each 9T1C unit array may include multiple 9T1C units.
  • Each 9T1C unit is a computing unit containing 9 transistors T and 1 capacitor (C bitcell ). Storage and multiplication are realized through 9T, and the multiplication results are accumulated on the upper plate of the capacitor through the capacitive coupling principle of the capacitor.
  • the SRAM storage and computing integrated chip can also include an SRAM read and write external structure.
  • the SRAM read and write external structure can be connected to each 9T1C unit through the word line WL and the bit line BL/BLB to realize the driving and control of each 9T1C.
  • a 9T1C cell array is used as a bit multiplication unit to realize the multiplication operation of one bit of stored data and the input data.
  • the number of transistors and the dynamic range are balanced, and the number of transistors is improved. improve chip performance.
  • the 9T1C unit includes six first-type transistors and three second-type transistors.
  • the first-type transistors The second type of transistor is connected to the SRAM read and write external structure;
  • the six first-type transistors are used to store one bit of data of the stored data
  • the three second-type transistors are used to perform a multiplication operation between one bit of the stored data stored by the six first-type transistors and a corresponding bit of the input data.
  • each bit multiplication unit may include a column of 9T1C unit arrays, and the 9T1C unit array may include multiple 9T1C units, and each 9T1C unit may be used to store one bit of data.
  • the number of 9T1C cells in each 9T1C cell array can be the same as the number of input data, for example, both can be 128.
  • each 9T1C unit The input of each 9T1C unit is IA[i].
  • the input line IA[i] divides the 9 transistors T into 6T on the upper side and 3T on the lower side.
  • the upper 6T can be the first type of transistor, which is mainly used to store one bit of data.
  • the lower 3T can be the second type of transistor, which is used to multiply the input data and the one bit of data stored in the upper 6T.
  • 6T includes node Q and node QB, and the input data is stored in node Q in the form of voltage.
  • the first T is parallel to the second T and then connected in series with the third T.
  • QB[i] and Q[i] are the voltages on the lines connected to the gates of the first T and the second T respectively.
  • the capacitor C in each 9T1C unit is parallel to the third T.
  • the upper plate voltage of the capacitor C can be expressed as Mult[i], which can be used to characterize the product operation result of IA[i] and the one-bit data stored at point Q. .
  • each 9T1C unit is connected to the word line WL[i], the bit line BL, and the bit line BLB, and each 9T1C unit includes a calculation line CL.
  • a switch for reset is connected to the calculation line CL. After the switch is turned on, the corresponding calculation line can receive the reset signal (RST_MAC).
  • the 9T1C unit has a rail-to-rail dynamic range that is greater than the 8T1C design.
  • the capacitor used is a ⁇ 1.33f MOM capacitor, which can be placed above the 9T transistor during chip preparation, with a small area overhead.
  • the 1b data in the 4b storage data is stored in each 9T1C unit.
  • the 4b input data is applied to the input line IA[0:127] as an analog voltage generated by the 4b-DAC, which is used to drive all bits on the corresponding row.
  • the multiplication operation of the 9T1C unit can include two stages: reset and output evaluation, as shown in Figure 5.
  • Multiple storage data are stored in multiple 9T1C cells in the 9T1C cell array, multiple 9T1C cells are parallelized to perform multiplication operations, and multiple input data are applied to the IA as driving voltages during the evaluation phase.
  • the generated voltage V_CL is proportional to the bitwise MAC operation result of the 1b data of the stored data and the 4b input data.
  • each bit multiplication unit in the bit-wise multiplication module includes a column of 9T1C unit arrays, the bit-wise multiplication module and Connection diagram of the capacitive attenuation module.
  • Figure 6 only shows four adjacent bit multiplication units corresponding to a 4b storage data and two first-type capacitive attenuators and second-layer capacitive attenuators in the first-layer capacitive attenuator array included in an HCA structure.
  • the 4-bit data in the 4b storage data are represented as W[0], W[1], W[2], and W[3] respectively.
  • Each bit of data corresponds to a calculation line, which is CL[0] and CL[1 respectively. ], CL[2], CL[3].
  • the two first-type capacitive attenuators are Cw01 and Cw02, and the second-type capacitive attenuator is Cw01/23.
  • Cw01 is connected to CL[0] and CL[1] respectively
  • Cw02 is connected to CL[2] and CL[ respectively.
  • 3] connection Cw01/23 is connected to CL[1] and CL[3] respectively.
  • the multiplication results of 128 4b IAs and four 1b data storing data are stored in CL[3], CL[2], CL[1] and CL[0] respectively.
  • the above calculation process determines the attenuation coefficient AC of each capacitive attenuator according to the ratio of one bit of data, and this process is hierarchical.
  • the multiplication operation of the 9T1C unit can include a reset phase and an evaluation phase.
  • the reset phase the upper and lower plates of the first type capacitor attenuator and the second type capacitor attenuator are discharged to GND.
  • the evaluation stage all 4b input data are input into the 9T1C unit array through the 4b-DAC, and the upper plate of the capacitor in the 9T1C unit is clamped to a fixed voltage generated by the 4b-DAC.
  • the output voltage V HCA of the HCA structure representing the calculation result is generated.
  • V HCA will be quantized by the 4b Flash analog-to-digital conversion module and output the 4b calculation result.
  • V HCA can be calculated by the following formula:
  • IA i is the input of the i-th row
  • w i,j is the one-bit data of the storage data stored in the i-th row and j-th column, which is 0 or 1
  • IA max,i is the maximum input value, for 4b Input data, IA max,i is 15.
  • the transistor area of the 9T1C unit is 0.7um ⁇ 1.42um
  • the MOM capacitor area of 1.33fF is 0.55um ⁇ 1.42um.
  • Figure 7 shows the layout of 9T1C units and HCA columns, which includes 4 columns of 9T1C units and 1 HCA structure, showing a multiplication and accumulation implementation layout of 4b weight data. Taking into account the symmetry and matching of the layout, the following three improvements are made: First, the C w01 /C w23 and C w01/23 capacitors in the HCA structure are split into 128 and 64 unit capacitances C bitcell at the 9T1C unit level respectively. . These small unit capacitors are distributed throughout the layout of the bitwise multiplication module while ensuring a preset ratio. Therefore, for each row, seven small capacitors at the 9T1C unit level with different functions are distributed in the transistor-level layout of four 9T1C units.
  • A is the unit capacitance C bitcell
  • B is the first type of capacitive attenuator
  • C is a virtual capacitor
  • D is the second type of capacitive attenuator.
  • the SRAM read and write external structure may include an SRAM controller (SRAM Controller) and SRAM peripheral circuits (SRAM Peripheral circuits). And address decoder driver (Address Decoder&Driver).
  • the SRAM controller can be connected to the SRAM peripheral circuit and address decoding driver respectively to achieve global control of the chip's storage function.
  • SRAM peripheral circuits and address decoding drivers are compatible with 9T1C The units are connected to ensure that the stored data can be stored in each 9T1C unit bit by bit.
  • the automatic implementation of the chip storage function can be realized through the SRAM controller.
  • the SRAM storage and computing integrated chip based on capacitive coupling is provided in the embodiment of the present disclosure.
  • the SRAM storage and computing integrated chip may also include an in-memory computing controller (CIM Controller).
  • the in-memory computing controller Can be connected to input modules and output modules respectively. Through the in-memory computing controller, global control of the chip’s computing functions can be achieved.
  • the storage data includes multiple 4-bit weight data in the neural network.
  • the neural network usually includes a large amount of 4-bit weight data, which can be stored bit by bit as storage data to the bit multiplication unit in the chip, and combined with the input data of the neural network.
  • the multiplication and accumulation operation can then realize the function of the convolution kernel in the neural network.
  • the chip 8 is a schematic diagram of the complete structure of an SRAM storage and computing integrated chip based on capacitive coupling provided in an embodiment of the present disclosure.
  • This chip can realize the requirement of 64 convolution kernels in a neural network to achieve 128 operations individually.
  • the chip includes a 128 ⁇ 256 9T1C cell array, SRAM controller, SRAM peripheral circuit, address decoder driver, CIM controller, 128 4b-DAC, IS-DAC, IS-CA, including 1 ⁇ 64 HCA Capacitance attenuation module and Flash analog-to-digital conversion module containing 1 ⁇ 64 4b-ADC.
  • the size of the 9T1C cell array is 128 ⁇ 256, and the memory capacity of the chip is 32kb.
  • the 256 columns of the 9T1C cell array are divided into 64 groups. Each group contains 4 columns and is used to store a 4b weight data.
  • the chip In SRAM mode, 4b-DAC, IS-DAC, IS-CA, and 4b-ADC do not work. At this time, the chip is a 6T-SRAM memory and performs normal read and write operations. In this mode, the neural network Weight data will be written to SRAM. In CIM mode, the chip executes 4b MAC operations completely in parallel. In a single cycle, all rows input 4b input data.
  • the chip can support a total of 128 4b input data, which can be expressed as IN[0][0 :3], IN[1][0:3], ..., IN[127][0:3].
  • the corresponding decoding results IA[0], IA[1], ..., IA[127] can be obtained by inputting the data through the corresponding 4b-DAC.
  • the vector matrix multiplication of 128 input data and 64 128 ⁇ 4b weights is calculated in the analog domain through capacitive coupling.
  • the Flash analog-to-digital conversion module converts the analog voltage representing the MAC operation result into a 4b digital code output.
  • the SRAM peripheral circuit provides bit lines BL/BLB and address decoding for each 9T1C cell in the 9T1C cell array
  • the driver provides word line WL for each 9T1C cell.
  • a switch controlled by the CIM controller is also connected between the 9T1C unit array and the capacitance attenuation module.
  • the switch can be grounded to realize the reset (RST) of the corresponding calculation line CL[i].
  • the input module includes an input sparse sensing module and an input sparse comparison module
  • the output module includes a Flash analog-to-digital conversion module.
  • the working sequence of the MAC operation with input sparsity sensing is divided into two independent parts: input sparsity sensing (IS) and MAC operation, which are interconnected through the thermometer code DR[0:14] the process of.
  • the IS process is divided into two processes: reset (Reset_IS) and output evaluation (Evaluation_IS).
  • the MAC operation process is also divided into two processes: reset (Reset_MAC) and output evaluation (Evaluation_MAC).
  • CLK is the working clock of the chip.
  • the working clock generates RST_IS, EVAL_IS (i.e. IS-Eval in Figure 2), SA_EN_IS (i.e. IS_SA_EN in Figure 3), RST_MAC, EVAL_MAC (indicating the perceptual support during MAC operation) through the timing control module.
  • the gate of the switch in the road is connected to the control signal) and SA_EN_MAC (indicating the enable signal of IS-CA during the MAC operation).
  • RST_IS and EVAL_IS, RST_MAC and EVAL_MAC are two inverted signals.
  • SA_EN_IS and SA_EN_MAC are both used for readout in the Evaluation stage.
  • the RST_IS signal is advanced before the RST_MAC signal, so that DR[0:14] can be generated before the MAC operation stage, thereby controlling the working status and the second type of comparator in the Flash analog-to-digital conversion module. output.
  • the IS-DAC is reset through the RST_IS signal.
  • IS-DAC immediately evaluates the input sparsity in the analog domain, generates V IS , and then prepares to start quantification of V IS through IS-CA.
  • the Reset_MAC process of the MAC operation is in progress.
  • IS-CA has already generated DR[0:14]. Such working timing ensures that the addition of the input sparse sensing strategy will not reduce the throughput of chip calculations.
  • the SRAM storage and computing integrated chip based on capacitive coupling adopts a 9T1C unit, which performs multiplication operations in the capacitive domain in a capacitive coupling manner; it achieves 4b of stored data through a hierarchical attenuation capacitive structure.
  • this structure does not have the additional switches, complex controls, and long sharing time of the traditional charge sharing structure, which greatly improves the computing throughput of multi-bit heavy data computing systems; it uses a Flash analog-to-digital conversion module based on the input sparse sensing strategy , which can reduce the number of AD comparisons and improve the energy efficiency of the system.
  • the chip can support 8192 4b ⁇ 4b MAC operations.
  • the distribution of Monte Carlo simulation results and MAC operation results at point A in Figure 10 is shown in Figure 11.
  • the MAC operation results corresponding to the vertical lines from left to right in Figure 11 are 166.107m, 166.307m, 166.506m, respectively. 166.706m, 166.905m, 167.105m and 167.304m.
  • the distribution of Monte Carlo simulation results and MAC operation results at point B in Figure 10 is shown in Figure 12.
  • the MAC operation results corresponding to the vertical lines from left to right in Figure 12 are 446.451m, 446.721m, 446.992m, respectively. 447.262m, 447.532m, 447.802m and 448.073m.
  • the distribution of Monte Carlo simulation results and MAC operation results at point C in Figure 10 is shown in Figure 13.
  • the MAC operation results corresponding to the vertical lines from left to right in Figure 13 are 726.978m, 727.275m, 727.573m, respectively. 727.871m, 728.168m, 728.466m and 728.763m.
  • the structure of the chip was simulated to calculate the voltage establishment time.
  • simulation first write all 1's in the storage locations of the 9T1C unit, and then input the corresponding pattern from small to large with equal gradients, so that the MAC operation result gradually increases.
  • the average settling time of the analog voltage of the MAC operation of 128 input data of 4b and weight data of 4b is 0.2ns.
  • the analog voltage establishment time of this chip is reduced by 90%, which makes the chip have a higher computational throughput than the charge sharing scheme. 50% improvement.
  • the reduction in analog voltage settling time and the improvement in throughput are mainly due to the fact that the analog voltage of the chip is established at a strong and clear externally applied voltage, while the analog voltage in the charge sharing structure is established at a weak and floating internal voltage. is established through re-equilibration of potential.
  • the energy efficiency of the chip under different input sparsity with and without the input sparsity sensing strategy is also compared. In both cases, the chip's energy efficiency increases with input sparsity, in increasingly larger increments.
  • the presence of the delta in the absence of an input sparsity-aware strategy indicates that the 9T1C unit saves the driver cost of capacitors with sparse dot product results.
  • the results with the input sparsity-aware strategy are lower than without the input sparsity-aware strategy due to the cost of IS-DAC and IS-CA.
  • the results with the presence of the input sparsity-aware strategy are significantly greater than the results without the input sparsity-aware strategy, thanks to the large number of skips in the second category in the Flash analog-to-digital conversion module during computation Comparators.
  • the input sparsity sensing strategy introduced by this chip achieves high energy efficiency of 460 ⁇ 2264.4TOPS/W under input sparsity of 5% to 95%. In terms of average energy efficiency, the results with the input sparse sensing strategy show an improvement of 12.8%, reaching a high energy efficiency of 666TOPS/W.
  • Table 3 shows a performance comparison table between the chip structure provided in the embodiment of the present disclosure and the existing chip structure.
  • the chip structure provided in the embodiment of the present disclosure achieves higher energy efficiency and throughput, and is better than the existing chip structure. Compared to 10 times and 1.84 times respectively. And behavioral simulation results show that the classification accuracy in the CIFAR-10 dataset is comparable to other works.
  • the existing chip structures in Table 3 are all reflected in the source of the article giving the structure.
  • Table 3 shows a performance comparison table between the chip structure provided in the embodiment of the present disclosure and the existing chip structure.
  • 1 means the average area considered local computing cell
  • 2 means the current calculation of the 1b weight MAC operation and the charge sharing of multi-bit weight accumulation (Current computation for 1b weight MAC and charge-sharing for multi-bit weight accumulation)
  • 3 means estimated from the description (Estimated from the description)
  • 4 means from the proposed Estimated from the proposed structure with NMOS as transmission gate switch
  • 5 represents Estimated from the graph
  • 6 represents normalization to 4b/4b input/weight operation (Normalized to 4b/4b input/weight operation)
  • 7 means that one MAC operation is counted as two operations (multiplication and addition) (One MAC is counted as two operations (multiplication and addition))
  • 8 means that the comparator offset is considered Behavioral simulation result considering comparator offset voltage.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Semiconductor Integrated Circuits (AREA)
  • Semiconductor Memories (AREA)
  • Analogue/Digital Conversion (AREA)

Abstract

La présente divulgation concerne une puce intégrée de stockage et de calcul SRAM basée sur un couplage capacitif. La puce intégrée de stockage et de calcul SRAM comprend un module d'entrée, un module de multiplication par bit, un module d'atténuation de capacité et un module de sortie, le module d'entrée étant utilisé pour recevoir des données d'entrée ; le module de multiplication par bit est utilisé pour réaliser une opération de multiplication sur les données d'entrée et les données de stockage, de façon à obtenir un résultat d'opération de multiplication ; et le module d'atténuation de capacité est utilisé pour couche la structure d'un atténuateur de capacité pour réaliser une accumulation en couches de résultats d'opération de multiplication. La structure est plus simple, et le temps de calcul est également plus court de telle sorte qu'un résultat d'accumulation de nombre peut être rapidement obtenu, ce qui permet d'améliorer l'efficacité énergétique d'une opération d'accumulation de multiplication, et d'augmenter le débit de calcul.
PCT/CN2023/083070 2022-04-27 2023-03-22 Puce intégrée de stockage et de calcul sram basée sur un couplage capacitif Ceased WO2023207441A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210457425.XA CN115048075B (zh) 2022-04-27 2022-04-27 基于电容耦合的sram存算一体芯片
CN202210457425.X 2022-04-27

Publications (1)

Publication Number Publication Date
WO2023207441A1 true WO2023207441A1 (fr) 2023-11-02

Family

ID=83157158

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/083070 Ceased WO2023207441A1 (fr) 2022-04-27 2023-03-22 Puce intégrée de stockage et de calcul sram basée sur un couplage capacitif

Country Status (2)

Country Link
CN (1) CN115048075B (fr)
WO (1) WO2023207441A1 (fr)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117316237A (zh) * 2023-12-01 2023-12-29 安徽大学 时域8t1c-sram存算单元及时序跟踪量化的存算电路
CN118098310A (zh) * 2024-04-25 2024-05-28 南京大学 基于超前补偿型跨阻放大器的光电存算阵列读出电路
CN118298872A (zh) * 2024-06-05 2024-07-05 安徽大学 输入权重比特位可配置的存内计算电路及其芯片
CN118333119A (zh) * 2024-06-13 2024-07-12 温州核芯智存科技有限公司 存算单元、存算方法、存内计算区块及神经网络电路组件
CN118519611A (zh) * 2024-05-09 2024-08-20 上海北湖冰矽科技有限公司 具有冗余设计的存内计算电路、方法、装置及电子设备
CN118536563A (zh) * 2024-05-09 2024-08-23 上海北湖冰矽科技有限公司 模拟存内计算电路、处理装置及电子设备
CN118708152A (zh) * 2024-08-27 2024-09-27 致真存储(北京)科技有限公司 存算一体运算电路

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115048075B (zh) * 2022-04-27 2025-11-14 北京大学 基于电容耦合的sram存算一体芯片
CN115658012B (zh) * 2022-09-30 2023-11-28 杭州智芯科微电子科技有限公司 向量乘加器的sram模拟存内计算装置和电子设备
CN115664422B (zh) * 2022-11-02 2024-02-27 北京大学 一种分布式逐次逼近型模数转换器及其运算方法
CN115794728B (zh) * 2022-11-28 2024-04-12 北京大学 一种存内计算位线钳位与求和外围电路及其应用
CN116126779B (zh) * 2023-02-21 2025-10-17 安徽大学 一种9t存算电路、乘累加运算电路、存内运算电路及芯片
CN116312670B (zh) * 2023-02-24 2025-09-19 安徽大学 一种9t1c存算电路、乘累加运算电路、存内运算电路、芯片
CN116029351B (zh) * 2023-03-30 2023-06-13 南京大学 基于光电存算单元的模拟域累加读出电路
CN117236394B (zh) * 2023-07-03 2025-09-19 南京大学 一种可部署大规模神经网络的存算一体装置及方法
CN119250129B (zh) * 2024-12-05 2025-03-04 北京犀灵视觉科技有限公司 基于感存算一体架构的cnn数据处理方法、装置以及芯片
CN119993237B (zh) * 2025-04-10 2025-08-05 北京大学 存储模块、存储阵列、存储装置及存内计算编程方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111144558A (zh) * 2020-04-03 2020-05-12 深圳市九天睿芯科技有限公司 基于时间可变的电流积分和电荷共享的多位卷积运算模组
CN111611529A (zh) * 2020-04-03 2020-09-01 深圳市九天睿芯科技有限公司 电容容量可变的电流积分和电荷共享的多位卷积运算模组
US11176991B1 (en) * 2020-10-30 2021-11-16 Qualcomm Incorporated Compute-in-memory (CIM) employing low-power CIM circuits employing static random access memory (SRAM) bit cells, particularly for multiply-and-accumluate (MAC) operations
CN113658628A (zh) * 2021-07-26 2021-11-16 安徽大学 一种用于dram非易失存内计算的电路
CN115048075A (zh) * 2022-04-27 2022-09-13 北京大学 基于电容耦合的sram存算一体芯片

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3844608A4 (fr) * 2018-08-31 2021-12-08 Flex Logix Technologies, Inc. Circuit multiplicateur-accumulateur, architecture de blocs logiques pour multiplication-accumulation et circuit intégré comprenant une matrice de blocs logiques
CN110288510B (zh) * 2019-06-11 2020-10-16 清华大学 一种近传感器视觉感知处理芯片及物联网传感装置
CN112558917B (zh) * 2019-09-10 2021-07-27 珠海博雅科技有限公司 存算一体电路和基于存算一体电路的数据运算方法
US11662980B2 (en) * 2019-11-06 2023-05-30 Flashsilicon Incorporation In-memory arithmetic processors
CN111431536B (zh) * 2020-05-18 2023-05-02 深圳市九天睿芯科技有限公司 子单元、mac阵列、位宽可重构的模数混合存内计算模组
CN112233712B (zh) * 2020-12-14 2021-03-05 中科院微电子研究所南京智能技术研究院 一种6t sram存算装置、存算系统及存算方法
CN113283593B (zh) * 2021-05-25 2023-09-12 思澈科技(上海)有限公司 一种卷积运算协处理器及基于该处理器的快速卷积方法
CN113343585B (zh) * 2021-06-29 2024-08-23 江南大学 一种用于矩阵乘法运算的权位分立存算阵列设计方法
CN113419705B (zh) * 2021-07-05 2024-08-16 南京后摩智能科技有限公司 存内乘加计算电路、芯片、计算装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111144558A (zh) * 2020-04-03 2020-05-12 深圳市九天睿芯科技有限公司 基于时间可变的电流积分和电荷共享的多位卷积运算模组
CN111611529A (zh) * 2020-04-03 2020-09-01 深圳市九天睿芯科技有限公司 电容容量可变的电流积分和电荷共享的多位卷积运算模组
US11176991B1 (en) * 2020-10-30 2021-11-16 Qualcomm Incorporated Compute-in-memory (CIM) employing low-power CIM circuits employing static random access memory (SRAM) bit cells, particularly for multiply-and-accumluate (MAC) operations
CN113658628A (zh) * 2021-07-26 2021-11-16 安徽大学 一种用于dram非易失存内计算的电路
CN115048075A (zh) * 2022-04-27 2022-09-13 北京大学 基于电容耦合的sram存算一体芯片

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117316237A (zh) * 2023-12-01 2023-12-29 安徽大学 时域8t1c-sram存算单元及时序跟踪量化的存算电路
CN117316237B (zh) * 2023-12-01 2024-02-06 安徽大学 时域8t1c-sram存算单元及时序跟踪量化的存算电路
CN118098310A (zh) * 2024-04-25 2024-05-28 南京大学 基于超前补偿型跨阻放大器的光电存算阵列读出电路
CN118519611A (zh) * 2024-05-09 2024-08-20 上海北湖冰矽科技有限公司 具有冗余设计的存内计算电路、方法、装置及电子设备
CN118536563A (zh) * 2024-05-09 2024-08-23 上海北湖冰矽科技有限公司 模拟存内计算电路、处理装置及电子设备
CN118298872A (zh) * 2024-06-05 2024-07-05 安徽大学 输入权重比特位可配置的存内计算电路及其芯片
CN118333119A (zh) * 2024-06-13 2024-07-12 温州核芯智存科技有限公司 存算单元、存算方法、存内计算区块及神经网络电路组件
CN118708152A (zh) * 2024-08-27 2024-09-27 致真存储(北京)科技有限公司 存算一体运算电路

Also Published As

Publication number Publication date
CN115048075B (zh) 2025-11-14
CN115048075A (zh) 2022-09-13

Similar Documents

Publication Publication Date Title
WO2023207441A1 (fr) Puce intégrée de stockage et de calcul sram basée sur un couplage capacitif
US11714749B2 (en) Efficient reset and evaluation operation of multiplying bit-cells for in-memory computing
US11948659B2 (en) Sub-cell, mac array and bit-width reconfigurable mixed-signal in-memory computing module
CN115039177B (zh) 低功耗存储器内计算位单元
US10642922B2 (en) Binary, ternary and bit serial compute-in-memory circuits
US20190370640A1 (en) Architecture of in-memory computing memory device for use in artificial neuron
CN115910152B (zh) 电荷域存内计算电路以及具有正负数运算功能的存算电路
CN115080501A (zh) 基于局部电容电荷共享的sram存算一体芯片
Kang et al. A 481pJ/decision 3.4 M decision/s multifunctional deep in-memory inference processor using standard 6T SRAM array
Zhang et al. HD-CIM: Hybrid-device computing-in-memory structure based on MRAM and SRAM to reduce weight loading energy of neural networks
Tsai et al. RePIM: Joint exploitation of activation and weight repetitions for in-ReRAM DNN acceleration
Cheon et al. A 2941-TOPS/W charge-domain 10T SRAM compute-in-memory for ternary neural network
Kim et al. A charge-domain 10T SRAM based in-memory-computing macro for low energy and highly accurate DNN inference
Zhang et al. In-memory multibit multiplication based on bitline shifting
Xiao et al. A 128 Kb DAC-less 6T SRAM computing-in-memory macro with prioritized subranging ADC for AI edge applications
CN119152906B (zh) 一种基于eDRAM的高密度近存计算与存内计算混合架构及计算方法
Lin et al. An 11T1C Bit-Level-Sparsity-Aware Computing-in-Memory Macro With Adaptive Conversion Time and Computation Voltage
US12249395B2 (en) Memory device supporting in-memory MAC operation between ternary input data and binary weight using charge sharing method and operation method thereof
Lin et al. A 28-nm 9T SRAM-based CIM macro with capacitance weighting module and redundant array-assisted ADC
Bharti et al. Compute-in-memory using 6T SRAM for a wide variety of workloads
CN117877553A (zh) 一种用于非易失性随机存储器的存内计算电路
CN117389466A (zh) 可重构智能存算一体处理器及存算架构设计装置
US12347520B2 (en) Computing-in-memory device including digital-to-analog converter based on memory structure
CN118606267B (zh) 一种模数混合自刷新存内计算簇及模拟辅助数字存内计算电路
US12224008B2 (en) Non-volatile static random access memory

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23794876

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 07.02.2025)

122 Ep: pct application non-entry in european phase

Ref document number: 23794876

Country of ref document: EP

Kind code of ref document: A1