WO2025122977A1 - Calcul en mémoire stochastique hyperdimensionnel avec matrice dram à mode de charge - Google Patents
Calcul en mémoire stochastique hyperdimensionnel avec matrice dram à mode de charge Download PDFInfo
- Publication number
- WO2025122977A1 WO2025122977A1 PCT/US2024/059041 US2024059041W WO2025122977A1 WO 2025122977 A1 WO2025122977 A1 WO 2025122977A1 US 2024059041 W US2024059041 W US 2024059041W WO 2025122977 A1 WO2025122977 A1 WO 2025122977A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- transistor
- input
- charge
- edram
- transistors
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C7/00—Arrangements for writing information into, or reading information out from, a digital store
- G11C7/10—Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
- G11C7/1006—Data managing, e.g. manipulating data before writing or reading out, data bus switches or control circuits therefor
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C11/00—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
- G11C11/21—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
- G11C11/34—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
- G11C11/40—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
- G11C11/401—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming cells needing refreshing or charge regeneration, i.e. dynamic cells
- G11C11/403—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming cells needing refreshing or charge regeneration, i.e. dynamic cells with charge regeneration common to a multiplicity of memory cells, i.e. external refresh
- G11C11/405—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming cells needing refreshing or charge regeneration, i.e. dynamic cells with charge regeneration common to a multiplicity of memory cells, i.e. external refresh with three charge-transfer gates, e.g. MOS transistors, per cell
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C11/00—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
- G11C11/21—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
- G11C11/34—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
- G11C11/40—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
- G11C11/401—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming cells needing refreshing or charge regeneration, i.e. dynamic cells
- G11C11/4063—Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing or timing
- G11C11/407—Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing or timing for memory cells of the field-effect type
- G11C11/409—Read-write [R-W] circuits
- G11C11/4096—Input/output [I/O] data management or control circuits, e.g. reading or writing circuits, I/O drivers or bit-line switches
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C11/00—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
- G11C11/54—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using elements simulating biological cells, e.g. neuron
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C11/00—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
- G11C11/56—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using storage elements with more than two stable states represented by steps, e.g. of voltage, current, phase, frequency
- G11C11/565—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using storage elements with more than two stable states represented by steps, e.g. of voltage, current, phase, frequency using capacitive charge storage elements
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C7/00—Arrangements for writing information into, or reading information out from, a digital store
- G11C7/16—Storage of analogue signals in digital stores using an arrangement comprising analogue/digital [A/D] converters, digital memories and digital/analogue [D/A] converters
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
Definitions
- a method of performing a compute-in-memory operation on a eDRAM cell comprising a plurality of transistors, the plurality of transistors comprising at least: an input transistor, a charge pump transistor, a write select transistor, a share select transistor, and an output transistor, the plurality of transistors having a respective channel located underneath, the method comprising: performing a program operation on the plurality of transistors, the program operation comprising: coupling the input transistor and the output transistor to an analog supply, and sequentially pulsing the charge pump transistor, the write select transistor, and the share select transistor between ground and the analog supply while the channel underneath the output transistor is programmed between ground and a digital supply; and performing a compute operation on the plurality of transistors, the compute operation comprising: coupling the channel underneath the input transistor to the channel underneath the output transistor, and pulsing the input transistor from ground to the analog supply to transfer charge in the channel underneath the output transistor to the channel underneath the input transistor.
- the analog supply comprises a voltage of between about 1.5V and about 2.
- the digital supply comprises a voltage between about 0.6V and about 0.8V.
- sequentially pulsing the charge pump transistor, the write select transistor, and the share select transistor comprises pulsing the charge pump transistor, the write select transistor, and the share select transistor over a plurality of cycles.
- a second cycle of the plurality of cycles comprises a first phase and a second phase, and wherein: during the first phase of the second cycle of the plurality of cycles, the charge pump transistor and the write select transistor are pulsed and, during the second phase of the cycle of the plurality of cycles, the share select transistor is pulsed.
- pulsing the input transistor comprises accomplishing a linear matrix-vector multiplication operation by inducing a transfer of charge onto the output transistor proportional to the charge in the channel underneath the input transistor.
- the method further comprises performing a non-destructive readout operation to determine a result of the linear matrix-vector multiplication.
- the plurality of transistors are disposed within a memory cell, and a memory array comprises a plurality of memory cells arranged in a plurality of rows and a plurality of columns.
- the memory array further comprises: a row driver to drive the plurality of rows of memory cells; and a column driver to drive the plurality of columns of memory cells.
- the memory array is configured to perform fully row-parallel and column-parallel matrix-vector multiplication operations.
- the row driver comprises a pseudo-random number generator.
- the row driver comprises a binary -to-stochastic converter configured to convert a binary input into a stochastic representation.
- a system comprising a monolithic 3D silicon arrangement, the monolithic 3D silicon arrangement comprising: a plurality of eDRAM cells, each of the plurality of eDRAM cells comprising: a plurality of transistors, the plurality of transistors comprising at least: an input transistor, a write select transistor, and an output transistor, each of the plurality of transistors having a respective channel located underneath, wherein the plurality of eDRAM cells are disposed in a first direction, a plurality of input lines and a plurality of bitlines couple the plurality of eDRAM cells in a second direction, and a plurality of output lines and a plurality of write select lines couple the plurality of eDRAM cells in a third direction; a first controller configured to drive the plurality of input lines and the plurality of bitlines of the monolithic 3D silicon arrangement; and a second controller configured to drive the plurality of output lines and the plurality of write select lines of the monolithic 3D
- the monolithic 3D silicon arrangement is configured to perform a program operation and a compute operation on the plurality of transistors.
- the program operation comprises: coupling the input transistor and the output transistor to an analog supply, and pulsing the write select transistor while the channel underneath the output transistor is programmed between a multi-level voltage between ground and a digital supply.
- the compute operation comprises: coupling the channel underneath the input transistor to the channel underneath the output transistor, and pulsing the input transistor from ground to the analog supply to transfer charge in the channel underneath the output transistor to the channel underneath the input transistor.
- the compute operation accomplishes a linear matrix-vector multiplication operation by inducing a transfer of charge onto the output transistor proportional to the charge in the channel underneath the input transistor.
- the first direction, second direction, and third direction are orthogonal.
- the first controller comprises a binary-to-stochastic converter configured to convert a binary input into a stochastic representation.
- FIG. 1A shows an illustrative embodiment of a eDRAM cell structure in accordance with the systems and methods described herein;
- FIG. IB shows an illustrative embodiment of a program operation of a eDRAM cell in accordance with the systems and methods described herein;
- FIG. 1C shows an illustrative embodiment of a switched-capacitor structure equivalent to the programmed eDRAM cell in accordance with the systems and methods described herein;
- FIG. ID shows the linear multiplication that is enabled by the eDRAM cell in accordance with the systems and methods described herein;
- FIG. IE shows a graph of leakage of a eDRAM cell versus a negative reverse body bias voltage in accordance with the systems and methods described herein;
- FIGs. 2A-2C shows a system architecture for performing the CIM operations in a eDRAM array accordance with the systems and methods described herein;
- FIGs. 3A-3F illustrate embodiments of peripheral circuits in accordance with the systems and methods described herein;
- FIG. 3G illustrates a timing diagram showing the pre-charge, COMPUTE, and readout operations performed on eDRAM cells in accordance with the systems and methods described herein;
- FIG. 4 illustrates a table showing deterministic to stochastic conversion
- FIGs. 5A-5C illustrate relationships between linearity and input and weight vectors in deterministic and stochastic computing modes in accordance with the systems and methods described herein;
- FIGs. 6A-6D illustrate relationships between eDRAM weight coefficient and programmed weight, measured at four instances in time
- FIG. 7A illustrates an energy recovery logic (ERL) circuit for CV 2 energy recovery savings achieved by the systems and methods in accordance with embodiments described herein;
- FIG. 7B illustrates the energy recovery measured in accordance with embodiments described herein;
- FIGs. 8A and 8B illustrate monolithic 3D implementations of eDRAM arrays in accordance with embodiments described herein.
- a 768 by (x) 768 crossbar array of double-differential, 4 x 5T charge-mode DRAM compute-in-memory cells performs signed analog-digital matrix-vector multiplication with embedded DAC for 4b multibit programmable analog weights, and stochastic DAC binary encoding of lb-8b digital inputs and digital lb-8b encoding of stochastic ADC binary outputs for Al on the edge and cloud.
- the charge-mode computational DRAM (eDRAM) CIM macro may present one or more of the following key advantages over other CIM crossbars for Al on the edge / Tiny ML: i) highly linear multiply-and-accumulate operations with signed weight and input encoding as well as doubled dynamic range; ii) non-destructive read operations with low leakage current leveraging FDSOI body isolation and thick-oxide transistors for extremely small gate and subthreshold leakage allowing lower refresh rates; iii) highly energy-efficient charge-mode sensing from the cell, quantized by a high input-impedance dynamic comparator, thus no static current and IR drop along the array; iv) low device-to-device variability in a standard 22nm FDSOI CMOS process scaling to more advanced technology nodes; v) high-density, high-resolution digitally encoded analog weights in the cell performing area-efficient multiplication; vi) supporting both deterministic and stochastic input encoding and output decoding;
- CIM compute-in-memory
- Typical CIM processors perform multiply- accumulate (MAC) operations in the analog domain through accumulating output currents collected from bit cells subjected to digitally supplied input voltages and digitizing the outputs with a parallel bank of current-sensing or voltage-sensing ADCs.
- MAC multiply- accumulate
- charge-mode rather than conductive CIM, elements offer potentially greater overall efficiency by conserving charge in storage and energy in computation. Charge-mode elements provide these benefits by allowing for non-destructive reversible charge transfer triggered by an applied input voltage and by registering a change in output voltage through sensing the charge capacitively coupled to the output line.
- CIM architectures involving SRAMs store multi-bit weights across separate bit cells that couple through capacitors or other gain summing elements to output lines for voltage/current readout, that are binary weighted and combined to obtain a composite MAC sum.
- SRAMs offer superior memory retention but are subject to static power due to subthreshold leakage, gate leakage, and junction leakage.
- Alternative embedded DRAM (1T1C) cell for CIM cores offers greater bit-cell density than SRAM and offers direct charge-mode readout, although readout operations are destructive requiring frequent refresh and incurring peripheral circuit design complexities even greater than SRAM.
- Multi-bit weights are preferred in fixed deployment scenarios for edge Al and tiny ML which has led designers in this application space to explore traditional non-volatile memories such as Flash and emerging nonvolatile memory cells such as RRAM.
- the relatively high write energy and low operational speed of Flash macros limit their practical use in demanding settings.
- sneak paths, IR drop, and cycle-to-cycle and device-to-device variability of RRAM have impeded progress towards practical solutions.
- Production RRAMs have very low resistance and low RON/ROFF ratio which limits the operation of the array to activate only a small number of rows at a time, decreasing the degree of parallelism and the overall throughput of the system.
- RRAM drivers are typically over-dimensioned to support large static currents, which increases the macro area and lowers the on-chip memory density. Both RRAMs and Flash have proven to be hard in scaling to the latest technology nodes because of the challenges in floating gate oxide thickness and memristive material integration.
- the charge-mode computation DRAM (eDRAM) CIM macro for row-parallel and column-parallel signed analog-digital matrix-vector multiplication (MVM) presented here combines the non-destructive charge-conserving readout and adiabatic hot-clock energy recovery of a eDRAM CIM crossbar array, with cell-level embedded functionality for direct DAC multi -bit programming of lb-4b analog weights, and array-peripheral functionality for stochastic DAC binary encoding of lb-8b digital inputs and digital lb-8b encoding of stochastic ADC binary outputs, as a versatile macro for hyperdimensional probabilistic computing and Bayesian Al on the edge.
- MVM analog-digital matrix-vector multiplication
- eDRAM arrays and eDRAM systems presented herein can be dynamically reconfigured for reuse in in-memory/near-memory computations through fast reprogramming (similar to commercial DRAM). This ability of the systems described herein to be dynamically reconfigured contributes to device longevity.
- FIG. 1A illustrates the principle of operation of the macro at the level of a single quadrant of the double-differential eDRAM cell 100, serving weight storage with in-built DAC and MAC functions with a plurality of nMOS transistors 102a, 102b, 102c, 102d, and 102e. While the example of FIG. 1A shows 5 nMOS transistors 102a-102e, any number of transistors could be used. In the example of FIG. 1A, the 5 nMOS transistors comprise a charge pump transistor 102a, a write select transistor 102b, an output transistor 102c, a share select transistor 102d, and an input transistor 102e.
- Each of the plurality of transistors 102a-102e has a respective channel located underneath.
- the output transistor 102c has beneath it an output channel
- the input transistor 102 has beneath it an input channel.
- the channels located underneath each of the plurality transistors may function as a charge well as charge is transferred between or otherwise shared between different channels.
- the output transistor 102c and the input transistor 102e may have respective lengths that are larger than lengths of the charge pump transistor 102a, the write select transistor 102b, and the share select transistor 102d.
- the lengths of the output transistor 102c and the input transistor 102e are between about two and about six times larger than the lengths of the charge pump transistor 102a, the write select transistor 102b, and the share select transistor 102d.
- the lengths of the output transistor 102c and the input transistor 102e are between about three and about five times larger than the lengths of the charge pump transistor 102a, the write select transistor 102b, and the share select transistor 102d.
- Row drivers supply the bit-line (BL) and input (VIN) horizontal lines 104, while column drivers provide charge-pump (CP), write (WR), and share (SH) vertical lines 106.
- the CP vertical line is used to add additional charge into the shared junction between the charge pump transistor 102a and the write select transistor 102b without fully overwriting the charge well underneath the output transistor 102c and the input transistor 102e. This additional charge can be used to slowly refresh the charge well underneath the output transistor 102c and the input transistor 102e using a subthreshold current mechanism.
- the CP and WR vertical lines are driven to the same voltage.
- the eDRAM cell is configured to be subject to a weight programming operation, also called a program operation or a programming operation.
- the program operation of the eDRAM cell 100 is illustrated in FIG. IB.
- the input (VIN) and output (VOUT) vertical lines are tied to the analog supply voltage VoDa.
- the bitline voltage (BL) is varied between ground and the digital supply voltage (VDDD) modulated by the programming input.
- the analog supply voltage VoDa and the digital supply voltage VDDD may be configured to allow above subthreshold operations for linear programming of the eDRAM cell 100.
- the analog supply voltage VoDa may be between about 1.5 V and about 2.1V In certain implementations, the analog supply voltage may be 1.8V.
- the digital supply voltage VDDD may be between about 0.6V and about 0.8V.
- the CP, WR, and SH vertical lines are pulsed as part of the program operation.
- the CP, WR, and SH may be pulsed between ground (GND) and VoDa in synchrony with the sequence of bits Di presented from least significant bit (LSB) to most significant bit (MSB) on the bitline (BL) according to the waveforms shown.
- This pulsing of the CP, WR, and SH vertical lines configures the eDRAM cell as an algorithmic charge-division digital-to-analog converter (DAC) equivalent to the familiar switched-capacitor structure shown in FIG. 1C.
- DAC digital-to-analog converter
- the CP, WR, and SH vertical lines may be pulsed over a plurality of cycles 130a, 130b, .. . 130n.
- the write select transistor 102b and share select transistor 102d are pulsed together.
- This shorting of the channel potentials underneath VIN and VOUT serves to reset the memory cell.
- the write select transistor 102b and share select transistor 102d are driven to a lower voltage.
- the write select transistor 102b may be turned off prior to the share select transistor 102d being turned off to ensure a clean reset operation. Further, by turning off the write select transistor 102b prior to turning off the share select transistor 102d, the impact of charge injection on the charge stored in the channel underneath the input transistor 102e and the output transistor 102c caused by turning off the write select transistor 102b and the share select transistor 102d can be minimized.
- the charge pump transistor 102a and the write select transistor 102b are pulsed to a first voltage during a first phase of the second cycle 130b.
- the pulsing of the charge pump transistor 102a and the write select transistor 102b to the first voltage writes into the channel beneath the output transistor 102c the charge that was transferred to the channel underneath VOUT during the first cycle.
- the charge pump transistor 102a and the write select transistor 102b are driven to the first voltage, they are driven to a second voltage, the second voltage being lower than the first voltage to disconnect the BL from charge well underneath VOUT.
- the share select transistor 102d is pulsed so that the charge stored in the channel underneath VIN from the first cycle and the charged stored in the channel underneath VOUT from the first phase of the second cycle are subject to a divide-by-two operation.
- the pulsing patern of the second pulsing cycle of the plurality of cycles repeats as many times as needed for the input bits to be written to the eDRAM cell. Alternating activation of CP/WR and SH during subsequent DAC cycles with bit Di presented on BL as shown during cycles 130b and 130n of FIG.
- IB writes the corresponding zero or nominal charge in the channel underneath VOUT followed by sharing the net charge equally across both channels resulting in consecutive addition and divide-by-two operations, establishing algorithmic n-bit DAC of a multibit input to multi-level charge residing in both channels after n cycles.
- CP/WR is deactivated low, and SH remains activated high to continuously couple the two channels underneath VIN and VOUT, sharing their charge in a common well as in a charge injection device.
- the distribution of the shared charge between the two channels underneath VIN and VOUT depends on the input voltage VIN. With VOUT left floating near mid-level potential, all the weight charge stored in the well resides underneath VOUT. If the input D is low during the computational cycle, VIN is maintained at GND potential, and no charge transfer takes place from the channel underneath VOUT to the channel underneath VIN. In contrast, if D is high, VIN is pulsed from GND to VoDa which causes the well charge to transfer from the channel underneath VOUT to the channel underneath VIN.
- this transfer induces an equal transfer of charge onto the VOUT line due to capacitive coupling across the gate oxide of the output transistor, which causes a rise in voltage VOUT proportional to the charge stored in the well, hence accomplishing linear matrixvector multiplication across the array of eDRAM cells.
- the eDRAM cell 100 of FIG. 1A allows for high-density, high-resolution digitally encoded analog weights in a single dynamic memory cell, allowing the eDRAM cell 100 to perform area-efficient matrix-vector multiplication.
- FIG. ID illustrates the linear multiplication that is enabled by the eDRAM systems described herein.
- the transistor the input vertical line
- VIN the input vertical line
- VOUT an output signal on the output line
- FIG. ID illustrates adiabatic and non- adiabatic modes of operation of the eDRAM systems described herein.
- the applied inputs are digital pulses (e.g., 0 or 1).
- a sinusoidal input is applied such that, during the COMPUTE operation, CV 2 energy consumption is recovered.
- Performing this COMPUTE operation provides linearity of the programmed charge and greater throughput and energy efficiency relative to other CIM technologies.
- refresh rates of the eDRAM systems described herein are reduced relative to conventional CIM systems because low body leakage currents are obtained via fully depleted silicon-on-insulator body isolation.
- thick gate-oxide transistors provide extremely small gate and subthreshold leakage.
- FIG. IE shows a graph illustrating the leakage of a eDRAM cell, such as the eDRAM cell 100 of FIG. 1A, versus a negative reverse body bias voltage (VRBB).
- the eDRAM cell of FIG. IE may be manufactured according to a fully depleted silicon-on-insulator process and configured to operate under zero reverse body bias voltage.
- GIDL gate-induced drain lowering
- the leakage due to the GIDL mechanisms is exponential in nature with respect to the difference in the programmed BL bias voltage and the BL bias at which the cell is maintained during the COMPUTE operation.
- the GIDL leakage can be further suppressed or linearized because fully depleted silicon-on-insulator processes have two gate controls.
- the transistor body beneath the channel and the buried oxide layer of the transistor can act as an additional gate to control the GIDL leakage.
- the GIDL leakage can be further linearized by a factor of lOx by driving the body bias to -3V.
- the back bias can be selected between - IV to -5V to lower the GIDL leakage and provide superior retention.
- FIG. 2A illustrates an embodiment of the architecture of a eDRAM system 200 with double-differential encoding in accordance with the systems and methods described herein.
- the eDRAM system 200 of FIG. 2A comprises a eDRAM crossbar array 202, a row controller 204, and a column controller 206.
- the row controller 204 and the column controller 206 may be the same controller.
- the eDRAM crossbar array 202 may comprise a plurality of eDRAM cells 208 (i.e., a first eDRAM cell 208a, a second eDRAM cell 208b, a third eDRAM cell 208c, fourth eDRAM cell 208d, and so on).
- the eDRAM cells 208 may comprise a eDRAM cell having the structure described in FIG. 1A.
- the eDRAM crossbar array 202 may be, for example, a 768 x 768 double-differential eDRAM crossbar array comprising an array with 768 rows of double-differential eDRAM cells and 768 columns of double-differential eDRAM cells.
- the eDRAM crossbar array 202 may comprise a symmetrical structure. The symmetrical structure of the eDRAM crossbar array 202 may facilitate linear MVM owing to better common-mode rejection and offset cancellation due to capacitive coupling.
- the row controller 204 may comprise a plurality of modules configured to control various functions of the rows of the eDRAM crossbar array 202.
- the row controller 204 may comprise a reconfigurable bit precision logic (RBPL) module 204a, a pseudo-random number generator (PRBS) module 204b, and a binary-to-stochastic converter (BTSC) module 204c.
- the column controller 206 may comprise a reconfigurable bit precision logic (RBPL) module 206a and a binary-to-stochastic converter (BTSC) module 206b, both of which may control the various functions of the columns of the eDRAM crossbar array 202.
- the eDRAM crossbar array 202 of the eDRAM system 200 may be configured to perform CIM by first being subject to a pre-charge operation to perform autozeroing. During such a pre-charge operation, the output transistor is driven to a pre-charge voltage, VPRE, while the input transistor is kept at ground.
- the eDRAM crossbar array 202 is subject to a COMPUTE operation. Driving the output well to VPRE while maintaining the input well at Vss establishes a known output voltage for an input vector that is used as a reference during the COMPUTE operation. During the COMPUTE operation, the voltage applied to the input line is changed from Vss to VIN. In the case of a double-differential configuration, the pre-charging operation could be configured to double the output dynamic range during the COMPUTE operation.
- the sensed output is twice the output compared to a single bitcell operation.
- the charge transfer that occurs during the COMPUTE operations to be carried out on the eDRAM system 200 described herein may be described mathematically as follows. It is first assumed that a total charge QPROG is stored in the eDRAM well during the PROGRAM operation. Under this assumption, during the first step of a COMPUTE operation, the input transistor VIN may take a charge aQ PR0 G under its well. After channel inversion, the channel voltage VCII,COMP is defined by
- V Ou t aQ pR 0G (4).
- VOUT is typically pre-charged to a known bias VPRE before the cOut
- VPRE represents a known pre-charge bias that is selected for the eDRAM cell.
- VPRE equals VoDa/2 to allow the voltage swing to go up to VoDa after the charge transfer takes place.
- VPRE is between Vss and VDD 3 .
- Equations (l)-(5) can be manipulated to derive the equation for a, namely: This induces a multiplicative gain on the output line (proportional to VIN and QPROG):
- the parameter a depends on the applied input voltage VIN, and thus describes how much charge is transferred between the channel underneath the input transistor 102e and the channel underneath the output transistor 102c during the COMPUTE operation.
- VIN needs to be higher than a threshold voltage above the channel voltage VCH to allow an above-threshold linear charge transfer operation. In other words, in order for the above-threshold linear charge transfer operation to occur, VIN must satisfy
- T/IV 5 ⁇ CH + ⁇ THRESH -
- FIG. 2C illustrates the double differential structure with differential inputs and differential outputs.
- double-differential encoding of the inputs, outputs, and weights across the eDRAM crossbar array 202 supports four-quadrant signed analog-digital MVM, for a four-fold increase in cell size occupying four identical complementary quadrants.
- the symmetrical 4x5T structure offers greater common-mode noise rejection, as well as substantially improved linearity due to charge balancing with uniform capacitive loading along rows and columns across the eDRAM crossbar array 202.
- This improved linearity is also critically important in guaranteeing that resonance between the capacitive load of the eDRAM array and an external inductive tank for hot-clock adiabatic energy-recycling be maintained at constant hot- clock frequency, independent of the state of the inputs and outputs of the eDRAM crossbar array 202.
- This obviates elaborate stochastic modulation schemes to ensure balanced input statistics that have limited the efficiency gains achievable by adiabatic energy recovery in charge-domain CIM due to residual central-limit statistical variations in resonance frequency.
- the eDRAM CIM system 200 provides for stochastic encoding of the inputs and outputs to offer greater versatility in input and output digital formats accommodating both probabilistic and deterministic activations of neural state variables.
- FIG. 3A- illustrates peripheral circuit architecture in accordance with embodiments herein.
- the digital peripheral circuits 300 includes reconfigurable input and output bit width for computing at multiple precisions ranging up to 8-bits with support for multiplexing between both deterministic and stochastic encoding/decoding of inputs and outputs.
- the peripheral circuit 300 comprises a comparator 302, a multiplexer (MUXer) 304, an input chopper 306, energy recovery logic drivers 308 (discussed further below with respect to FIGs. 7A and 7B), a compute module 310, an output chopper 312, a double-tall latch-type dynamic comparator 314, a demultiplexer 316, and an output decoder 318.
- MUXer multiplexer
- the comparator 302 is a digital comparator that is configured to compare the multi-bit input and generated random number.
- the multiplexer 304 is configured to decide between the deterministic or stochastic datapath for the given inputs.
- the input chopper 306 is configured to perform input bit flipping without changing the actual input register value, which is useful for double-dynamic range operation with complementary inputs during pre-charge and compute operations.
- the energy recovery logic drivers 308 are configured to act as a simple level shifter (from digital supply voltage to analog supply voltage) in non-adiabatic mode of operation. However, in the case of adiabatic supply, the energy recovery logic drivers 308 are configured to act as a continuous analog driver to feed the sinusoidal supply voltage.
- the compute module 310 is configured to execute the core charge-mode compute-inmemory multiply-and-accumulate operation described with respect to FIG. ID and 2A.
- the output chopper 312, the demultiplexer 316, and the output decoder 318 are configured to act as a peripheral controller to decode results of MAC operations during stochastic and deterministic computing under different bit-precision compute constraints.
- the dynamic comparator 314 is configured to serve as the building block for singleslope analog-to-digital conversion.
- the peripheral circuit 300 of FIG. 3A computing is possible according to two modes of operation: the peripheral circuit 300 can support both deterministic and stochastic input encoding and output decoding for probabilistic CIM. In some implementations, it is possible to switch between the deterministic mode and the stochastic mode of operation.
- the digital peripheral circuit 300 and integrated pseudo-random number generators (PRNGs) and digital comparators, one each provided per row, generate stochastically encoded binary input streams to the array from supplied digital inputs with unbiased mean and adjustable variance set by PRGN magnitude.
- PRNGs pseudo-random number generators
- Strictly independent and identically distributed (i.i.d) PRGN channels are generated employing no more than two-bit shift registers and one XOR gate per PRGN bit slice.
- the 1 -bit quantized column outputs resolved by the array of dynamic comparators are decimated accordingly to accumulate statistics producing sigmoidal activation functions of varying spread corresponding to the combined variance of the PRGN additive noise in the central limit.
- Fig. 3A additionally shows a timing diagram 305 that shows the pulsing that occurs during a COMPUTE operation.
- peripheral circuits 320, 330, 340, 350, and 360 can be used to drive the transistors of the eDRAM cells in accordance with embodiments herein.
- peripheral circuit 320 of FIG. 3B is configured to drive the bitlines of the eDRAM cells described herein.
- Peripheral circuit 320 is configured to drive BL+ and BL- to either VBLHI or VBLLO in a complementary fashion, using a user-defined digital-to-analog converter at the periphery.
- the choice of VBLHI and VBLLO is governed by compute-in-memory performance metrics such as the output dynamic range, average refresh time, and multiplication SNR in multilevel storage.
- Peripheral circuit 330 of FIG. 3C is configured to drive the input transistors of the eDRAM cells described herein.
- Peripheral circuit 330 comprises level shifter 331, ERL drivers 332, and control switches 333.
- the level shifter 331 and the ERL drivers 332 ensure the generation of analog bias voltage modulated by the digital input bits.
- the control switches 333 determine whether to bias the IN at a fixed voltage during the PROGRAM operation or to drive the IN lines dynamically based on the digital input bits.
- Peripheral circuit 340 of FIG. 3D is configured to drive the write select transistors of the eDRAM cells described herein.
- Peripheral circuit 350 of FIG. 3E is configured to drive the share select transistors of the eDRAM cells described herein.
- the write select transistors such as write select transistor 102b of FIG. 1A and share select transistors, such as share select transistor 102d of FIG. 1A, are driven to Vnna/Hot Clock (HC) or ground (VSS) based on external control bits WR SEL and SH SEL, to control the switched-capacitor DAC operation of eDRAM.
- HC Vnna/Hot Clock
- VSS ground
- the write select transistors such as write select transistor 102b of FIG. 1A and share select transistors, such as share select transistor 102d of FIG. 1A, are driven to VWRLO and VSHHI respectively, set from another digital-to-analog converter at the periphery.
- VWRLO is governed by the desired leakage of eDRAM
- VSHHI is chosen to maximize the dynamic range of computation, as the share select transistor 102d governs the coupling of the channels beneath the input transistor and the output transistor.
- Peripheral circuit 360 of FIG. 3F is configured to drive the output transistors of the eDRAM cells described herein.
- eDRAM cell having transistors driven by peripheral circuits 320, 330, 340, 350, and 360 may be, for example, eDRAM cell 100 of FIG. 1 A or any of the eDRAM cells 208a, 208b, 208c, and so on, of FIGs. 2A and 2B.
- FIG. 3G illustrates a timing diagram for the pre-charge, COMPUTE, and readout operations performed using the eDRAM cells, such as eDRAM cells 208a, 208b, 208c, and 208d, of FIG. 2A.
- the pre-charge operation is performed on the eDRAM cells by pulsing the output transistor is driven to a pre-charge voltage, VPRE, while the gate of the input transistor is kept at ground to implement autozeroing.
- COMPUTE operations are performed on the eDRAM cell.
- eDRAM cells such as eDRAM cells 208a, 208b, 208c, and 208d, of FIG. 2A, allow for non-destructive compute/readout operations across multiple cycles, without requiring refresh after every compute cycle.
- readout of the result of the bit-serial COMPUTE operations is performed from a digital register.
- FIG. 4 illustrates a table 400 showing deterministic to stochastic conversion of a bit in accordance with embodiments described herein.
- a deterministic binary or bipolar input is received by the system.
- the input has a resolution as described by column 402 of the table 400.
- the binary or bipolar input may have a representation according to column 404 of the table 400, which may be analogous to an internal representation of the input according to column 406.
- the input may be mapped to an 8-bit internal representation.
- the 8-bit representation of the input may be compared to an 8-bit random number generated by the PRBS module, such as PRBS module 204b of FIG. 2A.
- the input may be mapped to a representation having as many bits as contained in the output of the PRBS module.
- the stochastic mapping of the binary or bipolar input is given in column 408 of the table 400.
- Column 410 of the table 400 displays the expected variance of the stochastic representation of the input. As shown in column 410 of the table 400, the variance of the stochastic representation increased with higher resolution inputs.
- FIGs. 5A-5C illustrate linearity of multiplication achieved by the systems described herein relative to different input and weight vectors.
- the linearity of multiplication and accumulation (also referred to as multiply-accumulate or MAC) is a measure of the signal-to-noise ratio (SNR) of the eDRAM systems described herein. Noise may come from the eDRAM cell, the periphery of the eDRAM cell, or from device variations.
- SNR signal-to-noise ratio
- ADC analog-to-digital conversion
- the threshold voltage of the ADC is varied by pre-charging VOUT+ and VOUT- to separate biases, VPRE+ and VPRE, respectively.
- the single-slope ramp ADC method allows for highly efficient charge-mode sensing from a eDRAM cell, quantized by a high input-impedance ADC. This eliminates any static current and current-resistance (IR) drops along the eDRAM array.
- FIG. 5A demonstrates the MAC required to effect the flipping of a bit from 0 to 1 at different threshold voltages V Thresh .
- FIG. 5 A shows the MAC error characterization at various random weights after offset cancellation using autozeroing, for a plurality of different threshold voltages.
- FIG. 5B shows the ADC switching outputs (as describing by the cumulative spiking probability) versus the expected MAC transfer function of stochastic encoded inputs at different input precisions (the input precisions being shown by the legend within FIG. 5A).
- FIG. 5B illustrates that higher bit-precision has higher variance when 1-b inputs [-1, +1] are mapped into the center values in multi -bit representation. By mapping 1-b inputs in this manner, the desired noise-shaping on MAC transfer function can be achieved for different bit precision.
- FIG. 5C shows the linearity of weight programming by plotting estimated analog weight value versus the programmed multi-bit digital weight value estimated weight coefficients versus the programmed weights at 4-bit precision.
- FIGs. 5A-5C illustrate that when the entire array, such as eDRAM crossbar array 202 of FIG. 2A, is utilized, the double-differential sensing margin is measured to cover the +175 mV to -175 mV range, substantially larger than the dynamic comparator random threshold variations measured to be within 1 mV, thus allowing for 6b-8b output quantization. Furthermore, linearity and uniformity in the weights also shown in FIGs. 5A-5C are measured to be within 4-bit resolution across the array, minimizing impact of random variations on array-level function in general MVM-based Al tasks.
- FIGs. 6A-6D show the estimated weight coefficients versus the programmed weights at 4-bit precision.
- FIGs. 6A-6D are measured at four sequential instances in time: FIG. 6A is measured at 0.90 ms, FIG. 6B is measured at 4.05 ms, FIG. 6C is measured at 7.20 ms, and FIG. 6D is measured at 10.12 ms.
- FIG. 7A illustrates an energy recovery logic (ERL) circuit 700 in accordance with embodiments described herein.
- the ERL circuit 700 of FIG. 7A provides CV 2 energy recovery savings.
- the ERL circuit 700 may be configured to drive lines with a hot clock (HC) 702.
- the hot clock may be a sinusoidal hot clock.
- FIG. 7B illustrates the measured energy recovery in graph 706, which plots both the instantaneous and average power measured versus time.
- FIGs. 8A shows a first vertical implementation of a monolithic three-dimensional (3D) silicon arrangement 800.
- the arrangement 800 comprises a plurality of eDRAM cells 801a, 801b, 801c, ...
- the plurality of eDRAM cells 801a, 801b, 801c, ... may be oriented in a three- dimensional arrangement.
- the plurality of eDRAM cells 801a, 801b, 801c, ... may comprise a plurality of transistors, including an input transistor, an output transistor, and a write select transistor.
- the DAC function provided by a charge pump transistor and a share select transistor may be shared at the periphery of the 3D arrangement such that a charge pump transistor and a share select transistor need not be included in the plurality of eDRAM cells 801a, 801b, 801c.
- the plurality of eDRAM cells 801a, 801b, 801c may comprise an input transistor, an output transistor, a write select transistor, a charge pump transistor, and a share select transistor, as is the case with the eDRAM cell 100 of FIG. 1A.
- the input lines VIN1, VIN2, VIN3 of the arrangement are disposed a long a first direction
- the output lines VOUT1, VOUT2, VOUT3 and write lines WR1, WR2, and WR3 are disposed along a second direction that is perpendicular to the first direction.
- the first direction, the second direction, and the third direction may be orthogonal.
- the plurality of eDRAM cells 801a, 801b, 801c are disposed in a third direction perpendicular to the first and second directions.
- the first direction is the direction into and out of the page
- the second direction runs horizontally along the page
- the third direction runs vertically along the page.
- FIG. 8B shows an additional vertical implementation of a monolithic three-dimensional (3D) silicon arrangement 810 comprising a plurality of eDRAM cells 802a, 802b, 802c, ....
- the plurality of eDRAM cells 802a, 802b, 802c, ... may be oriented in a three-dimensional arrangement.
- the plurality of eDRAM cells 802a, 802b, 802c, ... may comprise a plurality of transistors, including an input transistor, an output transistor, and a write select transistor.
- the DAC function provided by a charge pump transistor and a share select transistor may be shared at the periphery of the 3D arrangement such that a charge pump transistor and a share select transistor need not be included in the plurality of eDRAM cells 802a, 802b, 802c.
- the plurality of eDRAM cells 802a, 802b, 802c may comprise an input transistor, an output transistor, a write select transistor, a charge pump transistor, and a share select transistor, as is the case with the eDRAM cell 100 of FIG. 1A.
- the plurality of eDRAM cells 802a, 802b, 802c, ... may be oriented in a three- dimensional arrangement.
- the input lines VIN1, VIN2, VIN3 of the arrangement are disposed a long a first direction
- the output lines VOUT1, VOUT2, VOUT3 and write lines WR1, WR2, and WR3, are disposed along a second direction that is perpendicular to the first direction.
- the plurality of eDRAM cells 802a, 802b, 802c are disposed in a third direction perpendicular to the first and second directions.
- the first direction is runs vertically along the page
- the second direction runs into and out of the page
- the third direction runs horizontally along the page.
- the monolithic 3D silicon arrangement 800 and 810 of FIGs. 8 A and 8B are manufactured using 3D vNAND-flash process, 3D vNAND-flash-like process, or 3D DRAM process technologies where the transistors in the eDRAM bit cell are implemented vertically or horizontally, respectively, using multi-tier silicon.
- the monolithic three-dimensional (3D) silicon arrangements 800 and 810 can be used to perform PROGRAM and COMPUTE operations with high throughput and energy efficiency.
- the monolithic 3D silicon arrangements 800 and 810 shown in FIGs. 8A and 8B have a manufacturability with monolithic 3D silicon processes that results in extremely high memory capacity, and compute density. Because the monolithic 3D silicon arrangements 800 and 810 of FIGs. 8 A and 8B can be manufactured with standard CMOS processes that are scalable, the CIM operations described herein can be realized with very low device-to-device variability in multi-level storage and compute.
- One or more aspects or features of the subject matter described herein can be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs, field programmable gate arrays (FPGAs) computer hardware, firmware, software, and/or combinations thereof.
- These various aspects or features can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which can be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
- the programmable system or computing system may include clients and servers.
- a client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
- machine-readable signal refers to any signal used to provide machine instructions and/or data to a programmable processor.
- the machine-readable medium can store such machine instructions non-transitorily, such as for example as would a non-transient solid-state memory or a magnetic hard drive or any equivalent storage medium.
- the machine-readable medium can alternatively or additionally store such machine instructions in a transient manner, such as for example, as would a processor cache or other random-access memory associated with one or more physical processor cores.
- one or more aspects or features of the subject matter described herein can be implemented on a computer having a display device, such as for example a cathode ray tube (CRT) or a liquid crystal display (LCD) or a light emitting diode (LED) monitor for displaying information to the user and a keyboard and a pointing device, such as for example a mouse or a trackball, by which the user may provide input to the computer.
- a display device such as for example a cathode ray tube (CRT) or a liquid crystal display (LCD) or a light emitting diode (LED) monitor for displaying information to the user
- LCD liquid crystal display
- LED light emitting diode
- a keyboard and a pointing device such as for example a mouse or a trackball
- feedback provided to the user can be any form of sensory feedback, such as for example visual feedback, auditory feedback, or tactile feedback; and input from the user may be received in any form, including acoustic, speech, or tactile input.
- Other possible input devices include touch screens or other touch-sensitive devices such as single or multi-point resistive or capacitive track pads, voice recognition hardware and software, optical scanners, optical pointers, digital image capture devices and associated interpretation software, and the like.
- phrases such as “at least one of’ or “one or more of’ may occur followed by a conjunctive list of elements or features.
- the term “and/or” may also occur in a list of two or more elements or features. Unless otherwise implicitly or explicitly contradicted by the context in which it used, such a phrase is intended to mean any of the listed elements or features individually or any of the recited elements or features in combination with any of the other recited elements or features.
- the phrases “at least one of A and B;” “one or more of A and B;” and “A and/or B” are each intended to mean “A alone, B alone, or A and B together.”
- a similar interpretation is also intended for lists including three or more items.
- the phrases “at least one of A, B, and C;” “one or more of A, B, and C;” and “A, B, and/or C” are each intended to mean “A alone, B alone, C alone, A and B together, A and C together, B and C together, or A and B and C together.”
- Use of the term “based on,” above and in the claims is intended to mean, “based at least in part on,” such that an unrecited feature or element is also permissible.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Mathematical Physics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Computational Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Neurology (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Algebra (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Dram (AREA)
Abstract
Dans un mode de réalisation, l'invention concerne un procédé de réalisation d'une opération de calcul en mémoire sur une cellule cDRAM comprenant une pluralité de transistors comprenant au moins : un transistor d'entrée, un transistor de pompe de charge, un transistor de sélection d'écriture, un transistor de sélection de partage, et un transistor de sortie, la pluralité de transistors comportant un canal respectif situé en dessous. Le procédé comprenant les étapes suivantes : réalisation d'une opération de programme sur la pluralité de transistors par couplage des transistors d'entrée et de sortie à une alimentation analogique, et impulsion séquentielle des transistors de pompe de charge, de sélection d'écriture, et de sélection de partage entre la masse et l'alimentation analogique tandis que le canal de sortie est programmé entre la masse et une alimentation numérique; et réalisation d'une opération de calcul sur la pluralité de transistors par couplage du canal d'entrée au canal de sortie, et impulsion du transistor d'entrée de la masse à l'alimentation analogique afin de transférer la charge dans le canal de sortie au canal d'entrée.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363608055P | 2023-12-08 | 2023-12-08 | |
| US63/608,055 | 2023-12-08 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025122977A1 true WO2025122977A1 (fr) | 2025-06-12 |
Family
ID=95980612
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/059041 Pending WO2025122977A1 (fr) | 2023-12-08 | 2024-12-06 | Calcul en mémoire stochastique hyperdimensionnel avec matrice dram à mode de charge |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025122977A1 (fr) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190370639A1 (en) * | 2018-06-01 | 2019-12-05 | Arizona Board Of Regents On Behalf Of Arizona State University | Multi-layer vector-matrix multiplication apparatus for a deep neural network |
| US20220012580A1 (en) * | 2020-07-07 | 2022-01-13 | Qualcomm Incorporated | Power-efficient compute-in-memory pooling |
| US20220352893A1 (en) * | 2021-04-29 | 2022-11-03 | POSTECH Research and Business Development Foundation | Ternary logic circuit device |
| US20220398439A1 (en) * | 2021-06-09 | 2022-12-15 | Sandisk Technologies Llc | Compute in memory three-dimensional non-volatile nand memory for neural networks with weight and input level expansions |
| US20230012075A1 (en) * | 2021-07-06 | 2023-01-12 | Unisantis Electronics Singapore Pte. Ltd. | Memory device using semiconductor element |
-
2024
- 2024-12-06 WO PCT/US2024/059041 patent/WO2025122977A1/fr active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190370639A1 (en) * | 2018-06-01 | 2019-12-05 | Arizona Board Of Regents On Behalf Of Arizona State University | Multi-layer vector-matrix multiplication apparatus for a deep neural network |
| US20220012580A1 (en) * | 2020-07-07 | 2022-01-13 | Qualcomm Incorporated | Power-efficient compute-in-memory pooling |
| US20220352893A1 (en) * | 2021-04-29 | 2022-11-03 | POSTECH Research and Business Development Foundation | Ternary logic circuit device |
| US20220398439A1 (en) * | 2021-06-09 | 2022-12-15 | Sandisk Technologies Llc | Compute in memory three-dimensional non-volatile nand memory for neural networks with weight and input level expansions |
| US20230012075A1 (en) * | 2021-07-06 | 2023-01-12 | Unisantis Electronics Singapore Pte. Ltd. | Memory device using semiconductor element |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Wan et al. | A compute-in-memory chip based on resistive random-access memory | |
| Amirsoleimani et al. | In‐memory vector‐matrix multiplication in monolithic complementary metal–oxide–semiconductor‐memristor integrated circuits: design choices, challenges, and perspectives | |
| TWI744728B (zh) | 以sram為基礎的記憶體系統中處理 | |
| Yu et al. | Compute-in-memory with emerging nonvolatile-memories: Challenges and prospects | |
| CN115039177B (zh) | 低功耗存储器内计算位单元 | |
| CN111652363A (zh) | 存算一体电路 | |
| Sharma et al. | A reconfigurable 16Kb AND8T SRAM macro with improved linearity for multibit compute-in memory of artificial intelligence edge devices | |
| Krishnan et al. | Hybrid RRAM/SRAM in-memory computing for robust DNN acceleration | |
| US10979065B1 (en) | Signal processing circuit, in-memory computing device and control method thereof | |
| Lee et al. | A 17.5-fJ/bit energy-efficient analog SRAM for mixed-signal processing | |
| Chen et al. | Partition SRAM and RRAM based synaptic arrays for neuro-inspired computing | |
| Wan et al. | Edge AI without compromise: efficient, versatile and accurate neurocomputing in resistive random-access memory | |
| Chen et al. | MC 2-RAM: An in-8T-SRAM computing macro featuring multi-bit charge-domain computing and ADC-reduction weight encoding | |
| Lee et al. | A switched-capacitor SRAM in-memory computing macro with high-precision, high-efficiency differential architecture | |
| US12166459B2 (en) | Voltage amplifier based on cascaded charge pump boosting | |
| Li et al. | CafeHD: A charge-domain FeFET-based compute-in-memory hyperdimensional encoder with hypervector merging | |
| US20200395053A1 (en) | Integrated circuits | |
| Laleni et al. | A high efficiency charge domain compute-in-memory 1f1c macro using 2-bit fefet cells for dnn processing | |
| Caselli et al. | Memory devices and A/D interfaces: design tradeoffs in mixed-signal accelerators for machine learning applications | |
| CN113554158A (zh) | 用于卷积神经网络应用的记忆体元件及其操作方法 | |
| WO2025122977A1 (fr) | Calcul en mémoire stochastique hyperdimensionnel avec matrice dram à mode de charge | |
| Lin et al. | An 11T1C Bit-Level-Sparsity-Aware Computing-in-Memory Macro With Adaptive Conversion Time and Computation Voltage | |
| CN120677638A (zh) | 基于多路复用的电荷存储的可扩展量子位偏置设备 | |
| Rasul et al. | A 128x128 SRAM macro with embedded matrix-vector multiplication exploiting passive gain via MOS capacitor for machine learning application | |
| WO2024015023A2 (fr) | Noyau de traitement neuronal pour un réseau neuronal et son procédé de fonctionnement |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24901683 Country of ref document: EP Kind code of ref document: A1 |