WO2023224596A1 - Shared column adcs for in-memory-computing macros - Google Patents
Shared column adcs for in-memory-computing macros Download PDFInfo
- Publication number
- WO2023224596A1 WO2023224596A1 PCT/US2022/029438 US2022029438W WO2023224596A1 WO 2023224596 A1 WO2023224596 A1 WO 2023224596A1 US 2022029438 W US2022029438 W US 2022029438W WO 2023224596 A1 WO2023224596 A1 WO 2023224596A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- bit
- column
- weighted
- signal
- cells
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C7/00—Arrangements for writing information into, or reading information out from, a digital store
- G11C7/10—Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
- G11C7/1006—Data managing, e.g. manipulating data before writing or reading out, data bus switches or control circuits therefor
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C7/00—Arrangements for writing information into, or reading information out from, a digital store
- G11C7/16—Storage of analogue signals in digital stores using an arrangement comprising analogue/digital [A/D] converters, digital memories and digital/analogue [D/A] converters
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C11/00—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
- G11C11/21—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
- G11C11/34—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
- G11C11/40—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
- G11C11/41—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming static cells with positive feedback, i.e. cells not needing refreshing or charge regeneration, e.g. bistable multivibrator or Schmitt trigger
- G11C11/413—Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing, timing or power reduction
- G11C11/417—Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing, timing or power reduction for memory cells of the field-effect type
Definitions
- the present invention relates to the field of in-memory computing and, more particularly, to the scaling, summation, and conversion to digital data of analog signals representing weighted data such as provided by multiple parallel output of an array of inmemory computing cells.
- IMC Charge-domain in-memory computing
- bit-cell circuits involve appropriate switching of a local capacitor in a given bit-cell, where that local capacitor is also appropriately coupled to other bit-cell capacitors, to yield an aggregated compute result across the coupled bit-cells.
- In-memory computing is well suited to implementing matrix-vector multiplication, where matrix elements are stored in the memory array, and vector elements are broadcast in parallel fashion over the memory array.
- an IMC computing architecture acquires computational results over many bits stored in memory. This enhances system energy efficiency and speed by reducing the number of data acquisition cycles required.
- a computational result is derived within a memory column, where: parallel input data is provided to the rows, computation (e.g., multiplication) is performed by the memory bit cells with data stored therein; and further computation (e.g., accumulation) is performed on the column bit lines to provide reduction to a single output.
- the reduced output generally has increased dynamic range (i.e., number of signal levels) that need to be resolved, relative to single-bit accessing.
- analog operation is often employed for the column computation, both to fit computation within the constrained memory circuits (e.g., bit cells, bit lines) and to enable the increased dynamic range.
- ADC analog-to-digital converter
- Each bit cell provides at a respective output element (e.g., an output capacitor) a result of an operation during a measurement or evaluation phase, the result having associated with it a weight based upon the position of the bit cell in a row of bit cells (e.g., binary or other weighting from LSB to MSB of a result) such that each bit cell within a column of bit cells is associated with the same weight.
- a respective output element e.g., an output capacitor
- Analog signals e.g., voltage or charge
- ADC analog to digital conversion
- the scaling phase may comprise disconnecting some of the bit cells within a column of bit cells in accordance with the corresponding weighting value of that column such that, when the charge levels of the remaining bit cells within each column (e g., their output capacitors) are accumulated to provide the accumulated/summation analog signal, the charge contributed thereto by each column is proportional to the weighting value of that column.
- the scaling phase may comprise a signal divider such as a charge divider or charge divider network wherein the total charge provided by bit cells within a column of bit cells is divided to provide a charge level or analog signal representative thereof in accordance with the corresponding weighting value of that column.
- a signal divider such as a charge divider or charge divider network wherein the total charge provided by bit cells within a column of bit cells is divided to provide a charge level or analog signal representative thereof in accordance with the corresponding weighting value of that column.
- Some embodiments provide an apparatus for scaling and summing a plurality of weighted-data-representative analog signals, wherein each analog signal comprises a voltage associated with a respective plurality of coupled bit-cell outputs within an in-memory computing (IMC) array of bit-cells, the apparatus comprising: a plurality of charge divider circuits, each charge divider circuit configured to process a respective weighted-data- representative analog signal to produce an output signal across a respective output capacitor of a capacitance value scaled in accordance with the respective weighting value; wherein, during a measurement phase of operation, the output capacitors of the charge divider circuits are coupled to a sample and hold circuit associated with an input of an analog to digital converter (ADC) configured to generate therefrom a digital output representing a summation of the weighted-data-representative analog signals.
- ADC analog to digital converter
- FIG. 1 graphically depicts a typical structure of an in-memory computing architecture
- FIG. 2 depicts a block diagram of a fully row/column-parallel (1152 row x 256 col) array of multiplying bit-cells (M-BCs) of an in-memory -computing (IMC) macro enabling N-bit (5-bit) input processing;
- M-BCs multiplying bit-cells
- IMC in-memory -computing
- FIG. 3 depicts a circuit architecture of a multiplying bit cell suitable for use in the array of M-BCs of FIG. 2;
- FIGS. 4A-4B graphically depict mechanisms for scaling and summation of computational result indicative outputs useful in illustrating the various embodiments
- FIGS. 4C-4D graphically depict mechanisms for scaling and summation of computational result indicative outputs useful in illustrating the various embodiments
- FIG. 5 graphically illustrates an exemplary IMC column within an array of M-BCs
- FIG. 6 graphically depicts an example of binary -weighted scaling within a bit-cell array of an in-memory computing architecture useful in understanding the embodiments;
- FIGS. 7-9 depict circuit diagrams of various embodiments of binar -weighted scaling proximate or within an output ADC of an in-memory computing architecture; and [0022] FIG. 10 depicts circuit diagrams of various embodiments of binary-weighted current divider scaling circuitry suitable for use in the various embodiments.
- Some of the various embodiments are directed to IMC computing architecture, apparatus, methods, and portions thereof configured to acquire the computational result indicative outputs of multiple parallel columns or bit lines in a manner avoiding the use of individual analog-to-digital converters (ADCs) for each column or bit line. That is, rather than converting the analog output signal associated with each bit line or column to a respective digital representation suitable for further processing within the IMC computing architecture, the various embodiments perform some of this further processing using the analog output signals associated with the bit lines or columns so as to reduce the number of ADCs needed to implement the functions of the IMC computing architecture while retaining analog output signal accuracy (i.e., reducing the impact of ADC quantization errors and other errors).
- ADCs analog-to-digital converters
- FIG. 1 graphically depicts a typical structure of an in-memory computing architecture.
- the in-memory computing architecture 100 of FIG. 1 as depicted consists of a memory array (which could be based on standard bit-cells or modified bit-cells), in-memory computing involves two additional, “perpendicular” sets of signals; namely, (1) input lines; and (2) accumulation lines.
- each of a plurality of in-memory -computing channels 110-1 through 110-N comprises a respective column of bit-cells where each of the bit cells in a channel is associated with a common accumulation line and bit line (column), and a respective input line and word line (row).
- columns and rows of signals are denoted herein as being “perpendicular” with respect to each other to simply indicate a row/ column relationship within the context of an array of bit cells such as the two-dimensional array of bit-cells depicted in FIG. 1.
- the term “perpendicular” as used herein is not intended to convey any specific geometric relationship.
- the input/bit and accumulation/bit sets of signals may be physically combined with existing signals within the memory (e.g., word lines, bit lines) or could be separate.
- the matrix elements are first loaded in the memory cells. Then, multiple input-vector elements (possibly all) are applied at once via the input lines. This causes a local compute operation, typically some form of multiplication, to occur at each of the memory bit-cells. The results of the compute operations are then driven onto the shared accumulation lines. In this way, the accumulation lines represent a computational result over the multiple bit-cells activated by input-vector elements. This is in contrast to standard memory accessing, where bit-cells are accessed via bit lines one at a time, activated by a single word line.
- In-memory computing as described has a number of important attributes.
- compute is typically analog. This because the constrained structure of memory and bit-cells requires richer compute models than enabled by simple digital switch-based abstractions.
- the extensions on in-memory computing proposed in the invention are described.
- Bit-parallel compute involves loading the different matrix-element bits in different in-memory-computing columns. The ADC outputs from the different columns are then appropriately bit shifted to represent the corresponding bit weighting, and digital accumulation over a set of the columns is performed to yield the multi-bit matrix-element compute result.
- Bit-serial compute involves apply each bit of the input vector elements one at a time, storing the ADC outputs each time and bit shifting the stored outputs appropriately, before digital accumulation with the next outputs corresponding to subsequent input-vector bits.
- Such a BPBS approach enabling a hybrid of analog and digital compute, is highly efficient since it exploits the high-efficiency low-precision regime of analog (1-b) with the high-efficiency high-precision regime of digital (multi-bit), while overcoming the accessing costs associated with conventional memory operations.
- FIG. 2 depicts a block diagram of a fully row/column-parallel (1152 row x 256 col) array of multiplying bit-cells (M-BCs) of an in-memory -computing (IMC) macro enabling N-bit (5-bit) input processing in accordance with an embodiment.
- M-BCs multiplying bit-cells
- IMC in-memory -computing
- the exemplary IMC macro of FIG. 2 which may be used to implement structures such as the compute-in-memory array (CIMA) structures previously discussed, was rendered via a 28nm fabrication process and is configured for providing fully row/column-parallel matrix-vector multiplication (MVM), for exploiting precision analog computation based on metal-fringing (wire) capacitors, for extending the binary input-vector elements to 5 bit (5-b) input-vector elements, and for increasing energy efficiency by approximately 16x and throughput by 5x as compared with IMC and CIMA embodiments discussed above.
- MVM compute-in-memory array
- FIGS. 2-3 implement MVM operations, which dominate compute-intensive and data-intensive Al workloads, in a manner that reduces compute energy and data movement by orders of magnitude. This is achieved through efficient analog compute in bit cells, and by thus accessing a compute result (e.g., inner product), rather than individual bits, from memory. But, doing so fundamentally instates an energy /throughput- vs.-SNR tradeoff, where going to analog introduces compute noise and accessing a compute result increases dynamic range (i.e., reducing SNR for given readout architecture).
- IMC based on metal-fringing capacitors achieve very low noise from analog nonidealities, and thus potential for extremely high dynamic range. At least some of the embodiments exploit this precise capacitor-based compute mechanism to reliably enable the improved dynamic range such as discussed herein.
- Fig. 2 shows a block diagram of an in-memory-computing macro 200 comprising: a 1152(row)x256(col.) array 210 of 10T SRAM multiplying bit cells (M-BCs); periphery for standard writing/reading thereto (e.g., a bit line (BL) decoder 240 and 256 BL drivers 242-1 through 242-256, a word line (WL) decoder 250 and 1152 WL drivers 252-1 through 252- 1152, and control block 245 for controlling the decoders 240/250); periphery for providing 5-bit input-vector elements thereto (e.g., 1152 Dynamic-Range Doubling (DRD) DACs 220-1 through 220-1152, and a corresponding controller 225); periphery for digitizing the compute result from each column (e.g., 256 8-bit SAR ADCs 260-1 through 260-256), and column reset mechanisms 265-1 through 265-256 (e.g.,
- the array 210 of 10T SRAM multiplying bit cells (M-BCs) of IMC macro 200 operates in a manner similar to that described above with respect to the various figures.
- MVM operations are typically performed by applying input-vector elements corresponding to neural-network input activations to all or several rows at once.
- each DRD-DAC 220j in response to a respective 5-bit input-vector element Xj[4:0], generates a respective differential output signal (IAj/IAbj) which is subjected to a 1 -bit multiplication with the stored weights (Wlj/Wbij) at each M-BCj in the corresponding row of M-BCs, and accumulation through charge-redistribution across M-BC capacitors on the compute line (CL) to yield an inner product in each column, which is then digitized via the respective ADC 260 of each column.
- the operation of individual 10T SRAM M-BCs forming the array 210 will be discussed in more detail below with respect to FIG. 3. [0042] FIG.
- M-BCs multiplying bit-cell
- the M-BC 300 of FIG. 3 comprises a highly dense structure for achieving weight storage and multiplication, thereby minimizing data-broad-cast distance and control signals within the context of i-row, j- column arrays implemented using such M-BCs, such as the 1152(row)x256(col.) array 210 of 10T SRAM multiplying bit cells (M-BCs).
- the exemplar ⁇ ' M-BC 300 includes a six-transistor bit cell portion 320, a first switch SW1, a second switch SW2, a capacitor C, a word line (WL) 210, a first bit line (BLj) 312, a second bit line (BLbj) 314, and a compute line (CL) 315.
- the six-transistor bit cell portion 320 is depicted as being located in a middle portion of the M-BC 300, and includes six transistors 304a-304f.
- the 6-transistor bit cell portion 320 can be used for storage, and to read and write data.
- the 6-transistor bit cell portion 320 stores the filter weight.
- data is written to the M-BC 300 through the word line (WL) 310, the first bit line (BL) 312, and the second bit line (BLb) 314.
- the multiplying bit-cell 300 includes first CMOS switch SW1 and second CMOS switch SW2.
- First switch SW1 is depicted as being controlled by a first activation signal A (Ay) such that, when closed, SW1 couples one of the received differential output signals provided by the DRD-DAC 220, illustratively IA, to a first terminal of the capacitor C.
- Second switch SW2 is depicted as being controlled by a second activation signal Ab (Abij) such that, when closed, SW2 couples the other one of the received differential output signals of the corresponding DRD-DAC 220, illustratively I Ab, to the first terminal of the capacitor C.
- the second terminal of the capacitor C is connected to a compute line (CL).
- the input signals provided to the switches SW1 and SW2 may comprise a fixed voltage (e.g., Vaa), ground, or some other voltage level.
- the M-BC 300 can implement computation on the data stored in the 6-transistor bit cell portion 320.
- the result of a computation is driven as charge on the capacitor C.
- the capacitor C may be is positioned above the bit cell 300 and utilize no additional area on the circuit.
- a logic value of either Vdd or ground is driven on the capacitor C.
- the voltage driven on the capacitor C may comprise a positive or negative voltage in accordance with the operation of switches SW1 and SW2, and the output voltage level generated by the corresponding DRD-DAC 220.
- the charge (as a function of the driven voltage) that is stored on the capacitor C is highly stable, since the capacitor C value itself is highly stable and the driven voltage is highly stable (e.g., driven up to the supply voltage or down to ground).
- the capacitor C is a metal-oxide-metal (MOM) finger capacitor, and in some examples, the capacitor C is a 1.2 fF MOM capacitor.
- MOM capacitors have very good matching temperature and process characteristics, and thus have highly linear and stable compute operations. Note that other types of logic functions can be implemented using the M-BCs by changing the way the transistors 304 and/or switches SW1 and SW2 are connected and/or operated during the reset and evaluation phases of M-BC operation.
- the 6-transistor bit cell portion 320 is implemented using different numbers of transistors, and may have different architectures.
- the bit cell portion 320 can be a SRAM, DRAM, MRAM, or an RRAM.
- M-BCs multiplying bit-cells
- the IMC macro 200 is depicted as using one 8-bit analog to digital converter (ADC) for each of the columns of connected M-BCs w ithin the array 210. That is, the analog output signal provided by each of the illustratively 256 columns is individually converted by a respective 8-bit ADC to a respective 8-bit digital representation prior to further processing as discussed above and in the various related patent applications.
- ADC analog to digital converter
- bit-parallel processing can be employed, where the most-significant bit of the stored data is in the bit cells of one column, the next most-significant bit of the stored data is in the bit cells of the next column, and so on, all the way down to the least-significant bit of the stored data (typically bits of a stored data element will all be in the same row).
- each of the columns represents a component corresponding to a particular bit weighting of the computation output.
- the overall computation output can thus be derived by scaling each column output with a properly binary-weighted co-efficient, and then summing the different scaled column-output components.
- bit-weighting of data stored in the different columns need not be binary; this is readily supported by applying a corresponding scaling coefficient (not necessarily binary weighted) to each column output.
- the scaling and summation of computational result indicative outputs of multiple parallel columns or bit lines may be performed prior to or after the ADC. If done before the ADC, the scaling and summing operations must be applied on the corresponding analog signal, which could be a voltage, current, charge, etc.
- each element from VI is multiplied by each element from V2 and the totals are accumulated to achieve a result.
- Multiple bits of a vector V 1 stored in memory are is mapped to multiple columns, and input bits of input vector V2 are sequential provided to each of the columns for iterations of multiplication and bit shifting.
- Each column comprises the respective total voltage or stored charge associated with a weighted result (e.g., a bit position within a multiple bit word), illustratively a binary weighted result such as a 4-bit binary word (MSB, MSB-1, MSB-2, LSB) representing the result of a 4-bit input vector V2 being multiplied by each of the elements of a stored vector VI.
- a weighted result e.g., a bit position within a multiple bit word
- a binary weighted result such as a 4-bit binary word (MSB, MSB-1, MSB-2, LSB) representing the result of a 4-bit input vector V2 being multiplied by each of the elements of a stored vector VI.
- various embodiments provide for an analog domain scaling of the total voltage or stored charge associated with each column in accordance with its column weighting or scaling factor (e.g., bit position), an accumulation of the scaled voltage/charge of each column to provide an analog representation of the multiplication result (e.g., an analog voltage/charge level representing the result of the 4-bit input vector V2 being multiplied by each of the elements of a stored vector VI), which accumulate voltage/charge level is then subjected to A/D conversion to provide a digital representation of the final multiplication result.
- an accumulation of the scaled voltage/charge of each column to provide an analog representation of the multiplication result (e.g., an analog voltage/charge level representing the result of the 4-bit input vector V2 being multiplied by each of the elements of a stored vector VI), which accumulate voltage/charge level is then subjected to A/D conversion to provide a digital representation of the final
- FIGS. 4A-4B graphically depict mechanisms for scaling and summation of computational result indicative outputs useful in illustrating the various embodiments.
- each of the mechanisms is depicted as scaling and summing four computational result indicative outputs, where each output represents a respective one of four columns or bit lines presenting a voltage level associated with charge stored on a respective column of connected bit-cell output capacitors, the voltage level representing a respective weighted portion of an accumulated result such as binary-weighted portion of the accumulated result.
- columns b, b+1, b+2, b+3 represent binary-weighted data of an accumulated 4-bit computational result where the most significant bit (MSB) is represented by column b and the least significant bit (LSB) by column b+3.
- MSB most significant bit
- LSB least significant bit
- each of four IMC columns (IMCb through IMCb+3) provides a respective voltage signal or voltage level stored across a respective plurality of bit cell output capacitors forming the column, and representing a respective binary weighted portion of an accumulated result.
- each of four IMC columns may provide a current signal/level or some other type of signal/level to represent for each IMC column the respective binary weighted portion of the accumulated result (e.g., a signal such as a current or voltage signal provided by a buffer circuit, or by a resistor or transistor based voltage or charge divider circuit rather than an IMC output capacitor and/or capacitor-based voltage or charge divider circuit, etc.).
- a signal such as a current or voltage signal provided by a buffer circuit, or by a resistor or transistor based voltage or charge divider circuit rather than an IMC output capacitor and/or capacitor-based voltage or charge divider circuit, etc.
- other embodiments may use other types of weighting and/or scaling depending upon the application, the components selected for the IMC, and/or other factors.
- various embodiments provide a mechanism for selectively attenuating or amplifying the weighted signals (or whatever type used) according to their weighting factors so as to provide, after summation, a total signal level (voltage level, current level, charge level, etc.) representative of the accumulated result.
- the mechanism of FIG. 4A contemplates scaling and summation of accumulated weighted portions of a computational result prior to ADC processing.
- each of four IMC columns provides a respective voltage signal or voltage level stored across a respective plurality of bit cell output capacitors forming the column, and representing a respective binary weighted portion of an accumulated result.
- These voltage signals/levels are scaled to reflect their respective binary weighting with respect to each other.
- the scaled voltage levels are then summed together to provide a voltage level representing the accumulated result, which is converted to a digital representation by an ADC converter.
- FIG. 4B contemplates scaling and summation of accumulated weighted portions of a computational result in conjunction with ADC processing. Specifically, the scaling and summation functions discussed with respect to FIG. 4A are implemented by modifying various parameters of the operation of the ADC, as will be described in more detail below.
- FIGS. 4A-4B illustrate the case where four columns are combined before or within the ADC, in general any number of columns may be combined in this manner.
- scaling and summing before/within the ADC can be combined with scaling and summing across any number of outputs after the ADC; this involves applying and digital scaling co-efficient (which reduces to bit-wise shifting for binary weighting) and summing in the digital domain.
- this enables quantization-error effects to be optimally managed.
- FIGS. 4C-4D graphically depict mechanisms for scaling and summation of computational result indicative outputs useful in illustrating the various embodiments.
- the discussion above with respect to FIGS. 4A-4B is generally applicable to FIGS. 4C-4D.
- FIGS. 4C-4D contemplate a scaling function wherein the LSB column (b+3) is multiplied by a scaling factor of 1/2°, the next column (b+2) by a scaling factor of 1/2 1 , the next column (b+1) by a scaling factor of 1/2 2 , and the final column (b) by a scaling factor of 1/2 3 .
- the scaled voltage levels are then summed together to provide a voltage level representing the accumulated result, which is converted to a digital representation by an ADC converter.
- FIG. 5 graphically illustrates an exemplary IMC column within an array of M-BCs.
- each of a column of M-BCs 300 (1 through N) performs a multiplication of an input (lAi/IAbi through I Ax/IAbx) by a weighted value (IFb.i through ftb.x) to provide a respective result as an output voltage stored upon a respective output capacitor, which may be selectively couped to the output column line CLb.
- FIG. 5 depicts the use of switched capacitors, whereby a column accumulation (reduction) operation is performed via charge redistribution across capacitors in a particular column.
- individual bit-cell capacitors form the legs of a signal divider circuit such as a voltage/charge divider circuit, causing the output voltage (i. e. , node coupling all capacitors) to settle to the average across the voltage/charge divider inputs (i.e., driven side of the legs).
- a signal divider circuit such as a voltage/charge divider circuit
- the output voltage i. e. , node coupling all capacitors
- V Q/C
- a capacitor-based analog scaling and summing may be achieved via several approaches as will be discussed below; illustratively, (1) setting and shorting of the column capacitances, and (2) sampling the column voltages on auxiliary capacitance, and then setting and shorting the auxiliary capacitances (where the auxiliary capacitance may be combined with the ADC sample-and-hold circuit).
- Capacitance-based IMC typically involves two phases: (1) resetting, where the charge on all capacitors is reset by shorting the coupled node of the capacitors to a particular reference voltage; (2) evaluation, where the coupled node of the capacitors is released from shorting to the reference voltage, and the input legs of the signal divider circuit such as a voltage/charge divider circuit are driven (through the bit cells). Following this, each column output voltage can be sampled by an ADC for subsequent digitization.
- an additional phase can be added, which is denoted herein as scaling.
- coupling across all the column capacitors can be broken, to yield a remaining capacitance of scaled amount across the columns to be shorted together.
- the shorted capacitance across the columns can be sampled by an ADC for digitization. This approach is depicted in FIG. 6 for the case of binary -weighted scaling, as an example.
- FIG. 6 graphically depicts an example of binary -weighted scaling within the bit-cell array of an in-memory computing architecture useful in understanding the embodiments.
- FIG. 6 depicts an illustrative array of bit cell output capacitors for eight IMC rows (R1 through R8) by four IMC columns (CLb through CLb+3) of multiplying bit-cells, each of the IMC columns being selectively coupled to an input of an ADC via a respective switch (Sb through Sb+3).
- the array further includes additional switches S at each of CLb+3 between rows R7 and R8, CLb+2 between rows R6 and R7, and CLb+i between rows R4 and R5.
- the additional switches S are introduced into the columns at these locations to break/allow coupling of some of the column capacitors at different points in the columns.
- the additional switches S are closed, thereby enabling coupling of all capacitances in a column.
- the additional switches S are opened and the remaining column capacitances are shorted together by the column switches Sb through Sb+3 and the resulting signal provided to the ADC.
- column CLb with eight bit-cell capacitors is effectively weighted as twice that of column CLb+i with four bit-cell capacitors, which is effectively weighted as twice that of column CLb+2 with two bit-cell capacitors, which is effectively weighted as twice that of column CLb+3 with one bit-cell capacitor.
- the resulting voltage signal applied to the ADC represents a scaled accumulated output signal and can be digitized directly by the ADC to provide the digital representation of the accumulated output signal.
- parasitic offset switches SPO or other structures are added to the array to balance the total switch-related parasitic capacitances in the columns.
- the parasitic offset switches SPO or other structures my comprise functioning or non-functioning switches.
- a similar functioning or non-functioning (e.g., always closed) switch may be included in the substrate (e.g., VLSI substrate) used to form the bit-cell array.
- one or more of the other columns has formed into a corresponding location a parasitic offset switch SPO of similar structure such that column-to-column differences in capacitance are avoided.
- This technique may also be used with embodiments implementing weighting schemes other than binary weighting.
- the number and location of parasitic offset switches SPO may be modified according to fabrication technology and other factors, all that is relevant is that the parasitic offset switches SPO be formed in such a manner as to balance or offset the parasitic capacitances imparted to the circuitry by the additional switches S so as to avoid related scaling errors to the extent possible.
- the voltage of each set of column capacitors is first sampled via an auxiliary sampling capacitor within a signal divider circuit such as a voltage/charge divider circuit (i.e., a capacitor network configured for charge shanng/sampling), wherein the auxiliary sampling capacitor associated with a column has a value selected to produce a scaled output as appropriate to that column.
- the sampling capacitor may comprise an extra capacitor formed for each column, a sample-and-hold capacitor of the ADC itself (integrated within the ADC or separate from the ADC), or some other capacitor.
- signal associated with a particular column is sampled via the auxiliary capacitor of a charge divider circuit associated with that column, which capacitor may be selectively coupled to that column or divider circuit.
- Various embodiments contemplate the processing of signal associated with each column via a weighted input ADC; that is, an ADC with multiple inputs where each of those inputs may be weighted and the resulting weighted signals summed for ADC processing to provide thereby a digital output signal.
- FIGS. 7-9 depict circuit diagrams of various embodiments of binary -weighted scaling proximate or within an output ADC of an in-memory computing architecture. It is noted that while the embodiments of FIGS. 7-9 are generally depicted and described as processing voltage signals provided by charge stored across bit cell output capacitors such as described above, the embodiments may also be used to process other types of signals (e.g., voltage, cunent, etc.) such as previously discussed with respect to FIGS. 4A-4B.
- signals e.g., voltage, cunent, etc.
- FIG. 7 depicts a circuit diagram useful in understanding various embodiments.
- the circuit 700 of FIG. 7 contemplates a plurality of capacitive circuits (e.g., four), each capacitive circuit operative to share a portion of charge stored across a respective plurality of bit cell output capacitors with respective sampling or auxiliary capacitor(s) to provide thereat a respective voltage output signal representative of a respective weighted portion of an accumulated result, wherein a voltage sampled across a sampling or auxiliary capacitor(s) is provided to the ADC for further processing.
- a plurality of capacitive circuits e.g., four
- each capacitive circuit operative to share a portion of charge stored across a respective plurality of bit cell output capacitors with respective sampling or auxiliary capacitor(s) to provide thereat a respective voltage output signal representative of a respective weighted portion of an accumulated result, wherein a voltage sampled across a sampling or auxiliary capacitor(s) is provided to the ADC for further processing.
- sampling results in scaling of the sampled voltage by a factor of CCOL/(CCOL+CAUX), where CCOL is the total column capacitance and CAUX is the auxiliary sampling capacitance. This makes it important to ensure that CCOL and CAUX are well matched across the columns and that CAUX be adequately discharged at the start, to alleviate errors. Then, CAUX is subsequently broken into binary-weighted components, so that the properly binary -weighted components are then shorted together for accurate scaling and summing.
- the capacitance of the charge divider circuits is important where sharing capacitance is a mechanism of scaling (binary weighted or otherwise) in the case of a charge sharing event, such as sharing of charge stored across a plurality of bit-cell output capacitors with a corresponding capacitor voltage/charge divider circuit.
- load balancing capacitors are used (such as depicted in FIGS. 7-9) to ensure that each capacitor charge divider circuit has substantially the same capacitance.
- scaling is achieved by other means alone or in combination.
- scaling of each weighted-data-representative analog signal may be achieved via charge, voltage, current, or impedance scaling techniques depending upon the nature of the analog signal to be scaled (e.g., using weighted or binary weighted capacitor divider networks, resistor divider networks, and so on).
- charge divider circuits based on capacitive charge sharing or redistribution so as to scale charge-based or voltagebased weighted-data-representative analog signals.
- each of a plurality of weighted-data-representative analog signals (e.g., binary weighted by column) is scaled such that the analog signal contribution (charge, voltage, current, etc.) of a particular weighted-data-representative analog signal to the total or accumulated signal level of all the various weighted-data-representative analog signals is proportional to the weight of that data-representative analog signal (e.g., the weight associated with the column position of that data-representative analog signal).
- the scaling circuits may comprise resistive scaling or signal dividing components, transistor scaling or signal dividing components, or some other scaling or signal dividing components suitable for indicating respective weighting/scaling of charge levels or signals indicative of charge levels (e.g., voltage/charge divider circuit, charge sharing network, and the like).
- the load-balancing capacitors need not be used, since the settled signal does not depend upon capacitive loading.
- the sampled signal from column CLb is given twice the weight as that of column CLb+i, which is given twice the weight as that of column CLb+2, which is given twice the weight as that of column CLb+i.
- the various switches are controlled to cause the capacitance of the voltage/charge divider circuit for each column to be the same (i.e., C), but the sampling or auxiliary capacitor for each voltage/charge divider circuit is different.
- the sampling or auxiliary capacitor for column CLb is C (C/2 + C/2), for column CLb+i is C/2, for column CLb+2 is C/4, and for column CLb+3 is C/8.
- each of the sampling or auxiliary capacitors represents the respective scaled portion of the accumulated result, and by connecting each of the sampling or auxiliary capacitors of the columns together and providing that signal to the ADC a digital representation of the accumulated result may be generated.
- the capacitance of the voltage/charge divider circuit for each column to be the same so that the error from a charge sharing event is equalized across the voltage/charge divider circuits so as to avoid any relative error between the voltage/charge divider circuits.
- FIG. 8 depicts a circuit diagram useful in understanding various embodiments.
- FIGS. 8-9 depict the voltage/charge divider circuitry of FIG. 7, wherein the voltage sampled across all the sampling or auxiliary capacitors is combined during a charge sharing event (e.g., during a measurement or evaluation phase of operation) into the sample- and-hold (SH) of a successive-approximation-register (S AR) ADC, wherein the SH also serves as a feedback digital-to-analog converter (DAC).
- DAC digital-to-analog converter
- FIGS. 8-9 depict an 8-bit ADC receiving an accumulated input voltage associated with only four weighted input signals. If eight weighted input signals were processed by the 8-bit ADC, then each of the eight weighted input signals would be initially scaled by a respective divider circuit.
- each of the additional four (e.g., LSB) voltage/charge divider circuits would also be the same as the initial four (e.g., MSB) voltage/charge divider circuits, and the respective sampling or auxiliary capacitors would be scaled accordingly (e.g., C/16, C/32, C/64, AND C/128 assuming four columns representing the next four LSB values of an accumulated result).
- the S/H is integrated into the ADC.
- the SAR ADC comprises a feedback circuit wherein a digital to analog converter (DAC) is adjusted via differing digital input signals provided by SAR logic to ultimately produce a DAC output voltage that corresponds to the analog input voltage provided to the ADC, thereby determining the digital word or bits representing the analog input voltage to the ADC.
- DAC digital to analog converter
- the analog input voltage is sampled at the bottom plate of each of the sampling capacitors of each voltage/charge divider circuit (i.e., capacitors denoted as C, C/2, C/4 and C/8).
- the voltage associated with the feedback code of the DAC is then successively applied to the other plate of the capacitors and, in doing so, causes a binary weighted signal to be produced thereat for comparison purposes (i.e., for determining the ADC output value).
- the circuit 800 of FIG. 8 contemplates that an ADC SH/DAC is partitioned into four segments, for taking inputs from four IMC columns, as an example. Each of the four segments has equal capacitance, to ensure the relative sampling error/ scaling is not significant. Each of the four segments is then further divided into a portion that is processed by the ADC for digitization, and a portion that is not processed. The portion that is further processed corresponds to a binary-weighted capacitance across the columns. Each column output is sampled onto one side of each segment, and only the portions that are further processed are then subsequently coupled together on the other side. The remaining portion is left uncoupled (remaining shorted to at a reference voltage) on the other side, and is subsequently discharged before future sampling.
- FIG. 10 depicts circuit diagrams of various embodiments of binary-weighted current divider scaling circuitry suitable for use in the various embodiments. It can be seen that a weighted-data-representative analog signal from MSB column CLb is effectively weighted as twice that of column CLb+i, which is effectively weighted as twice that of column CLb+2, which is effectively weighted as twice that of column CLb+3.
- the above-described embodiments utilize approaches to scaling and summing before the ADC and have a primary benefit of the ADC being shared across summed columns within the context of in-memory computing embodiments. This allows the ADC energy and area consumption to be amortized.
- analog scaling summing is that the total dynamic range of the signal to be digitized by the ADC is increased.
- the ADC which then performs quantization of the signal to a particular resolution, therefore introduces quantization error.
- the quantization error is mitigated somewhat relative to post-ADC scaling and summing, where each column output incurs quantization (i.e., the analog residue cannot be recovered after each columnoutput digitization, whereas pre-ADC scaling and summing incurs one quantization event); however, post-ADC scaling and summation has a net benefit on quantization error due to the low energy/area cost of digital bit-growth.
- the quantization error of pre-ADC scaling and summing can be reduced by increasing the ADC resolution, at the cost of ADC energy/area overhead.
- the various embodiments while primarily described within the context of binary- weighted scaling factors, are suitable for use in arbitrary analog scaling of columns values. That is, while the disclosure primarily describes a structure where the columns feeding a shared ADC have binary-weighted scaling factors, it should be understood that arbitrary scaling factors could be used.
- scaling factors may be configurable. It is noted that a primary benefit of non-binary-weighted scaling factors is that that alternate number formats (i.e. , non-binary integers) may then be used for the matrix weights stored in the memory cells. This is valuable because quantized neural networks may exploit alternate number formats (e.g., where bit positions represent powers of 1.5, 4, etc., instead of 2) to optimize how weight dynamic-range tradeoffs are managed.
- alternate number formats i.e. , non-binary integers
- equal scaling factors may be used to increase the total charge signal relative to a single column computation. In this manner, mitigating the impacts of different sources of charge noise may be achieved.
- configurability of the scaling factor enables the above two features on a dynamic basis, where for instance different in-memory computations scheduled during execution time may be thus optimized.
- Such configurability requires configurable capacitor setting across the columns, which could be achieved using capacitive digital-to-analog converters (DACs) coupled to the different column outputs, thus providing digital configuration control.
- DACs digital-to-analog converters
- the various embodiments contemplate compensating capacitance mismatch across columns. Specifically, in cases where column scaling is determined by the relative ratio of capacitances across the columns, deviations in the relative ratios due to parasitic capacitances can lead to computation errors. This is overcome in various embodiments via any of a plurality of practical approaches, as discussed herein.
- the critical capacitances are matched through careful layout and parasitic-capacitance estimation.
- the layout features impacting the parasitic capacitances is matched within the array and array periphery, such as on a substrate or layer of a very large scale integrated (VLSI) circuit during fabrication.
- VLSI very large scale integrated
- capacitance DACs may be coupled to each of the column outputs to enable trim-able capacitive loading that introduces linearly adjustable voltage attenuation, to compensate mismatches in the parasitic capacitances.
- some of the various embodiments are directed to IMC computing architecture, apparatus, methods, and portions thereof configured to acquire the computational result indicative outputs of multiple parallel columns or bit lines in a manner avoiding the use of individual analog-to-digital converters (ADCs) for each column or bit line. That is, rather than converting the analog output signal associated with each bit line or column to a respective digital representation suitable for further processing within the IMC computing architecture, the various embodiments perform some of this further processing using the analog output signals associated with the bit lines or columns so as to reduce the number of ADCs needed to implement the functions of the IMC computing architecture while retaining analog output signal accuracy (i.e., reducing the impact of ADC quantization errors and other errors).
- ADCs analog-to-digital converters
Landscapes
- Engineering & Computer Science (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Computer Hardware Design (AREA)
- Analogue/Digital Conversion (AREA)
- Static Random-Access Memory (AREA)
Abstract
Description
Claims
Priority Applications (6)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| KR1020257033541A KR20250153872A (en) | 2022-05-16 | 2022-05-16 | Shared column adcs for in-memory-computing macros |
| JP2024568086A JP2025519052A (en) | 2022-05-16 | 2022-05-16 | Shared Column ADC for In-Memory Computing Macros |
| PCT/US2022/029438 WO2023224596A1 (en) | 2022-05-16 | 2022-05-16 | Shared column adcs for in-memory-computing macros |
| EP22942895.8A EP4487329A4 (en) | 2022-05-16 | 2022-05-16 | Shared column ADCs for in-memory macros |
| CN202280096051.1A CN119585795A (en) | 2022-05-16 | 2022-05-16 | Shared column ADC for in-memory computation macros |
| KR1020247040610A KR20240175728A (en) | 2022-05-16 | 2022-05-16 | Shared columns for in-memory computing macros |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/US2022/029438 WO2023224596A1 (en) | 2022-05-16 | 2022-05-16 | Shared column adcs for in-memory-computing macros |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023224596A1 true WO2023224596A1 (en) | 2023-11-23 |
Family
ID=88835679
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2022/029438 Ceased WO2023224596A1 (en) | 2022-05-16 | 2022-05-16 | Shared column adcs for in-memory-computing macros |
Country Status (5)
| Country | Link |
|---|---|
| EP (1) | EP4487329A4 (en) |
| JP (1) | JP2025519052A (en) |
| KR (2) | KR20240175728A (en) |
| CN (1) | CN119585795A (en) |
| WO (1) | WO2023224596A1 (en) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190042199A1 (en) * | 2018-09-28 | 2019-02-07 | Intel Corporation | Compute in memory circuits with multi-vdd arrays and/or analog multipliers |
| US20190080231A1 (en) * | 2017-09-08 | 2019-03-14 | Analog Devices, Inc. | Analog switched-capacitor neural network |
| US20210158854A1 (en) * | 2019-11-27 | 2021-05-27 | Taiwan Semiconductor Manufacturing Company, Ltd. | Compute in memory system |
| US20210240442A1 (en) * | 2020-01-31 | 2021-08-05 | Qualcomm Incorporated | Low-power compute-in-memory bitcell |
| US20210271597A1 (en) * | 2018-06-18 | 2021-09-02 | The Trustees Of Princeton University | Configurable in memory computing engine, platform, bit cells and layouts therefore |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9697877B2 (en) * | 2015-02-05 | 2017-07-04 | The Board Of Trustees Of The University Of Illinois | Compute memory |
| JP7587823B2 (en) * | 2018-06-18 | 2024-11-21 | ザ、トラスティーズ オブ プリンストン ユニバーシティ | Configurable in-memory computing engine, platform, bit cell, and layout therefor |
| EP4091048A4 (en) * | 2020-02-05 | 2024-05-22 | The Trustees of Princeton University | Scalable array architecture for in-memory computing |
| US11586896B2 (en) * | 2020-03-02 | 2023-02-21 | Infineon Technologies LLC | In-memory computing architecture and methods for performing MAC operations |
-
2022
- 2022-05-16 KR KR1020247040610A patent/KR20240175728A/en active Pending
- 2022-05-16 EP EP22942895.8A patent/EP4487329A4/en active Pending
- 2022-05-16 WO PCT/US2022/029438 patent/WO2023224596A1/en not_active Ceased
- 2022-05-16 CN CN202280096051.1A patent/CN119585795A/en active Pending
- 2022-05-16 JP JP2024568086A patent/JP2025519052A/en active Pending
- 2022-05-16 KR KR1020257033541A patent/KR20250153872A/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190080231A1 (en) * | 2017-09-08 | 2019-03-14 | Analog Devices, Inc. | Analog switched-capacitor neural network |
| US20210271597A1 (en) * | 2018-06-18 | 2021-09-02 | The Trustees Of Princeton University | Configurable in memory computing engine, platform, bit cells and layouts therefore |
| US20190042199A1 (en) * | 2018-09-28 | 2019-02-07 | Intel Corporation | Compute in memory circuits with multi-vdd arrays and/or analog multipliers |
| US20210158854A1 (en) * | 2019-11-27 | 2021-05-27 | Taiwan Semiconductor Manufacturing Company, Ltd. | Compute in memory system |
| US20210240442A1 (en) * | 2020-01-31 | 2021-08-05 | Qualcomm Incorporated | Low-power compute-in-memory bitcell |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4487329A4 * |
Also Published As
| Publication number | Publication date |
|---|---|
| KR20250153872A (en) | 2025-10-27 |
| JP2025519052A (en) | 2025-06-24 |
| EP4487329A4 (en) | 2025-11-12 |
| CN119585795A (en) | 2025-03-07 |
| EP4487329A1 (en) | 2025-01-08 |
| KR20240175728A (en) | 2024-12-20 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11714749B2 (en) | Efficient reset and evaluation operation of multiplying bit-cells for in-memory computing | |
| Lee et al. | Fully row/column-parallel in-memory computing SRAM macro employing capacitor-based mixed-signal computation with 5-b inputs | |
| Pelgrom | Analog-to-digital conversion | |
| Hsieh et al. | 7.6 A 70.85-86.27 TOPS/W PVT-insensitive 8b word-wise ACIM with post-processing relaxation | |
| KR101689053B1 (en) | A/d converter | |
| TWI470938B (en) | Capacitive voltage divider | |
| Khaddam-Aljameh et al. | An SRAM-based multibit in-memory matrix-vector multiplier with a precision that scales linearly in area, time, and power | |
| US20120081243A1 (en) | Digital-to-analog converter, analog-to-digital converter including same, and semiconductor device | |
| CN110086468A (en) | A kind of weight calibration method of nonbinary gradual approaching A/D converter | |
| CN115080501A (en) | SRAM (static random Access memory) storage integrated chip based on local capacitance charge sharing | |
| EP4035032A1 (en) | Successive bit-ordered binary-weighted multiplier-accumulator | |
| US20230370082A1 (en) | Shared column adcs for in-memory-computing macros | |
| Jo et al. | DenseCIM: Binary weighted-capacitor SRAM computation-in-memory with column-by-column dynamic range calibration SAR ADC | |
| CN113922819A (en) | One-step two-bit successive approximation type analog-to-digital converter based on background calibration | |
| Kung et al. | A low energy consumption 10-bit 100kS/s SAR ADC with timing control adaptive window | |
| US11032501B2 (en) | Low noise image sensor system with reduced fixed pattern noise | |
| Jun et al. | IC Design of 2Ms/s 10-bit SAR ADC with Low Power | |
| EP4487329A1 (en) | Shared column adcs for in-memory-computing macros | |
| Shin et al. | A charge-domain computation-in-memory macro with versatile all-around-wire-capacitor for variable-precision computation and array-embedded DA/AD conversions | |
| CN110535467B (en) | Capacitor array calibration method and device of stepwise approximation type analog-to-digital conversion device | |
| Rasul et al. | A 128x128 SRAM macro with embedded matrix-vector multiplication exploiting passive gain via MOS capacitor for machine learning application | |
| KR20250161664A (en) | Shared column adcs for in-memory-computing macros | |
| CN114696834B (en) | Successive approximation analog-to-digital converter, test equipment and capacitance weight value calibration method | |
| Wei et al. | A 28nm Static-Power-Free Fully-Parallel RRAM-Based TD CIM Macro With 1982TOPS/W/bit for Edge Applications | |
| US20240211536A1 (en) | Embedded matrix-vector multiplication exploiting passive gain via mosfet capacitor for machine learning application |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22942895 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2022942895 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2022942895 Country of ref document: EP Effective date: 20241004 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 202280096051.1 Country of ref document: CN |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2024568086 Country of ref document: JP |
|
| ENP | Entry into the national phase |
Ref document number: 20247040610 Country of ref document: KR Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 1020247040610 Country of ref document: KR |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWP | Wipo information: published in national office |
Ref document number: 202280096051.1 Country of ref document: CN |
|
| WWD | Wipo information: divisional of initial pct application |
Ref document number: 1020257033541 Country of ref document: KR |
|
| WWP | Wipo information: published in national office |
Ref document number: 1020257033541 Country of ref document: KR |