US20180341623A1 - Matrix circuits - Google Patents
Matrix circuits Download PDFInfo
- Publication number
- US20180341623A1 US20180341623A1 US16/052,516 US201816052516A US2018341623A1 US 20180341623 A1 US20180341623 A1 US 20180341623A1 US 201816052516 A US201816052516 A US 201816052516A US 2018341623 A1 US2018341623 A1 US 2018341623A1
- Authority
- US
- United States
- Prior art keywords
- matrix
- values
- memory array
- coupled
- circuit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C11/00—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
- G11C11/21—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
- G11C11/24—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using capacitors
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C11/00—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
- G11C11/21—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
- G11C11/34—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
- G11C11/40—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
- G11C11/401—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming cells needing refreshing or charge regeneration, i.e. dynamic cells
- G11C11/4063—Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing or timing
- G11C11/407—Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing or timing for memory cells of the field-effect type
- G11C11/408—Address circuits
- G11C11/4087—Address decoders, e.g. bit - or word line decoders; Multiple line decoders
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C11/00—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
- G11C11/21—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
- G11C11/34—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
- G11C11/40—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
- G11C11/401—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming cells needing refreshing or charge regeneration, i.e. dynamic cells
- G11C11/4063—Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing or timing
- G11C11/407—Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing or timing for memory cells of the field-effect type
- G11C11/409—Read-write [R-W] circuits
- G11C11/4093—Input/output [I/O] data interface arrangements, e.g. data buffers
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C11/00—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
- G11C11/21—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
- G11C11/34—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
- G11C11/40—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
- G11C11/401—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming cells needing refreshing or charge regeneration, i.e. dynamic cells
- G11C11/4063—Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing or timing
- G11C11/407—Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing or timing for memory cells of the field-effect type
- G11C11/409—Read-write [R-W] circuits
- G11C11/4096—Input/output [I/O] data management or control circuits, e.g. reading or writing circuits, I/O drivers or bit-line switches
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C13/00—Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00
- G11C13/0002—Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00 using resistive RAM [RRAM] elements
- G11C13/0021—Auxiliary circuits
- G11C13/0023—Address circuits or decoders
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C13/00—Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00
- G11C13/0002—Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00 using resistive RAM [RRAM] elements
- G11C13/0021—Auxiliary circuits
- G11C13/003—Cell access
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C13/00—Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00
- G11C13/0002—Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00 using resistive RAM [RRAM] elements
- G11C13/0021—Auxiliary circuits
- G11C13/004—Reading or sensing circuits or methods
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C13/00—Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00
- G11C13/0002—Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00 using resistive RAM [RRAM] elements
- G11C13/0021—Auxiliary circuits
- G11C13/0069—Writing or programming circuits or methods
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C7/00—Arrangements for writing information into, or reading information out from, a digital store
- G11C7/10—Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
- G11C7/1006—Data managing, e.g. manipulating data before writing or reading out, data bus switches or control circuits therefor
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C8/00—Arrangements for selecting an address in a digital store
- G11C8/16—Multiple access memory array, e.g. addressing one storage element via at least two independent addressing line groups
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C2213/00—Indexing scheme relating to G11C13/00 for features not covered by this group
- G11C2213/70—Resistive array aspects
- G11C2213/74—Array wherein each memory cell has more than one access device
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C2213/00—Indexing scheme relating to G11C13/00 for features not covered by this group
- G11C2213/70—Resistive array aspects
- G11C2213/79—Array wherein the access device being a transistor
Definitions
- Matrices are arrays of elements of any suitable size, and these elements may represent data, relationships between data values, transformations to be applied to data, and more. Owing to their flexibility and utility, matrices are used in a wide range of real-world applications. In computing examples, matrices are used to store and manipulate sensor data for controlling automated manufacturing, scientific data for research and modeling, demographic data, other statistical data, and other data types. Matrices are also used extensively in computer graphics. For example, a bitmap is a specific type of matrix in which each entry is a pixel value. As a further example, a digital filter for image processing may be represented as a matrix in which each entry is a transformation to be applied to a portion of an image or frame. A wide body of algebraic operations have been developed to manipulate and analyze matrices and their contents, and because they are utilized with such frequency, computing systems may include dedicated hardware for handling matrices and performing these operations.
- FIGS. 1A-1D are block diagrams of a circuit at various points in time according to some examples of the present disclosure.
- FIG. 2 is a schematic diagram of a memory array according to some examples of the present disclosure.
- FIG. 3 is a schematic diagram of a resistive memory array according to some examples of the present disclosure.
- FIG. 4 is a flow diagram of a method of performing a matrix operation according to some examples of the present disclosure.
- FIG. 5 is a flow diagram of a method of performing a matrix multiplication operation according to some examples of the present disclosure.
- FIG. 6 is a block diagram of a matrix processor circuit according to some examples of the present disclosure.
- FIG. 7 is a block diagram of a processing circuit according to some examples of the present disclosure.
- FIG. 8 is a block diagram of a matrix processor circuit according to some examples of the present disclosure.
- a processor may include dedicated circuitry for operating on matrices.
- a matrix processor includes a memory array to store a matrix of multi-bit elements and circuitry to multiply the matrix by a vector.
- the memory array may be configured to read a subset of the bits that make up a row's elements, where the values in the subset have a common place value (e.g., 2 0 , 2 1 , 2 2 , etc.).
- a first subset may include the least significant bit of each element in the row
- a second subset may include the next least significant bit of each element in the row, and so on.
- Each subset may be provided to a multiplier that multiplies the subset's values by a corresponding value from a subset of a vector.
- the resultant products of this multiplication are summed by a summing unit to produce a dot product of the matrix subset and the vector subset.
- the sum or dot product is provided to a shifting unit where it is shifted according to the place value of the matrix subset and/or the place value of the vector subset and added to a running total.
- the matrix processor may repeat the process for each subset of each row of the matrix and/or each subset of the vector such that the running totals represent the matrix-vector product.
- the sets of vector values are part of a second matrix, and the process is repeated to multiply the first matrix by the second matrix.
- the present disclosure provides substantial real-world improvements to the operation and efficiency of a matrix processor.
- the memory array is structured to store matrices of variable sizes and bit-lengths. This provides additional flexibility without changing the underlying structure of the matrix processor.
- the memory array is structured to support multiple reads between refreshes of the array's memory cells. This may reduce the overhead of maintaining data in the memory array.
- the matrix processor is structured to read and process the matrix a column at a time to parallelize a significant portion of the processing.
- the matrix processor is structured to read multiple columns at a time to further parallelize the matrix operations.
- FIGS. 1A-1D are block diagrams of a circuit 100 at various points in time according to some examples of the present disclosure.
- the figures illustrate an example matrix operation performed by the circuit 100 that includes multiplying a matrix 102 by a vector 104 to produce a result vector 106 , and the figures include graphical representations of the matrix 102 , vector 104 , and result vector 106 for clarity of explanation.
- the matrix 102 includes elements A-H, each of which has two bits indicated by subscripts, although suitable matrices may be of any size and have any bit-length.
- the example vector 104 includes elements I-L, each of which has two bits indicated by subscripts and may also be of any size and bit-length.
- the result vector 106 includes elements M-N, each of which has six bits as determined by the matrix 102 and the vector 104 .
- the matrix 102 , the vector 104 , and the result vector 106 may be of any suitable size.
- the circuit 100 that performs the operation includes a memory array 108 communicatively coupled to a matrix processing unit 110 that includes hardware, software, and/or combinations thereof to perform matrix operations and to perform other processing tasks.
- the matrix processing unit 110 includes a multiplier 112 , a summing unit 114 , a shifting unit 116 , and a result store 118 , each of which is described in more detail below.
- the array 108 includes a plurality of memory cells 120 to store a matrix.
- Each memory cell 120 may include any suitable programmable element, and in various examples, the memory cells 120 include Dynamic Random Access Memory (DRAM) elements, Static RAM (SRAM) elements, memristive elements, non-volatile memory elements, flash and/or other solid-state memory elements, magnetic memory elements, etc.
- the memory array 108 may include any number of memory cells 120 arranged to store a matrix of any size, and the number of memory cells 120 may exceed the size and number of bits of the matrix. For clarity, the matrix values are illustrated within their respective memory cell 120 .
- the matrix elements may be allocated to the memory cells 120 such that when a column of memory cells 120 are read, the values in the column correspond to a row of the matrix (e.g., matrix row A-D) and have the same place value (e.g., 2 1 ).
- the memory array 108 includes a set of ports 121 to provide the matrix elements on a set of data lines 122 when the memory cells 120 are read.
- the memory array 108 is structured so that when a column of memory cells 120 are read, the corresponding values on the data lines 122 correspond to a row of the matrix and share a common place value. For example, referring to FIG. 1A , a first set of values containing elements A 1 , B 1 , C 1 , and D 1 are present on the data lines 122 during a first read.
- the multiplier 112 performs an element-by-element multiplication of the matrix sets received from the data lines 122 by the vector sets.
- the multiplier 112 includes a plurality of AND devices 124 to multiply each value in a given matrix set by a corresponding value in a given vector set. In this way, the multiplier 112 produces a third set of values.
- the multiplier 112 multiplies the first set of values containing elements A 1 , B 1 , C 1 , and D 1 by a second set of values that contains elements I 1 , J 1 , K 1 , and L 1 to produce the third set of values.
- the summing unit 114 includes any combination of hard-coded logic, programmable logic, and/or software to receive the third set of values and to sum the values therein.
- the summing unit 114 includes a Hamming weight engine, which counts the number of non-zero values in the third set in order to sum the binary values.
- the Hamming weight engine may be implemented as a tree of adders 126 or other suitable structure.
- the summing unit 114 produces a sum that represents the dot product of the sets. Referring to the example of FIG. 1A , the sum represents the dot product of the first set of values containing elements A 1 , B 1 , C 1 , and D 1 and the second set of values containing elements I 1 , J 1 , K 1 , and L 1 .
- the summing unit 114 provides the sum to the shifting unit 116 , which may shift the sum one or more positions based on the place value of the set of matrix values and/or the place value of the set of vector values.
- the shifting unit 116 shifts the sum once based on the place value of the first set of values (2 1 ) and once again based on the place value of the second set of values (2 1 ).
- the shifting unit 116 may then add the shifted sum to a running total.
- the running total is updated as each set of the matrix 102 is multiplied by each set of the vector 106 .
- the shifting unit 116 includes any suitable combination of hard-coded logic, programmable logic, and/or software to shift the received sum and to maintain the running total.
- the processes may be repeated for each set of matrix values in the row and each set of vector values in the vector.
- the memory array 108 provides a set of matrix values containing elements A 0 , B 0 , C 0 , and D 0 on the data lines 122 .
- the multiplier 112 receives the set of matrix values and multiplies it by the set of vector values containing elements I 1 , J 1 , K 1 , and L 1 .
- the summing unit 114 sums the elements of the resultant set, and the shifting unit 116 shifts the sum once based the place value of the set of vector values (2 1 ) before adding the shifted sum to the running total.
- the memory array 108 provides the set of matrix values containing elements A 1 , B 1 , C 1 , and D 1 on the data lines 122 again.
- the memory array 108 is configured to read and provide the set multiple times between refreshes of the memory cells 120 that contain the set. This may reduce the overhead of maintaining data in the memory array 108 .
- the multiplier 112 multiplies this set of matrix values by a set of vector values containing elements I 0 , J 0 , K 0 , and L 0 .
- the summing unit 114 sums the elements of the resultant set, and the shifting unit 116 shifts the sum once based the place value of the set of matrix values (2 1 ) before adding the shifted sum to the running total.
- the memory array 108 provides the set of matrix values containing elements A 0 , B 0 , C 0 , and D 0 on the data lines 122 again.
- the multiplier 112 multiplies this set of matrix values by the set of vector values containing elements I 0 , J 0 , K 0 , and L 0 .
- the summing unit 114 sums the elements of the resultant set, and the shifting unit 116 adds the unshifted sum to the running total based on the place value of the set of matrix values and the set of vector values.
- the running total at this point in time represents an element (e.g., element M) of a vector result of the matrix-vector operation, and the running total is provided to a result store 118 for storing.
- the result store 118 may include any suitable programmable element for storing the results. After storing the running total in the result store 118 , the running total may be reset.
- the processes may be repeated for each row in the matrix 102 to determine the remaining element(s) in the result vector 106 .
- the input vector is a subset of a second matrix.
- the circuit 100 repeats the processes for each row in the first matrix and each column in the second matrix.
- FIG. 2 is a schematic diagram of a memory array 200 according to some examples of the present disclosure.
- the memory array 200 is suitable for use as the memory array 108 of FIG. 1 and includes any number of memory cells 201 , each of which may be substantially similar to those described above. While the illustrated memory array 200 includes a limited number of memory cells 201 , in other examples, the memory array 200 includes any number of memory cells 201 .
- Each memory cell 201 includes a programmable element such as a capacitor 202 .
- the capacitor 202 may store a charge, and a magnitude or presence of the charge may represent a binary value of a corresponding element of a matrix 102 .
- each memory cell 201 may be coupled to a plurality of control lines within the memory array 200 .
- the memory array 200 may include a set of row write enable lines 204 and a set of column write lines 206 .
- Each memory cell 201 may include an access transistor 208 with a source and drain coupled between the capacitor 202 and a corresponding column write line 206 .
- the gate of the access transistor 208 is coupled to a corresponding row write enable line 204 .
- a write controller 210 of the memory array 200 may program the memory cells 201 a row at a time by enabling the row write enable line 204 for the row and by applying voltages on the column write lines 206 corresponding to the values to be stored in the memory cells 201 .
- control lines include a set of column read enable lines 212 and a set of row read lines 214 .
- Each memory cell 201 may include a pass transistor 216 with a source and drain coupled between a corresponding column read enable line 212 and a corresponding row read line 214 .
- the gate of the pass transistor 216 is coupled to the capacitor 202 such that the mode of operation of the pass transistor 216 is determined by the magnitude of the charge stored in the capacitor 202 .
- a read controller 218 of the memory array 200 may read the memory cells 201 a column at a time by providing a voltage on the corresponding column read enable line 212 and detecting voltage and/or current on the set of row read lines 214 .
- the read controller 218 may amplify, smooth, and/or otherwise shape the signals on the row read lines 214 before providing them at ports 220 of the memory array 200 .
- the ports 220 produce signals on the data lines 222 that represent the values in the corresponding memory cells 201 at voltages and forms suitable for use in driving other circuitry.
- the capacitor 202 drives the gate of the pass transistor 216 , the capacitor 202 charge is not significantly dissipated by the reading process and multiple reads of a memory cell 201 may be performed without refreshing the capacitor 202 charge.
- FIG. 3 is a schematic diagram of a resistive memory array 300 according to some examples of the present disclosure.
- the memory array 300 is suitable for use as the memory array 108 of FIG. 1 and includes any number of memory cells 301 , each of which may be substantially similar to those described above.
- Each memory cell 301 includes a resistive memory device 302 , such as a memristor.
- a resistive memory device 302 may have more than one resistance/conductance state, and the state of the device may be used to store data.
- the conductive state of a resistive memory device represents a binary value of a corresponding element of the matrix 102 .
- the conductive state of the resistive memory device 302 may be set by applying a voltage that exceeds a write threshold to the resistive memory device 302 . Accordingly, the resistive memory device 302 may be coupled to a row read/write line 304 and coupled by an access transistor 306 to a column write line 308 . The gate of the access transistor 306 may be coupled to a column write enable line 310 .
- a write controller 312 of the memory array 300 may program the state of the resistive memory devices 302 a row at a time by applying a voltage differential across the row read/write line 304 and each of the column write lines 308 that exceeds the write threshold of the resistive memory devices 302 . The write controller 312 selects which cells 301 in the row to write by selectively enabling the column write enable lines 310 to activate the respective access transistors 306 .
- the state of a resistive memory device 302 may be read by applying a voltage less than the write threshold to the resistive memory device 302 and detecting the current induced by the voltage.
- the resistive memory device 302 may be coupled to the row read/write line 304 .
- the resistive memory device 302 may be coupled to a row read line 314 by a pass transistor 316 with a gate of the pass transistor 316 coupled to a column read enable line 318 .
- a read controller 320 of the memory array 300 may read the memory cells 301 a column at a time by enabling the column read enable lines 318 , applying a voltage differential across the each of the row read/write lines 304 and each of the row read lines 314 , and detecting voltage and/or current on the set of row read/write lines 304 and/or the set of row read lines 314 due to the applied voltage differential.
- the read controller 320 may amplify, smooth, shape, and/or otherwise translate the detected voltage or current to produce signals at the ports 322 and on the data lines 324 that represent the values in the corresponding memory cells 301 at voltages and forms suitable for use in driving other circuitry.
- FIG. 4 is a flow diagram of a method 400 of performing the matrix operation according to some examples of the present disclosure.
- the description of the method 400 is non-limiting, and steps may be added to and omitted from the method 400 without departing from the disclosure.
- processes of the method 400 may be performed in any order including being performed concurrently by one or more entities.
- a first set of values of a matrix are read from a memory array 108 .
- the first set of values of the matrix are multiplied by a second set of values of a vector to provide a third set of values.
- the elements of the third set of values are summed to produce a sum.
- the sum is shifted based on a place value of the first set of values or a place value of the second set of values, and referring to block 410 , the shifted sum is added to a running total.
- FIG. 5 illustrated is a flow diagram of a method 500 of performing a matrix multiplication operation according to some examples of the present disclosure.
- the description of the method 500 is non-limiting, and steps may be added to and omitted from the method 500 without departing from the disclosure. Furthermore, unless noted otherwise, processes of the method 500 may be performed in any order including being performed concurrently by one or more entities.
- the method 500 is suitable for performing using the circuit 100 of FIGS. 1A-1D, 2, and 3 and/or any other suitable circuit or device.
- a read controller of a memory array 108 reads a column of the array's memory cells 120 to produce a first set of values associated with a row of a matrix. Each value may be a bit of a respective element in the row of the matrix and each value in the set shares a common place value.
- the read controller provides the first set of values on a set of data lines 122 .
- a multiplier 112 receives the first set of values and multiplies each value of the first set of values by a corresponding value of a second set of values.
- the second set of values may correspond to a column of a vector such that each value is a bit of a respective element in the column and each value in the second set shares a common place value.
- the multiplier 112 produces a third set of values.
- a summing unit 114 sums each element of the third set of values to produce a sum.
- the sum may represent the dot product of the first set of values and the second set of values.
- a shifting unit 116 shifts the sum based on a place value of the first set of values and/or a place value of the second set of values.
- the shifting unit 116 adds the shifted sum to a running total.
- the processes of blocks 502 - 516 are repeated for each row in the matrix. Similar to block 514 , the number of times that the processes are repeated may depend on the number of rows in the matrix, and the method 500 may be adapted to a matrix of any suitable size by merely adjusting the number of iterations. Accordingly, the memory array 108 and matrix processing unit 110 may support a range of suitable sizes without modifications to the underlying hardware.
- the result store 118 may contain a result vector that is the result of the original matrix multiplied by the vector.
- the vector is part of another matrix (a second matrix).
- the processes of blocks 502 - 518 are repeated for each column vector in the second matrix.
- the result store 118 may contain a third matrix that is the result of the original matrix multiplied by the second matrix.
- FIG, 6 is a block diagram of a matrix processor circuit 600 according to some examples of the present disclosure.
- the matrix processor circuit 600 includes a memory array 108 , a multiplier 112 , a summing unit 114 , and a shifting unit 116 , each of which may be similar to those described above in many aspects.
- the memory array 108 may include a plurality of memory cells 120 to store elements of a matrix.
- Each memory cell 120 may include any suitable programmable element, such as the capacitive memory cell 120 of FIG. 2 and/or the restive memory cell 120 of FIG. 3 .
- the memory array 108 may include any number of memory cells 120 arranged to store a matrix of any size, and the number of memory cells 120 may exceed the size and number of bits of the matrix.
- the memory array 108 includes a plurality of data lines 122 coupled to the memory cells 120 to provide values stored in the memory cells 120 . This may be performed substantially as described in block 402 of FIG. 4 and/or blocks 502 - 504 of FIG, 5 .
- the data lines 122 provide a first set of values of the matrix.
- the first set of values may correspond to a row of the matrix, and each value in the first set may be a bit of a corresponding element in the row.
- the values in the first set have a common place value.
- the multiplier 112 is coupled to the data lines 122 and thereby receives the values from the memory array 108 .
- the multiplier 112 may include any combination of hardware and/or software to multiply each value of the first set of values by a corresponding value within a second set of values to produce a third set of values.
- the second set of values may correspond to a column of a vector, and each value in the second set may be a bit of a corresponding element in the column.
- the values in the second set have a common place value. This may be performed substantially as described in block 404 of FIG. 4 and/or block 506 of FIG. 5 .
- the summing unit 114 receives the third set of values from the multiplier 112 and includes any combination of hardware and/or software to sum the third set of values to produce a sum.
- the sum may represent the dot product of the first set of values and the second set of values. This may be performed substantially as described in block 406 of FIG. 4 and/or block 508 of FIG. 5 .
- the shifting unit 116 receives the sum from the summing unit 114 and includes any combination of hardware and/or software to shift the sum.
- the shifting unit 116 shifts the sum based on a place value of the first set of values and/or a place value of the second set of values. This may be performed substantially as described in block 408 of FIG. 4 and/or block 510 of FIG. 5 .
- the shifting unit 116 may then add the shifted sum to a running total. This may be performed substantially as described in block 410 of FIG. 4 and/or block 512 of FIG. 5 .
- the matrix processor circuit 600 may repeat these processes for each set in the row of the matrix and each set in the column of the vector as described in block 514 of FIG. 5 .
- the running total may be stored in a result store 118 as described in block 516 of FIG. 5 .
- the matrix processor circuit 600 may repeat these processes for each row in the matrix as described in block 518 of FIG. 5 .
- the vector is part of a second matrix. In some such examples, the matrix processor circuit 600 repeats these processes for each column in the second matrix as described in block 520 of FIG. 5 .
- the processing circuit 700 includes a memory array 108 coupled to a matrix processing unit 110 , each of which may be similar to those described above in many aspects.
- the array 108 may store a matrix of multi-bit values.
- the matrix may be subdivided into a plurality of matrix sets, where each set corresponds to a row of the matrix and includes a value for each element in the row.
- the values in each set may share a common place value.
- Sets of values may be read from the memory array 108 substantially as described in block 402 of FIG. 4 and/or block 502 of FIG. 5 .
- the unit 110 may include logic to receive each set from the memory array 108 and to multiply each matrix set by a vector set to produce a product.
- the values of the vector set may correspond to a column of a vector, and each value in the vector set may represent a portion of a corresponding element in the column.
- the values in each vector set may have a common place value. This may be performed substantially as described in block 404 of FIG. 4 and/or block 506 of FIG. 5 .
- the matrix processing unit 110 includes logic to sum each element of the product of a matrix set and a vector set to produce a total.
- the total may represent a dot product of the matrix and vector sets. This may be performed substantially as described in block 406 of FIG. 4 and/or block 508 of FIG. 5 .
- the matrix processing unit 110 may include logic to shift the total. The amount that the total is shifted may be based on a place value of the matrix set and/or the vector set. This may be performed substantially as described in block 408 of FIG. 4 and/or block 510 of FIG. 5 . In some examples, the matrix processing unit 110 includes logic to add the shifted total to a running total. This may be performed substantially as described in block 410 of FIG. 4 and/or block 512 of FIG. 5 .
- the matrix processing unit 110 may repeat these processes for each matrix set of the row of the matrix and each vector set in the vector as described in block 514 of FIG. 5 . Similarly, the matrix processing unit may repeat these processes for each row in the matrix as described in block 518 of FIG. 5 .
- the vector is part of a second matrix. In some such examples, the matrix processing unit 110 repeats these processes for each column in the second matrix as described in block 520 of FIG. 5 .
- FIG. 8 is a block diagram of a matrix processor circuit 800 according to some examples of the present disclosure.
- the circuit 800 is suitable for performing method 400 and/or method 500 .
- the matrix processor circuit 800 includes a memory array 802 , which may be substantially similar to those described above in many aspects.
- the memory array 802 includes any number of memory cells 804 , each with a programmable element to store an element of a matrix.
- the cells 804 have been annotated to show storing a matrix that includes elements A-H, each of which has two bits indicated by subscripts, although suitable matrices may be of any size and have any bit-length.
- the memory array 802 has a plurality of sets of ports 806 , where each set of ports 806 is configured to produce values stored within a respective column of the memory cells 804 . Having a plurality of sets of ports 806 may allow multiple columns of memory cells 804 to be read from the memory array 802 concurrently. As a column of memory cells 804 may be used to store a row of a matrix, each set of ports 806 may provide a set of values that correspond to a row of the matrix, with each value in the set being a bit of a corresponding element in the row. This may be performed substantially as described in block 402 of FIG. 4 and/or blocks 502 - 504 of FIG. 5 .
- the sets of ports 806 are coupled to respective multiplier 808 , which may be substantially similar to the multiplier 112 described above.
- the multiplier 808 may include any combination of hardware and/or software to multiply each value provided by the ports by a corresponding value within a set of vector values that correspond to a column of a vector. In an example, the multiplier 808 first multiplies the values [A 1:0 , B 1:0 , C 1:0 , D 1:0 ] by the vector [I 0 , J 0 , K 0 , and L 0 ].
- the multiplier 808 may multiply the values [A 1:0 , B 1:0 , C 1:0 , D 1:0 ] by other bits of the vector (e.g., [I 1 , J 1 , K 1 , and L 1 ]).
- the multiplier combines the respective values from the sets of ports 806 to form multibit values (e.g., A 1:0 , B 1:0 , C 1:0 , D 1:0 , etc.) and multiplies the multibit values by the vector. This may be performed substantially as described in block 404 of FIG. 4 and/or block 506 of FIG. 5 .
- the multiplier 808 may provide the multiplication results to a summing unit 810 , which may be substantially similar to the summing unit 114 described above.
- the summing unit 810 includes any combination of hardware and/or software to sum the values provided by the multiplier 808 .
- the sum may represent the dot product of the values received by the multiplier 808 . This may be performed substantially as described in block 406 of FIG. 4 and/or block 508 of FIG. 5 .
- a shifting unit 812 receives the results of the summing units 810 .
- the shifting unit 812 includes any combination of hardware and/or software to shift the received sum based on a place value of the matrix values received from the ports 806 of the memory array 802 and/or a place value of the set of vector values. This may be performed substantially as described in block 408 of FIG. 4 and/or block 510 of FIG. 5 .
- a totaling unit 814 receives the shifted sums from each of the shifting units 812 and adds the shifted sums to a running total. This may be performed substantially as described in block 410 of FIG, 4 and/or block 512 of FIG. 5 .
- the matrix processor circuit 800 may repeat these processes for each set in the row of the matrix and each set in the column of the vector as described in block 514 of FIG. 5 .
- the running total may be stored in a result store 118 as described in block 516 of FIG. 5 .
- the matrix processor circuit 800 may repeat these processes for each row in the matrix as described in block 518 of FIG. 5 .
- the vector is part of a second matrix.
- the matrix processor circuit 800 repeats these processes for each column in the second matrix as described in block 520 of FIG. 5 .
- the increased read ports 806 and the multi-bit multiplier 808 and summing unit 810 allow multiple columns of the memory array 802 to be read and processed in parallel. In this way, the circuit 800 may increase the computational throughput with a nominal increase in circuit area.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Computational Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Databases & Information Systems (AREA)
- Pure & Applied Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Algebra (AREA)
- Power Engineering (AREA)
- Memory System (AREA)
- Complex Calculations (AREA)
Abstract
A circuit is provided. In an example, the circuit includes a memory array that includes a plurality of memory cells to store a matrix and a plurality of data lines coupled to the plurality of memory cells to provide a first set of values of the matrix. The circuit includes a multiplier coupled to the plurality of data lines to multiply the first set of values by a second set of values to produce a third set of values. A summing unit is included that is coupled to the multiplier to sum the third set of values to produce a sum. The circuit includes a shifting unit coupled to the summing unit to shift the sum and to add the shifted sum to a running total.
Description
- Matrices are arrays of elements of any suitable size, and these elements may represent data, relationships between data values, transformations to be applied to data, and more. Owing to their flexibility and utility, matrices are used in a wide range of real-world applications. In computing examples, matrices are used to store and manipulate sensor data for controlling automated manufacturing, scientific data for research and modeling, demographic data, other statistical data, and other data types. Matrices are also used extensively in computer graphics. For example, a bitmap is a specific type of matrix in which each entry is a pixel value. As a further example, a digital filter for image processing may be represented as a matrix in which each entry is a transformation to be applied to a portion of an image or frame. A wide body of algebraic operations have been developed to manipulate and analyze matrices and their contents, and because they are utilized with such frequency, computing systems may include dedicated hardware for handling matrices and performing these operations.
- Certain examples are described in the following detailed description with reference to the drawings, of which:
-
FIGS. 1A-1D are block diagrams of a circuit at various points in time according to some examples of the present disclosure. -
FIG. 2 is a schematic diagram of a memory array according to some examples of the present disclosure. -
FIG. 3 is a schematic diagram of a resistive memory array according to some examples of the present disclosure. -
FIG. 4 is a flow diagram of a method of performing a matrix operation according to some examples of the present disclosure. -
FIG. 5 is a flow diagram of a method of performing a matrix multiplication operation according to some examples of the present disclosure. -
FIG. 6 is a block diagram of a matrix processor circuit according to some examples of the present disclosure. -
FIG. 7 is a block diagram of a processing circuit according to some examples of the present disclosure. -
FIG. 8 is a block diagram of a matrix processor circuit according to some examples of the present disclosure. - Throughout the drawings, identical reference numbers may designate similar, but not necessarily identical elements.
- Many computing processors operate on matrices and matrix data, and for this purpose, a processor may include dedicated circuitry for operating on matrices. In some examples, a matrix processor includes a memory array to store a matrix of multi-bit elements and circuitry to multiply the matrix by a vector. To perform operations on a row of the matrix, the memory array may be configured to read a subset of the bits that make up a row's elements, where the values in the subset have a common place value (e.g., 20, 21, 22, etc.). In other words, a first subset may include the least significant bit of each element in the row, a second subset may include the next least significant bit of each element in the row, and so on. Each subset may be provided to a multiplier that multiplies the subset's values by a corresponding value from a subset of a vector. In some examples, the resultant products of this multiplication are summed by a summing unit to produce a dot product of the matrix subset and the vector subset. The sum or dot product is provided to a shifting unit where it is shifted according to the place value of the matrix subset and/or the place value of the vector subset and added to a running total. The matrix processor may repeat the process for each subset of each row of the matrix and/or each subset of the vector such that the running totals represent the matrix-vector product. In some such examples, the sets of vector values are part of a second matrix, and the process is repeated to multiply the first matrix by the second matrix.
- The present disclosure provides substantial real-world improvements to the operation and efficiency of a matrix processor. For instance, in some examples, the memory array is structured to store matrices of variable sizes and bit-lengths. This provides additional flexibility without changing the underlying structure of the matrix processor. In some examples, the memory array is structured to support multiple reads between refreshes of the array's memory cells. This may reduce the overhead of maintaining data in the memory array. In some examples, the matrix processor is structured to read and process the matrix a column at a time to parallelize a significant portion of the processing. In some examples, the matrix processor is structured to read multiple columns at a time to further parallelize the matrix operations. Of course, these are merely some of the improvements that the examples described herein may provide.
- These examples and others are described with reference to the following figures. Unless noted otherwise, the figures and their accompanying description are non-limiting, and no element is characteristic of any particular example. In that regard, features from one example may be freely incorporated into other examples without departing from the spirit and scope of the disclosure.
-
FIGS. 1A-1D are block diagrams of acircuit 100 at various points in time according to some examples of the present disclosure. The figures illustrate an example matrix operation performed by thecircuit 100 that includes multiplying amatrix 102 by avector 104 to produce aresult vector 106, and the figures include graphical representations of thematrix 102,vector 104, and resultvector 106 for clarity of explanation. In the example, thematrix 102 includes elements A-H, each of which has two bits indicated by subscripts, although suitable matrices may be of any size and have any bit-length. Theexample vector 104 includes elements I-L, each of which has two bits indicated by subscripts and may also be of any size and bit-length. Theresult vector 106 includes elements M-N, each of which has six bits as determined by thematrix 102 and thevector 104. Of course, in other examples, thematrix 102, thevector 104, and theresult vector 106 may be of any suitable size. - The
circuit 100 that performs the operation includes amemory array 108 communicatively coupled to amatrix processing unit 110 that includes hardware, software, and/or combinations thereof to perform matrix operations and to perform other processing tasks. In some examples, thematrix processing unit 110 includes amultiplier 112, asumming unit 114, ashifting unit 116, and aresult store 118, each of which is described in more detail below. - Referring first to the
memory array 108, thearray 108 includes a plurality ofmemory cells 120 to store a matrix. Eachmemory cell 120 may include any suitable programmable element, and in various examples, thememory cells 120 include Dynamic Random Access Memory (DRAM) elements, Static RAM (SRAM) elements, memristive elements, non-volatile memory elements, flash and/or other solid-state memory elements, magnetic memory elements, etc. Thememory array 108 may include any number ofmemory cells 120 arranged to store a matrix of any size, and the number ofmemory cells 120 may exceed the size and number of bits of the matrix. For clarity, the matrix values are illustrated within theirrespective memory cell 120. As can be seen, the matrix elements may be allocated to thememory cells 120 such that when a column ofmemory cells 120 are read, the values in the column correspond to a row of the matrix (e.g., matrix row A-D) and have the same place value (e.g., 21). - The
memory array 108 includes a set ofports 121 to provide the matrix elements on a set ofdata lines 122 when thememory cells 120 are read. In some examples, thememory array 108 is structured so that when a column ofmemory cells 120 are read, the corresponding values on thedata lines 122 correspond to a row of the matrix and share a common place value. For example, referring toFIG. 1A , a first set of values containing elements A1, B1, C1, and D1 are present on thedata lines 122 during a first read. - The
data lines 122 are coupled to themultiplier 112. Themultiplier 112 includes any combination of hard-coded logic (e.g., CMOS), programmable logic, and/or software to receive the sets of matrix values from thedata lines 122 and multiply the sets of matrix values by sets of vector values. In various examples, thevector 104 is stored in thesame memory array 108 as thematrix 102, stored inanother memory array 108, stored in another suitable programmable structure, and/or is hardcoded for providing to themultiplier 112. - The
multiplier 112 performs an element-by-element multiplication of the matrix sets received from thedata lines 122 by the vector sets. In some examples, themultiplier 112 includes a plurality ofAND devices 124 to multiply each value in a given matrix set by a corresponding value in a given vector set. In this way, themultiplier 112 produces a third set of values. In the example ofFIG. 1A , themultiplier 112 multiplies the first set of values containing elements A1, B1, C1, and D1 by a second set of values that contains elements I1, J1, K1, and L1 to produce the third set of values. - This third set of values is provided to a summing
unit 114. The summingunit 114 includes any combination of hard-coded logic, programmable logic, and/or software to receive the third set of values and to sum the values therein. In some examples, the summingunit 114 includes a Hamming weight engine, which counts the number of non-zero values in the third set in order to sum the binary values. The Hamming weight engine may be implemented as a tree ofadders 126 or other suitable structure. The summingunit 114 produces a sum that represents the dot product of the sets. Referring to the example ofFIG. 1A , the sum represents the dot product of the first set of values containing elements A1, B1, C1, and D1 and the second set of values containing elements I1, J1, K1, and L1. - The summing
unit 114 provides the sum to the shiftingunit 116, which may shift the sum one or more positions based on the place value of the set of matrix values and/or the place value of the set of vector values. In the example ofFIG. 1A , the shiftingunit 116 shifts the sum once based on the place value of the first set of values (21) and once again based on the place value of the second set of values (21). - The shifting
unit 116 may then add the shifted sum to a running total. The running total is updated as each set of thematrix 102 is multiplied by each set of thevector 106. In various examples, the shiftingunit 116 includes any suitable combination of hard-coded logic, programmable logic, and/or software to shift the received sum and to maintain the running total. - The processes may be repeated for each set of matrix values in the row and each set of vector values in the vector. For example, referring to
FIG. 1B , thememory array 108 provides a set of matrix values containing elements A0, B0, C0, and D0 on the data lines 122. Themultiplier 112 receives the set of matrix values and multiplies it by the set of vector values containing elements I1, J1, K1, and L1. The summingunit 114 sums the elements of the resultant set, and the shiftingunit 116 shifts the sum once based the place value of the set of vector values (21) before adding the shifted sum to the running total. - The example continues in
FIG. 1C , where thememory array 108 provides the set of matrix values containing elements A1, B1, C1, and D1 on thedata lines 122 again. In some examples, thememory array 108 is configured to read and provide the set multiple times between refreshes of thememory cells 120 that contain the set. This may reduce the overhead of maintaining data in thememory array 108. - The
multiplier 112 multiplies this set of matrix values by a set of vector values containing elements I0, J0, K0, and L0. The summingunit 114 sums the elements of the resultant set, and the shiftingunit 116 shifts the sum once based the place value of the set of matrix values (21) before adding the shifted sum to the running total. - Finally, referring to
FIG. 1D , thememory array 108 provides the set of matrix values containing elements A0, B0, C0, and D0 on thedata lines 122 again. Themultiplier 112 multiplies this set of matrix values by the set of vector values containing elements I0, J0, K0, and L0. The summingunit 114 sums the elements of the resultant set, and the shiftingunit 116 adds the unshifted sum to the running total based on the place value of the set of matrix values and the set of vector values. In the example, the running total at this point in time represents an element (e.g., element M) of a vector result of the matrix-vector operation, and the running total is provided to aresult store 118 for storing. As with thememory array 108, theresult store 118 may include any suitable programmable element for storing the results. After storing the running total in theresult store 118, the running total may be reset. - The processes may be repeated for each row in the
matrix 102 to determine the remaining element(s) in theresult vector 106. Furthermore, in some examples, the input vector is a subset of a second matrix. In such examples, thecircuit 100 repeats the processes for each row in the first matrix and each column in the second matrix. - As described above, the
memory cells 120 of thememory array 108 may include any suitable programmable elements, some examples of which are described with reference toFIG. 2 . In that regardFIG. 2 is a schematic diagram of amemory array 200 according to some examples of the present disclosure. Thememory array 200 is suitable for use as thememory array 108 ofFIG. 1 and includes any number ofmemory cells 201, each of which may be substantially similar to those described above. While the illustratedmemory array 200 includes a limited number ofmemory cells 201, in other examples, thememory array 200 includes any number ofmemory cells 201. - Each
memory cell 201 includes a programmable element such as acapacitor 202. Thecapacitor 202 may store a charge, and a magnitude or presence of the charge may represent a binary value of a corresponding element of amatrix 102. - To read and write values to the
cells 201, eachmemory cell 201 may be coupled to a plurality of control lines within thememory array 200. For example, thememory array 200 may include a set of row write enablelines 204 and a set of column writelines 206. Eachmemory cell 201 may include anaccess transistor 208 with a source and drain coupled between thecapacitor 202 and a correspondingcolumn write line 206. The gate of theaccess transistor 208 is coupled to a corresponding row write enableline 204. Awrite controller 210 of thememory array 200 may program the memory cells 201 a row at a time by enabling the row write enableline 204 for the row and by applying voltages on the column writelines 206 corresponding to the values to be stored in thememory cells 201. - In some examples, the control lines include a set of column read enable
lines 212 and a set of row readlines 214. Eachmemory cell 201 may include apass transistor 216 with a source and drain coupled between a corresponding column read enableline 212 and a corresponding row readline 214. The gate of thepass transistor 216 is coupled to thecapacitor 202 such that the mode of operation of thepass transistor 216 is determined by the magnitude of the charge stored in thecapacitor 202. Aread controller 218 of thememory array 200 may read the memory cells 201 a column at a time by providing a voltage on the corresponding column read enableline 212 and detecting voltage and/or current on the set of row readlines 214. - The
read controller 218 may amplify, smooth, and/or otherwise shape the signals on the row readlines 214 before providing them atports 220 of thememory array 200. Theports 220 produce signals on thedata lines 222 that represent the values in thecorresponding memory cells 201 at voltages and forms suitable for use in driving other circuitry. In some examples, because thecapacitor 202 drives the gate of thepass transistor 216, thecapacitor 202 charge is not significantly dissipated by the reading process and multiple reads of amemory cell 201 may be performed without refreshing thecapacitor 202 charge. - Further examples of
suitable memory arrays 108 of thecircuit 100 are described with reference toFIG. 3 , which is a schematic diagram of aresistive memory array 300 according to some examples of the present disclosure. Thememory array 300 is suitable for use as thememory array 108 ofFIG. 1 and includes any number ofmemory cells 301, each of which may be substantially similar to those described above. - Each
memory cell 301 includes aresistive memory device 302, such as a memristor. Aresistive memory device 302 may have more than one resistance/conductance state, and the state of the device may be used to store data. In some examples, the conductive state of a resistive memory device represents a binary value of a corresponding element of thematrix 102. - The conductive state of the
resistive memory device 302 may be set by applying a voltage that exceeds a write threshold to theresistive memory device 302. Accordingly, theresistive memory device 302 may be coupled to a row read/write line 304 and coupled by anaccess transistor 306 to acolumn write line 308. The gate of theaccess transistor 306 may be coupled to a column write enableline 310. Awrite controller 312 of thememory array 300 may program the state of the resistive memory devices 302 a row at a time by applying a voltage differential across the row read/write line 304 and each of the column writelines 308 that exceeds the write threshold of theresistive memory devices 302. Thewrite controller 312 selects whichcells 301 in the row to write by selectively enabling the column write enablelines 310 to activate therespective access transistors 306. - The state of a
resistive memory device 302 may be read by applying a voltage less than the write threshold to theresistive memory device 302 and detecting the current induced by the voltage. As noted above, theresistive memory device 302 may be coupled to the row read/write line 304. Theresistive memory device 302 may be coupled to a row readline 314 by apass transistor 316 with a gate of thepass transistor 316 coupled to a column read enableline 318. In such examples, aread controller 320 of thememory array 300 may read the memory cells 301 a column at a time by enabling the column read enablelines 318, applying a voltage differential across the each of the row read/write lines 304 and each of the row readlines 314, and detecting voltage and/or current on the set of row read/write lines 304 and/or the set of row readlines 314 due to the applied voltage differential. Theread controller 320 may amplify, smooth, shape, and/or otherwise translate the detected voltage or current to produce signals at theports 322 and on thedata lines 324 that represent the values in thecorresponding memory cells 301 at voltages and forms suitable for use in driving other circuitry. - Examples of performing a matrix operation using the
circuit 100 are described with reference toFIG. 4 , which is a flow diagram of amethod 400 of performing the matrix operation according to some examples of the present disclosure. The description of themethod 400 is non-limiting, and steps may be added to and omitted from themethod 400 without departing from the disclosure. Furthermore, unless noted otherwise, processes of themethod 400 may be performed in any order including being performed concurrently by one or more entities. - Referring to block 402, a first set of values of a matrix are read from a
memory array 108. Referring to block 404 ofFIG. 4 , the first set of values of the matrix are multiplied by a second set of values of a vector to provide a third set of values. Referring to block 406 ofFIG. 4 , the elements of the third set of values are summed to produce a sum. - Referring to block 408, the sum is shifted based on a place value of the first set of values or a place value of the second set of values, and referring to block 410, the shifted sum is added to a running total.
- Referring now to
FIG. 5 , illustrated is a flow diagram of amethod 500 of performing a matrix multiplication operation according to some examples of the present disclosure. The description of themethod 500 is non-limiting, and steps may be added to and omitted from themethod 500 without departing from the disclosure. Furthermore, unless noted otherwise, processes of themethod 500 may be performed in any order including being performed concurrently by one or more entities. Themethod 500 is suitable for performing using thecircuit 100 ofFIGS. 1A-1D, 2, and 3 and/or any other suitable circuit or device. - Referring to block 502, a read controller of a
memory array 108 reads a column of the array'smemory cells 120 to produce a first set of values associated with a row of a matrix. Each value may be a bit of a respective element in the row of the matrix and each value in the set shares a common place value. Referring to block 504, the read controller provides the first set of values on a set of data lines 122. - Referring to block 506, a
multiplier 112 receives the first set of values and multiplies each value of the first set of values by a corresponding value of a second set of values. The second set of values may correspond to a column of a vector such that each value is a bit of a respective element in the column and each value in the second set shares a common place value. In multiplying the first set of values by the second set of values, themultiplier 112 produces a third set of values. - Referring to block 508, a summing
unit 114 sums each element of the third set of values to produce a sum. The sum may represent the dot product of the first set of values and the second set of values. Referring to block 510, a shiftingunit 116 shifts the sum based on a place value of the first set of values and/or a place value of the second set of values. Referring to block 512, the shiftingunit 116 adds the shifted sum to a running total. - Referring to block 514, the processes of blocks 502-512 are repeated for each combination of sets in the row of the matrix and sets in the column of the vector. The number of times that the processes are repeated may depend on the bit-length of the elements in the row of the matrix and the bit-length of the elements in the row of the vector, and the
method 500 may be adapted to a matrix and vector of any suitable bit-length by merely adjusting the number of iterations. Thus, thememory array 108 and thematrix processing unit 110 may support a range of suitable bit-lengths without modifications to the underlying hardware. Once the processes have been repeated for each combination, the running total is stored in aresult store 118 as shown inblock 516. The running total maintained by the shiftingunit 116 may be reset after it is stored in theresult store 118. - Referring to block 518, the processes of blocks 502-516 are repeated for each row in the matrix. Similar to block 514, the number of times that the processes are repeated may depend on the number of rows in the matrix, and the
method 500 may be adapted to a matrix of any suitable size by merely adjusting the number of iterations. Accordingly, thememory array 108 andmatrix processing unit 110 may support a range of suitable sizes without modifications to the underlying hardware. At the conclusion ofblock 518, theresult store 118 may contain a result vector that is the result of the original matrix multiplied by the vector. - In some examples, the vector is part of another matrix (a second matrix). In some such examples, referring to block 520, the processes of blocks 502-518 are repeated for each column vector in the second matrix. At the conclusion of
block 520, theresult store 118 may contain a third matrix that is the result of the original matrix multiplied by the second matrix. - Further examples of suitable circuits for performing
method 400 and/ormethod 500 are described with reference to FIG, 6, which is a block diagram of amatrix processor circuit 600 according to some examples of the present disclosure. Thematrix processor circuit 600 includes amemory array 108, amultiplier 112, a summingunit 114, and a shiftingunit 116, each of which may be similar to those described above in many aspects. - For example, the
memory array 108 may include a plurality ofmemory cells 120 to store elements of a matrix. Eachmemory cell 120 may include any suitable programmable element, such as thecapacitive memory cell 120 ofFIG. 2 and/or therestive memory cell 120 ofFIG. 3 . Thememory array 108 may include any number ofmemory cells 120 arranged to store a matrix of any size, and the number ofmemory cells 120 may exceed the size and number of bits of the matrix. Thememory array 108 includes a plurality ofdata lines 122 coupled to thememory cells 120 to provide values stored in thememory cells 120. This may be performed substantially as described inblock 402 ofFIG. 4 and/or blocks 502-504 of FIG, 5. In an example, thedata lines 122 provide a first set of values of the matrix. The first set of values may correspond to a row of the matrix, and each value in the first set may be a bit of a corresponding element in the row. The values in the first set have a common place value. - The
multiplier 112 is coupled to thedata lines 122 and thereby receives the values from thememory array 108. Themultiplier 112 may include any combination of hardware and/or software to multiply each value of the first set of values by a corresponding value within a second set of values to produce a third set of values. The second set of values may correspond to a column of a vector, and each value in the second set may be a bit of a corresponding element in the column. The values in the second set have a common place value. This may be performed substantially as described inblock 404 ofFIG. 4 and/or block 506 ofFIG. 5 . - The summing
unit 114 receives the third set of values from themultiplier 112 and includes any combination of hardware and/or software to sum the third set of values to produce a sum. The sum may represent the dot product of the first set of values and the second set of values. This may be performed substantially as described inblock 406 ofFIG. 4 and/or block 508 ofFIG. 5 . - The shifting
unit 116 receives the sum from the summingunit 114 and includes any combination of hardware and/or software to shift the sum. The shiftingunit 116 shifts the sum based on a place value of the first set of values and/or a place value of the second set of values. This may be performed substantially as described inblock 408 ofFIG. 4 and/or block 510 ofFIG. 5 . The shiftingunit 116 may then add the shifted sum to a running total. This may be performed substantially as described inblock 410 ofFIG. 4 and/or block 512 ofFIG. 5 . - The
matrix processor circuit 600 may repeat these processes for each set in the row of the matrix and each set in the column of the vector as described inblock 514 ofFIG. 5 . When each set in the row of the matrix and each set in the column of the vector have been multiplied, the running total may be stored in aresult store 118 as described inblock 516 ofFIG. 5 . Furthermore, thematrix processor circuit 600 may repeat these processes for each row in the matrix as described inblock 518 ofFIG. 5 . In some examples, the vector is part of a second matrix. In some such examples, thematrix processor circuit 600 repeats these processes for each column in the second matrix as described inblock 520 ofFIG. 5 . - Referring now to
FIG. 7 , illustrated is a block diagram of aprocessing circuit 700 according to some examples of the present disclosure. Theprocessing circuit 700 includes amemory array 108 coupled to amatrix processing unit 110, each of which may be similar to those described above in many aspects. - With respect to the
memory array 108, thearray 108 may store a matrix of multi-bit values. The matrix may be subdivided into a plurality of matrix sets, where each set corresponds to a row of the matrix and includes a value for each element in the row. The values in each set may share a common place value. Sets of values may be read from thememory array 108 substantially as described inblock 402 ofFIG. 4 and/or block 502 ofFIG. 5 . - With respect to the
matrix processing unit 110, theunit 110 may include logic to receive each set from thememory array 108 and to multiply each matrix set by a vector set to produce a product. The values of the vector set may correspond to a column of a vector, and each value in the vector set may represent a portion of a corresponding element in the column. The values in each vector set may have a common place value. This may be performed substantially as described inblock 404 ofFIG. 4 and/or block 506 ofFIG. 5 . - In some examples, the
matrix processing unit 110 includes logic to sum each element of the product of a matrix set and a vector set to produce a total. The total may represent a dot product of the matrix and vector sets. This may be performed substantially as described inblock 406 ofFIG. 4 and/or block 508 ofFIG. 5 . - The
matrix processing unit 110 may include logic to shift the total. The amount that the total is shifted may be based on a place value of the matrix set and/or the vector set. This may be performed substantially as described inblock 408 ofFIG. 4 and/or block 510 ofFIG. 5 . In some examples, thematrix processing unit 110 includes logic to add the shifted total to a running total. This may be performed substantially as described inblock 410 ofFIG. 4 and/or block 512 ofFIG. 5 . - The
matrix processing unit 110 may repeat these processes for each matrix set of the row of the matrix and each vector set in the vector as described inblock 514 ofFIG. 5 . Similarly, the matrix processing unit may repeat these processes for each row in the matrix as described inblock 518 ofFIG. 5 . In some examples, the vector is part of a second matrix. In some such examples, thematrix processing unit 110 repeats these processes for each column in the second matrix as described inblock 520 ofFIG. 5 . - In further examples, the circuit is structured to read more than one column from the memory array concurrently and to process the values in a multi-bit value.
FIG. 8 is a block diagram of amatrix processor circuit 800 according to some examples of the present disclosure. Thecircuit 800 is suitable for performingmethod 400 and/ormethod 500. - The
matrix processor circuit 800 includes amemory array 802, which may be substantially similar to those described above in many aspects. Thememory array 802 includes any number ofmemory cells 804, each with a programmable element to store an element of a matrix. InFIG. 8 , thecells 804 have been annotated to show storing a matrix that includes elements A-H, each of which has two bits indicated by subscripts, although suitable matrices may be of any size and have any bit-length. - The
memory array 802 has a plurality of sets ofports 806, where each set ofports 806 is configured to produce values stored within a respective column of thememory cells 804. Having a plurality of sets ofports 806 may allow multiple columns ofmemory cells 804 to be read from thememory array 802 concurrently. As a column ofmemory cells 804 may be used to store a row of a matrix, each set ofports 806 may provide a set of values that correspond to a row of the matrix, with each value in the set being a bit of a corresponding element in the row. This may be performed substantially as described inblock 402 ofFIG. 4 and/or blocks 502-504 ofFIG. 5 . - The sets of
ports 806 are coupled torespective multiplier 808, which may be substantially similar to themultiplier 112 described above. Themultiplier 808 may include any combination of hardware and/or software to multiply each value provided by the ports by a corresponding value within a set of vector values that correspond to a column of a vector. In an example, themultiplier 808 first multiplies the values [A1:0, B1:0, C1:0, D1:0] by the vector [I0, J0, K0, and L0]. Subsequently themultiplier 808 may multiply the values [A1:0, B1:0, C1:0, D1:0] by other bits of the vector (e.g., [I1, J1, K1, and L1]). The multiplier combines the respective values from the sets ofports 806 to form multibit values (e.g., A1:0, B1:0, C1:0, D1:0, etc.) and multiplies the multibit values by the vector. This may be performed substantially as described inblock 404 ofFIG. 4 and/or block 506 ofFIG. 5 . - The
multiplier 808 may provide the multiplication results to a summingunit 810, which may be substantially similar to the summingunit 114 described above. The summingunit 810 includes any combination of hardware and/or software to sum the values provided by themultiplier 808. The sum may represent the dot product of the values received by themultiplier 808. This may be performed substantially as described inblock 406 ofFIG. 4 and/or block 508 ofFIG. 5 . - A shifting
unit 812 receives the results of the summingunits 810. The shiftingunit 812 includes any combination of hardware and/or software to shift the received sum based on a place value of the matrix values received from theports 806 of thememory array 802 and/or a place value of the set of vector values. This may be performed substantially as described inblock 408 ofFIG. 4 and/or block 510 ofFIG. 5 . - A totaling unit 814 receives the shifted sums from each of the shifting
units 812 and adds the shifted sums to a running total. This may be performed substantially as described inblock 410 of FIG, 4 and/or block 512 ofFIG. 5 . - The
matrix processor circuit 800 may repeat these processes for each set in the row of the matrix and each set in the column of the vector as described inblock 514 ofFIG. 5 . When each set in the row of the matrix and each set in the column of the vector have been multiplied, the running total may be stored in aresult store 118 as described inblock 516 ofFIG. 5 . Furthermore, thematrix processor circuit 800 may repeat these processes for each row in the matrix as described inblock 518 ofFIG. 5 . - In some examples, the vector is part of a second matrix. In some such examples, the
matrix processor circuit 800 repeats these processes for each column in the second matrix as described inblock 520 ofFIG. 5 . - The increased read
ports 806 and themulti-bit multiplier 808 and summingunit 810 allow multiple columns of thememory array 802 to be read and processed in parallel. In this way, thecircuit 800 may increase the computational throughput with a nominal increase in circuit area. - In the foregoing description, numerous details are set forth to provide an understanding of the subject disclosed herein. However, implementations may be practiced without some or all of these details. Other implementations may include modifications and variations from the details discussed above. It is intended that the appended claims cover such modifications and variations.
Claims (20)
1. A circuit comprising:
a memory array including:
a plurality of memory cells to store a matrix; and
a plurality of data lines coupled to the plurality of memory cells to provide a first set of values of the matrix;
a multiplier coupled to the plurality of data lines to multiply the first set of values by a second set of values to produce a third set of values;
a summing unit coupled to the multiplier to sum the third set of values to produce a sum; and
a shifting unit coupled to the summing unit to shift the sum and to add the shifted sum to a running total.
2. The circuit of claim 1 , wherein each of the plurality of memory cells includes:
a capacitor;
an access transistor coupled to the capacitor to program the capacitor; and
a pass transistor having a gate coupled to the capacitor to read a value of the matrix that is stored in the capacitor, wherein the value is provided to a corresponding data line of the plurality of data lines.
3. The circuit of claim 2 , wherein the access transistor of each of the plurality of memory cells is coupled between the capacitor and a column write line to program the memory array a row at a time.
4. The circuit of claim 3 , wherein the pass transistor of each of the plurality of memory cells is coupled to a row read line to read the memory array a column at a time.
5. The circuit of claim 1 , wherein each of the plurality of memory cells includes:
a resistive memory device;
an access transistor coupled to the resistive memory device to program the resistive memory device; and
a pass transistor coupled to the resistive memory device to read a value of the matrix that is encoded in the resistive memory device, wherein the value is provided to a corresponding data line of the plurality of data lines.
6. The circuit of claim 1 , wherein the memory array, the multiplier, the summing unit, and the shifting unit are to multiply the matrix by a vector that includes the second set of values.
7. The circuit of claim 1 , wherein the matrix includes a plurality of multi-bit values, and wherein the first set of values of the matrix have a common place value.
8. The circuit of claim 1 , wherein the shifting unit is to shift the sum based on a place value of the first set of values or a place value of the second set of values.
9. A circuit comprising:
a memory array to store a matrix that includes a plurality of multi-bit values, wherein the plurality of multi-bit values includes a plurality of matrix sets, and wherein each set of the plurality of matrix sets contains values having a common place value; and
a matrix processing unit coupled to the memory array and including logic to, for each set of the plurality of matrix sets:
receive the respective set;
multiply the respective set by a vector set;
sum each element of the product of the respective set and the vector set to produce a total;
shift the total; and
add the shifted total to a running total.
10. The circuit of claim 9 , wherein the memory array stores the matrix in a plurality of memory cells, and wherein each memory cell includes:
a capacitor;
an access transistor coupled to the capacitor to program the capacitor; and
a pass transistor having a gate coupled to the capacitor to read a value of the matrix that is stored in the capacitor.
11. The circuit of claim 10 , wherein the access transistor of each of the plurality of memory cells is coupled between the capacitor and a column line to program the memory array a row at a time.
12. The circuit of claim 11 , wherein the pass transistor of each of the plurality of memory cells is coupled to a row line to read the memory array a column at a time.
13. The circuit of claim 9 , wherein the memory array stores the matrix in a plurality of memory cells, and wherein each of the plurality of memory cells includes:
a resistive memory device;
an access transistor coupled to the resistive memory device to program the resistive memory device; and
a pass transistor coupled to the resistive memory device to read a value of the matrix that is encoded in the resistive memory device.
14. The circuit of claim 9 , wherein the vector multiplication unit is to shift the total based on a place value of the respective set of the plurality of matrix sets or a place value of the vector set.
15. A method comprising:
reading a first set of values of a matrix from a memory array;
multiplying the first set of values of the matrix by a second set of values of a vector to provide a third set of values;
summing the third set of values to produce a sum;
shifting the sum based on a place value of the first set of values or a place value of the second set of values; and
adding the shifted sum to a running total.
16. The method of claim 15 comprising writing the matrix to the memory array a row at a time.
17. The method of claim 16 , wherein the reading of the first set of values reads from the memory array a column at a time.
18. The method of claim 15 , wherein the matrix includes a plurality of multi-bit values, and wherein the first set of values of the matrix have a common place value.
19. The method of claim 15 , wherein the memory array includes a plurality of capacitors that store the matrix, and wherein the reading of the first set of values retains the first set of values within the plurality of capacitors.
20. The method of claim 15 , wherein the memory array includes a plurality of resistive memory devices that store the matrix.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US16/052,516 US20180341623A1 (en) | 2017-04-28 | 2018-08-01 | Matrix circuits |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US15/581,110 US10055383B1 (en) | 2017-04-28 | 2017-04-28 | Matrix circuits |
| US16/052,516 US20180341623A1 (en) | 2017-04-28 | 2018-08-01 | Matrix circuits |
Related Parent Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US15/581,110 Continuation US10055383B1 (en) | 2017-04-28 | 2017-04-28 | Matrix circuits |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| US20180341623A1 true US20180341623A1 (en) | 2018-11-29 |
Family
ID=63143959
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US15/581,110 Active US10055383B1 (en) | 2017-04-28 | 2017-04-28 | Matrix circuits |
| US16/052,516 Abandoned US20180341623A1 (en) | 2017-04-28 | 2018-08-01 | Matrix circuits |
Family Applications Before (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US15/581,110 Active US10055383B1 (en) | 2017-04-28 | 2017-04-28 | Matrix circuits |
Country Status (1)
| Country | Link |
|---|---|
| US (2) | US10055383B1 (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TWI737511B (en) * | 2020-09-18 | 2021-08-21 | 旺宏電子股份有限公司 | Memory array and memory structure |
| TWI869441B (en) * | 2019-08-29 | 2025-01-11 | 英商Arm股份有限公司 | Refactoring mac operations |
Families Citing this family (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111542826A (en) * | 2017-12-29 | 2020-08-14 | 斯佩罗设备公司 | Digital architecture supporting analog coprocessors |
| US10242737B1 (en) * | 2018-02-13 | 2019-03-26 | Macronix International Co., Ltd. | Device structure for neuromorphic computing system |
| US10489483B1 (en) * | 2018-09-21 | 2019-11-26 | National Technology & Engineering Solutions Of Sandia, Llc | Circuit arrangement and technique for setting matrix values in three-terminal memory cells |
| US20210073317A1 (en) | 2019-09-05 | 2021-03-11 | International Business Machines Corporation | Performing dot product operations using a memristive crossbar array |
| KR102810470B1 (en) * | 2019-09-20 | 2025-05-19 | 에스케이하이닉스 주식회사 | Semiconductor device performing a mac operation |
| US11244718B1 (en) * | 2020-09-08 | 2022-02-08 | Alibaba Group Holding Limited | Control of NAND flash memory for al applications |
| KR20230020274A (en) * | 2021-08-03 | 2023-02-10 | 에스케이하이닉스 주식회사 | Semiconductor memory apparatus and operating method of semiconductor memory apparatus |
| DE102021124870B4 (en) * | 2021-09-27 | 2023-04-06 | Infineon Technologies Ag | INTEGRATED RADAR SIGNAL PROCESSING CIRCUIT |
Family Cites Families (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5032865A (en) * | 1987-12-14 | 1991-07-16 | General Dynamics Corporation Air Defense Systems Div. | Calculating the dot product of large dimensional vectors in two's complement representation |
| JP2945487B2 (en) * | 1990-12-26 | 1999-09-06 | 株式会社日立製作所 | Matrix multiplier |
| WO2009037684A2 (en) | 2007-09-19 | 2009-03-26 | Provost Fellows And Scholars Of The College Of The Holy And Undivided Trinity Of Queen Elizabeth Near Dublin | Sparse matrix by vector multiplication |
| US8352847B2 (en) | 2009-12-02 | 2013-01-08 | Lsi Corporation | Matrix vector multiplication for error-correction encoding and the like |
| WO2014105154A1 (en) * | 2012-12-24 | 2014-07-03 | Intel Corporation | Systems, methods, and computer program products for performing mathematical operations |
| US9489618B2 (en) | 2014-05-27 | 2016-11-08 | Purdue Research Foudation | Electronic comparison systems |
| US10019229B2 (en) * | 2014-07-02 | 2018-07-10 | Via Alliance Semiconductor Co., Ltd | Calculation control indicator cache |
| US10068652B2 (en) | 2014-09-03 | 2018-09-04 | Micron Technology, Inc. | Apparatuses and methods for determining population count |
| US9898253B2 (en) | 2015-03-11 | 2018-02-20 | Micron Technology, Inc. | Division operations on variable length elements in memory |
-
2017
- 2017-04-28 US US15/581,110 patent/US10055383B1/en active Active
-
2018
- 2018-08-01 US US16/052,516 patent/US20180341623A1/en not_active Abandoned
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TWI869441B (en) * | 2019-08-29 | 2025-01-11 | 英商Arm股份有限公司 | Refactoring mac operations |
| TWI737511B (en) * | 2020-09-18 | 2021-08-21 | 旺宏電子股份有限公司 | Memory array and memory structure |
Also Published As
| Publication number | Publication date |
|---|---|
| US10055383B1 (en) | 2018-08-21 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10055383B1 (en) | Matrix circuits | |
| US11934798B2 (en) | Counter-based multiplication using processing in memory | |
| US11562229B2 (en) | Convolution accelerator using in-memory computation | |
| TWI815312B (en) | Memory device, compute in memory device and method | |
| KR20180004026A (en) | Vector-matrix multiplications involving negative values | |
| CN110729011B (en) | In-Memory Computing Device for Neural-Like Networks | |
| CN111128279A (en) | Memory computing chip based on NAND Flash and control method thereof | |
| JP7561906B2 (en) | MEMORY SYSTEM AND METHOD FOR OPERATING A MEMORY SYSTEM - Patent application | |
| CN110390074B (en) | Computing system of resistance type memory | |
| US20250094126A1 (en) | In-memory computation circuit and method | |
| US20200175355A1 (en) | Neural network accelerator with systolic array structure | |
| US10853066B1 (en) | Memory processing units and methods of computing DOT products including zero bit skipping | |
| US12009021B2 (en) | Device and method for performing matrix operation | |
| US10340001B2 (en) | Single-readout high-density memristor crossbar | |
| US10902087B2 (en) | Device and method for accelerating matrix multiply operations as a sum of outer products | |
| US20200133992A1 (en) | Device and method for accelerating matrix multiply operations | |
| US20250362875A1 (en) | Compute-in-memory devices and methods of operating the same | |
| KR20190114208A (en) | In DRAM Bitwise Convolution Circuit for Low Power and Fast Computation | |
| US20230057756A1 (en) | Crossbar Mapping Of DNN Weights | |
| TWI815502B (en) | Memory device and compute-in-memory method | |
| US10754581B2 (en) | Identifying outlying values in matrices | |
| US12032959B2 (en) | Non-volatile memory die with latch-based multiply-accumulate components | |
| JP3860548B2 (en) | Image processing apparatus and image processing method | |
| KR20250040544A (en) | Multi-mode compute-in-memory systems and methods for operating the same | |
| US20190286685A1 (en) | Arithmetic processing device |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |