Disclosure of Invention
The invention aims to provide a multi-bit memory inner product and exclusive OR unit and a technical scheme of an exclusive OR vector thereof, firstly provides an implementation mode based on a FeFET, and improves indexes such as energy consumption, search delay, area and the like compared with the existing only work.
In order to achieve the purpose, the invention provides the following scheme:
a multi-bit memory inner product and exclusive-OR unit comprises N1 FeFET1R structures connected in parallel, an input transistor, a first inverter and a second inverter, wherein N is a natural number larger than 1, the 1FeFET1R structure comprises FeFETs and resistors which are electrically connected, the resistor of each 1FeFET1R structure is electrically connected with the input transistor, the gate of the input transistor is electrically connected with the gate of the FeFET in one 1FeFET1R structure through the first inverter, and the gate of the FeFET in the 1FeFET1R structure is electrically connected with the gate of the FeFET in the other 1FeFET1R structure through the second inverter.
Further, the resistance of the resistor in each 1FeFET1R structure is different, forming a series of output currents as a series of binary 2 N-1 ,2 N-2 ,…,2 1 ,2 0 And a memory unit.
Further, the 1FeFET1R structure has a resistor electrically connected to the drain or source of the FeFET.
Further, the input transistor operates in a linear region, which maps the weight of the vector elements to a voltage and inputs to the gate of the corresponding FeFET.
Further, the first inverter is used to input complementary values of vector elements.
Further, the second inverter is used for storing complementary values of two corresponding FeFETs.
The invention also provides a multi-bit internal product and exclusive or vector, which comprises M multi-bit internal product and exclusive or units, wherein the M multi-bit internal product and exclusive or units are connected in parallel.
The present invention further provides an operation method of the multi-bit memory inner product and xor vector, including:
s1, each vector element of the stored vector is stored into the multi-bit memory inner product and exclusive OR unit, and the specific storage method is as follows: each vector element of the stored vector is binary, and according to the binary value of the vector element to be input, if the binary value is '1', a high voltage is input to the corresponding FeFET grid electrode, so that the FeFET is stored in '1'; if '0', a low voltage is input to the gate of the corresponding FeFET, so that the FeFET is stored in '0', and simultaneously, the low voltage is stored in another exclusive-OR 1FeFET1R structure through an inverter
S2 storing the vector elements of the vector into the multi-bit memory inner product and xor unit, when the vector is queried, the following operations are performed:
s2.1, applying the vector elements of the query vector to the grid of an input transistor in the multi-bit memory inner product and exclusive OR unit in a voltage mode; meanwhile, the vector elements of the query vector correspond to the 1FeFET1R structure through the first inverter;
s2.2, for realizing the multi-bit inner product function, the grid of each FeFET simultaneously inputs high voltage, the characteristic that the FeFETs can realize AND is utilized, and when the stored value is '0', the output is '0'; when the stored value is '1', the output is '1';
s2.3, for realizing the multi-bit function, the two inverters are in a turn-off state; for realizing the exclusive-or function, the two inverters are connected with a power supply and are in a working state, and low voltage is input to the grid electrodes of the first N-1 FeFETs at the same time, namely the low voltage is stored into '0'.
The invention has the following beneficial effects:
the invention provides a unit and a vector thereof based on a nonvolatile memory device and simultaneously supporting inner product and exclusive OR of a multi-bit memory for the first time, and the unit and the vector thereof have better performance on three indexes of search energy consumption, search delay and area.
Detailed Description
The invention is described in further detail below with reference to the figures and specific examples.
Referring to fig. 1-6, a multi-bit memory inner product and exclusive-or unit includes N1 FeFET1R structures 1 connected in parallel, an input transistor 2, a first inverter 3, and a second inverter 4, where N is a natural number greater than 1, the 1FeFET1R structure 1 includes fefets 100 and resistors 101, the resistors 101 are electrically connected to drains or sources of the fefets 100, the resistors 101 of each 1FeFET1R structure 1 are electrically connected to the input transistor 2, a gate of the input transistor 2 is electrically connected to a gate of the FeFET100 in one 1FeFET1R structure 1 through the first inverter 3, and a gate of the FeFET100 in the 1FeFET1R structure 1 is electrically connected to a gate of the FeFET100 in the other 1FeFET1R structure 1 through the second fet 4.
For forming an inner product unit with N +1 bits to work in a bit memory inner product mode, only a 1FeFET1R structure 1 needs to be added to an N bit structure, and a resistor 101 of a 1FeFET1R structure 1 needs to have 2 N Multiple or 2 -1 Multiple saturated drain-source current. Therefore, the resistance of the resistor 101 in each 1FeFET1R structure 1 is different, and a series of output currents are formed as a series of binary 2 N-1 ,2 N-2 ,…,2 1 ,2 0 And a memory unit.
Wherein the input transistor 2 operates in a linear region, which maps the weight of the vector elements to a voltage and inputs it to the gate of the corresponding FeFET 100.
Wherein the first inverter 3 is used for inputting a complementary value.
Wherein the second inverter 4 stores complementary values for two corresponding fefets 100.
Referring to fig. 2, the present invention further provides a multi-bit in-memory product and xor vector, which includes M multi-bit in-memory product and xor units C as described above, wherein the M multi-bit in-memory product and xor units C are connected in parallel to form a vector having M vector elements.
A method for operating the multi-bit in-memory product and xor vector as described above, comprising:
s1, each vector element of the stored vector is stored into the multi-bit memory inner product and exclusive OR unit, and the specific storage method is as follows: each vector element stored in the vector is binary, for example, an in-memory product unit of N-4 bits, W-W
3 w
2 w
1 w
0 High position is w
3 Is shown by 2
3 (ii) a Low position is w
0 Is shown by 2
0 . Inputting high voltage to the corresponding FeFET gate according to the binary value of the vector element to be input, and if the binary value is '1', storing the FeFET into '1'; if '0', a low voltage is input to the gate of the corresponding FeFET, so that the FeFET is stored in '0'. At the same time, another exclusive-or 1FeFET1R structure is stored in through an inverter
S2 storing the vector elements of the vector into the multi-bit memory inner product and xor unit, when the vector is queried, the following operations are performed:
s2.1, applying the vector elements of the query vector to the grid of an input transistor in the multi-bit memory inner product and exclusive OR unit in a voltage mode; meanwhile, the vector elements of the query vector correspond to the 1FeFET1R structure through the first inverter.
S2.2, for realizing the multi-bit inner product function, the grid of each FeFET simultaneously inputs high voltage, the characteristic that the FeFETs can realize AND is utilized, and when the stored value is '0', the output is '0'; when the stored value is '1', the output is '1'.
S2.3, for realizing the multi-bit function, the two inverters are in a turn-off state; for the exclusive-OR function, the two inverters are connected to the power supply and are in working state, and the gates of the first N-1 (i.e. V3 to V1 in FIG. 2) FeFETs simultaneously input low voltage, i.e. store into '0', so that only the rightmost two 1 FeFETs 1R of the cell work.
Unit application and architecture simulation operation flow description
Multiple bit memoryThe vector composed of the product and exclusive or unit calculates the cosine calculating circuit input; as shown in fig. 1, taking N-4 bits as an example, the transistors of each memory cell in the memory array are connected to form a vector containing M vector elements. This inner product result is copied by the current mirror as an input to the cosine calculation circuit. While the memory array on the right of FIG. 1 is used to calculate L for each cosine value
2 Norm, i.e. denominator of the cosine expression; and the output of the cosine calculation circuit is processed by a Winner-Take-All circuit to find out the storage vector with the maximum cosine distance with the query vector. The expression of the cosine calculating circuit is as follows:
the specific operation process of the multi-bit memory inner product and XOR unit is as follows
1. Before searching begins, inputting a storage vector to each multi-bit memory inner product and exclusive OR unit; v [3] by each multibit memory inner product and XOR unit, for example, 4 bits N]~V[0]Separately writing w
3 、w
2 、w
1 、w
0 (ii) a While another 1FeFET1R is stored through an inverter
'0' is written with a-4V voltage pulse and '1' is written with a +4V voltage pulse. After writing the vector elements, the search process can begin.
2.1, during searching, when the unit works in a multi-bit memory inner product mode, each 1FeFET1R for realizing the multi-bit memory inner product, namely V3-V0 (figure 2), is written by a +4V voltage pulse, namely '1'; at the same time, the input is input by the gate of the input transistor (fig. 2). Selecting a voltage between 0V and 1.2V according to the magnitude of the input vector element value.
2.2, during the search, when the cell is operating in XOR mode, the first N-1 bits of 1FeFET1R, i.e., V3-V1 (FIG. 2), which implements the inner product for multiple bits, are written with a-4V voltage pulse, i.e., a '0' is written.
The function and effect of the invention are further illustrated and shown by the following simulation experiments:
1. simulation conditions
Experiments a memory array consisting of 1FeFET1R memory cells was simulated using a physical circuit-based compatible spectrum and SPICE model, where FeFET is based on the preiach model. The model realizes efficient design and analysis, and is widely applied to FeFET circuit design. PTM45-HP was used as a simulation model for the remaining transistors.
The simulation architecture is shown in FIG. 1. FIG. 1 implements one application of an artificial intelligence scenario: a nearest neighbor search based on a cosine search. The principle is to find the closest stored vector to the input vector in cosine distance. The memory cell (denoted by C) in FIG. 1 is a multi-bit memory inner product and XOR unit according to the present invention; fig. 2 shows a multi-bit inner product and xor unit represented by N-4 bits.
2. Simulation result
(1) According to the schematic diagram of the multi-bit memory inner product and XOR unit of FIG. 2, when the current is at the nano-level, the simulation of SPECTRE shows that R is equal to R 0 :R 1 :R 2 :R 3 The ratio is 8:4:2: 1.
(2) The abscissa of fig. 3(a) represents the voltage input to the gate of the transistor in the multi-bit bank and xor unit, and the voltage is a continuous value. The curve is from 0000 to 1111 from bottom to top; fig. 3(b) shows the result obtained by considering FeFET process errors (extracted from non-patent document 1T. solimetric et al, "Ultra-lowpower flexibleprecision FeFET base analyzed analog in-memory computing", IEEE iedm,2020.), large resistance errors (extracted from non-patent document 2d. saito et al, "analog in-memory computing FeFET-based 1T1 raray for edgeaiapplications", IEEE symposium vlsi-ics, 2021) and transistor errors, i.e., domain default 10% magnitude error, 10% threshold voltage error. The abscissa of FIG. 3(a) is the voltage input to the gate of the transistor in the inner product and XOR cell of the multi-bit memory, and the curve from bottom to top is the stored value 0001 (1) (10) )、0011(3 (10) )、0101(5 (10) )、0111(7 (10) )、1001(9 (10) )、1011(11 (10) )、1101(13 (10) )、1111(15 (10) ). The result shows that the inner product result of the invention has high accuracy, the difference of the inner product result is more than 2, and the inner product result range is large enough (the input is more than 0101 (5) (10) ) And the voltage is about 0.5V), no overlap occurs in the operation.
(3) Energy consumption and time delay are as follows:
comparing our results with those based on SRAM multibit inner-product-XOR cell proposed In non-patent document 3(M.Ali et al, "IMAC: In-Memory Multi-Bit Multi and ACcumulation In 6T SRAM Array", TCAS-I,2020.), the present invention obtains results of expanding vector number and vector dimension from FIG. 6, respectively, and the present invention obtained results of more than 10 4 The reduction of the power consumption of the inner product and exclusive-or unit in each multi-bit memory, and the reduction of 4.67 times of output delay.
(4) Consumption area:
the area consumption of the present invention is significantly reduced compared to the above-mentioned non-patent document 3 mainly because a new nonvolatile memory device FeFET is utilized and is simpler in design than the conventional SRAM. For a single multi-bit memory inner product and exclusive or unit, the area of the present invention is reduced by 488 times (SRAM 64.9 μm) compared to the above non-patent document 3 2 Cell, 0.133 μm according to the invention 2 /cell)。
(5) And (3) expandability:
fig. 4 extends the N-4-bit multi-bit in-memory product and xor unit to N-6-bit, which shows that in the worst case, only 1 is inaccurate; specifically, for the cell with N ═ 6, the simulation results indicate 000111 (7) (10) ) And 001000 (8) (10) ) The same current situation will occur.
FIG. 5 shows that the current difference flowing out from each branch of the multi-bit memory inner product and XOR unit is increased by reducing the resistance of 1FeFET1R and increasing the off-working current; it is shown that the scalability of the present invention is further increased in applications where it is not necessary to limit the current magnitude, such as hamming calculations.
The above-described embodiments are intended to illustrate rather than to limit the invention, and any modifications and variations of the present invention are within the spirit of the invention and the scope of the claims.