[go: up one dir, main page]

WO2025052292A1 - Systems and methods for providing and using multi-level non-volatile memory elements - Google Patents

Systems and methods for providing and using multi-level non-volatile memory elements Download PDF

Info

Publication number
WO2025052292A1
WO2025052292A1 PCT/IB2024/058647 IB2024058647W WO2025052292A1 WO 2025052292 A1 WO2025052292 A1 WO 2025052292A1 IB 2024058647 W IB2024058647 W IB 2024058647W WO 2025052292 A1 WO2025052292 A1 WO 2025052292A1
Authority
WO
WIPO (PCT)
Prior art keywords
volatile
binary
elements
cell
conductance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/IB2024/058647
Other languages
French (fr)
Inventor
Dmitry Leshchiner
Denis POTAPOV
Konstantin ZVEZDIN
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Spinedge Ltd
Original Assignee
Spinedge Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Spinedge Ltd filed Critical Spinedge Ltd
Publication of WO2025052292A1 publication Critical patent/WO2025052292A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C11/00Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
    • G11C11/54Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using elements simulating biological cells, e.g. neuron
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • G06N3/065Analogue means
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C11/00Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
    • G11C11/02Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using magnetic elements
    • G11C11/16Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using magnetic elements using elements in which the storage effect is based on magnetic spin effect
    • G11C11/165Auxiliary circuits
    • G11C11/1659Cell access
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C11/00Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
    • G11C11/18Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using Hall-effect devices
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C11/00Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
    • G11C11/56Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using storage elements with more than two stable states represented by steps, e.g. of voltage, current, phase, frequency
    • G11C11/5607Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using storage elements with more than two stable states represented by steps, e.g. of voltage, current, phase, frequency using magnetic storage elements
    • HELECTRICITY
    • H10SEMICONDUCTOR DEVICES; ELECTRIC SOLID-STATE DEVICES NOT OTHERWISE PROVIDED FOR
    • H10BELECTRONIC MEMORY DEVICES
    • H10B61/00Magnetic memory devices, e.g. magnetoresistive RAM [MRAM] devices
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C11/00Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
    • G11C11/02Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using magnetic elements
    • G11C11/16Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using magnetic elements using elements in which the storage effect is based on magnetic spin effect
    • G11C11/165Auxiliary circuits
    • G11C11/1673Reading or sensing circuits or methods
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C11/00Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
    • G11C11/02Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using magnetic elements
    • G11C11/16Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using magnetic elements using elements in which the storage effect is based on magnetic spin effect
    • G11C11/165Auxiliary circuits
    • G11C11/1675Writing or programming circuits or methods

Definitions

  • the disclosure herein relates to systems and methods for providing and using multi-level non-volatile memory elements.
  • the disclosure relates to an apparatus and methods for a reliable and efficient representation of neural network weights by resistive elements within electrical circuits.
  • Neural networks require large computational hardware resources which are both costly and energy inefficient.
  • a more cost effective alternative may be to use a low-powered analogue approximation device to perform approximate neural network inference.
  • an approximation device to be useful it must provide an approximate inference result having sufficient accuracy for practical purposes.
  • each neural network weight could be represented by a single nonbinary cell element having multiple resistance levels.
  • binary cell elements consisting of multiple binary cell elements binary cell elements remain both the most common and the most advanced type of memory available. The task of representing multi-level neural network weights by individual binary cell elements requires a solution.
  • a system is introduced for providing non-volatile multilevel memory cells.
  • an apparatus is taught for approximating neural network inference using such non-volatile multilevel memory cells to represent neural network weights.
  • the non-volatile multilevel memory cell comprises a set of non-homogeneous non-volatile binary cell elements, each comprising a variable resistive element and a corresponding switching mechanism.
  • the variable resistive element is switchable between at least a first stable state having a higher conductance and a second stable state having a lower conductance, and the switching mechanism is configured to select between the first stable state and the second stable state of the corresponding variable resistive element without changing the state of any of the other non-homogeneous non-volatile binary cell elements of the multilevel memory cell.
  • the difference between the higher conductance of the first stable state and the lower conductance of the second stable state of each non-volatile binary cell element in the set of non-homogeneous non-volatile binary cell elements represents a characteristic conductance gap distinct from other members of the set.
  • the characteristic conductance gap of each non-homogeneous non-volatile binary cell is determined by at least one characteristic physical property of the non-homogeneous non-volatile binary cell.
  • the set of non-homogeneous non-volatile binary cell elements comprises a sequence of binary cell elements wherein each binary cell member of the sequence has twice the conductance gap of its previous binary cell element in the sequence.
  • the characteristic conductance gap of each non-homogeneous non-volatile binary cell is determined by a characteristic size of the non-homogeneous non-volatile binary cell.
  • variable resistive element may comprise a magnetic tunnel junction comprising an insulating layer, for example a metal oxide layer such as a MgO barrier, sandwiched between a reference layer of ferromagnetic material, such as FeCoB, and a free layer of ferromagnetic material, such that the magnetic tunnel junction is switchable between a parallel stable state having a higher conductance and an antiparallel stable state having a lower conductance.
  • the magnetic tunnel junction of each non-volatile binary cell element in the set of non-homogeneous non-volatile binary cell elements has a characteristic area distinct from the areas of magnetic tunnel junctions other non-volatile binary cell elements of the set.
  • the set of non-homogeneous non-volatile binary cell elements comprises a sequence of binary cell elements wherein the magnetic tunnel junction of each binary cell member of the sequence has twice the area of the magnetic tunnel junction of the previous binary cell element in the sequence.
  • the switching mechanism of the magnetic tunnel junction comprises a Spin-Orbit Torque mechanism comprising a source line MOSFET connected to the free layer of the magnetic tunnel junction and a bit line MOSFET connected to the reference layer of the magnetic tunnel junction, a read line connected to the gate terminal of the bit line MOSFET and a write line connected to the gate terminal of the source line MOSFET.
  • Various other switching mechanisms may be used for the magnetic tunnel junction such as a Spin Transfer Torque mechanism comprising a source line MOSFET connected to the reference layer of the magnetic tunnel junction and a write line connected to the gate terminal of the source line MOSFET.
  • the switching mechanism may be a Voltage-Controlled Magnetic Anisotropy mechanism comprising a source line MOSFET connected to the free layer of the magnetic tunnel junction and a voltage control circuit configured to determine an electrical field direction across the free layer of ferromagnetic material.
  • the non-volatile multilevel memory cell further comprises an input line and an output line configured to pass current through all the individual non-homogeneous non-volatile binary cell elements in the set.
  • the overall conductance of the set of non-homogeneous non-volatile binary cell elements may be selected to represent a numerical value, such as a single neural network weight value, when current passes through the input line and the output line.
  • all the non-homogeneous non-volatile binary cell elements of the set may be connected in parallel between a common input line and a common output line such that the total conductance of the memory cell equals the sum of the individual conductances of all the non-homogeneous non-volatile binary cell elements in the set.
  • each non-homogeneous nonvolatile binary cell of the memory cell may represents a different bit of a multi-bit binary value.
  • each non-homogeneous non-volatile binary cell of the memory cell may represent a different bit of a multi-bit binary value and the characteristic conductance gap of each non-volatile binary cell element may be selected such that the overall conductance gap of the memory cell represents the multi-bit value.
  • the memory cell may comprise a sequence of binary cell elements wherein the value assigned to each binary cell member of the sequence is twice the value assigned to its previous binary cell element in the sequence.
  • the set of non-homogeneous non-volatile binary cell elements may be arranged on one or more memory crossbar.
  • the set of non-homogeneous non-volatile binary cell elements is configured to represent a multi-bit binary value having a number of bits equal to the total number of resistive elements on the plurality of memory crossbars.
  • the non-homogeneous non-volatile binary cells of the set are sufficiently spaced apart that parasitic switching of neighboring variable resistive elements is prevented during targeted switching of a single variable resistive element.
  • the variable resistive element may comprise a magnetic tunnel junction and the non-homogeneous non-volatile binary cells of the set are sufficiently spaced apart that parasitic switching by magnetostatic excitation of neighboring magnetic tunnel junction is prevented during targeted switching of a single magnetic tunnel junction.
  • Another aspect of the disclosure is to introduce an apparatus for approximating neural network inference comprising at least one non-volatile multilevel memory cell configured to represent a neural network weight.
  • each non-homogeneous non-volatile binary cell element is selected such that the total conductance gap of the memory cell represents the neural network weight.
  • the non- homogeneous non-volatile binary cell elements of the set may be connected in parallel between a common input line and a common output line such that the total conductance of the memory cell equals the sum of the conductances of all the individual non-homogeneous non-volatile binary cell elements in the set. Accordingly, when an input voltage is applied across the common input and the common output lines generates a current through the output line higher than a base current by an amount equal to the product of the input voltage difference and the total conductance gap.
  • the method may include measuring at least one physical property of each multilevel memory cell; and applying a digitizalization procedure for each multilevel memory cell, the digitizalization procedure depending upon the at least one physical property.
  • the method includes adjusting the ADC range based upon a set of samples for neural network input data while the normalization parameters remain unchanged. Additionally or alternatively, the method includes adjusting the normalization parameters based upon a set of samples for neural network input data while remain the ADC range unchanged. Additionally or alternatively, again, the method includes adjusting the ADC range and the normalization parameters jointly based upon a set of samples for neural network input data.
  • a method for approximating a multiply-accumulate operation on a neural network inference or the value of a dot product of a first vector and a second vector may include providing a weighting apparatus comprising a sequence of adjustable multilevel memory cells, each adjustable multilevel memory cell connected between a cell input line and a common output line; programming the weighting apparatus such that each adjustable multilevel memory cell has a conductivity representing a single neural network weight value; providing an activation vector comprising a sequence of activation values; providing input voltage signals to each cell input line of the weighting apparatus representing a corresponding activation value from the activation vector; and measuring the current through the common output line.
  • the method for approximating a value of a dot product of a first vector and a second vector may include providing the first vector comprising a sequence of first vector values; providing the second vector comprising a sequence of second vector values; providing a weighting apparatus comprising a sequence of adjustable multilevel memory cells, each adjustable multilevel memory cell connected between a cell input line and a common output line; programming the weighting apparatus such that each adjustable multilevel memory cell has a conductivity representing a single first vector value from the first vector; providing input voltage signals to each cell input line of the weighting apparatus representing a corresponding second vector value from the second vector; and measuring the current through the common output line.
  • a method for encoding a numerical value in a multibit format to be represented by a multilevel memory cell. Such a method may include: measuring at least one resistive state of the multilevel memory cell; defining a digitizalization function, the digitalization function depending upon the higher conductance value and the lower conductance value of each said variable resistive element; and applying the digitization function to the numerical value. Additionally or alternatively, the numerical value is selected from a set of numerical values and the method further comprising scaling each numerical value of the set.
  • the multilevel memory cell comprises a sequence of binary cell elements each binary cell element comprising a variable resistive element which is switchable between a first stable state having a higher conductance and a second state having a lower conductance
  • the step of measuring at least one resistive state of the multilevel memory cell comprises: measuring the higher conductance value of each binary cell element; and measuring the lower conductance value of each binary cell element.
  • the scaling comprises: determining a lowest numerical value in the set of numerical values; determining a highest numerical value in the set of numerical values; defining a scaling function, the scaling function depending upon the lowest numerical value and the highest numerical value; and applying the scaling function to each the numerical value.
  • Fig. 1 schematically represents a multilevel memory cell including a set of non-homogeneous binary cell elements, according to the presently disclosed subject matter
  • Fig. 2 schematically represents of a sequence of non-homogeneous binary cell elements connected in parallel, according to the presently disclosed subject matter
  • Fig. 3A is a schematic perspective view of a Spin-Orbit Torque memory write mechanism not passing the current, through a tunneling barrier, according to the present invention
  • Fig. 3B is a schematic perspective view of a Spin Transfer Torque write mechanism
  • Fig. 4A schematically represents a multilevel memory cell including a set of perpendicular magnetic anisotropy SOT cells
  • Fig. 4B schematically represents a multilevel memory cell including a set of in-plane magnetic anisotropy SOT cells
  • Fig. 5 schematically represents an example of a multilevel memory element architecture for controlling a set of binary cell elements using SOT cells
  • Fig. 6A is a flowchart indicating selected steps of a digitalization procedure for mapping numerical weight values to multilevel memory cells
  • Fig. 6B is a flowchart indicating selected steps of a method for scaling a set of numerical weight values for a multilevel memory cell having a given number n_bit of binary cell elements;
  • Fig. 60 is a flowchart indicating selected steps of a method for associating binary digits with actual conductance levels of binary cell elements of a multilevel memory cell;
  • Fig. 6D is a flowchart indicating selected steps of a method for mapping numerical values to binary digits
  • Fig. 7A is a graph showing how inference accuracy degradation relates to the weight value digitalization method chosen for a neural network
  • Fig. 7B is a table showing how inference accuracy degradation relates to the weight value digitalization method chosen for a neural network
  • Fig. 8 is a graph of the maximal value of an output neuron signal relative to the expected exact neuron output signal (vertical axis), depending on the ratio r/R (different curve shades) of conductance of connection lines to conductance of memory elements, and on the size (horizontal axis) of the neuron layer modeled;
  • Fig. 9A is a diagram of a straight wire crossbar array connection scheme
  • Fig. 9B is a schematic view of a diagram of a balanced analog crossbar array connection scheme, according to the present invention.
  • Fig. 10A is a schematic perspective view of a crossbar connection lines scheme, according to the present invention.
  • Fig. 10B is a schematic perspective view of an overall 3D structure of the crossbar with the connection lines made as binary balanced trees, according to the present invention.
  • Fig. 11 A is a graph of the inference accuracy in % for a Modified National Institute of Standards and Technology (MNIST) task depending on the number of bits, and on the size of the neuron layer modeled, according to the present invention
  • Fig. 11 B is a graph of the inference accuracy in % for a Street View House Numbers (SVHN) task depending on the number of bits (different curve shades), and on the size of the neuron layer modeled, according to the present invention.
  • Fig. 11 C is a graph of the inference accuracy in % for a Canadian Institute For Advanced Research (CIFAR10) task depending on the number of bits (different curve shades), and on the size of the neuron layer modeled, according to the present invention.
  • CIFAR10 Canadian Institute For Advanced Research
  • aspects of the present disclosure relate to system and methods for providing multi-level non-volatile memory elements.
  • the disclosure relates to an apparatus and methods for a reliable and efficient representation of neural network weights by resistive elements within electrical circuits.
  • a neural network approximation device which may receive an activation vector as an input and generate an output vector of sufficient accuracy.
  • the multi-level non-volatile memory elements which enable the system typically include a sequence of non-volatile binary cell elements which are non-homogeneous and which are characterized by having distinctive physical features such that each binary cell element of the sequence represents a particular bit in a multi-bit value.
  • a set of adjustable multi-level non-volatile memory cells may be programmable to encode, for example, a row of weight values from a weight matrix associated with a layer of a neural network weights.
  • Methods are also described for making approximate neural network inference employing an individual adjustment for the ADC ranges and the output normalization parameters, and to use the proposed apparatus for computation of dot products of input vectors at intermediate layers for the approximate neural network inference.
  • one or more tasks as described herein may be performed by a data processor, such as a computing platform or distributed computing system for executing a plurality of instructions.
  • the data processor includes or accesses a volatile memory for storing instructions, data or the like.
  • the data processor may access a non-volatile storage, for example, a magnetic hard-disk, flash-drive, removable media or the like, for storing instructions and/or data.
  • Each non-homogeneous non-volatile binary cell element may be a variable resistive element and may be switchable between two stable states. Accordingly, each of the binary cell elements may represent a binary digit or bit and the two stable states may be used to indicate if the bit is ON or OFF.
  • a first stable state having a higher conductance indicates that the bit represented by the binary cell element is ON and a second stable state having a lower conductance indicates that the bit represented by the binary cell element is OFF.
  • a targeted switching mechanism may be provided for each binary cell element configured to select between the first stable state and the second stable state of only the associated binary cell element without changing the state of any other non-homogeneous non-volatile binary cell element of the multilevel binary cell element.
  • the binary cell elements 10A-D are non-homogeneous.
  • the physical features of the binary cell elements are not identical to one another. Accordingly, the difference between the higher conductance of the first stable state and the lower conductance of the second stable state of each non-volatile binary cell element is distinctive.
  • the distinctive conductive gap of each binary cell element may therefore be used to characterize the binary cell element and the bit it represents.
  • Such an apparatus may be programed to represent a single neural network weight by selecting bit values for each of the binary cell elements such that the multilevel binary cell element stores a digital value indicative of the neural network weight.
  • current may be passed through a common input line and a common output line to via the multi-level binary cell element.
  • the binary cell elements are connected in parallel between the common input line and the common output line the overall conductance gap of the multilevel memory element may represent the neural network weight.
  • an array of such multi-level binary cell elements may be arranged into a crossbar to make the approximate neural network inference.
  • the multilevel memory element may represent a single neural network weight value by means of a multi-bit binary representation for the values of the neural network weights.
  • the multilevel memory element may represent separate bits of the multi-bit binary representation for the values of the neural network weights with separate binary cell elements of the plurality of non-homogeneous non-volatile binary cell elements.
  • the multi-bit states representation is provided by a group of cells that were subject to application of distinct voltage patterns or distinct output processing procedures, so that different outputs can be weighted according to the significance of the bit they represent before they are combined.
  • PCT/IL2023/050328 “Apparatus and methods for approximate neural network inference” the multibit representation is provided by several separate binary cell elements either being located within a single memory crossbar or being distributed to several separate crossbars. All these prior art solutions rely on the use of the homogeneous differential cells of identical type and properties to represent the different bits of the multi-bit weight value.
  • a group of non-homogeneous cells of distinct conductance characteristics and/or geometry may be coupled together to reproduce a single neural network weight value represented by their overall combined conductance level.
  • the non-volatile binary cell elements of the apparatus may represent separate bits of the multi-bit binary representation for the values of the neural network weights corresponding to each nonvolatile binary cell element’s characteristic conductance gap, which is a difference in overall conductance between the two stable states of the cell.
  • the conductance gap corresponds to the difference between current levels, flowing through the cell in the open and the close state, at the same voltage.
  • Fig. 2 schematically represents a set of n non-volatile binary cell elements BCE1-n of the multilevel binary cell element connected between a common input line and a common output line. In input voltage VIN may be connected across the multi-level binary cell element such that current passes through all non-volatile binary cell elements in parallel.
  • a given voltage VIN between input line and output line may represent an input signal for example representing an activation value for neural network inference.
  • the current running h, h, I3, 1 , ... In through each of the non-volatile binary cell elements depends upon the nonvolatile binary cell element’s conductance.
  • the total current flowing through the memory element IOUT is the product of the input voltage and the total conductance of the memory element. Therefore, where the total conductance of the multilevel binary cell element represents a weight value of a particular neuron in a neural network layer, the total resulting current flowing through the output line may represent an output inference signal of the neural network layer.
  • the sum of the currents flowing through all the binary cell elements equals the current flowing through the output line.
  • the bit value represented by a given binary cell element by means of the binary cell element’s selected conductive state, affects the output current (at a given voltage) with the weight proportional to the cell's conductance gap.
  • the conductance gap depends on the physical properties of the binary cell element, for example its parameters, its geometry, its dimensions and size.
  • the values of conductance gap levels may depend on the different sizes of the binary cell element.
  • a binary cell element occupying the area with exactly uniform resistive properties across the full area of it will have a conductance gap level exactly proportional to the total area of the binary cell element. It is noted that such as construction would be compatible with a manufacturing process in which the binary cell element area is changed and the binary cell element thickness is maintained constant.
  • the apparatus may use the conductance gap levels reproducing the value scales in proportion 1 :2:4:8 etc., from the lower to the higher bit. That way, it will directly reproduce the pertinent scales for the corresponding bits at the neural network weight multi-bit binary representation.
  • Multilevel memory cells of the invention may be realized with a variety of constituent binary cell elements as suit requirements.
  • the multilevel memory cells may be implemented using MTJ (Magnetic Tunnel Barrier) technology.
  • Magnetic tunnel junctions are devices in which two ferromagnet layers are separated by an insulating layer sufficiently thin that quantum tunneling enable electrons in one ferromagnet to cross to the other. It is more likely that an electron will cross the insulator when the directions of magnetization in both ferromagnetic layers are parallel. Accordingly, an MTJ has two distinct stable resistance states. It is possible to switch between these stable resistance states by keeping a constant direction of magnetization in one ferromagnetic layer, known as the reference layer, and controlling the direction of magnetization in the other, known as the free layer.
  • a metal oxide layer such as a MgO barrier
  • a free layer of ferromagnetic material such as FeCoB.
  • Such a magnetic tunnel junction may be switchable between a parallel stable state having a higher conductance and an antiparallel stable state having a lower conductance
  • Various the switching mechanisms can be used to control magnetic tunnel junctions such as Spin- Orbit Torque (SOT), Spin Transfer Torque (STT), Voltage-Controlled Magnetic Anisotropy (VCMA) and the like.
  • SOT Spin- Orbit Torque
  • STT Spin Transfer Torque
  • VCMA Voltage-Controlled Magnetic Anisotropy
  • a source line MOSFET is connected to the free layer and a bit line MOSFET is connected to the reference layer of the magnetic tunnel junction, a read line connected to the gate terminal of the bit line MOSFET and a write line connected to the gate terminal of the source line MOSFET.
  • a source line MOSFET is connected to the reference layer of the magnetic tunnel junction and a write line is connected to the gate terminal of the source line MOSFET.
  • VCMA mechanism comprising a source line MOSFET is connected to the free layer of the magnetic tunnel junction and a voltage control circuit configured to determine an electrical field direction across the free layer of ferromagnetic material.
  • any of these switching mechanisms may be used to control the MTJ, it is particularly noted that a higher resistance memory elements might be controlled by SOT (Spin-Orbit Torque) switching mechanisms.
  • the resistance levels of SOT MTJ cells/memory elements may reach 1 MQ or more, while for other types of MTJ memory cells, the resistance normally does not exceed 10KQ.
  • the main device to increase the resistance level of the SOT cell/memory elements is by thickening of the tunneling barrier, which is made possible because memory write mechanism of the SOT MTJ does not pass the current through the tunneling barrier so that high voltages are not required for the write mechanism for SOT MTJ even when the resistance of the tunneling barrier is very high.
  • the specific properties making memory elements suitable for the apparatus described herein may not be appropriate for use when the memory elements are used for digital memory storage. In some examples, this involves using an element size larger than that expected for digital memory storage (45 nm or larger), nevertheless, the larger size may enable a low power readout. In other examples, a significant percentage of the memory elements used are not usable for memory storage but are still usable as part of a a structure of memory elements to perform the reliable instant analogue approximation of the neural network layer output.
  • Computer simulation shows cases when up to 5% of non-functional cells could be tolerated. In yet other examples, this involves the use of memory elements with a significant percentage of read errors, while the ensemble of the memory elements is still usable to perform the reliable instant analogue approximation of the neural network layer output. Computer simulation shows cases when up to 1 % of read errors could be tolerated. In still another example, this involves the use of a relatively short memory retention time, while the ensemble of the memory elements is still usable to perform the reliable instant analogue approximation of the neural network layer output within the specified retention time frame. For certain applications, the energy barrier for the analogue approximation memory elements could be lowered to provide a memory retention time as low as 24 hours or even less, depending on the relevant device requirements.
  • Fig. 3 illustrates an example in which non-homogeneous MTJ (magnetic tunnel junction) binary cell elements are used.
  • the multilevel binary cell element may include a set of four circular-shape perpendicular magnetic anisotropy spin-orbit torque (SOT) MTJ cells, for example, arranged within a square-shaped area.
  • Fig. 4 illustrates an alternative example of a multilevel binary cell element in which a set of four elliptic-shape in-plane magnetic anisotropy spin-orbit torque (SOT) MTJ cells are arranged within a square-shaped area.
  • SOT magnetic tunnel junction
  • the plurality of non-homogeneous non-volatile binary cell elements of variable resistance may be implemented with MTJ (magnetic tunnel junction) cells.
  • MTJ magnetic tunnel junction
  • a multilevel binary cell element may be provided having a set of four circular-shaped perpendicular magnetic anisotropy MTJ cells, in which the conductance levels are set using the SOT (spin-orbit torques) to provide a 4-bit device having 16 conductance levels.
  • SOT spin-orbit torques
  • the sizes of the binary cell elements involved may be selected to have areas in proportion 8:4:2: 1,
  • the maximum permitted diameter depends upon on the stability of the magnetic domain structure.
  • the distance between SOT cells within one muiltilevel binary cell element may be of the order of 100 nanometers or less and should be selected to ensure that there is no parasitic switching of adjacent cells during the MTJ cell targeted switching.
  • binary cell elements having diameters of 140 nm, 99 nm, 70 nm, 49.5 nm, such that their areas are in the in proportion 8:4:2: 1 may be packed into a multi-level binary cell element of 350nm x 350 nm.
  • a set of four in-plane magnetic anisotropy based SOT cells having oblong shapes, such as near-rectangular with rounded corners or elliptically shaped, for example with an approximate 3:1 area axis ratio may be configured to function together as a 4-bit multilevel binary cell element having 16 different conductance levels.
  • the sizes of the binary cell elements involved may be selected to have areas in proportion 8:4:2: 1 , by way of example, binary cell elements may be provided having dimensions of 420x140 nm, 297x99 nm, 210x70 nm, 148.5x49.5 nm.
  • a distance of the order of 200 nanometers or less between SOT cells may ensure that there is no parasitic switching of adjacent cells during the MTJ cell targeted switching.
  • Such oblong binary cell elements could be packed into a multi-level binary cell element of at least 600nm x 600 nm.
  • FIG. 5 shows an example of a multilevel memory element architecture in which a set of binary cell element SOT cells 510A-D are connected in parallel between a common input line 520 and a common output line 530.
  • the architecture further includes a pair of targeted control lines for each SOT cell via which each binary cell element may be separately switched between its two stable states.
  • the architecture of the multilevel memory element may include an individual pair of cell control lines 515A-D, 517A-D connected to each SOT cell individually for controlling the process of switching between the states of the SOT cell.
  • the control lines may be connected to auxiliary transistors configured to control the open and closed states of connection lines to each SOT cell individually.
  • connections may in some examples be realized within a conventional scheme of the crossbar matrix.
  • the multi-bit representation of the value for a given neural network’s weights matrix may, in some examples, be provided by a composite memory element including several separate multilevel memory elements which may be connected to a single memory crossbar or alternatively distributed across more than one crossbar.
  • the total number of bits in the composite memory element is equal to the sum of the numbers of bits in all the constituent multilevel memory elements.
  • a composite memory element comprising two 4-bit resistive elements each with 16 conductance levels would result in a total number of 8 bits, with 256 levels represented accordingly.
  • the appropriateness of the outcome of using the apparatus for representation and/or reproduction of the value scales for neural network weights may be determined by the accuracy of approximate neural network inference staying within the limits required by the task addressed by the neural network.
  • the appropriateness of the outcome of using the apparatus could be verified and asserted, in particular, by the inference simulation modeling. Accordingly, methods described herein may be tested by measuring the resulting accuracy of the approximate neural network inference.
  • a digitalization procedure converts a precise numerical value into a digital greyscale approximation which can be expressed as a binary number with the number of digits available.
  • a 4-bit memory element may be used to represent any of the integers 16 values 0, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, and 15.
  • the digitalization procedure may map a numerical value to a 4-bit approximation by segmenting a range of expected values into 16 subrange segments each and selecting a value from 0-15 indicating the segment in which the numerical value is found.
  • n-bit approximation may be achieved by segmenting a range of expected values into 2 n subrange segments.
  • the range is segmented.
  • One simple example is the direct transmission of the bits of digital binary representation of the neural network weight into the binary values for the binary cell elements involved.
  • the weight of 0.5 may be represented by an 8-bit integer (INT8) value 128, which would produce a digital binary representation of 10000000, and the sequence of binary values (1 ,0, 0,0, 0,0, 0,0) for the binary cell elements representing the bits of the corresponding 8-bit value, from the higher to the lower bit.
  • INT8 8-bit integer
  • a digitalization procedure is described herein which has been found to be particularly appropriate for using multilevel memory elements having imperfect individual binary cell elements, whose conduction gaps may vary from the required proportions, to represent weight values in neural networks. It has been found that the digitalization procedure may provide reliable approximate analogue output for MAC (Multiply-ACcumulate) operations used to model inference.
  • MAC Multiply-ACcumulate
  • the digitalization method proposed takes into account the actual individual conductance levels of each binary cell element to obtain a more faithful approximation for the overall weight representation as described further herein.
  • the MAC weights matrix may be used to define the required range of expected values to be segmented.
  • a conductance dependent digitalization method for a multilevel memory cell having a given number n_bit of binary cell elements may include scaling weight values, associating binary digits of the memory element with actual conductance levels, and mapping the scaled weight values to binary values according to the actual conductance levels.
  • a scaling function may be used which takes a setof weight values, such as a row of the MAC weights matrix and uses these values to define a scaling function which may be applied to the weight values of the set.
  • a method for scaling weights may include:
  • SCALE maxjnt / (wjnax-wjnin); and • for each numerical weight w in the row assigning a scaled value W given by the product of the scaling factor SCALE and the difference between the numerical weight w and the lowest numerical weight wjnin, wjnin).
  • a possible method for asscociating binary digits with actual conductance levels may include:
  • one method for measuring actual conductance of each binary cell may be by a process of setting each binary cell element individually into high and low conductance states, and then individually measuring the output current values through the chosen cell under the given voltage.
  • bjeal_high is a value that tends to lie close to 1
  • b ealjow is a value that tends to lie close to 0.
  • the scaled weight values W may be mapped to the binary values of the multilevel memory element for example, by comparing the bit value of each binary cell element in turn with a bit-adjusted scaled weight value and setting the compared bit value to 1 only if the bit-adjusted scaled weight value is higher than the bit value.
  • the highest bit may be compared with the scaled weight value W directly such that if the scaled weight value is higher than the high-bit value of the highest bit then assigning a value of 1 to the highest bit and subtracting the high-bit value of the highest bit from the scaled weight value to obtain a bit-adjusted scaled weight value (or Test Value T);
  • the next bit may be compared with the bit-adjusted scaled weight value T such that if the bit- adjusted scaled weight value T is higher than the high-bit value of the next bit then assigning a value of 1 to the next bit and subtracting the high-bit value of the next bit from the adjusted scaled weight value to obtain another bit-adjusted scaled weight value;
  • bit-adjusted scaled weight value is not higher than the high-bit value of the next bit then assigning a value of 0 to the next bit and subtracting the low-bit value of the next bit from the bit-adjusted scaled weight value to obtain another bit-adjusted weight value.
  • Figs. 7A and 7B a graph and a chart are presented showing how inference accuracy degradation relates to the digitalization method chosen for a neural network.
  • the vertical axis shows the percentage error rate increase relative to a base accuracy given by the digital inference for the ViT-B/16 384x384 neural network on the ImageNet task at the 10-bit ADC resolution.
  • the horizontal axis indicates the number of weight representation bits and the variation level (i.e. noise-to-signal ratio) for the conductance gap of the cells.
  • inference outcomes depend upon neural network layer output normalization parameters, as well as, in the case of approximate analogue inference, the ranges for analogue-to-digital conversion (ADC) for the neural network layer output.
  • ADC analogue-to-digital conversion
  • ADC in the case of approximating neural network inference using physical conductance-based memory cells, is to measure the output current level and to convert it into digital format, let’s say, 9-bit integer value.
  • Imin corresponding to the minimal non-0 integer value of 1
  • Imax corresponding to the maximal binary integer value of 511 .
  • the range of Imin to Imax should cover the most informative part of the possible neural network layer output values.
  • optimization of normalization parameters and ADC conversion ranges involves the tuning of both the neural network layer output normalization parameters and the analogue-to-digital conversion (ADC) range for the neural network layer output based on the statistics for a set of samples for the neural network training input data.
  • the tuning typically involves the parameters and/or ranges optimization process trying to maximize the inference accuracy. It is particularly noted that inference accuracy optimization may be performed using inference simulation modeling and measuring the resulting accuracy of the approximate neural network inference.
  • Another aspect of the current invention is to teach a method for adjusting the ADC ranges and the output normalization parameters individually based on an individual set of samples for the neural network input data. Accordingly, the best way to perform an individual adjustment of normalization and ADC range parameters could be chosen based on the training experience. Accordingly, the training experience may be used to determine whether to adjust the ADC ranges and the output normalization parameters separately or to whether to adjust them jointly.
  • the ADC ranges and the output normalization parameters are adjusted separately on a given subset of parameters and/or ranges while freezing the values for the other parameters and ranges during the adjustment process.
  • the optimal way would be a separate individual adjustment for the ADC ranges and the output normalization parameters, wherein the ADC ranges get adjusted first based on the individual set of samples for the neural network input data, while the output normalization parameters remain frozen and are unchanged during the adjustment process.
  • the output normalization parameters get adjusted based on the same or different individual set of samples for the neural network input data, while the adjusted ADC ranges remain frozen and are unchanged during the adjustment process.
  • the accuracy degradation for approximate inference was measured as 2.1 % of the original (digital inference) error rate increase at the 3% cell conductance gap variation level from one instance of resistive element 50 to another.
  • the accuracy degradation of 2.1 % error rate increase is achieved with performing the ADC ranges adjustment first, based on individual sets of samples for the neural network input data, while the output normalization parameters stay frozen during the adjustment process. After the first adjustment, the output normalization parameters get adjusted based on the same or different individual set of samples for the neural network input data, while the adjusted ADC ranges stay frozen.
  • the absolute accuracy values for the two above described methods of performing the individual adjustment are 79.08% and 77.83% respectively, relative to the original (digital inference) accuracy of 79.51 %.
  • the ‘ADC ranges adjustment first, output normalization parameters second” way gives the error rate increase of 25.13%
  • the joint adjustment way gives the error rate increase of 10.98% (absolute accuracy values 74.36% and 77.26% respectively, relative to the original accuracy of 79.51 %). That optimal choice tendency, as “low conductance variation - separate adjustment, high conductance variation - joint adjustment”, stays valid for value representation proposed above on ResNetRS50 160x160 network for the wide range of the training setting options.
  • the inference operation requires, besides the MAC operation multiplying the input vectors by the fixed (given and stored) matrix of weights, also a computation of dot products of input vectors in certain intermediate layers, for instance, in transformer type architectures such computation is used for calculation of “self-attention” values.
  • the prior art solution for it was to rely on digital calculations for dot products computation.
  • the present subject matter proposes a solution that uses the apparatus, providing the output for the MAC operations, to calculate the dot product value of vectors, in the same way as the MAC operation result is calculated with the same device.
  • the solution uses the fact of mathematical equivalence of the result of MAC operation, multiplying the matrix to the input vector, to the collection of dot product values of the input vector to the matrix rows.
  • the write-up of the matrix could be performed row by row, one row in parallel, with the same conventional scheme of the crossbar matrix, so the number of writing cycles is equal to the number of vectors.
  • the use of SOT cells would allow the option of very fast (up to 0.3 ns) and very energy efficient (up to 0.03 pj per cell) write-up.
  • the number of computation cycles in the proposed scheme is equal to the number of vectors as well (one analogue inference cycle per one input vector).
  • the total write-up time is less than 17.5% of the full inference time, while the total write-up energy budget is about 0.05% of the total energy budget.
  • a further apparatus is disclosed to perform a reliable and fast analogue approximation of an inference output for a neural network layer represented by a plurality of memory elements organized into a crossbar such that its size is not limited by the ratio of the conductance of memory elements to the conductance of connection lines.
  • a plurality of non-volatile memory cells of variable conductance which may represent multibit binary values, such as the multilevel memory cells described herein, are organized to perform the required instant analogue approximation. Where appropriate current distribution may be governed by the conductance of the circuit elements, that is of memory elements, connection lines, control elements and devices as well as lines required to organize an ensemble of the non-volatile memory elements of variable conductance.
  • the graph of Fig. 8 illustrates the maximal value of an output neuron signal relative to the expected exact neuron output signal (vertical axis), depending on the ratio r/R (different curve shades) of conductance of connection lines to conductance of memory elements, and on the size (horizontal axis) of the neuron layer modeled.
  • the apparatus may perform reliable and fast analogue approximation of the output of the neural network layer represented by a plurality of memory elements organized into a crossbar by preventing back-currents to flow through the cells and memory elements in a reverse direction.
  • Computer simulation shows such prevention to limit the influence of parasitic currents that disrupt the accuracy of an analogue approximation of the output for the neural network layer with a crossbar of a large size.
  • Such prevention of reverse currents could, for example, be made with controlling diodes attached to the memory elements input connection lines.
  • Fig. 9A showing a possible straight wire crossbar array connection scheme
  • some systems may achieve reliable fast analogue approximation of the output for the neural network using a configuration or topology of connection lines that involves a single straight wire input and output connection lines. This ensures a well-balanced output current distribution.
  • Fig. 10A is a three dimensional representation of a possible crossbar connection lines scheme, according to the present invention.
  • a schematic diagram is shown of a balanced analog crossbar array connection scheme, according to the present invention.
  • the configuration of input and/or output connection lines may involve a multi-level tree structure of connections of input/output line to an array of memory elements allowing an evenly balanced distribution of input/output currents over the connected memory elements.
  • a multi-level tree structure of connections may either be a binary balanced tree of connecting lines or a non-binary tree of connecting lines involving also conventional straight single wire connecting lines at individual levels.
  • FIG. 10B shows a perspective view of an overall 3D structure of the crossbar with the connection lines made as binary balanced trees.
  • a partially balanced (non-binary) tree also involving conventional straight single wire connecting lines at individual levels, allow a decrease in the number of branching levels while preserving sufficient balance when the square of the length of an individual single wire connecting line does not exceed the ratio of conductance of connection lines to conductance of memory elements. Although a uniform voltage drop will occur, that will not degrade the accuracy as long as the dynamic noise source is controlled, which is well achievable by conventional means.
  • the apparatus it is also possible for the apparatus to achieve a reliable instant analogue approximation of the output for the neural network layer of a size not practically limited by the ratio of conductance of memory elements to conductance of connection lines by selecting memory elements with specific properties found to be effective.
  • helpful memory element properties include high resistance, large memory element size, acceptance of a higher percentage of memory elements not usable for memory storage, higher percentage of read errors, shorter memory retention time, and/or combinations thereof.
  • sufficiently high resistance of the memory elements may enable a sufficiently low ratio of the conductance of the memory elements to conductance of the connection lines, which in turn removes the main obstacle limiting the size of the neural network layer, which may be represented by a crossbar, the size of which corresponds to the size of the neural network layer. Accordingly, systems having high resistance memory element may be suitable for reliable and fast instant analogue approximation.
  • the parts of the plurality of the non-volatile memory elements of the apparatus could be used alternatively as a digital memory/logic device or as to perform reliable instant analogue approximation of the neural network layer output.
  • the alternative use of the same part of the ensemble of memory elements as either digital memory/logic device or to perform the reliable instant analogue approximation of the neural network layer output could be controlled by the preprogramming or by the run time reprogramming for the ensemble of the memory elements or for some part of that ensemble.
  • the control mentioned, the preprogramming, or the run time reprogramming could be performed either by direct programming of the inmemory logic with the non-volatile memory elements or by the auxiliary control elements and/or devices, including the digital ones.
  • the separate memory elements of the plurality of non-volatile memory elements of the apparatus could be used to represent separate parts for the multi-bit binary representation for the values of the neural network weights.
  • An example of that would be to use one memory element, representing a 4-bit value, to represent the lower 4 bits of an 8-bit neural network weight value, and to use another memory element, representing a 4-bit value, to represent the higher 4 bits of the same value.
  • the resulting output current distributions for the memory elements of the apparatus that represent separate parts for the multi-bit binary representation for the values of the neural network weights may be produced for each part of a multi-bit representation separately and then collected together using an additional circuit — in which case all memory elements, even those representing different parts of a multi-bit representation, could use the same scale of input voltage signals, as the output current distributions for every given part of bits are produced separately, or they may be produced for all parts of a multi-bit representation together using different input voltage scales on the memory cells representing different parts of a multi-bit representation — in which case the scales would be approximately proportional to 2k (2 to the power of k), where k is the given bit index for the corresponding part.
  • different input voltage scales are applied to the memory elements representing different parts of a 6-bit representation of the NN weights.
  • Fig. 11 A is a graph of the inference accuracy in % for a Modified National Institute of Standards and Technology (MNIST) task depending on the number of bits, and on the size of the neuron layer modeled
  • Fig. 11 B is a graph of the inference accuracy in % for a Street View House Numbers (SVHN) task depending on the number of bits (different curve shades), and on the size of the neuron layer modeled
  • Fig. 11 C is a graph of the inference accuracy in % for a Canadian Institute For Advanced Research (CIFAR10) task depending on the number of bits (different curve shades), and on the size of the neuron layer modeled.
  • CIFAR10 Canadian Institute For Advanced Research
  • the plurality of control elements and/or devices of the apparatus may involve a digital processor core and the plurality of the memory elements that are connected to data input lines from the data source and to the control lines of the aforementioned digital processor core.
  • the digital processor core and/or the plurality of control elements and/or devices could be set up to be powered up by a wake-up controller, connected to the plurality of the memory elements and the digital processor core, in case of a wake-up event only.
  • the plurality of memory elements may also stay permanently in “always-on” standby mode, not consuming the energy, while in the event of a data signal coming to the input lines, the signal is initially processed by the plurality of memory elements that perform the initial analogue neural network approximation procedure to determine the need for the digital processor core and/or the plurality of control elements and/or devices to wake up.
  • the detection of the wake-up event could then be the function performed by non-volatile memory elements, without the digital core involvement.
  • the detection of the wake-up event could then be the function performed by non-volatile memory elements, without the digital core involvement.
  • a reliable implementation of an inference for a given pretrained neural network may use separate memory elements of the plurality of non-volatile memory elements to represent separate parts for the multi-bit binary representation for the values of the neural network weights.
  • An example of that would be to use one memory element, representing a 4-bit value, to represent the lower 4 bits of an 8-bit neural network weight value, and to use another memory element, representing a 4-bit value, to represent the higher 4 bits of the same value.
  • a separate memory elements representation of separate parts for the multi-bit binary representation for the values of the neural network weights may be adjusted to optimally, or nearly optimally fit the specific particular instance of specifications, configuration and/or topology of the memory elements and/or connection lines of a specific particular instance of a device described above.
  • the above representation of separate parts for the multi-bit binary representation for the values of the neural network weights may also be adjusted to optimally, or nearly optimally fit the specific particular instance of manufactured properties of the memory elements and/or connection lines of a specific particular instance of a manufactured device described above.
  • the idea of the method is that the levels of binary representation and/or the particular assignment of specific parts to the specific memory elements of a given device and/or particular allocation of input and/or output channels could be adjusted to better fit the particular properties of the particular instance of the device and the given neural network. That includes also the individual electrical properties of the device that are affected by its manufacturing process and are less than fully stable from a device to a device (but are stable for any particular device instance once the device is manufactured).
  • the specific actions involved may include a randomized Monte-Carlo type search for the optimal assignment of specific parts to the specific memory elements and/or for the optimal allocation of input and/or output channels for a particular device instance and a given neural network layer.
  • the term “about” refers to at least ⁇ 10 %.
  • the terms “comprises”, “comprising”, “includes”, “including”, “having” and their conjugates mean “including but not limited to” and indicate that the components listed are included, but not generally to the exclusion of other components. Such terms encompass the terms “consisting of” and “consisting essentially of”.
  • the phrase “consisting essentially of' means that the composition or method may include additional ingredients and/or steps, but only if the additional ingredients and/or steps do not materially alter the basic and novel characteristics of the claimed composition or method.
  • the singular form “a”, “an” and “the” may include plural references unless the context clearly dictates otherwise.
  • a compound or “at least one compound” may include a plurality of compounds, including mixtures thereof.
  • the word “exemplary” is used herein to mean “serving as an example, instance or illustration”. Any embodiment described as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments or to exclude the incorporation of features from other embodiments.
  • the word “optionally” is used herein to mean “is provided in some embodiments and not provided in other embodiments”. Any particular embodiment of the disclosure may include a plurality of “optional” features unless such features conflict.
  • range such as from 1 to 6 should be considered to have specifically disclosed sub-ranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1 , 2, 3, 4, 5, and 6 as well as non-integral intermediate values. This applies regardless of the breadth of the range.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Neurology (AREA)
  • Theoretical Computer Science (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Semiconductor Memories (AREA)

Abstract

Systems and methods for providing and using multi-level non-volatile memory elements for a reliable and efficient representation of neural network weights by resistive elements within electrical circuits and analogue approximation of neural network inference. Multi-level non-volatile memory elements include a sequence of non-volatile binary cell elements which are non-homogeneous and which are characterized by having distinctive physical features such that each binary cell element of the sequence represents a particular bit in a multi-bit value.

Description

SYSTEMS AND METHODS FOR PROVIDING AND USING MULTI-LEVEL NON-VOLATILE MEMORY ELEMENTS
CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims the benefit of priority from U.S. Provisional Patent Application No. 63/536,905, filed September 6, 2023 and U.S. Provisional Patent Application No. 62/646,957, filed January 30, 2024 the contents of which are incorporated by reference in their entirety.
FIELD OF THE DISCLOSURE
The disclosure herein relates to systems and methods for providing and using multi-level non-volatile memory elements. In particular, but not exclusively, the disclosure relates to an apparatus and methods for a reliable and efficient representation of neural network weights by resistive elements within electrical circuits.
BACKGROUND
Neural networks require large computational hardware resources which are both costly and energy inefficient. A more cost effective alternative may be to use a low-powered analogue approximation device to perform approximate neural network inference. However for such an approximation device to be useful it must provide an approximate inference result having sufficient accuracy for practical purposes.
Several publications relate to analogue methods for neural network approximation. These include United States Patent Number 8,275,727 titled, “Hardware analogue-digital neural networks”; United States Patent Number 10,339,202B2 titled, “Resistive memory arrays for performing multiply-accumulate operations”; United States Patent Number 10,534,840B1 titled, “Multiplication using non-volatile binary cell elements”, SanDisk; United States Patent Number 9,152, 827B2 titled, “Apparatus for performing matrix vector multiplication approximation using crossbar arrays of resistive memory devices”, US Air Force; United States Patent Number 10,740,671 titled, “Convolutional neural networks using resistive processing unit array” and International Patent Application Number PCT/IL2023/050328 titled, “Apparatus and methods for approximate neural network inference”.
None of these publications provides an efficient way to represent the multi-level neural network weights with sufficient accuracy to reproduce inference results.
It has further been suggested that each neural network weight could be represented by a single nonbinary cell element having multiple resistance levels. However, binary cell elements consisting of multiple binary cell elements binary cell elements remain both the most common and the most advanced type of memory available. The task of representing multi-level neural network weights by individual binary cell elements requires a solution.
United States Patent Number 10,643,705 titled “Configurable precision neural network with differential binary non-volatile binary cell element structure”, addresses the issue of the multi-bit binary representation for the values of the neural network weights. The applicant’s copending International Patent Application Number PCT application PCT/IL2023/050328, titled “Apparatus and methods for approximate neural network inference”, suggests providing multi-bit representation by combining several separate binary binary cell elements either within a single memory crossbar or distributed across several separate crossbars.
Furthermore, using the conventional realization of analogue approximation, the high ratio of conductance of memory elements to conductance of connection lines poses severe limitations on the size of the network layers, which severely limits the network’s prediction power. The applicant’s copending International Patent Application Number PCT application PCT/IL2023/050328 resolves the issue of such limitations on the size of the network layers by providing analogue approximation solutions where the size of the network layers is not practically limited by ratio of conductance of memory cells to conductance of connection lines. Notably, however, all these solutions represent different bits of a multi-bit weight value using multiple indistinguishable binary cell elements of identical type and properties. It may be possible to distinguish the bits and therefore associating different values with each binary digit by assigning distinct voltage patterns to each of the identical binary cell elements and/or by distinct type of processing so that output currents for difference bits can be weighted according to the value of the bits they represent. However, although the binary digits may then be characterized, for example by such voltage patterns, there is nothing intrinsic about the physical properties of the binary cell elements themselves that would indicate the binary value of the binary digit represented by each binary cell element.
Thus, the need remains for reliable and energy efficient analogue representation of neural network weights. The invention described herein addresses the above-described needs.
SUMMARY OF THE EMBODIMENTS
According to one aspect of the presently disclosed subject matter, a system is introduced for providing non-volatile multilevel memory cells. According to another aspect an apparatus is taught for approximating neural network inference using such non-volatile multilevel memory cells to represent neural network weights.
The non-volatile multilevel memory cell comprises a set of non-homogeneous non-volatile binary cell elements, each comprising a variable resistive element and a corresponding switching mechanism. The variable resistive element is switchable between at least a first stable state having a higher conductance and a second stable state having a lower conductance, and the switching mechanism is configured to select between the first stable state and the second stable state of the corresponding variable resistive element without changing the state of any of the other non-homogeneous non-volatile binary cell elements of the multilevel memory cell.
The difference between the higher conductance of the first stable state and the lower conductance of the second stable state of each non-volatile binary cell element in the set of non-homogeneous non-volatile binary cell elements represents a characteristic conductance gap distinct from other members of the set. Typically, the characteristic conductance gap of each non-homogeneous non-volatile binary cell is determined by at least one characteristic physical property of the non-homogeneous non-volatile binary cell. In particular embodiments of the disclosure the set of non-homogeneous non-volatile binary cell elements comprises a sequence of binary cell elements wherein each binary cell member of the sequence has twice the conductance gap of its previous binary cell element in the sequence. Optionally, the characteristic conductance gap of each non-homogeneous non-volatile binary cell is determined by a characteristic size of the non-homogeneous non-volatile binary cell.
In certain embodiments the variable resistive element may comprise a magnetic tunnel junction comprising an insulating layer, for example a metal oxide layer such as a MgO barrier, sandwiched between a reference layer of ferromagnetic material, such as FeCoB, and a free layer of ferromagnetic material, such that the magnetic tunnel junction is switchable between a parallel stable state having a higher conductance and an antiparallel stable state having a lower conductance. Optionally, the magnetic tunnel junction of each non-volatile binary cell element in the set of non-homogeneous non-volatile binary cell elements has a characteristic area distinct from the areas of magnetic tunnel junctions other non-volatile binary cell elements of the set. Accordingly, the set of non-homogeneous non-volatile binary cell elements comprises a sequence of binary cell elements wherein the magnetic tunnel junction of each binary cell member of the sequence has twice the area of the magnetic tunnel junction of the previous binary cell element in the sequence.
Optionally the switching mechanism of the magnetic tunnel junction comprises a Spin-Orbit Torque mechanism comprising a source line MOSFET connected to the free layer of the magnetic tunnel junction and a bit line MOSFET connected to the reference layer of the magnetic tunnel junction, a read line connected to the gate terminal of the bit line MOSFET and a write line connected to the gate terminal of the source line MOSFET. Various other switching mechanisms may be used for the magnetic tunnel junction such as a Spin Transfer Torque mechanism comprising a source line MOSFET connected to the reference layer of the magnetic tunnel junction and a write line connected to the gate terminal of the source line MOSFET. Alternatively the switching mechanism may be a Voltage-Controlled Magnetic Anisotropy mechanism comprising a source line MOSFET connected to the free layer of the magnetic tunnel junction and a voltage control circuit configured to determine an electrical field direction across the free layer of ferromagnetic material.
Typically, the non-volatile multilevel memory cell further comprises an input line and an output line configured to pass current through all the individual non-homogeneous non-volatile binary cell elements in the set. Accordingly, the overall conductance of the set of non-homogeneous non-volatile binary cell elements may be selected to represent a numerical value, such as a single neural network weight value, when current passes through the input line and the output line. Optionally all the non-homogeneous non-volatile binary cell elements of the set may be connected in parallel between a common input line and a common output line such that the total conductance of the memory cell equals the sum of the individual conductances of all the non-homogeneous non-volatile binary cell elements in the set.
In particular examples, the set of non-homogeneous non-volatile binary cell elements comprises a sequence of binary cell elements and wherein the conductance gap of each binary cell member of the sequence is measured and a value assigned to the binary cell element based upon the measured values such that each binary cell element is assigned twice the value of its previous binary cell element in the sequence.
By way of example, the non-volatile multilevel memory cell may be a quad-level memory cell wherein the set of non-homogeneous non-volatile binary cell elements comprises a first non-volatile binary cell element having a first characteristic conductance gap G; a second non-volatile binary cell element having a second conductance gap 2G having twice the value of the first conductance gap G, a third non-volatile binary cell element having a third conductance gap 4G having twice the value of the second conductance gap 2G; and a fourth non-volatile binary cell element having a fourth conductance gap 8G having twice the value of the third conductance gap 4G.
Where appropriate, the overall conductance of the set of non-homogeneous non-volatile binary cell elements may be selected to represent a multi-bit binary value. Accordingly, each non-homogeneous nonvolatile binary cell of the memory cell may represents a different bit of a multi-bit binary value. For example, each non-homogeneous non-volatile binary cell of the memory cell may represent a different bit of a multi-bit binary value and the characteristic conductance gap of each non-volatile binary cell element may be selected such that the overall conductance gap of the memory cell represents the multi-bit value. Thus the memory cell may comprise a sequence of binary cell elements wherein the value assigned to each binary cell member of the sequence is twice the value assigned to its previous binary cell element in the sequence.
Additionally or alternatively, the set of non-homogeneous non-volatile binary cell elements may be arranged on one or more memory crossbar. Optionally the set of non-homogeneous non-volatile binary cell elements is configured to represent a multi-bit binary value having a number of bits equal to the total number of resistive elements on the plurality of memory crossbars.
In particular examples of the multilevel memory cell the non-homogeneous non-volatile binary cells of the set are sufficiently spaced apart that parasitic switching of neighboring variable resistive elements is prevented during targeted switching of a single variable resistive element. For example, the variable resistive element may comprise a magnetic tunnel junction and the non-homogeneous non-volatile binary cells of the set are sufficiently spaced apart that parasitic switching by magnetostatic excitation of neighboring magnetic tunnel junction is prevented during targeted switching of a single magnetic tunnel junction. Another aspect of the disclosure is to introduce an apparatus for approximating neural network inference comprising at least one non-volatile multilevel memory cell configured to represent a neural network weight. Accordingly, the states of each non-homogeneous non-volatile binary cell element are selected such that the total conductance gap of the memory cell represents the neural network weight. The non- homogeneous non-volatile binary cell elements of the set may be connected in parallel between a common input line and a common output line such that the total conductance of the memory cell equals the sum of the conductances of all the individual non-homogeneous non-volatile binary cell elements in the set. Accordingly, when an input voltage is applied across the common input and the common output lines generates a current through the output line higher than a base current by an amount equal to the product of the input voltage difference and the total conductance gap.
It is still another aspect of the disclosure to teach a method for representing a set of neural network weight values by a set of multilevel memory cells. The method may include measuring at least one physical property of each multilevel memory cell; and applying a digitizalization procedure for each multilevel memory cell, the digitizalization procedure depending upon the at least one physical property.
It is still another aspect of the disclosure to teach a method for approximating inference of a neural network comprising approximating a multiply-accumulate operation on a neural network inference thereby producing an analog output; applying a digitizalization procedure on the analog output, the digitizalization procedure depending upon an ADC range thereby producing a digital value; applying a normalization procedure, the normalization procedure depending upon output normalization parameters. Optionally, the method includes adjusting the ADC range based upon a set of samples for neural network input data while the normalization parameters remain unchanged. Additionally or alternatively, the method includes adjusting the normalization parameters based upon a set of samples for neural network input data while remain the ADC range unchanged. Additionally or alternatively, again, the method includes adjusting the ADC range and the normalization parameters jointly based upon a set of samples for neural network input data.
In still another aspect of the disclosure a method for approximating a multiply-accumulate operation on a neural network inference or the value of a dot product of a first vector and a second vector. The method for approximating a multiply-accumulate operation may include providing a weighting apparatus comprising a sequence of adjustable multilevel memory cells, each adjustable multilevel memory cell connected between a cell input line and a common output line; programming the weighting apparatus such that each adjustable multilevel memory cell has a conductivity representing a single neural network weight value; providing an activation vector comprising a sequence of activation values; providing input voltage signals to each cell input line of the weighting apparatus representing a corresponding activation value from the activation vector; and measuring the current through the common output line.
Similarly, the method for approximating a value of a dot product of a first vector and a second vector, the method may include providing the first vector comprising a sequence of first vector values; providing the second vector comprising a sequence of second vector values; providing a weighting apparatus comprising a sequence of adjustable multilevel memory cells, each adjustable multilevel memory cell connected between a cell input line and a common output line; programming the weighting apparatus such that each adjustable multilevel memory cell has a conductivity representing a single first vector value from the first vector; providing input voltage signals to each cell input line of the weighting apparatus representing a corresponding second vector value from the second vector; and measuring the current through the common output line.
In yet another aspect of the disclosure, a method is taught for encoding a numerical value in a multibit format to be represented by a multilevel memory cell. Such a method may include: measuring at least one resistive state of the multilevel memory cell; defining a digitizalization function, the digitalization function depending upon the higher conductance value and the lower conductance value of each said variable resistive element; and applying the digitization function to the numerical value. Additionally or alternatively, the numerical value is selected from a set of numerical values and the method further comprising scaling each numerical value of the set.
Optionally, the multilevel memory cell comprises a sequence of binary cell elements each binary cell element comprising a variable resistive element which is switchable between a first stable state having a higher conductance and a second state having a lower conductance, and the step of measuring at least one resistive state of the multilevel memory cell comprises: measuring the higher conductance value of each binary cell element; and measuring the lower conductance value of each binary cell element.
Where the numerical value is selected from a set of numerical values and the scaling comprises: determining a lowest numerical value in the set of numerical values; determining a highest numerical value in the set of numerical values; defining a scaling function, the scaling function depending upon the lowest numerical value and the highest numerical value; and applying the scaling function to each the numerical value.
BRIEF DESCRIPTION OF THE FIGURES
For a better understanding of the embodiments and to show how it may be carried into effect, reference will now be made, purely by way of example, to the accompanying drawings.
With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of selected embodiments only, and are presented in the cause of providing what is believed to be the most useful and readily understood description of the principles and conceptual aspects. In this regard, no attempt is made to show structural details in more detail than is necessary for a fundamental understanding; the description taken with the drawings making apparent to those skilled in the art how the various selected embodiments may be put into practice. In the accompanying drawings:
Fig. 1 schematically represents a multilevel memory cell including a set of non-homogeneous binary cell elements, according to the presently disclosed subject matter,
Fig. 2 schematically represents of a sequence of non-homogeneous binary cell elements connected in parallel, according to the presently disclosed subject matter,
Fig. 3A is a schematic perspective view of a Spin-Orbit Torque memory write mechanism not passing the current, through a tunneling barrier, according to the present invention;
Fig. 3B is a schematic perspective view of a Spin Transfer Torque write mechanism;
Fig. 4A schematically represents a multilevel memory cell including a set of perpendicular magnetic anisotropy SOT cells;
Fig. 4B schematically represents a multilevel memory cell including a set of in-plane magnetic anisotropy SOT cells;
Fig. 5 schematically represents an example of a multilevel memory element architecture for controlling a set of binary cell elements using SOT cells;
Fig. 6A is a flowchart indicating selected steps of a digitalization procedure for mapping numerical weight values to multilevel memory cells;
Fig. 6B is a flowchart indicating selected steps of a method for scaling a set of numerical weight values for a multilevel memory cell having a given number n_bit of binary cell elements;
Fig. 60 is a flowchart indicating selected steps of a method for associating binary digits with actual conductance levels of binary cell elements of a multilevel memory cell;
Fig. 6D is a flowchart indicating selected steps of a method for mapping numerical values to binary digits; Fig. 7A is a graph showing how inference accuracy degradation relates to the weight value digitalization method chosen for a neural network;
Fig. 7B is a table showing how inference accuracy degradation relates to the weight value digitalization method chosen for a neural network;
Fig. 8 is a graph of the maximal value of an output neuron signal relative to the expected exact neuron output signal (vertical axis), depending on the ratio r/R (different curve shades) of conductance of connection lines to conductance of memory elements, and on the size (horizontal axis) of the neuron layer modeled;
Fig. 9A is a diagram of a straight wire crossbar array connection scheme;
Fig. 9B is a schematic view of a diagram of a balanced analog crossbar array connection scheme, according to the present invention;
Fig. 10A is a schematic perspective view of a crossbar connection lines scheme, according to the present invention; and
Fig. 10B is a schematic perspective view of an overall 3D structure of the crossbar with the connection lines made as binary balanced trees, according to the present invention.
Fig. 11 A is a graph of the inference accuracy in % for a Modified National Institute of Standards and Technology (MNIST) task depending on the number of bits, and on the size of the neuron layer modeled, according to the present invention;
Fig. 11 B is a graph of the inference accuracy in % for a Street View House Numbers (SVHN) task depending on the number of bits (different curve shades), and on the size of the neuron layer modeled, according to the present invention; and
Fig. 11 C is a graph of the inference accuracy in % for a Canadian Institute For Advanced Research (CIFAR10) task depending on the number of bits (different curve shades), and on the size of the neuron layer modeled, according to the present invention.
DETAILED DESCRIPTION
Aspects of the present disclosure relate to system and methods for providing multi-level non-volatile memory elements. In particular, but not exclusively, the disclosure relates to an apparatus and methods for a reliable and efficient representation of neural network weights by resistive elements within electrical circuits.
It is noted that efficient and accurate representations of neural network weights may enable analogue approximation of neural network inference with sufficient accuracy for practical purposes. Accordingly a neural network approximation device is hereby disclosed which may receive an activation vector as an input and generate an output vector of sufficient accuracy.
The multi-level non-volatile memory elements which enable the system typically include a sequence of non-volatile binary cell elements which are non-homogeneous and which are characterized by having distinctive physical features such that each binary cell element of the sequence represents a particular bit in a multi-bit value.
Accordingly a set of adjustable multi-level non-volatile memory cells may be programmable to encode, for example, a row of weight values from a weight matrix associated with a layer of a neural network weights.
It has been found that natural variance of the physical properties of binary cell elements has significantly degraded the accuracy of inference in prior art analogueue approximations.
It is a particular feature of the apparatus that the issue of the variance in individual physical properties of the binary cell elements is addressed such that the neural network weight values are represented with sufficient accuracy that proper approximate inference results.
Methods are also described for making approximate neural network inference employing an individual adjustment for the ADC ranges and the output normalization parameters, and to use the proposed apparatus for computation of dot products of input vectors at intermediate layers for the approximate neural network inference.
In various embodiments of the disclosure, one or more tasks as described herein may be performed by a data processor, such as a computing platform or distributed computing system for executing a plurality of instructions. Optionally, the data processor includes or accesses a volatile memory for storing instructions, data or the like. Additionally, or alternatively, the data processor may access a non-volatile storage, for example, a magnetic hard-disk, flash-drive, removable media or the like, for storing instructions and/or data.
It is particularly noted that the systems and methods of the disclosure herein may not be limited in its application to the details of construction and the arrangement of the components or methods set forth in the description or illustrated in the drawings and examples. The systems and methods of the disclosure may be capable of other embodiments, or of being practiced and carried out in various ways and technologies.
Alternative methods and materials similar or equivalent to those described herein may be used in the practice or testing of embodiments of the disclosure. Nevertheless, particular methods and materials are described herein for illustrative purposes only. The materials, methods, and examples are not intended to be necessarily limiting.
Reference is now made to Fig. 1 , which schematically represents an example of a multilevel binary cell element 50 of the invention. The multilevel binary cell element 50 may include a set of non-homogeneous non-volatile binary cell elements 10A-D, an input line 20 and an output line 30.
Each non-homogeneous non-volatile binary cell element may be a variable resistive element and may be switchable between two stable states. Accordingly, each of the binary cell elements may represent a binary digit or bit and the two stable states may be used to indicate if the bit is ON or OFF.
Typically, a first stable state having a higher conductance indicates that the bit represented by the binary cell element is ON and a second stable state having a lower conductance indicates that the bit represented by the binary cell element is OFF. Accordingly, a targeted switching mechanism may be provided for each binary cell element configured to select between the first stable state and the second stable state of only the associated binary cell element without changing the state of any other non-homogeneous non-volatile binary cell element of the multilevel binary cell element.
It is a particular feature of the invention that the binary cell elements 10A-D are non-homogeneous. To this end, the physical features of the binary cell elements are not identical to one another. Accordingly, the difference between the higher conductance of the first stable state and the lower conductance of the second stable state of each non-volatile binary cell element is distinctive. The distinctive conductive gap of each binary cell element may therefore be used to characterize the binary cell element and the bit it represents.
Such an apparatus may be programed to represent a single neural network weight by selecting bit values for each of the binary cell elements such that the multilevel binary cell element stores a digital value indicative of the neural network weight.
In some examples, current may be passed through a common input line and a common output line to via the multi-level binary cell element. Where the binary cell elements are connected in parallel between the common input line and the common output line the overall conductance gap of the multilevel memory element may represent the neural network weight. Where required, an array of such multi-level binary cell elements may be arranged into a crossbar to make the approximate neural network inference.
It will be appreciated that if the binary memory elements were homogeneous and indistinguishable from each other, then such an arrangement would only allow for the representation of a number of conductance levels equal to one plus the number of the cells involved. It is an because the binary memory elements of the invention are non-homogeneous cells that it is possible to distinctly represent all the conductance levels corresponding to all possible multi-bit states of the multilevel memory element. In some examples, the multilevel memory element may represent a single neural network weight value by means of a multi-bit binary representation for the values of the neural network weights. Alternatively, the multilevel memory element may represent separate bits of the multi-bit binary representation for the values of the neural network weights with separate binary cell elements of the plurality of non-homogeneous non-volatile binary cell elements.
In the prior art, the multi-bit states representation is provided by a group of cells that were subject to application of distinct voltage patterns or distinct output processing procedures, so that different outputs can be weighted according to the significance of the bit they represent before they are combined. In PCT application PCT/IL2023/050328 “Apparatus and methods for approximate neural network inference” the multibit representation is provided by several separate binary cell elements either being located within a single memory crossbar or being distributed to several separate crossbars. All these prior art solutions rely on the use of the homogeneous differential cells of identical type and properties to represent the different bits of the multi-bit weight value. By contrast, in the present invention, a group of non-homogeneous cells of distinct conductance characteristics and/or geometry may be coupled together to reproduce a single neural network weight value represented by their overall combined conductance level.
In some examples, the non-volatile binary cell elements of the apparatus may represent separate bits of the multi-bit binary representation for the values of the neural network weights corresponding to each nonvolatile binary cell element’s characteristic conductance gap, which is a difference in overall conductance between the two stable states of the cell. The conductance gap corresponds to the difference between current levels, flowing through the cell in the open and the close state, at the same voltage. Referring now to Fig. 2, which schematically represents a set of n non-volatile binary cell elements BCE1-n of the multilevel binary cell element connected between a common input line and a common output line. In input voltage VIN may be connected across the multi-level binary cell element such that current passes through all non-volatile binary cell elements in parallel.
It is noted that a given voltage VIN between input line and output line may represent an input signal for example representing an activation value for neural network inference. For the given voltage VIN, the current running h, h, I3, 1 , ... In through each of the non-volatile binary cell elements depends upon the nonvolatile binary cell element’s conductance. The total current flowing through the memory element IOUT is the product of the input voltage and the total conductance of the memory element. Therefore, where the total conductance of the multilevel binary cell element represents a weight value of a particular neuron in a neural network layer, the total resulting current flowing through the output line may represent an output inference signal of the neural network layer.
With the parallel connection of the non-homogeneous non-volatile binary cell elements between designated single input line and single output line, the sum of the currents flowing through all the binary cell elements equals the current flowing through the output line. Because the voltage across each binary cell elements is the same, the bit value represented by a given binary cell element, by means of the binary cell element’s selected conductive state, affects the output current (at a given voltage) with the weight proportional to the cell's conductance gap. The conductance gap, in turn, depends on the physical properties of the binary cell element, for example its parameters, its geometry, its dimensions and size.
In some examples, the values of conductance gap levels may depend on the different sizes of the binary cell element. In particular, a binary cell element occupying the area with exactly uniform resistive properties across the full area of it, will have a conductance gap level exactly proportional to the total area of the binary cell element. It is noted that such as construction would be compatible with a manufacturing process in which the binary cell element area is changed and the binary cell element thickness is maintained constant. In some examples, the apparatus may use the conductance gap levels reproducing the value scales in proportion 1 :2:4:8 etc., from the lower to the higher bit. That way, it will directly reproduce the pertinent scales for the corresponding bits at the neural network weight multi-bit binary representation.
Multilevel memory cells of the invention may be realized with a variety of constituent binary cell elements as suit requirements. In particular examples, the multilevel memory cells may be implemented using MTJ (Magnetic Tunnel Barrier) technology.
Magnetic tunnel junctions are devices in which two ferromagnet layers are separated by an insulating layer sufficiently thin that quantum tunneling enable electrons in one ferromagnet to cross to the other. It is more likely that an electron will cross the insulator when the directions of magnetization in both ferromagnetic layers are parallel. Accordingly, an MTJ has two distinct stable resistance states. It is possible to switch between these stable resistance states by keeping a constant direction of magnetization in one ferromagnetic layer, known as the reference layer, and controlling the direction of magnetization in the other, known as the free layer.
For example a metal oxide layer such as a MgO barrier, may be sandwiched between a reference layer and a free layer of ferromagnetic material, such as FeCoB. Such a magnetic tunnel junction may be switchable between a parallel stable state having a higher conductance and an antiparallel stable state having a lower conductance
Various the switching mechanisms can be used to control magnetic tunnel junctions such as Spin- Orbit Torque (SOT), Spin Transfer Torque (STT), Voltage-Controlled Magnetic Anisotropy (VCMA) and the like.
Referring to Fig. 3A, which schematically represents a SOT mechanism, a source line MOSFET is connected to the free layer and a bit line MOSFET is connected to the reference layer of the magnetic tunnel junction, a read line connected to the gate terminal of the bit line MOSFET and a write line connected to the gate terminal of the source line MOSFET.
Referring to Fig. 3B, which schematically represents a SOT mechanism, a source line MOSFET is connected to the reference layer of the magnetic tunnel junction and a write line is connected to the gate terminal of the source line MOSFET.
Alternatives may be considered such as the VCMA mechanism comprising a source line MOSFET is connected to the free layer of the magnetic tunnel junction and a voltage control circuit configured to determine an electrical field direction across the free layer of ferromagnetic material.
Although any of these switching mechanisms may be used to control the MTJ, it is particularly noted that a higher resistance memory elements might be controlled by SOT (Spin-Orbit Torque) switching mechanisms. The resistance levels of SOT MTJ cells/memory elements may reach 1 MQ or more, while for other types of MTJ memory cells, the resistance normally does not exceed 10KQ. The main device to increase the resistance level of the SOT cell/memory elements is by thickening of the tunneling barrier, which is made possible because memory write mechanism of the SOT MTJ does not pass the current through the tunneling barrier so that high voltages are not required for the write mechanism for SOT MTJ even when the resistance of the tunneling barrier is very high.
It is noted that computer modeling results show that higher resistance of SOT MTJ cells provides an increase in the size of the NN layer up to thirty times relative to the limitation posed by the conventional construction of a crossbar. This is true for crossbars using both the memory elements that are implemented using a multi-state MTJ with the free ferromagnetic region defined by a plurality of ovals, and/or the multilevel memory elements containing a plurality of non-homogeneous non-volatile memory cells of variable resistance that are assembled to represent a single neural network weight. It is a particular feature of the SOT MTJ cell used in the approximation apparatus that, unlike in RAM applications, the construction of the such a SOT MTJ cell for the purpose of analogue approximation does not require a read-line transistor or diode. Reducing the number of transistors in the cell allows for both cell area and energy reduction. It will be appreciated that this feature by itself provides a distinctive advantage for the analogue approximation using SOT (Spin-Orbit Torque) memory cell technology cells rather than Spin Transfer Torque (STT) memory cells.
Notably, the specific properties making memory elements suitable for the apparatus described herein may not be appropriate for use when the memory elements are used for digital memory storage. In some examples, this involves using an element size larger than that expected for digital memory storage (45 nm or larger), nevertheless, the larger size may enable a low power readout. In other examples, a significant percentage of the memory elements used are not usable for memory storage but are still usable as part of a a structure of memory elements to perform the reliable instant analogue approximation of the neural network layer output.
Computer simulation shows cases when up to 5% of non-functional cells could be tolerated. In yet other examples, this involves the use of memory elements with a significant percentage of read errors, while the ensemble of the memory elements is still usable to perform the reliable instant analogue approximation of the neural network layer output. Computer simulation shows cases when up to 1 % of read errors could be tolerated. In still another example, this involves the use of a relatively short memory retention time, while the ensemble of the memory elements is still usable to perform the reliable instant analogue approximation of the neural network layer output within the specified retention time frame. For certain applications, the energy barrier for the analogue approximation memory elements could be lowered to provide a memory retention time as low as 24 hours or even less, depending on the relevant device requirements.
Fig. 3 illustrates an example in which non-homogeneous MTJ (magnetic tunnel junction) binary cell elements are used. The multilevel binary cell element may include a set of four circular-shape perpendicular magnetic anisotropy spin-orbit torque (SOT) MTJ cells, for example, arranged within a square-shaped area. Fig. 4 illustrates an alternative example of a multilevel binary cell element in which a set of four elliptic-shape in-plane magnetic anisotropy spin-orbit torque (SOT) MTJ cells are arranged within a square-shaped area.
In some examples, the plurality of non-homogeneous non-volatile binary cell elements of variable resistance may be implemented with MTJ (magnetic tunnel junction) cells. By way of example, a multilevel binary cell element may be provided having a set of four circular-shaped perpendicular magnetic anisotropy MTJ cells, in which the conductance levels are set using the SOT (spin-orbit torques) to provide a 4-bit device having 16 conductance levels. The sizes of the binary cell elements involved may be selected to have areas in proportion 8:4:2: 1,
The maximum permitted diameter depends upon on the stability of the magnetic domain structure. The distance between SOT cells within one muiltilevel binary cell element may be of the order of 100 nanometers or less and should be selected to ensure that there is no parasitic switching of adjacent cells during the MTJ cell targeted switching. By way of example, binary cell elements having diameters of 140 nm, 99 nm, 70 nm, 49.5 nm, such that their areas are in the in proportion 8:4:2: 1 , may be packed into a multi-level binary cell element of 350nm x 350 nm.
Alternatively, a set of four in-plane magnetic anisotropy based SOT cells having oblong shapes, such as near-rectangular with rounded corners or elliptically shaped, for example with an approximate 3:1 area axis ratio (Fig. 4), may be configured to function together as a 4-bit multilevel binary cell element having 16 different conductance levels.
The sizes of the binary cell elements involved may be selected to have areas in proportion 8:4:2: 1 , by way of example, binary cell elements may be provided having dimensions of 420x140 nm, 297x99 nm, 210x70 nm, 148.5x49.5 nm. Here a distance of the order of 200 nanometers or less between SOT cells may ensure that there is no parasitic switching of adjacent cells during the MTJ cell targeted switching. Such oblong binary cell elements could be packed into a multi-level binary cell element of at least 600nm x 600 nm.
FIG. 5 shows an example of a multilevel memory element architecture in which a set of binary cell element SOT cells 510A-D are connected in parallel between a common input line 520 and a common output line 530. The architecture further includes a pair of targeted control lines for each SOT cell via which each binary cell element may be separately switched between its two stable states.
The architecture of the multilevel memory element may include an individual pair of cell control lines 515A-D, 517A-D connected to each SOT cell individually for controlling the process of switching between the states of the SOT cell. The control lines may be connected to auxiliary transistors configured to control the open and closed states of connection lines to each SOT cell individually.
These connections may in some examples be realized within a conventional scheme of the crossbar matrix. The multi-bit representation of the value for a given neural network’s weights matrix may, in some examples, be provided by a composite memory element including several separate multilevel memory elements which may be connected to a single memory crossbar or alternatively distributed across more than one crossbar.
It is noted that, where more than one multilevel memory element are used in combination to form a composite memory element, then the total number of bits in the composite memory element is equal to the sum of the numbers of bits in all the constituent multilevel memory elements. For example, a composite memory element comprising two 4-bit resistive elements each with 16 conductance levels would result in a total number of 8 bits, with 256 levels represented accordingly.
It has been found that in some cases, even when binary cell elements are selected to form a sequence in which each binary cell element has twice the conductance gap of the the previous binary element, natural variations in the binary cell elements may result in deviations of their conductance gap levels from the exact proportions of 1 :2:4:8 etc. These deviations may depend on the individual physical properties of the cells and may vary from one instance of the multilevel memory element to another. Nevertheless, there is a need for the fitness of the apparatus for the appropriate use to represent the particular values assigned to the corresponding bits to be maintained even in cells with deviations.
It is another feature of the current invention that even multilevel memory cells with inexact proportions may produce reliable neuron network inference approximation.
In the context of neural network inference, the appropriateness of the outcome of using the apparatus for representation and/or reproduction of the value scales for neural network weights may be determined by the accuracy of approximate neural network inference staying within the limits required by the task addressed by the neural network. The appropriateness of the outcome of using the apparatus could be verified and asserted, in particular, by the inference simulation modeling. Accordingly, methods described herein may be tested by measuring the resulting accuracy of the approximate neural network inference.
It has been found that, in some cases, even where there are variations from the required proportions of the conduction gaps of individual binary cell elements within the multilevel memory element, reliable neuron network inference approximation may still be maintained when an appropriate digitalization procedure is used.
A digitalization procedure converts a precise numerical value into a digital greyscale approximation which can be expressed as a binary number with the number of digits available. For example, a 4-bit memory element may be used to represent any of the integers 16 values 0, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, and 15. Accordingly, the digitalization procedure may map a numerical value to a 4-bit approximation by segmenting a range of expected values into 16 subrange segments each and selecting a value from 0-15 indicating the segment in which the numerical value is found. Similarly n-bit approximation may be achieved by segmenting a range of expected values into 2n subrange segments.
There is flexibility in digitalization procedure particularly in the choice of range of expected values as well as the may the range is segmented. One simple example is the direct transmission of the bits of digital binary representation of the neural network weight into the binary values for the binary cell elements involved. For instance, the weight of 0.5, may be represented by an 8-bit integer (INT8) value 128, which would produce a digital binary representation of 10000000, and the sequence of binary values (1 ,0, 0,0, 0,0, 0,0) for the binary cell elements representing the bits of the corresponding 8-bit value, from the higher to the lower bit.
In the case of the individual binary cell elements representing particular conductance levels depending on the individual physical properties of the cells which may vary from one instance of the multilevel memory element to another such that their conductance gap levels may vary from the exact proportions of 1 :2:4:8 etc., such a direct transmission of the digital binary representation does not necessarily provide a sufficiently faithful representation of the weight value.
A digitalization procedure is described herein which has been found to be particularly appropriate for using multilevel memory elements having imperfect individual binary cell elements, whose conduction gaps may vary from the required proportions, to represent weight values in neural networks. It has been found that the digitalization procedure may provide reliable approximate analogue output for MAC (Multiply-ACcumulate) operations used to model inference.
The digitalization method proposed takes into account the actual individual conductance levels of each binary cell element to obtain a more faithful approximation for the overall weight representation as described further herein.
Furthermore, where the method is applied to a given MAC weights matrix of a particular layer of a given particular neural network, the MAC weights matrix may be used to define the required range of expected values to be segmented.
Referring now to Fig. 6A a flowchart is shown indicating selected steps of a digitalization procedure for mapping numerical weight values to multilevel memory cells. A conductance dependent digitalization method for a multilevel memory cell having a given number n_bit of binary cell elements may include scaling weight values, associating binary digits of the memory element with actual conductance levels, and mapping the scaled weight values to binary values according to the actual conductance levels.
In order to scale the weight values a scaling function may be used which takes a setof weight values, such as a row of the MAC weights matrix and uses these values to define a scaling function which may be applied to the weight values of the set.
Referring to the flowchart of Fig. 6B, indicating selected steps of a method for scaling a set of numerical weight values for a multilevel memory cell having a given number n_bit of binary cell elements, a method for scaling weights may include:
• obtaining a weights matrix;
• for each row of the weights matrix determining the highest numerical weight wjnax in the set of numerical weights and determining the lowest numerical weight wjnin in the set of numerical weights;
• scaling the lowest numerical weight wjnin to the binary value 0;
• scaling the highest numerical weight w nax to the highest binary value represented by the multilevel memory cell maxjnt := 2A(n_bit) - 1 ;
• defining a scaling factor SCALE as the ratio of the highest binary value and the difference between the highest numerical weight wjnax and the lowest numerical weight wjnin:
• SCALE := maxjnt / (wjnax-wjnin); and • for each numerical weight w in the row assigning a scaled value W given by the product of the scaling factor SCALE and the difference between the numerical weight w and the lowest numerical weight wjnin,
Figure imgf000014_0001
wjnin).
In order to associate binary digits of the memory element with actual conductance levels of the memory element, it is useful to know the expected high and low conductance values and to measure the high and low conductance values of each bit.
Referring to the flowchart of Fig. 6C, indicating selected steps of a method for associating binary digits with actual conductance levels of binary cell elements of a multilevel memory cell, a possible method for asscociating binary digits with actual conductance levels may include:
• obtaining an expected high conductance value c_ high of each binary cell element in a higher stable state;
• obtaining an expected low conductance value c_ low of each binary cell element in a lower stable state;
• measuring the actual higher conductance value cjeal_high of each binary cell element;
• measuring the actual lower conductance value cjeal Jow of each binary cell element; and
• assigning actual bit values to each binary digit in of the multilevel memory cell based upon the actual conductance values of the binary cell element used to represent it by: o assigning an actual low bit value bjealjow to each binary digit given by a ratio of the difference between the actual lower conductance value cjealjow and its expected low conductance value c_ low to the expected conductance gap, bjealjow := (cjealjow - c_low)/(c_high - cjow); and o assigning an actual high bit value b jeal_high to each binary digit given by a ratio of the difference between the actual higher conductance value cjeal_high and its expected low conductance value c_ low to the expected conductance gap, bjeal_high := (cjeal_high - c_low)/(c_high - cjow).
By way of example, one method for measuring actual conductance of each binary cell may be by a process of setting each binary cell element individually into high and low conductance states, and then individually measuring the output current values through the chosen cell under the given voltage.
It is noted that bjeal_high is a value that tends to lie close to 1, and b ealjow is a value that tends to lie close to 0.
Referring now to Fig. 6D a flowchart is presented indicating selected steps of a method for mapping numerical values to binary digits. The scaled weight values W may be mapped to the binary values of the multilevel memory element for example, by comparing the bit value of each binary cell element in turn with a bit-adjusted scaled weight value and setting the compared bit value to 1 only if the bit-adjusted scaled weight value is higher than the bit value.
The mapping process may start from the highest bit (with bit number i := n_bit-1).
The highest bit may be compared with the scaled weight value W directly such that if the scaled weight value is higher than the high-bit value of the highest bit then assigning a value of 1 to the highest bit and subtracting the high-bit value of the highest bit from the scaled weight value to obtain a bit-adjusted scaled weight value (or Test Value T);
If the scaled weight value is not higher than the high-bit value of the highest bitthen assigning a value of 0 to the highest bit and subtracting the low-bit value of the highest bit from the scaled weight value to obtain a bit-adjusted scaled weight value Then the next bit may be compared with the bit-adjusted scaled weight value T such that if the bit- adjusted scaled weight value T is higher than the high-bit value of the next bit then assigning a value of 1 to the next bit and subtracting the high-bit value of the next bit from the adjusted scaled weight value to obtain another bit-adjusted scaled weight value;
If the bit-adjusted scaled weight value is not higher than the high-bit value of the next bit then assigning a value of 0 to the next bit and subtracting the low-bit value of the next bit from the bit-adjusted scaled weight value to obtain another bit-adjusted weight value.
The process is repeated until the last bit (i=0) is reached at which where a modified version of the process is applied such that a 1 or 0 is assigned, depending of which of the values b_real_high and b_real Jow for the lowest bit (i=0) is closer to the bit-adjusted weight value T.
Referring now to Figs. 7A and 7B a graph and a chart are presented showing how inference accuracy degradation relates to the digitalization method chosen for a neural network. The vertical axis shows the percentage error rate increase relative to a base accuracy given by the digital inference for the ViT-B/16 384x384 neural network on the ImageNet task at the 10-bit ADC resolution. The horizontal axis indicates the number of weight representation bits and the variation level (i.e. noise-to-signal ratio) for the conductance gap of the cells.
There is a clear improvement of the conductance dependent digitalization method described herein over the prior art. Using a binary cell elements with a cell conductance gap of 6% and 6-bit weight representation, the prior art digitalization results in a 150% increase in error rate. The performance of the conductance dependent digitalization method is strikingly better showing only an error rate increase of 22%.
Using 8-bit weight representation the improvement is even more pronounced with the prior art digitalization resulting in a 164% increase in error rate whereas the conductance dependent digitalization method only has an error rate increase of 17%.
It is further noted that during training of a neural network there are additional parameters involved in neural network inference process as a whole which need to be optimized apart from neural network weights. In particular, inference outcomes depend upon neural network layer output normalization parameters, as well as, in the case of approximate analogue inference, the ranges for analogue-to-digital conversion (ADC) for the neural network layer output.
The function of ADC, in the case of approximating neural network inference using physical conductance-based memory cells, is to measure the output current level and to convert it into digital format, let’s say, 9-bit integer value. In that case, there is a minimal current level Imin corresponding to the minimal non-0 integer value of 1 , and there is a minimal current level Imax corresponding to the maximal binary integer value of 511 . The range of Imin to Imax should cover the most informative part of the possible neural network layer output values.
Typically, optimization of normalization parameters and ADC conversion ranges involves the tuning of both the neural network layer output normalization parameters and the analogue-to-digital conversion (ADC) range for the neural network layer output based on the statistics for a set of samples for the neural network training input data. The tuning typically involves the parameters and/or ranges optimization process trying to maximize the inference accuracy. It is particularly noted that inference accuracy optimization may be performed using inference simulation modeling and measuring the resulting accuracy of the approximate neural network inference.
However, the efficiency of the tuning has been found to be complicated by confounding parameters. In particular simultaneous optimization for layer output normalization parameters and optimization for ADC range for the neural network layer output is not always efficient. Another aspect of the current invention is to teach a method for adjusting the ADC ranges and the output normalization parameters individually based on an individual set of samples for the neural network input data. Accordingly, the best way to perform an individual adjustment of normalization and ADC range parameters could be chosen based on the training experience. Accordingly, the training experience may be used to determine whether to adjust the ADC ranges and the output normalization parameters separately or to whether to adjust them jointly.
The particular manner in which the individual adjustment is performed, that is, adjusted jointly or separately for the ADC ranges and the output normalization parameters, might in some examples be chosen case-by-case based on the training setting parameters (the illustration on the training setting effect on such a choice is given below).
In one possible method for separate individual adjustment for the ADC ranges and the output normalization parameters, the ADC ranges and the output normalization parameters are adjusted separately on a given subset of parameters and/or ranges while freezing the values for the other parameters and ranges during the adjustment process. In some examples, the optimal way would be a separate individual adjustment for the ADC ranges and the output normalization parameters, wherein the ADC ranges get adjusted first based on the individual set of samples for the neural network input data, while the output normalization parameters remain frozen and are unchanged during the adjustment process. Followign adjustment of the ADC ranges, the output normalization parameters get adjusted based on the same or different individual set of samples for the neural network input data, while the adjusted ADC ranges remain frozen and are unchanged during the adjustment process.
The choice for jointly or separately performing the individual adjustment might become optimal (lead to higher inference accuracy) for either joint or separate performance, alternatively, on certain ranges of the training setting parameters, depending, among other considerations, on the particular neural network involved.
For instance, simulating the ResNetRS50 160x160 inference with the value representation method described above, with separate adjustment assuming the 10-bit ADC resolution, and the 10-bit weights representation, the accuracy degradation for approximate inference was measured as 2.1 % of the original (digital inference) error rate increase at the 3% cell conductance gap variation level from one instance of resistive element 50 to another. Performing the ADC ranges and output normalization parameter adjustment jointly at the same training setting (neural network involved, value representation method. ADC resolution, weights number of bits, cell conductance gap variation level), results in an accuracy degradation of 8.2% error increase.
The accuracy degradation of 2.1 % error rate increase is achieved with performing the ADC ranges adjustment first, based on individual sets of samples for the neural network input data, while the output normalization parameters stay frozen during the adjustment process. After the first adjustment, the output normalization parameters get adjusted based on the same or different individual set of samples for the neural network input data, while the adjusted ADC ranges stay frozen.
The absolute accuracy values for the two above described methods of performing the individual adjustment, corresponding to the accuracy degradation values of 2.1 % and 8.2%, are 79.08% and 77.83% respectively, relative to the original (digital inference) accuracy of 79.51 %.
On the other hand, for a much higher cell conductance gap variation level of 10% (and the rest of the training setting the same), the ‘ADC ranges adjustment first, output normalization parameters second” way gives the error rate increase of 25.13%, while the joint adjustment way gives the error rate increase of 10.98% (absolute accuracy values 74.36% and 77.26% respectively, relative to the original accuracy of 79.51 %). That optimal choice tendency, as “low conductance variation - separate adjustment, high conductance variation - joint adjustment”, stays valid for value representation proposed above on ResNetRS50 160x160 network for the wide range of the training setting options. In some types of neural network architectures (in particular, the transformer type) the inference operation requires, besides the MAC operation multiplying the input vectors by the fixed (given and stored) matrix of weights, also a computation of dot products of input vectors in certain intermediate layers, for instance, in transformer type architectures such computation is used for calculation of “self-attention” values. The prior art solution for it was to rely on digital calculations for dot products computation. The present subject matter proposes a solution that uses the apparatus, providing the output for the MAC operations, to calculate the dot product value of vectors, in the same way as the MAC operation result is calculated with the same device. The solution uses the fact of mathematical equivalence of the result of MAC operation, multiplying the matrix to the input vector, to the collection of dot product values of the input vector to the matrix rows. Thus, to perform the computation of dot product values for the collection of input vectors (each vector to each), it is enough to write them down as rows of the weight matrix, and then to perform the MAC operation with that matrix, for each vector of the collection as an input vector, separately.
The write-up of the matrix could be performed row by row, one row in parallel, with the same conventional scheme of the crossbar matrix, so the number of writing cycles is equal to the number of vectors. The use of SOT cells would allow the option of very fast (up to 0.3 ns) and very energy efficient (up to 0.03 pj per cell) write-up. The number of computation cycles in the proposed scheme is equal to the number of vectors as well (one analogue inference cycle per one input vector). In our estimate of write-up time and energy for the ViT-B/16 384x384 vision transformer network, the total write-up time is less than 17.5% of the full inference time, while the total write-up energy budget is about 0.05% of the total energy budget.
A further apparatus is disclosed to perform a reliable and fast analogue approximation of an inference output for a neural network layer represented by a plurality of memory elements organized into a crossbar such that its size is not limited by the ratio of the conductance of memory elements to the conductance of connection lines. A plurality of non-volatile memory cells of variable conductance which may represent multibit binary values, such as the multilevel memory cells described herein, are organized to perform the required instant analogue approximation. Where appropriate current distribution may be governed by the conductance of the circuit elements, that is of memory elements, connection lines, control elements and devices as well as lines required to organize an ensemble of the non-volatile memory elements of variable conductance.
Specific methods and devices allowing these components to ensure the reliable instant analogue approximation of the output for the neural network layer of the size not practically limited by the ratio of conductance of memory elements to conductance of connection lines are described below.
The graph of Fig. 8 illustrates the maximal value of an output neuron signal relative to the expected exact neuron output signal (vertical axis), depending on the ratio r/R (different curve shades) of conductance of connection lines to conductance of memory elements, and on the size (horizontal axis) of the neuron layer modeled. The graph with plots of prior art results of the dependence of the maximal fall value of the output neuron signal relative to the expected exact neuron output signal, depending on the ratio of conductance of connection lines to conductance of memory elements, and on the size of the neural network layer modeled (the line representing Iog10(r/R) = -2 is the leftmost line, Iog10(r/R) = -3 is the line just to its right, and so on). This distortion of the output signal, due to the high conductance of memory elements in comparison to the conductance of connection lines, growing with the number of neurons, limits the size of the networks (their number of neurons), available, accessible and/or approachable with analog inference, thus severely limiting the network’s prediction power. Conventional designs of MTJ memory cells and the crossbar allow reliable instant analog approximation of the output for neural network layers of about 50 neurons. The reasons for such limitations are due to Kirchhoffs circuit laws, and the solutions are offered in the present invention. In some examples, the apparatus may perform reliable and fast analogue approximation of the output of the neural network layer represented by a plurality of memory elements organized into a crossbar by preventing back-currents to flow through the cells and memory elements in a reverse direction. Computer simulation shows such prevention to limit the influence of parasitic currents that disrupt the accuracy of an analogue approximation of the output for the neural network layer with a crossbar of a large size. Such prevention of reverse currents could, for example, be made with controlling diodes attached to the memory elements input connection lines.
Referring to Fig. 9A showing a possible straight wire crossbar array connection scheme, some systems may achieve reliable fast analogue approximation of the output for the neural network using a configuration or topology of connection lines that involves a single straight wire input and output connection lines. This ensures a well-balanced output current distribution.
The reason for the better balance of output current distribution is that the geometry and the topology of input and output connections leading to any specific memory element are less dependent upon the position of the element in the array. Fig. 10A is a three dimensional representation of a possible crossbar connection lines scheme, according to the present invention.
With reference to Fig. 9B, a schematic diagram is shown of a balanced analog crossbar array connection scheme, according to the present invention. The configuration of input and/or output connection lines may involve a multi-level tree structure of connections of input/output line to an array of memory elements allowing an evenly balanced distribution of input/output currents over the connected memory elements. Moreover a multi-level tree structure of connections may either be a binary balanced tree of connecting lines or a non-binary tree of connecting lines involving also conventional straight single wire connecting lines at individual levels.
Referring now to Fig. 10B which shows a perspective view of an overall 3D structure of the crossbar with the connection lines made as binary balanced trees. A partially balanced (non-binary) tree also involving conventional straight single wire connecting lines at individual levels, allow a decrease in the number of branching levels while preserving sufficient balance when the square of the length of an individual single wire connecting line does not exceed the ratio of conductance of connection lines to conductance of memory elements. Although a uniform voltage drop will occur, that will not degrade the accuracy as long as the dynamic noise source is controlled, which is well achievable by conventional means.
It is also possible for the apparatus to achieve a reliable instant analogue approximation of the output for the neural network layer of a size not practically limited by the ratio of conductance of memory elements to conductance of connection lines by selecting memory elements with specific properties found to be effective. For example, helpful memory element properties include high resistance, large memory element size, acceptance of a higher percentage of memory elements not usable for memory storage, higher percentage of read errors, shorter memory retention time, and/or combinations thereof. The ways in which these specific methods and devices allow the required quality of analogue approximation are described herein.
It is has been found that sufficiently high resistance of the memory elements may enable a sufficiently low ratio of the conductance of the memory elements to conductance of the connection lines, which in turn removes the main obstacle limiting the size of the neural network layer, which may be represented by a crossbar, the size of which corresponds to the size of the neural network layer. Accordingly, systems having high resistance memory element may be suitable for reliable and fast instant analogue approximation.
The parts of the plurality of the non-volatile memory elements of the apparatus could be used alternatively as a digital memory/logic device or as to perform reliable instant analogue approximation of the neural network layer output. The alternative use of the same part of the ensemble of memory elements as either digital memory/logic device or to perform the reliable instant analogue approximation of the neural network layer output could be controlled by the preprogramming or by the run time reprogramming for the ensemble of the memory elements or for some part of that ensemble. The control mentioned, the preprogramming, or the run time reprogramming could be performed either by direct programming of the inmemory logic with the non-volatile memory elements or by the auxiliary control elements and/or devices, including the digital ones.
The separate memory elements of the plurality of non-volatile memory elements of the apparatus could be used to represent separate parts for the multi-bit binary representation for the values of the neural network weights. An example of that would be to use one memory element, representing a 4-bit value, to represent the lower 4 bits of an 8-bit neural network weight value, and to use another memory element, representing a 4-bit value, to represent the higher 4 bits of the same value. The resulting output current distributions for the memory elements of the apparatus that represent separate parts for the multi-bit binary representation for the values of the neural network weights, may be produced for each part of a multi-bit representation separately and then collected together using an additional circuit — in which case all memory elements, even those representing different parts of a multi-bit representation, could use the same scale of input voltage signals, as the output current distributions for every given part of bits are produced separately, or they may be produced for all parts of a multi-bit representation together using different input voltage scales on the memory cells representing different parts of a multi-bit representation — in which case the scales would be approximately proportional to 2k (2 to the power of k), where k is the given bit index for the corresponding part. As illustrated in Fig. 10A, different input voltage scales are applied to the memory elements representing different parts of a 6-bit representation of the NN weights.
Some examples of the present invention relate to the ways the apparatus may represent and use the separate parts of the multi-bit binary representation for the values of the neural network’s weights to provide the required network inference accuracy. Computer modeling results show that 6 bits are sufficient for the tested tasks. For example, Fig. 11 A is a graph of the inference accuracy in % for a Modified National Institute of Standards and Technology (MNIST) task depending on the number of bits, and on the size of the neuron layer modeled, Fig. 11 B is a graph of the inference accuracy in % for a Street View House Numbers (SVHN) task depending on the number of bits (different curve shades), and on the size of the neuron layer modeled, and Fig. 11 C is a graph of the inference accuracy in % for a Canadian Institute For Advanced Research (CIFAR10) task depending on the number of bits (different curve shades), and on the size of the neuron layer modeled.
The plurality of control elements and/or devices of the apparatus may involve a digital processor core and the plurality of the memory elements that are connected to data input lines from the data source and to the control lines of the aforementioned digital processor core.
In the apparatus, the digital processor core and/or the plurality of control elements and/or devices could be set up to be powered up by a wake-up controller, connected to the plurality of the memory elements and the digital processor core, in case of a wake-up event only. The plurality of memory elements may also stay permanently in “always-on” standby mode, not consuming the energy, while in the event of a data signal coming to the input lines, the signal is initially processed by the plurality of memory elements that perform the initial analogue neural network approximation procedure to determine the need for the digital processor core and/or the plurality of control elements and/or devices to wake up. The detection of the wake-up event (which implies the presence of a data signal coming to the input lines, but is not implied by the data signal) could then be the function performed by non-volatile memory elements, without the digital core involvement. In some examples of the invention there are provided methods to determine conditions and/or available solutions related to the task of reliable inference implementation on a given device instance of the apparatus (given device) for a given pre-trained neural network. In particular, there are methods to set up the approximate neural network inference performing the computer modeling for the use of the apparatus described above, modeling a sequence of actions to determine if an inference for a given pre-trained neural network could be reliably implemented on a given device instance of any example of the apparatus described above; to determine the required properties of a device instance of any example of the apparatus described above that could be used to reliably implement an inference for a given pre-trained neural network; to determine the required characteristics of a given pre-trained neural network so that its inference could be reliably implemented on a given device instance of the same example of the apparatus mentioned above; and to provide a reliable implementation on a given device instance of the same example of the apparatus mentioned above for a given pre-trained neural network that could be reliably implemented on that device instance.
In the methods and apparatus noted above, a reliable implementation of an inference for a given pretrained neural network may use separate memory elements of the plurality of non-volatile memory elements to represent separate parts for the multi-bit binary representation for the values of the neural network weights. An example of that would be to use one memory element, representing a 4-bit value, to represent the lower 4 bits of an 8-bit neural network weight value, and to use another memory element, representing a 4-bit value, to represent the higher 4 bits of the same value. In that instance, a separate memory elements representation of separate parts for the multi-bit binary representation for the values of the neural network weights may be adjusted to optimally, or nearly optimally fit the specific particular instance of specifications, configuration and/or topology of the memory elements and/or connection lines of a specific particular instance of a device described above. The above representation of separate parts for the multi-bit binary representation for the values of the neural network weights may also be adjusted to optimally, or nearly optimally fit the specific particular instance of manufactured properties of the memory elements and/or connection lines of a specific particular instance of a manufactured device described above. The idea of the method is that the levels of binary representation and/or the particular assignment of specific parts to the specific memory elements of a given device and/or particular allocation of input and/or output channels could be adjusted to better fit the particular properties of the particular instance of the device and the given neural network. That includes also the individual electrical properties of the device that are affected by its manufacturing process and are less than fully stable from a device to a device (but are stable for any particular device instance once the device is manufactured). The specific actions involved may include a randomized Monte-Carlo type search for the optimal assignment of specific parts to the specific memory elements and/or for the optimal allocation of input and/or output channels for a particular device instance and a given neural network layer.
Technical and scientific terms used herein should have the same meaning as commonly understood by one of ordinary skill in the art to which the disclosure pertains. Nevertheless, it is expected that during the life of a patent maturing from this application many relevant systems and methods will be developed. Accordingly, the scope of the terms such as computing unit, network, display, memory, server and the like are intended to include all such new technologies a priori.
As used herein the term “about” refers to at least ± 10 %. The terms "comprises", "comprising", "includes", "including", “having” and their conjugates mean "including but not limited to" and indicate that the components listed are included, but not generally to the exclusion of other components. Such terms encompass the terms "consisting of" and "consisting essentially of". The phrase "consisting essentially of' means that the composition or method may include additional ingredients and/or steps, but only if the additional ingredients and/or steps do not materially alter the basic and novel characteristics of the claimed composition or method. As used herein, the singular form "a", "an" and "the" may include plural references unless the context clearly dictates otherwise. For example, the term "a compound" or "at least one compound" may include a plurality of compounds, including mixtures thereof. The word “exemplary” is used herein to mean “serving as an example, instance or illustration”. Any embodiment described as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments or to exclude the incorporation of features from other embodiments. The word “optionally” is used herein to mean “is provided in some embodiments and not provided in other embodiments”. Any particular embodiment of the disclosure may include a plurality of “optional” features unless such features conflict.
Whenever a numerical range is indicated herein, it is meant to include any cited numeral (fractional or integral) within the indicated range. The phrases “ranging/ranges between” a first indicate number and a second indicate number and “ranging/ranges from” a first indicate number “to” a second indicate number are used herein interchangeably and are meant to include the first and second indicated numbers and all the fractional and integral numerals therebetween. It should be understood, therefore, that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the disclosure. Accordingly, the description of a range should be considered to have specifically disclosed all the possible sub-ranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed sub-ranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1 , 2, 3, 4, 5, and 6 as well as non-integral intermediate values. This applies regardless of the breadth of the range.
It is appreciated that certain features of the disclosure, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the disclosure, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable sub-combination or as suitable in any other described embodiment of the disclosure. Certain features described in the context of various embodiments are not to be considered essential features of those embodiments unless the embodiment is inoperative without those elements.
Although the disclosure has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims.
All publications, patents and patent applications mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present disclosure. To the extent that section headings are used, they should not be construed as necessarily limiting. The scope of the disclosed subject matter is defined by the appended claims and includes both combinations and sub combinations of the various features described hereinabove as well as variations and modifications thereof, which would occur to persons skilled in the art upon reading the foregoing description.

Claims

1 . A non-volatile multilevel memory cell comprising a set of non-homogeneous non-volatile binary cell elements, each non-homogeneous non-volatile binary cell element comprising: a variable resistive element which is switchable between at least a first stable state having a higher conductance and a second stable state having a lower conductance; and a corresponding switching mechanism configured to select between the first stable state and the second stable state of the variable resistive element without changing the state of any other non- homogeneous non-volatile binary cell element of the multilevel memory cell; wherein the difference between the higher conductance of the first stable state and the lower conductance of the second stable state of each non-volatile binary cell element in the set of non-homogeneous non-volatile binary cell elements represents a characteristic conductance gap distinct from other members of the set.
2. The non-volatile multilevel memory cell of claim 1 further comprising an input line and an output line configured to pass current through all the individual non-homogeneous non-volatile binary cell elements in the set.
3. The non-volatile multilevel memory cell of claim 2, wherein the overall conductance of the set of non- homogeneous non-volatile binary cell elements is selected to represent a single neural network weight value when current passes through the input line and the output line.
4. The non-volatile multilevel memory cell of claim 1 wherein all the non-homogeneous non-volatile binary cell elements of the set are connected in parallel between a common input line and a common output line such that the total conductance of the memory cell equals the sum of the conductances of all the individual non-homogeneous non-volatile binary cell elements in the set.
5. The non-volatile multilevel memory cell of claim 1 wherein the overall conductance of the set of non- homogeneous non-volatile binary cell elements is selected to represent a multi-bit binary value.
6. The non-volatile multilevel memory cell of claim 1 wherein each non-homogeneous non-volatile binary cell of the memory cell represents a different bit of a multi-bit binary value.
7. The non-volatile multilevel memory cell of claim 1 wherein each non-homogeneous non-volatile binary cell of the memory cell represents a different bit of a multi-bit binary value and the characteristic conductance gap of each non-volatile binary cell element is selected such that the overall conductance gap of the memory cell represents the multi-bit value.
8. The non-volatile multilevel memory cell of claim 1 wherein the characteristic conductance gap of each non-homogeneous non-volatile binary cell is determined by at least one characteristic physical property of the non-homogeneous non-volatile binary cell.
9. The non-volatile multilevel memory cell of claim 1 wherein the set of non-homogeneous non-volatile binary cell elements comprises a sequence of binary cell elements wherein each binary cell member of the sequence has twice the conductance gap of its previous binary cell element in the sequence.
10. The non-volatile multilevel memory cell of claim 1 wherein the set of non-homogeneous non-volatile binary cell elements comprises a sequence of binary cell elements and wherein the value assigned to each binary cell member of the sequence is twice the value assigned to its previous binary cell element in the sequence.
11 . The non-volatile multilevel memory cell of claim 1 wherein the set of non-homogeneous non-volatile binary cell elements comprises a sequence of binary cell elements and wherein the conductance gap of each binary cell member of the sequence is measured and a value assigned to the binary cell element based upon the measured values such that each binary cell element is assigned twice the value of its previous binary cell element in the sequence.
12. The non-volatile multilevel memory cell of claim 1 wherein the characteristic conductance gap of each non-homogeneous non-volatile binary cell is determined by a characteristic size of the non-homogeneous non-volatile binary cell.
13. The non-volatile multilevel memory cell of claim 1 wherein the set of non-homogeneous non-volatile binary cell elements are arranged on at least one memory crossbar.
14. The non-volatile multilevel memory cell of claim 1 wherein the set of non-homogeneous non-volatile binary cell elements are arranged on a plurality of memory crossbars having a total number of resistive elements and is configured to represent a multi-bit binary value having a number of bits equal to the total number of resistive elements on the plurality of memory crossbars.
15. The non-volatile multilevel memory cell of claim 1 , wherein the non-homogeneous non-volatile binary cells of the set are sufficiently spaced apart that parasitic switching of neighboring variable resistive elements is prevented during targeted switching of a single variable resistive element.
16. The non-volatile multilevel memory cell of claim 1 wherein the variable resistive element comprises a magnetic tunnel junction.
17. The non-volatile multilevel memory cell of claim 16, wherein the non-homogeneous non-volatile binary cells of the set are sufficiently spaced apart that parasitic switching by magnetostatic excitation of neighboring magnetic tunnel junction is prevented during targeted switching of a single magnetic tunnel junction.
18. The non-volatile multilevel memory cell of claim 1 wherein the variable resistive element comprises a magnetic tunnel junction comprising an insulating layer sandwiched between a reference layer of ferromagnetic material and a free layer of ferromagnetic material, such that the magnetic tunnel junction is switchable between a parallel stable state having a higher conductance and an antiparallel stable state having a lower conductance.
19. The non-volatile multilevel memory cell of claim 18 wherein the magnetic tunnel junction of each nonvolatile binary cell element in the set of non-homogeneous non-volatile binary cell elements has a characteristic area distinct from the areas of magnetic tunnel junctions other non-volatile binary cell elements of the set.
20. The non-volatile multilevel memory cell of claim 19 wherein the set of non-homogeneous non-volatile binary cell elements comprises a sequence of binary cell elements wherein the magnetic tunnel junction of each binary cell member of the sequence has twice the area of the magnetic tunnel junction of the previous binary cell element in the sequence.
21 . The non-volatile multilevel memory cell of claim 1 comprising a quad-level memory cell wherein the set of non-homogeneous non-volatile binary cell elements comprises: a first non-volatile binary cell element having a first characteristic conductance gap G; a second non-volatile binary cell element having a second conductance gap 2G having twice the value of the first conductance gap G, a third non-volatile binary cell element having a third conductance gap 4G having twice the value of the second conductance gap 2G; and a fourth non-volatile binary cell element having a fourth conductance gap 8G having twice the value of the third conductance gap 4G.
22. The non-volatile multilevel memory cell of claim 1 wherein the insulating layer of the magnetic tunnel junction comprises a metal oxide layer.
23. The non-volatile multilevel memory cell of claim 1 wherein the magnetic tunnel junction comprises a MgO barrier sandwiched between two FeCoB layers.
24. The non-volatile multilevel memory cell of claim 1 wherein the switching mechanism comprises a Spin Transfer Torque mechanism comprising a source line MOSFET connected to the reference layer of the magnetic tunnel junction and a write line connected to the gate terminal of the source line MOSFET.
25. The non-volatile multilevel memory cell of claim 1 wherein the switching mechanism comprises a Voltage-Controlled Magnetic Anisotropy mechanism comprising a source line MOSFET connected to the free layer of the magnetic tunnel junction and a voltage control circuit configured to determine an electrical field direction across the free layer of ferromagnetic material.
26. The non-volatile multilevel memory cell of claim 1 wherein the switching mechanism comprises a Spin-Orbit Torque mechanism comprising a source line MOSFET connected to the free layer of the magnetic tunnel junction and a bit line MOSFET connected to the reference layer of the magnetic tunnel junction, a read line connected to the gate terminal of the bit line MOSFET and a write line connected to the gate terminal of the source line MOSFET.
27. An apparatus for approximating neural network inference comprising at least one non-volatile multilevel memory cell of claim 1 configured to represent a neural network weight.
28. The apparatus of claim 27 wherein all the non-homogeneous non-volatile binary cell elements of the set are connected in parallel between a common input line and a common output line such that the total conductance of the memory cell equals the sum of the conductances of all the individual non-homogeneous non-volatile binary cell elements in the set.
29. The apparatus of claim 28 wherein the states of each non-homogeneous non-volatile binary cell element are selected such that the total conductance gap of the memory cell represents the neural network weight.
30. The apparatus of claim 29 wherein an input voltage is applied across the common input and the common output lines generates a current through the output line higher than a base current by an amount equal to the product of the input voltage difference and the total conductance gap.
31 . The non-volatile multilevel memory cell of claim 1 wherein the overall conductance of the set of non- homogeneous non-volatile binary cell elements is selected to represent a multi-bit binary value for a single neural network weight value.
32. An apparatus for approximating neural network inference, the apparatus comprising a set of multilevel memory cells each of the set of multilevel memory cells comprising a sequence of non-homogeneous nonvolatile binary cell elements, each non-homogeneous non-volatile binary cell element comprising a variable resistive element which is switchable between at least a first stable state and a second stable state wherein the sequence of non-homogeneous non-volatile binary cell elements are configured to represent a single neural network weight.
33. The apparatus of claim 32 further comprising an input line and an output line configured to pass current through all the individual non-homogeneous non-volatile binary cell elements in the sequence.
34. The apparatus of claim 33, wherein the overall conductance of the sequence of non-homogeneous non-volatile binary cell elements is selected to represent the single neural network weight value when current passes through the input line and the output line.
35. The apparatus of claim 34 wherein all the non-homogeneous non-volatile binary cell elements of the sequence are connected in parallel between a common input line and a common output line.
36. The apparatus of claim 32 wherein the overall conductance of the sequence of non-homogeneous non-volatile binary cell elements is selected to represent a multi-bit binary value indicating the single neural network weight.
37. The apparatus of claim 32 wherein each non-homogeneous non-volatile binary cell of the memory cell represents a different bit of a multi-bit binary value.
38. The apparatus of claim 33 wherein each non-homogeneous non-volatile binary cell of the memory cell has a characteristic conductance gap distinct from other members of the sequence and the characteristic conductance gap of each non-volatile binary cell element is selected to represent a different bit of a multi-bit binary value such that the overall conductance gap of the memory cell represents the single neural network weight.
39. The apparatus of claim 33 wherein each non-homogeneous non-volatile binary cell of the memory cell has a characteristic conductance gap distinct from other members of the sequence and the characteristic conductance gap of each non-homogeneous non-volatile binary cell is determined by at least one characteristic physical property of the non-homogeneous non-volatile binary cell.
40. The apparatus of claim 33 wherein each non-homogeneous non-volatile binary cell of the memory cell has a characteristic conductance gap distinct from other members of the sequence and the sequence of non-homogeneous non-volatile binary cell elements comprises a sequence of binary cell elements wherein each binary cell member of the sequence has twice the conductance gap of its previous binary cell element in the sequence.
41. The apparatus of claim 33 wherein the sequence of non-homogeneous non-volatile binary cell elements comprises a sequence of binary cell elements and wherein the value assigned to each binary cell member of the sequence is twice the value assigned to its previous binary cell element in the sequence.
42. The apparatus of claim 33 wherein the sequence of non-homogeneous non-volatile binary cell elements comprises a sequence of binary cell elements and wherein the conductance gap of each binary cell member of the sequence is measured and a value assigned to the binary cell element based upon the measured values such that each binary cell element is assigned twice the value of the previous binary cell element in the sequence.
43. The apparatus of claim 33 wherein each non-homogeneous non-volatile binary cell of the memory cell has a characteristic conductance gap distinct from other members of the sequence and the characteristic conductance gap of each non-homogeneous non-volatile binary cell is determined by a characteristic size of the non-homogeneous non-volatile binary cell.
44. The apparatus of claim 33 wherein the sequence of non-homogeneous non-volatile binary cell elements are arranged on at least one memory crossbar.
45. The apparatus of claim 33 wherein the sequence of non-homogeneous non-volatile binary cell elements are arranged on a plurality of memory crossbars having a total number of resistive elements and the single neural network weight is represented by a multi-bit binary value having a number of bits equal to the total number of resistive elements on the plurality of memory crossbars.
46. The apparatus of claim 33, wherein the non-homogeneous non-volatile binary cells of the sequence are sufficiently spaced apart such that parasitic switching of neighboring variable resistive elements is prevented during targeted switching of a single variable resistive element.
47. The apparatus of claim 33 wherein the variable resistive element comprises a magnetic tunnel junction.
48. The apparatus of claim 47, wherein the non-homogeneous non-volatile binary cells of the sequence are sufficiently spaced apart such that parasitic switching by magnetostatic excitation of neighboring magnetic tunnel junction is prevented during targeted switching of a single magnetic tunnel junction.
49. The apparatus of claim 47 wherein the magnetic tunnel junction of each non-volatile binary cell element in the sequence of non-homogeneous non-volatile binary cell elements has a characteristic area distinct from the areas of magnetic tunnel junctions other non-volatile binary cell elements of the sequence.
50. The apparatus of claim 47 wherein the sequence of non-homogeneous non-volatile binary cell elements comprises a sequence of binary cell elements wherein the magnetic tunnel junction of each binary cell member of the sequence has twice the area of the magnetic tunnel junction of the previous binary cell element in the sequence.
51. The apparatus of claim 33 comprising a quad-level memory cell wherein the sequence of non- homogeneous non-volatile binary cell elements comprises: a first non-volatile binary cell element having a first characteristic conductance gap G; a second non-volatile binary cell element having a second conductance gap 2G having twice the value of the first conductance gap G, a third non-volatile binary cell element having a third conductance gap 4G having twice the value of the second conductance gap 2G; and a fourth non-volatile binary cell element having a fourth conductance gap 8G having twice the value of the third conductance gap 4G.
52. The apparatus of claim 33 wherein the insulating layer of the magnetic tunnel junction comprises a metal oxide layer.
53. The apparatus of claim 33 wherein the magnetic tunnel junction comprises a MgO barrier sandwiched between two FeCoB layers.
54. The apparatus of claim 33 wherein the switching mechanism comprises a Spin Transfer Torque mechanism comprising a source line MOSFET connected to the reference layer of the magnetic tunnel junction and a write line connected to the gate terminal of the source line MOSFET.
55. The apparatus of claim 33 wherein the switching mechanism comprises a Voltage-Controlled Magnetic Anisotropy mechanism comprising a source line MOSFET connected to the free layer of the magnetic tunnel junction and a voltage control circuit configured to determine an electrical field direction across the free layer of ferromagnetic material.
56. The apparatus of claim 33 wherein the switching mechanism comprises a Spin-Orbit Torque mechanism comprising a source line MOSFET connected to the free layer of the magnetic tunnel junction and a bit line MOSFET connected to the reference layer of the magnetic tunnel junction, a read line connected to the gate terminal of the bit line MOSFET and a write line connected to the gate terminal of the source line MOSFET.
57. A method for representing a set of neural network weight values by a set of multilevel memory cells, the method comprising: measuring at least one physical property of each multilevel memory cell; and applying a digitizalization procedure for each multilevel memory cell, the digitizalization procedure depending upon the at least one physical property.
58. The method of claim 57 wherein each multilevel memory cell comprises a sequence of non- homogeneous non-volatile binary cell elements, each non-homogeneous non-volatile binary cell element comprising a variable resistive element which is switchable between at least a first stable state having a higher conductance and a second stable state having a lower conductance binary memory element and wherein the at least one physical property comprises a conductance gap between the higher conductance and the second stable state.
59. A method for approximating inference of a neural network comprising: approximating a multiply-accumulate operation on a neural network inference thereby producing an analogue output; applying a digitizalization procedure on the analogue output, the digitizalization procedure depending upon an ADC range thereby producing a digital value; applying a normalization procedure, the normalization procedure depending upon output normalization parameters.
60. The method of claim 59 further comprising adjusting the ADC range and the normalization parameters jointly based upon a set of samples for neural network input data.
61 . The method of claim 59 further comprising adjusting the ADC range based upon a set of samples for neural network input data while the normalization parameters remain unchanged
62. The method of claim 59 further comprising adjusting the normalization parameters based upon a set of samples for neural network input data while remain the ADC range unchanged.
63. A method for approximating a multiply-accumulate operation on a neural network inference, the method comprising: providing a weighting apparatus comprising a sequence of adjustable multilevel memory cells, each adjustable multilevel memory cell connected between a cell input line and a common output line; programming the weighting apparatus such that each adjustable multilevel memory cell has a conductivity representing a single neural network weight value; providing an activation vector comprising a sequence of activation values; providing input voltage signals to each cell input line of the weighting apparatus representing a corresponding activation value from the activation vector; and measuring the current through the common output line.
64. A method for approximating a value of a dot product of a first vector and a second vector, the method comprising: providing the first vector comprising a sequence of first vector values; providing the second vector comprising a sequence of second vector values; providing a weighting apparatus comprising a sequence of adjustable multilevel memory cells, each adjustable multilevel memory cell connected between a cell input line and a common output line; programming the weighting apparatus such that each adjustable multilevel memory cell has a conductivity representing a single first vector value from the first vector; providing input voltage signals to each cell input line of the weighting apparatus representing a corresponding second vector value from the second vector; and measuring the current through the common output line.
65. A method for encoding a numerical value in a multi-bit format to be represented by a multilevel memory cell, the method comprising: measuring at least one resistive state of the multilevel memory cell; defining a digitizalization function, the digitalization function depending upon the higher conductance value and the lower conductance value of each said variable resistive element; and applying the digitization function to the numerical value.
66. The method of claim 65 wherein the multilevel memory cell comprises a sequence of binary cell elements each binary cell element comprising a variable resistive element which is switchable between a first stable state having a higher conductance and a second state having a lower conductance, and the step of measuring at least one resistive state of the multilevel memory cell comprises: measuring the higher conductance value of each binary cell element; and measuring the lower conductance value of each binary cell element.
67. The method of claim 65 wherein the numerical value is selected from a set of numerical values, the method further comprising scaling each numerical value of the set.
68. The method of claim 67 wherein the scaling comprises: determining a lowest numerical value in the set of numerical values; determining a highest numerical value in the set of numerical values; defining a scaling function, the scaling function depending upon the lowest numerical value and the highest numerical value; and applying the scaling function to each the numerical value.
69. The method of claim 68 wherein the step of defining a scaling function comprises: scaling the lowest numerical value to the binary value 0; scaling the highest numerical value to the highest binary value represented by the multilevel memory cell.
70. An apparatus for an approximate neural network inference, the apparatus comprising: a plurality of non-volatile memory elements of variable conductance that represent multi-bit binary values, organized to perform an instant analog approximation for a reliable neural network inference, including the inference for neural network layers; and circuit elements including connection lines, which form circuits together with non-volatile memory elements, the circuit elements being configured to control a current distribution, which is governed by the conductance of the parts of the circuit, for the instant analog approximation of an output of the inference wherein the neural network layers have a size not limited within the common neural network inference practice by a ratio of conductance of memory elements to conductance of the connection lines.
71. The apparatus of claim 70, wherein the plurality of the non-volatile memory elements of variable conductance is configured by a plurality of control elements and/or devices provided to perform the instant analog approximation of the inference output for the neural network layer of the size not practically limited by the ratio of conductance of memory elements to conductance of connection lines.
72. The apparatus of claim 70, wherein the plurality of the non-volatile memory elements of variable conductance is configured by a plurality of connection lines provided to ensure the reliability of the instant analog approximation of the inference output for the neural network layer of the size not practically limited by the ratio of conductance of memory elements to conductance of connection lines.
73. The apparatus of claim 70, wherein the reliable instant analog approximation of the output for the neural network layer of the size not practically limited by the ratio of conductance of memory elements to conductance of connection lines is secured by prevention of currents flow through the cells in a reverse direction.
74. The apparatus of claim 72, wherein the reliable instant analog approximation of the output for the neural network layer of the size not practically limited by the ratio of conductance of memory elements to conductance of connection lines is ensured by a configuration and/or topology of connection lines distinct from a conventional cross-point connection that involves straight single wire input and output connection lines.
75. The apparatus of claim 74, wherein the configuration of input and/or output connection lines involves a multi-level tree structure of connections of input/output line to an array of memory elements.
76. The apparatus of claim 75, wherein the multi-level tree structure of connections is a binary balanced tree of connecting lines.
77. The apparatus of claim 75, wherein the multi-level tree structure of connections is a non-binary tree of connecting lines involving also conventional straight single wire connecting lines at individual levels.
78. The apparatus of claim 70, wherein the reliable instant analog approximation of the output for the neural network layer of the size not practically limited by the ratio of conductance of memory elements to conductance of connection lines is ensured by specific properties of memory elements such as high resistance, a large memory element size, acceptance of a higher percentage of memory elements not usable for memory storage, higher percentage of read errors, short memory retention time, and/or combination thereof.
79. The apparatus of claim 78, wherein the reliable instant analog approximation of the output for the neural network layer of the size not practically limited by the ratio of conductance of memory elements to conductance of connection lines is ensured by means of sufficiently high resistance of the memory elements.
80. The apparatus of claim 79, wherein the memory elements are implemented using Magnetic Tunnel Barrier technology.
81 . The apparatus of claim 80, wherein the memory elements are implemented using a multi-state MTJ with the free ferromagnetic region defined by a plurality of ovals.
82. The apparatus of claim 80, wherein the memory elements are implemented using a multi-level resistive element containing a plurality of non-homogeneous non-volatile memory cells of variable resistance that are assembled to represent a single neural network weight.
83. The apparatus of claim 80, wherein the property of the high resistance of the memory elements is satisfied by applying a Spin-Orbit Torque memory cell construction, involving a memory write mechanism not passing the current through the tunneling barrier.
84. The apparatus of claim 83, wherein the property of the high resistance of the Spin-Orbit Torque memory cell is satisfied by thicker tunneling barrier, than what is possible for other MTJ based memory cell types.
85. The apparatus of claim 83, wherein the construction of the Spin-Orbit Torque cell does not involve the read line transistor or diode, thus allowing for the cell area and energy reduction.
86. The apparatus of claim 78, wherein the specific properties of memory elements involve the properties that were not appropriate for use of the memory elements as digital memory storage.
87. The apparatus of claim 86, wherein the specific properties of memory elements involve the large memory element size to ensure the low power readout or to increase the element’s manufacturing stability.
88. The apparatus of claim 86, wherein the specific properties of memory elements involve the significant percentage of elements not usable for memory storage, while the ensemble of the memory elements is still usable to perform the reliable instant analog approximation of the neural network layer output.
89. The apparatus of claim 86, wherein the specific properties of memory elements involve the significant percentage of read errors, while the ensemble of the memory elements is still usable to perform the reliable instant analog approximation of the neural network layer output.
90. The apparatus of claim 86, wherein the specific properties of memory elements involve the relatively short memory retention time, while the ensemble of the memory elements is still usable to perform the reliable instant analog approximation of the neural network layer output within the specified retention time frame.
91. The apparatus of claim 71 , wherein parts of the plurality of non-volatile memory elements are used alternatively as digital memory/logic device or as to perform the reliable instant analog approximation of the neural network layer output.
92. The apparatus of claim 91 , wherein parts of the plurality of non-volatile memory elements are used alternatively as digital memory/logic device or as to perform the reliable instant analog approximation of the neural network layer output while their usage is controlled by the preprogramming or the run time reprogramming for the ensemble of the memory elements or for some part of that ensemble.
93. The apparatus of claim 70, wherein separate memory elements of the plurality of non-volatile memory elements are used to represent separate parts of the multi-bit binary representation for the values of the neural network weights.
94. The apparatus of claim 93, wherein separate memory elements of the plurality of non-volatile memory elements are used to represent separate parts of the multi-bit binary representation for the values of the neural network weights.
95. The apparatus of claim 93, wherein the output current distributions for memory elements representing separate parts of the multi-bit values for the neural network weights are produced for each part of a multi-bit representation separately and then collected together using an additional circuit.
96. The apparatus of claim 93, wherein the output current distributions for memory elements representing separate parts of the multi-bit values for the neural network weights are produced for all parts of a multi-bit representation together using different input voltage scales on memory elements representing different parts of a multi-bit representation.
97. The apparatus of claim 71 , wherein the plurality of control elements and/or devices involves a digital processor core, the plurality of memory elements is connected to data input lines from the data source and to the control lines of the aforementioned digital processor core.
98. The apparatus of claim 97, wherein the digital processor core and/or the plurality of control elements and/or devices is powered up by a wake-up controller, connected to the plurality of memory elements and the digital processor core, in case of a wake-up event only.
99. The apparatus of claim 98, wherein the plurality of memory elements permanently stays in an “always- on” standby mode, not consuming the energy, while in the event of a data signal coming to the input lines, the signal is initially processed by the plurality of memory elements that performs the initial analog neural network approximation procedure to determine the need for the digital processor core and/or the plurality of control elements and/or devices to wake up.
100. A method comprising: performing computer modeling for the use of the apparatus of any one of claims 70 to 99; and applying an appropriate sequence of actions to determine conditions and/or available solutions related to the task of reliable inference implementation on a given device instance of that apparatus for a given pretrained neural network.
101. The method of claim 100, further comprising applying a sequence of actions to determine if the reliable inference implementation for a given pre-trained neural network could be implemented on a given device instance of the apparatus.
102. The method of claim 100, further comprising applying a sequence of actions to determine the required properties of a device instance of the apparatus that could be used to reliably implement an inference for a given pre-trained neural network.
103. The method of claim 100, further comprising applying a sequence of actions to determine the required characteristics of a given pre-trained neural network so that inference of the network could be reliably implemented on a given device instance of the apparatus.
104. The method of claim 100, further comprising applying a sequence of actions to provide a reliable inference implementation on a given device instance of the apparatus for a given pre-trained neural network that could be reliably implemented on that device instance.
105. The method of claim 100, further comprising that within the modeling for the use of the apparatus of any of claims 70 to 99, a reliable implementation of an inference for a given pre-trained neural network uses separate memory elements of the plurality of non-volatile memory elements to represent separate parts of the multi-bit values for the neural network weights.
106. The method of claim 105, further comprising that within the modeling for the use of the apparatus of any of claims 70 to 99, the levels of binary representation and/or a separate memory elements representation of separate parts of the multi-bit values for the neural network weights, and/or particular allocation of input and/or output channels are adjusted to optimally, or nearly optimally fit the specific particular instance of specifications, configuration and/or topology of the memory elements and/or connection lines of a specific particular instance of a device outlined above.
107. The method of claim 105, further comprising that within the modeling for the use of the apparatus of any one of claims 70 to 99, the levels of binary representation and/or a separate memory elements representation of separate parts of the multi-bit values for the neural network weights, and/or particular allocation of input and/or output channels are adjusted to optimally, or nearly optimally fit the specific particular instance of manufactured properties of the memory elements and/or connection lines of a specific particular manufactured instance of the device.
PCT/IB2024/058647 2023-09-06 2024-09-05 Systems and methods for providing and using multi-level non-volatile memory elements Pending WO2025052292A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202363536905P 2023-09-06 2023-09-06
US63/536,905 2023-09-06
US202463626957P 2024-01-30 2024-01-30
US63/626,957 2024-01-30

Publications (1)

Publication Number Publication Date
WO2025052292A1 true WO2025052292A1 (en) 2025-03-13

Family

ID=94923448

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2024/058647 Pending WO2025052292A1 (en) 2023-09-06 2024-09-05 Systems and methods for providing and using multi-level non-volatile memory elements

Country Status (1)

Country Link
WO (1) WO2025052292A1 (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100240152A1 (en) * 2006-11-01 2010-09-23 Avalanche Technology, Inc. Current-Confined Effect of Magnetic Nano-Current-Channel (NCC) for Magnetic Random Access Memory (MRAM)
US20130001718A1 (en) * 2008-09-29 2013-01-03 Seagate Technology Llc Magnetic tunnel junction with electronically reflective insulative spacer
US9349447B1 (en) * 2015-03-02 2016-05-24 HGST, Inc. Controlling coupling in large cross-point memory arrays
US20170229170A1 (en) * 2014-10-23 2017-08-10 Hewlett-Packard Development Company, L.P. Generating a representative logic indicator of grouped memristors
US20200372335A1 (en) * 2019-05-22 2020-11-26 International Business Machines Corporation Closed loop programming of phase-change memory
US20210005262A1 (en) * 2018-03-22 2021-01-07 Micron Technology, Inc. Memory block select circuitry including voltage bootstrapping control
US20230089791A1 (en) * 2021-09-23 2023-03-23 International Business Machines Corporation Resistive memory for analog computing

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100240152A1 (en) * 2006-11-01 2010-09-23 Avalanche Technology, Inc. Current-Confined Effect of Magnetic Nano-Current-Channel (NCC) for Magnetic Random Access Memory (MRAM)
US20130001718A1 (en) * 2008-09-29 2013-01-03 Seagate Technology Llc Magnetic tunnel junction with electronically reflective insulative spacer
US20170229170A1 (en) * 2014-10-23 2017-08-10 Hewlett-Packard Development Company, L.P. Generating a representative logic indicator of grouped memristors
US9349447B1 (en) * 2015-03-02 2016-05-24 HGST, Inc. Controlling coupling in large cross-point memory arrays
US20210005262A1 (en) * 2018-03-22 2021-01-07 Micron Technology, Inc. Memory block select circuitry including voltage bootstrapping control
US20200372335A1 (en) * 2019-05-22 2020-11-26 International Business Machines Corporation Closed loop programming of phase-change memory
US20230089791A1 (en) * 2021-09-23 2023-03-23 International Business Machines Corporation Resistive memory for analog computing

Similar Documents

Publication Publication Date Title
CN110825345B (en) Multiplication using nonvolatile memory cells
He et al. 2-bit-per-cell RRAM-based in-memory computing for area-/energy-efficient deep learning
TWI753492B (en) Testing circuitry and methods for analog neural memory in artificial neural network
KR102657246B1 (en) A system for converting neuronal currents into neuronal current-based time pulses in analog neural memory within deep learning artificial neural networks.
Yoon et al. A 40-nm 118.44-TOPS/W voltage-sensing compute-in-memory RRAM macro with write verification and multi-bit encoding
Li et al. Secure-RRAM: A 40nm 16kb compute-in-memory macro with reconfigurability, sparsity control, and embedded security
US11893478B2 (en) Programmable output blocks for analog neural memory in a deep learning artificial neural network
Donato et al. On-chip deep neural network storage with multi-level eNVM
JP2024123058A (en) Test circuit and method for analog neural memories in artificial neural networks - Patents.com
Shim et al. Impact of read disturb on multilevel RRAM based inference engine: Experiments and model prediction
US11289171B1 (en) Multi-level ultra-low power inference engine accelerator
KR20220044643A (en) Ultralow power inference engine with external magnetic field programming assistance
KR20250011719A (en) Precision Programming Circuit For Analog Neural Memory In Deep Learning Artificial Neural Network
US12205008B2 (en) Dropout in neutral networks using threshold switching selectors in non-volatile memories
Yan et al. iCELIA: A full-stack framework for STT-MRAM-based deep learning acceleration
CN113870921B (en) A Symbol-Number Mapping Method on Memristor Arrays
US20230252276A1 (en) Calibration of electrical parameters in a deep learning artificial neural network
Yoon et al. A 40nm 100Kb 118.44 TOPS/W Ternary-weight Computein-Memory RRAM Macro with Voltage-sensing Read and Write Verification for reliable multi-bit RRAM operation
Lee et al. NAND flash based novel synaptic architecture for highly robust and high-density quantized neural networks with binary neuron activation of (1, 0)
Taha et al. Approximate memristive in-memory Hamming distance circuit
CN112215344A (en) Correction method and design method of neural network circuit
TW202341013A (en) Calibration of electrical parameters in a deep learning artificial neural network
Palhares et al. 28 nm FDSOI embedded PCM exhibiting near zero drift at 12 K for cryogenic SNNs
Chen et al. Nonideality suppression and 16-state multilevel cell storage optimization in phase change memory with linear-like circuit
WO2025052292A1 (en) Systems and methods for providing and using multi-level non-volatile memory elements

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24862193

Country of ref document: EP

Kind code of ref document: A1