WO2024081827A1 - Thermodynamic computing system for sampling high-dimensional probability distributions - Google Patents
Thermodynamic computing system for sampling high-dimensional probability distributions Download PDFInfo
- Publication number
- WO2024081827A1 WO2024081827A1 PCT/US2023/076759 US2023076759W WO2024081827A1 WO 2024081827 A1 WO2024081827 A1 WO 2024081827A1 US 2023076759 W US2023076759 W US 2023076759W WO 2024081827 A1 WO2024081827 A1 WO 2024081827A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- analog
- network
- distribution
- voltage
- unit cells
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
- G06N3/065—Analogue means
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01R—MEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
- G01R19/00—Arrangements for measuring currents or voltages or for indicating presence or sign thereof
- G01R19/25—Arrangements for measuring currents or voltages or for indicating presence or sign thereof using digital measurement techniques
- G01R19/2506—Arrangements for conditioning or analysing measured signals, e.g. for indicating peak values ; Details concerning sampling, digitizing or waveform capturing
- G01R19/2509—Details concerning sampling, digitizing or waveform capturing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/217—Validation; Performance evaluation; Active pattern learning techniques
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0499—Feedforward networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
Definitions
- Aleatoric ambiguity persists independent of exposing an AI or ML system to more data.
- the perception system of an autonomous vehicle includes noisy sensors and imaging.
- Attorney Docket No. NORM-001WO01 Conventional AI addresses uncertainty by assigning a confidence level to its decisions (e.g., 90% confidence that an image shows a particular person or object).
- FIG. 1 illustrates how uncertainty affects a mainstream deep neural network’s ability to assign a meaningful confidence level to its predictions under real-world conditions. It shows idealized and real plots of uncertainty (confidence) in the deep neural network’s predictions, with darker shading indicating higher confidence.
- the deep neural network was trained on the training data (dots) distributed along the semicircles shown in both plots in FIG. 1. The trained deep neural network was then presented with the “out-of- distribution” input data (dots) clustered from x ⁇ 0 to x ⁇ 1.5 at y ⁇ 1.5 in both plots.
- the deep neural network should recognize that its predictions about the input data are highly uncertain/low confidence (light shading), as shown in the left plot. In reality, however, the deep neural network assigns high confidence (dark shading) to its prediction as shown in the right plot even though the input data is far from the training data.
- common AI and ML models tend to be especially overconfident in their predictions when operating on unfamiliar input data.
- probabilistic AI and ML can quantify and account for uncertainty in predictions.
- Probabilistic AI and ML combine AI and ML with probabilistic reasoning, which is the mathematical process of determining uncertainties as a function of prior beliefs, biases, and data. For any level of stakes, probabilistic reasoning should be optimal so long as there exists uncertainty.
- Probabilistic AI and probabilistic reasoning naturally involve continuous distributions, such as Gaussian distributions, to reflect one’s state of knowledge. Unfortunately, performing probabilistic reasoning with these continuous distributions scales poorly in computational difficulty as a function of model complexity, when using standard digital computers. Summary Nature performs probabilistic reasoning through the physical principle of thermodynamic equilibration. Probabilistic AI models are trained and inferred via probabilistic reasoning, usually working in terms of continuous random variables. A thermodynamic computing platform can exploit thermodynamic equilibration to perform probabilistic reasoning for probabilistic artificial intelligence and machine learning.
- thermodynamic equilibration of a system describes dynamics which evolve a physical system until the system can be described by a steady probability distribution over possible states; the inventive technol- ogy uses such thermodynamic equilibration for computation with dynamics engineered so that the final probability distribution over states matches the target probability distribu- Attorney Docket No. NORM-001WO01 tion that is being computed. Moreover, utilizing physical systems that are continuous in nature naturally matches the mathematical framework of continuous distributions used in probabilistic reasoning.
- the inventive technology can be implemented as a thermodynamic system for sam- pling a multivariate distribution, such as a multivariate Gaussian probability distribution. This system may include a network of coupled harmonic oscillators that is operably cou- pled to a digital controller.
- the network of coupled harmonic oscillators is characterized by a Hamiltonian that maps to the multivariate distribution.
- Each coupled harmonic oscillator in the network of coupled harmonic oscillators includes a digitally tunable in- ductor, a digitally tunable capacitor, a digitally tunable voltage source, and a digitally tunable current source.
- the digital controller initializes the digitally tunable inductors, the digitally tunable capacitors, the digitally tunable voltage sources, and the digitally tunable current sources. It iteratively reads voltages and currents at nodes in the network of coupled harmonic oscillators and updates the digitally tunable voltage sources and the digitally tunable current sources based on the voltages and currents.
- thermodynamic system for sampling a target probability distribution.
- This thermodynamic system includes both an analog dynamical system and a digital controller.
- the analog dynamical system is configured to evolve via Hamilton’s equations over a series of time steps to provide samples of the target probability distribution.
- the digital controller is operably coupled to the analog dynamical system and is configured, at each of the time steps, to receive a proposed sample of the target probability distribution and to accept or reject the proposed sample.
- a thermodynamic system for sampling a multivariate distribution such as a multivariate Gaussian probability dis- tribution.
- This system may include a network of analog electrical circuits (e.g., coupled harmonic oscillators) that is operably coupled to a digital controller.
- Each of the analog electrical circuits includes at least one tunable passive electrical component (e.g., a digi- tally tunable inductor, digitally tunable capacitor, and/or digitally tunable resistor).
- the digital controller tunes the tunable passive electrical components accord- ing to the multivariate distribution and samples voltages at points within the network of analog electrical circuits.
- a thermodynamic system may also include a voltage-controlled voltage source, operably coupled to the network of analog electrical circuits, to modify the multivari- ate distribution.
- the voltage-controlled voltage source can employ an artificial neural Attorney Docket No. NORM-001WO01 network to relate an input voltage to the voltage-controlled voltage source to an output voltage of the voltage-controlled voltage source.
- the input voltage can be based on the voltages sampled by the digital controller and the voltage-controlled voltage source can be configured to apply the output voltage to the network of analog electrical circuits so as to change a distribution of potential energy across the network of analog electrical circuits.
- the the multivariate distribution can have a nonzero cumulant of at least third order, in which case the system includes a multi-port device, operably coupled to at least three of the analog electrical circuits in the network of analog electrical circuits.
- This multi-port device can generate a k-body term in a potential energy distribution over the network of analog electrical circuits, where k is an integer greater than or equal to 3.
- the multi-port device can include a transistor, a voltage-controlled capacitor, or both.
- Each analog electrical circuit in the thermodynamic system can also include a stochas- tic noise source, such as a thermal noise source, a shot noise source, or a digital pseudo- random noise source.
- thermodynamic system can generate samples from the multivariate distribution via a Langevin Monte Carlo algorithm and/or a Metropolis Adjusted Langevin Algorithm.
- the multivariate distribution’s covariance matrix can be encoded in at least some of the parameters of the network of analog electrical circuits.
- Still another inventive system for sampling a multivariate distribution includes a network of coupled analog unit cells operably coupled to a digi- tal controller.
- the network of coupled analog unit cells is characterized by a total energy function that maps to the multivariate distribution.
- Each of the coupled analog unit cells includes a digitally tunable capacitor and either a digitally tunable voltage source, a digitally tunable current source, or both a digitally tunable voltage source and a dig- itally tunable current source.
- the digital controller initializes the digitally tunable capacitors, voltage sources, and/or current sources.
- the digital controller iter- atively reads voltages and currents at nodes in the network of coupled analog unit cells and updates the digitally tunable voltage sources and/or the digitally tunable current sources.
- the voltages and the currents correspond to positions and momenta of the total energy function, and the digital controller translates the voltages and the currents into samples of the multivariate distribution.
- the total energy function can be a Hamiltonian in which case the network of cou- pled analog unit cells have dynamics used to run a Hamiltonian Monte Carlo protocol.
- the network of coupled analog unit cells can have dynamics used to run a Langevin Monte Carlo protocol.
- the network of coupled analog unit cells can include one analog unit cell for each Attorney Docket No. NORM-001WO01 dimension of the multivariate distribution.
- Each of the analog unit cells may also include a digitally tunable inductor.
- the analog unit cells can be coupled to each other capacitively, resistively, and/or inductively.
- Yet another inventive system for sampling a multivariate distribution includes a gradi- ent calculator (e.g., an analog neural network or a network of resistive circuit elements), a momentum integrator device operably coupled to the gradient calculator, and a position integrator device operably coupled to the momentum integrator device.
- Still another inventive thermodynamic system for sampling a target probability distri- bution includes an analog dynamical system operably coupled to a digital controller.
- the analog dynamical system is configured to evolve via a Langevin equation and/or Hamil- ton’s equations of motion.
- the digital controller is configured, at each of time step of the Langevin equation and/or Hamilton’s equations of motion, to receive a proposed sample of the target probability distribution from the analog dynamical system and to accept or reject the proposed sample.
- Inventive methods include a method of inverting a matrix. This method involves uploading elements of the matrix to respective values of tunable circuit elements of a network of analog unit cells, allowing the network of analog unit cells to come to ther- mal equilibrium, and computing a covariance matrix based on dynamical variables of the network of analog unit cells.
- This covariance matrix is an inverse of the matrix.
- Com- puting the covariance matrix may include drawing samples from a thermal equilibrium distribution of the dynamical variables of the network of analog unit cells and computing the covariance matrix of the samples.
- Computing the covariance matrix can also include integrating the dynamical variables over time with a plurality of analog integrators.
- Other inventive methods include a method of solving a system of linear equations represented by a matrix and a vector.
- This the method involves uploading elements of a matrix to respective values of tunable circuit elements of a network of analog unit cells, uploading elements of a vector to respective current sources or voltage sources in the analog unit cells, allowing the network of analog unit cells to come to thermal equilibrium, and computing mean values of dynamical variables of the network of analog unit cells.
- the mean values representing a solution to the system of linear equations.
- Computing the mean values can include integrating the dynamical variables over time with a plurality of analog integrators.
- FIG. 1 shows plots of uncertainty for deep neural networks in the idealized and real cases.
- FIG. 2 shows an overview of the thermodynamic computing system.
- a digital con- troller draws samples from a temporally evolving analog system, also called an analog dynamical system, to perform the Hamiltonian Monte Carlo (HMC) method of sampling a target probability distribution.
- FIG. 3 is a circuit diagram for a basic LC oscillator that maps to a one-dimensional normal (i.e., Gaussian) probability distribution.
- FIG. 1 shows plots of uncertainty for deep neural networks in the idealized and real cases.
- FIG. 2 shows an overview of the thermodynamic computing system.
- a digital con- troller draws samples from a temporally evolving analog system, also called an analog dynamical system, to perform the Hamiltonian Monte Carlo (HMC) method of sampling a target probability distribution.
- HMC Hamiltonian Monte Carlo
- FIG. 4 is a circuit diagram for a pair of capacitively coupled oscillators in a network of capacitively coupled harmonic oscillators that maps to a multivariate normal probability distribution.
- FIG. 5 shows a a U-Turn that appears while running HMC on a 2D Gaussian.
- FIG. 6 shows a system level view of the UL hardware implementation. The PC downloads the setup information to the on-board controller, and consequently reads the sampled data from the cells.
- FIG. 7 depicts two coupled LC resonators that form a simple two-cell system. The resonant frequency of each resonator can be independently tuned, and the capacitive coupling can be controlled. The cells are excited with a Gaussian noise current source.
- FIG. 5 shows a a U-Turn that appears while running HMC on a 2D Gaussian.
- FIG. 6 shows a system level view of the UL hardware implementation. The PC downloads the setup information to the on-board controller, and consequently reads
- FIG. 8 shows generic all-to-all coupling in a UL hardware setup.
- FIG. 9a shows a tunable cell where the resonant frequency can be digitally controlled with a current noise source implemented using a digital controller followed by a LPF.
- FIG.9b shows a tunable cell with an initial condition implemented by adding an extra switch separating the inductor from the capacitors.
- FIG.10 shows the capacitor can be initialized while disconnected instead of initializing the capacitor and inductor in each cell. At a given time after connecting it with the inductor, the sought initial condition is achieved. This way, one component is initialized.
- FIG. 11 shows a coupling unit between two cells. The cells are connected at nodes A and B.
- FIG. 12 illustrates an FPGA implementation of the controller: (a) The high level Attorney Docket No. NORM-001WO01 control, where the download and upload operation occur, in addition to setting up each cell and a coupling unit within the FPGA. (b) The cell interface reads the voltages from the ADC, and subsequently stores that data. It also controls the switches of the tunable resonator and controls the noise amplitude. (c) The coupling unit interface turns the proper switches to control the coupling and its polarity. (d) The format of the downloaded data to the FPGA. FIG.
- FIG. 13A shows the current noise source in Figure 7 can be implemented with a since- bit PN generator with a series resistance to emulate a current source.
- the RC circuit also acts as a LPF.
- FIG. 13B shows an example of a linear feedback shift register (LFSR) used for a single-bit pseudonoise (PN) generator.
- FIG. 14 is a visualization of the overall process. Shaded boxes are procedures which happen on the digital device, while white boxes are procedures that happen on the analog device.
- FIG. 15 illustrates coupling between HMC unit cells via a mutual inductance.
- FIG. 16 illustrates direct inductive coupling between HMC unit cells.
- FIG. 17A shows a circuit diagram of a possible physical realization of a single unit cell, consisting of a noisy resistor and a capacitor illustrates direct inductive coupling between HMC unit cells.
- FIG. 17B shows a circuit diagram of a possible means to couple two unit cells, using a coupling resistor.
- FIG. 18 shows an example of a switching capacitor bank controlled by digital logic, where each physical capacitor adds an additional bit of precision for the overall capaci- tance.
- FIG. 19 illustrates two varicaps in opposite directions in a back-to-back configuration. A third port in the middle acts as a tuning voltage to tune the capacitance of the system.
- FIG. 20 is a circuit diagram of the digital-analog scheme for implementing bi-polar coupling.
- FIG. 20 is a circuit diagram of the digital-analog scheme for implementing bi-polar coupling.
- FIG. 21 shows an active inductor based on a gyrator (top) and an equivalent circuit with an inductor (bottom).
- FIG. 22 shows a circuit diagram of a basic symmetric RC unit-cell used for over damped Langevin sampling of one-dimensional normal (i.e. Gaussian) probability distri- bution.
- FIG. 23 shows a circuit diagram of the basic coupling-cell used to achieve negative coupling between two RC unit-cells, without inductors.
- FIG. 24 shows a circuit diagram of the basic coupling-cell used to achieve positive coupling between two RC unit-cells, without inductors.
- FIG. 22 shows a circuit diagram of a basic symmetric RC unit-cell used for over damped Langevin sampling of one-dimensional normal (i.e. Gaussian) probability distri- bution.
- FIG. 23 shows a circuit diagram of the basic coupling-cell used to achieve negative coupling between two RC unit-cells, without inductors.
- FIG. 24
- FIG. 25 illustrates different strategies for achieving local connectivity, including a square lattice, hexagonal lattice, and King’s graph, with boxes representing unit cells.
- Attorney Docket No. NORM-001WO01 FIG. 26 illustrates additional strategies for achieving local connectivity based on clus- ter graphs, including square clusters and hexagonal clusters, with boxes representing unit cells.
- FIG. 27 illustrates a semi-local approach to connecting unit cells (represented by boxes).
- unit cells within a row are fully connected, while the connections between rows are more sparse.
- FIG. 28 shows a circuit schematic of a possible fully-connected architecture.
- FIG.29 is a plot of the success rate of the accept/reject step with increasing number of bits of precision in the switching capacitor bank capacitors (error bars represents standard deviation error). Results of simulating the process and hardware with fully-connected capacitance matrix of dimension 8.
- FIG. 30 is a plot of the success rate of the accept/reject step with increasing capacitor imperfections (shaded region represents standard deviation error). Results of simulating the process and hardware with fully-connected capacitance matrix of dimension 200.
- Capacitors have many levels of tolerances, but one can select capacitors with a tolerance of, say 2.5%, which leads a success rate of approximately 80% according to the plot.
- FIG. 28 shows a circuit schematic of a possible fully-connected architecture.
- FIG.29 is a plot of the success rate of the accept/reject step with increasing number of bits of precision in the switching capacitor bank capacitors (error bars represents standard deviation error). Results of simulating the process and hardware with fully-connected capacitance
- FIG. 31 shows plots illustrating the effect of hardware with limited k-banded connec- tivity on the acceptance rate of the accept/reject step (error bars represent the standard deviation). Results from simulating the process and hardware with k-banded capacitance matrix of increasing dimension.
- the target distribution has a sparse covari- ance matrix with elements decaying with distance from the diagonal
- the target distribution covariance matrix is arbitrarily dense.
- FIG. 33 is a plot showing the L 1 distance between the approximating probability density function f a and the target f t as a function of ⁇ .
- the target distribution is a univariate normal distribution with mean zero and variance 20.
- error mitigation green
- the first derivative of the error goes to zero as ⁇ goes to zero
- error mitigation blue
- the first derivative does not vanish as ⁇ goes to zero.
- FIG. 34 illustrates Maxwell’s demon, which is a concept in thermodynamics whereby an intelligent observer can reduce the entropy of a system by watching the system and appropriately applying forces to the system.
- FIG. 35 shows how the potential energy landscape U(x) can be altered by adding a Maxwell’s demon.
- the bottom panel shows schematically how this alters the probability distribution P (x) that one draws samples from.
- the left panel shows a harmonic oscillator Attorney Docket No. NORM-001WO01 potential corresponding to sampling a Gaussian probability distribution.
- the right panel shows that adding in a Maxwell’s demon system leads to a more complicated potential, allowing one to sample other distributions such as multi-modal distributions.
- FIG. 36 illustrates how the Maxwell’s demon device can be viewed as a voltage- controlled voltage source.
- FIG. 37 illustrates how a digital Maxwell’s demon system (e.g., processed on a CPU or on a Field Programmable Gate Array (FPGA)) could interact with the analog HMC hardware.
- FIG. 38 illustrates a non-correlating approach to constructing an analog Maxwell’s demon.
- the primary unit cell (on the left) is coupled to an auxiliary unit cell (on the right). Specifically, the voltage across the capacitor in the primary unit cell is applied as a voltage source in the auxiliary unit cell. This voltage source drives current through a non-linear element (NLE), and the current is read off via the voltage across a resistor.
- NLE non-linear element
- FIG. 39 shows an alternative construction for the analog Maxwell’s demon without a voltage inverter. Namely, by allowing the resistor in the auxiliary unit cell to be ungrounded on both terminals, then one can pick off its voltage difference and invert the wires before applying it as a voltage source in the primary unit cell
- FIG. 40 illustrates the characteristics of a diode-based Maxwell’s demon.
- the NLE is a diode under forward bias.
- Te first panel shows the current- voltage characteristic of the diode, denoted I(x).
- the second panel shows the force f(x) outputted by the Maxwell’s demon, which we assume applies a negative sign to the I(x) curve.
- the third and fourth panels respectively are the potential energy U(x) and the probability distribution P (x) being sampled.
- FIG. 41 illustrates the characteristics of a Maxwell’s demon based on a tunnel diode. Here we assume that the NLE is a tunnel diode under forward bias.
- the first panel shows the current-voltage characteristic of the tunnel diode, denoted I(x).
- the second panel shows the force f(x) outputted by the Maxwell’s demon, which we assume applies a negative sign to the I(x) curve and then shifts the voltage by a constant, negative amount (as discussed in the text).
- FIG. 42 illustrates an analog Maxwell’s demon for generating correlations amongst the unit cells.
- An analog coupling operation which could, for example, be an affine transformation, is applied to the vector of voltages associated with the capacitors of each unit cell. After the analog coupling operation, the resulting vector is applied component- wise as a voltage source in each auxiliary unit cell, which contains a non-linear element (NLE). The current through each NLE is read off as a voltage, which is then inverted in sign and finally applied as a voltage source in the corresponding primary unit cell.
- NLE non-linear element
- FIG. 43 shows an analog circuit for implementing the matrix-vector multiplication Av. This provides a possible implementation for the analog coupling operation shown in Fig. 42.
- FIG. 44 shows an analog circuit for implementing the function h(v) using transistors.
- the ith component of the circuit for implementing h i (v) as a representative example for other components.
- transistors associated with h i giving a total of d 2 transistors for the entire transformation h(v).
- FIG. 45 illustrates a device that performs analog HMC when the probability distri- bution a user wishes to sample from is represented by a neural network.
- FIG. 47 shows three unit cells coupled via the three ports of a voltage-controlled capacitor.
- FIG. 48 shows four unit cells, coupled via a voltage multiplier (i.e., voltage mixer) and a voltage-controlled capacitor.
- the voltage multiplier can either be implemented on a digital device or with an analog device.
- FIG. 49 illustrates a scheme for achieving local connectivity with 3-body coupling via voltage-controlled capacitors and a square lattice.
- FIG. 50 illustrates a scheme for achieving local connectivity with 3-body coupling via voltage-controlled capacitors (VCCs) and a hexagaonal lattice.
- FIG. 51 shows an LMC Hardware unit cell with a capacitor, an ideal resistor, and a stochastic voltage source.
- FIG. 52 shows two unit cells of LMC hardware coupled with either (A) a resistive bridge or (B) a capacitive bridge.
- FIG. 53 shows two coupled unit cells of LMC hardware for Gaussian sampling.
- the covariance matrix of the Gaussian is encoded into the values of the circuit elements, particularly the resistances of the resistors.
- FIG. 54 shows an alternative analog device based on integrators for implementing LMC. The user interfaces through compilation of the gradient of the log-probability as well as through sample collection.
- FIG. 55 illustrates a possible circuit diagram for two coupled unit cells, which would form the building block of the hardware. In each unit cell, current noise is injected in parallel with the LC oscillator circuit.
- FIG. 56 is a schematic diagram of our Thermodynamic process for solving linear systems of equations. Attorney Docket No.
- FIG.2 shows a thermodynamic computing system 200, also called a thermodynamic com- puting platform, thermodynamic system, or thermodynamic platform, that can sample multivariate Gaussian distributions using the Hamiltonian Monte Carlo (HMC) method, which is a state-of-the-art Markov chain Monte Carlo (MCMC) method.
- the HMC sam- pling method is a Markov chain Monte Carlo (MCMC) method, which samples from the Gibbs distribution.
- the HMC sampling method reduces the problem of sampling from the Gibbs distribution of f to simulating the trajectory of a particle according to the laws of Hamiltonian dynamics where f plays the role of a potential energy function.
- the sam- ples produced by the thermodynamic computing system can be used in probabilistic AI, for example, to weight connections between neurons in different layers of a probabilistic neural network.
- the thermodynamic computing system 200 in FIG.2 is a hybrid digital-analog system that includes a digital system 210 that interfaces with an analog system 220, also called an analog dynamical system or temporally evolving analog system.
- the digital system 210 can be implemented in appropriately programmed digital hardware (e.g., a processor, application-specific integrated circuit, field-programmable gate array, etc.) and executes both a probabilistic model 212, which is an application-level statistical model being in- ferred by the thermodynamic system, and a controller 214 for interfacing with the analog system.
- the controller can be implemented in firmware.
- the controller 214 initializes control parameters 211 of the analog dynamical system 220 and accepts and rejects samples 221 of the target multivariate distribution from the analog system 220.
- the digital system 210 interfaces with the user directly or via another digital computing device (e.g., a personal computer (PC)).
- the analog system 220 includes a dynamical physical system 222 that evolves via Hamilton’s equations over a series of time steps to provide samples of a target multi- variate distribution, such as a multivariate Gaussian distribution, Bernoulli distribution, binomial distribution, Poisson distribution, or other exponential distribution.
- the analog system 220 can sample any distribution that can be defined in terms of an energy function that can be realized in (analog) electronics.
- the target multivariate distribution can be a high dimensional distribution (e.g., a distribution with 10, 100, 1000, 10000, or more dimensions) that represents probability, heat, population, or any other suitable quantity.
- the analog system 220 proposes a sample 221 of the target probability distribution to the controller 214, which accepts the proposed sample or rejects it if it is an outlier.
- the natural time evolution of the analog system 220 Attorney Docket No. NORM-001WO01 maps to Hamiltonian dynamics, meaning that the analog system naturally implements the integration step of the HMC sampling method.
- the controller 214 in the digital system 210 sets the analog system’s input parameters 211, which represent the multivariate distribution, for example, a normal (Gaussian) distribution or other distribution that the analog dynamical system 220 is sampling.
- the input parameters 211 may include the inductances, capacitances, voltages, and currents of adjustable inductors, capacitors, voltage sources, and current sources, respectively, that make up the analog system 220 as described below.
- the controller 214 sets these input parameters 211 via a serial bus or other suitable interface, then allows the analog system 220 to evolve for a predetermined period of time, also called a time step, integration period, or sampling period. Once that time has elapsed, the controller 214 reads the analog system’s output parameters, which may include voltages, currents, and/or other analog values representing the analog system’s final state, via the serial bus. These output parameters represent samples 221 of the multivariate Gaussian probability distribution. The controller 214 rejects errant or outlying parameter readings to correct for analog imperfections in the analog system 220.
- the analog component 220 of the thermodynamic system 200 simulates physical (Hamiltonian) dynamics to sample the multivariate Gaussian distribution or other tar- get probability distribution.
- the Hamiltonian H of the physical system 222 specifies the total energy of the physical system, including both kinetic energy and potential energy.
- the Hamiltonian represents the target probability distribution, with q sample space variables in D dimensions.
- the HMC method traditionally involves taking the derivatives of the Hamiltonian with respect to sample space variables and with respect to the generalized momenta, p, then integrating trajectories with respect to time t over a series of time steps of duration ⁇ .
- the dynamics of a physical system replace both the taking of derivatives and the trajectory integration.
- One such physical system 222 is an analog electronic system—–specifically, a network of coupled harmonic oscillators.
- the kinetic energy of a network of coupled harmonic oscillators maps to the energy function (Hamiltonian) of a multivariate Gaussian probability distribution up to a normalization constant.
- the oscillators voltages and currents represent the position coordinates of the Hamiltonian of the multivariate Gaussian probability distribution, so they can be sampled to provide samples of the distribution.
- Attorney Docket No. NORM-001WO01 2 Sampling a One-Dimensional Normal Probability Distribution
- a single lumped circuit element, such as a basic LC oscillator can be mapped to a one- dimensional (1D) normal (i.e., Gaussian) probability distribution as follows.
- an LC oscillator can be used as the analog dynamical system for sampling a 1D normal probability distribution in a thermodynamic computing platform that performs HMC calculations.
- HMC Hamiltonian Monte Carlo
- HMC relies on two main ingredients to sample from the target distribution: • Hamiltonian dynamics Given a Hamiltonian H and initial conditions x 0 and p 0 for x and p, HMC integrates Hamilton’s equations, Attorney Docket No. NORM-001WO01 for some time. The new value for x and p at the end of integration are the next proposal in the MCMC chain. • Canonical distribution Separately from the question of dynamics, given a Hamiltonian H and temperature T , there is a corresponding canonical probability distribution over x and p, where k B is Boltzmann’s constant.
- HMC The structure of HMC is influence by the property that x and p are independent random variables. This property follows from the fact that U(x) and K(p) have non overlapping support: Connecting the two ingredients is the choice of initial conditions for the integration.
- the initial values x 0 and p 0 set the surface of constant probability density on which the Hamiltonian dynamics travel. Up to normalization, the probability density along the trajectory is ⁇ U(x) ⁇ K(p) e kB T e kB T . (2)
- the independence of the marginals lets us use resampling of p to choose a new constant value for the joint probability density; then, the Hamiltonian dynamics exchange this resampling of momentum into a resampling of position.
- K ⁇ H(P , Q) is also a Hamiltonian
- the stationary paths of both sides of the equation are identical: the location of the stationary paths does not depend on the multiplicative constant ⁇ . 3.3.2 Prove new coordinates With the result of the previous section in hand we have a procedure to check that our transformations are canonical.
- FIG. 6 shows a hardware implementation that primarily achieves the sampling part of the overall computational process.
- This hardware implemenation includes a digital system, implemented here as an FPGA controller 610, operably coupled an analog sys- tem, show here as an LC resonator 620 coupled to an analog-to-digital converter (ADC), digital-to-analog converter (DAC), noise source, and switches 630.
- ADC analog-to-digital converter
- DAC digital-to-analog converter
- noise source and switches 630.
- the samples are volt- age levels across the LC resonators 620 (also called unit cells or cells, detailed in section 4.1) when excited with the noise source 630.
- a user can interact with the hardware im- plementation via a personal computer (PC) 602 that is operably coupled to the FPGA controller 610.
- the PC 602 downloads three different types of information from the FPGA controller 610: (1) cells resonant frequency, (2) coupling sections and their respective polarities, and (3) noise level.
- the FPGA controller 610 (detailed in section 4.3) sets up the circuits ac- cording to the downloaded data.
- An ADC samples the voltages and stores the data in the FPGA’s on-chip memory.
- Coupled Resonators Figure 7 shows coupled resonators in the form of two capacitively coupled, parallel LC circuits. The coupling and the cell capacitance are dictated by the controller. The capacitors can be tuned in an analog or digital fashion, while the inductances can remain constant. Coupling a larger number of resonators improves system performance.
- FIG. 8 shows a conceptual all-to-all coupling between N cells. Attorney Docket No. NORM-001WO01 The exact implementation of the cells and the coupling sections are technology depen- dent. Sections 4.1.1 and 4.1.2 show examples of how to implement cells in technologies compatible with printed circuit boards (PCBs).
- PCBs printed circuit boards
- Figure 9a(a) shows a structure for a switch-tunable resonator. The capacitors can change their values using a shunt switch (e.g., transistor). The minimum capacitance value (C1) does not require a switch.
- the top plate of the capacitors represent the voltage that is sampled and coupled to other cells.
- the noise source can be implemented with a digital controller to reduce or minimize the dependence on analog circuits. Consequently, a Lowpass Filter (LPF) filters out the unnecessary and/or undesired high-frequency components of the output signal.
- LPF Lowpass Filter
- the coupling units between cells can be implemented as shown in Figure 11.
- the series switch (controlled by Coupling enable signal) can turn the coupling ON or OFF.
- the polarity of the coupling is controlled by the signal Coupling polarity.
- the negative and positive polarity is achieved by using the transformer setup as shown in the figure.
- ADC Analog to Digital Converter
- the configuration uses a single ADC channel for each cell. Alternatively, multiple cells can share a single ADC if they are sampled simultaneously. The input impedance of the ADC should be taken into consideration with respect to the resonant frequency of the cells.
- Controller The controller in the PCB handles the following functions: • Download and upload interface with the PC • Interface with the ADC (read and store data, initializations, etc.) • Control the cell switches to tune the resonators • Control the coupling switches to turn them ON or OFF with the proper polarity • Generate a Pseudorandom noise (Section 4.4)
- Figure 12 shows the block diagrams for the functionality of the FPGA, along with the protocol of the setup data from the PC. Attorney Docket No.
- NORM-001WO01 4.4 Noise Injection
- Each unit cell has an independent noise source. Since the cells are set up as a parallel LC circuit, the noise sources in Figure 7 are current sources. One way to implement this is to generate a single-bit pseudo-random noise (PN) and filter the output using an RC circuit as shown in Figure 13A. The presence of the resistor also makes the PRN voltage source appear as a current source (Thevenin to Norton conversion). The amplitude of the noise can be controlled using a Pulse Density Modulation (PDM) for the single-but noise source.
- PDM Pulse Density Modulation
- the single-bit pseudo-random sequence is implemented with a Linear Feedback Shift Register (LFSR) configuration, representing a primitive polynomial.
- LFSR Linear Feedback Shift Register
- Select hyperparameters 1412 Use the Lanczos process to compute the highest and lowest normal mode frequencies of ⁇ , ⁇ max and ⁇ min .
- ⁇ threshold is set by the fact that device speed (set by ⁇ ) and device accuracy (set by the measurement circuitry) exhibit some tradeoff; the exact value is driven by simulations of the overall oscillator system.
- Set the target physical covariance matrix as ⁇ physical (for details, see section 3.5).
- Set the integration time ⁇ 1/(4 ⁇ min ) to be used for our modified HMC sampling process (for details on this choice, see section 3.6).
- Initialize Markov chain 1414 (digital device): Initialize the Markov chain S on the digital device. The initial state is a vector of all zeros of length d, the dimension of the target normal distribution. This chain represents the voltage values. 5.0.2 Draw samples In the sampling stage 1420, the digital device (digital controller) and physical device (network of coupled analog unit cells) collaborate to advance a Markov chain on the volt- age.
- the initial value of the currents I is computed 3.
- Set initial V, I 1423 (analog device): Open switch B and close switches A and C.
- Activate system evolution 1424 (analog device): Flip the switches so that B is closed and A, C are open in each oscillator block; turn off the voltage and current sources. This allows the oscillators to begin evolving freely.
- Measure final V, I 1425 (analog device) after time ⁇ , measure the final tra- jectory values - use each volt meter X to measure the final voltages V final on the coupling spikes J; use each current meter Y to measure the final currents I final running through the oscillator inductors F. 6.
- a multivariate normal probability distribution maps to a network of coupled harmonic oscillators, whether that coupling is inductive, capacitive, or inductive and capacitive.
- the network includes one harmonic oscillator for each dimension of the multivariate normal probability distribution, and the mapping is based on the Lagrangian and Hamiltonian of the circuit.
- the derivation of the Lagrangian and Hamiltonian is conceptually similar to the derivations presented above.
- the resulting Hamiltonian can be mapped to a multivariate normal probability distribution following the derivation presented above.
- FIG. 17(A) shows the typical equivalent noise model for a noisy resistor composed of a stochastic voltage noise source, ⁇ v(t), in series with an ideal (non-noisy) resistor of resistance R.
- the terminal capacitance, C, of the resistor is also added to the equivalent resistor model.
- the voltage on node 1 (labeled simply as v(t) here) is the state variable, whose dynamics obey:
- This stochastic differential equation (SDE) comprises a drift term proportional to v(t) and a diffusion or stochastic term proportional to ⁇ v(t).
- SDE stochastic differential equation
- two unit cells could be coupled through a resistor, as pictured in Fig. 17(B).
- a switching capacitor bank is made of several capacitors that can be connected in parallel in any combination to achieve a number of different values of capacitance.
- a digital signal is used to actuate the switches to connect or disconnect the capacitors.
- the device thus behaves effectively like a tunable capacitor, where the capacitance value is tuned by a digital voltage signal as shown in Fig. 18. Since the device can include any number of capacitors of any value of capacitance, this device can theoretically have arbitrary range and precision, at the cost of large physical area, loss, and parasitic capacitors. Additionally, since this device is made up of regular capacitors, this device has generally good linearity and is well-behaved at a wide range of voltage amplitudes.
- a varicap diode is a semiconductor device that has a voltage-dependent capacitance value. This type of diode has a depletion layer that is particularly sensitive to its voltage bias. The capacitance value can thus be changed by varying the value of the voltage bias across the diode.
- These diodes are specifically designed for the desired range of tunability .
- the Attorney Docket No. NORM-001WO01 varicap has in general a smaller range of tunability than the switching bank of capacitors, but has continuous tunability.
- Another consideration for these semiconductor devices is their non-linearity. This means that the device behaves as a linear capacitor for very small fluctuating voltages. At larger magnitude AC fields, the non-linear distortions become significant.
- FIG. 19 shows a possible approach for constructing a tunable capacitor based on varicaps.
- two varicaps are oriented in opposite directions, with an external source applying a voltage between the two varicaps.
- This approach can allow one to control the overall capacitance with an external voltage, such as a voltage signal from a digital device.
- 8.2 Constructions for bipolar coupling capacitors An arbitrary target covariance will have some elements that are negative, representing a negative correlation between variables. Bipolar (positive and negative) capacitive cou- pling enables expression of these elements in the hardware capacitance matrix.
- the negative coupling between elements can be implemented using a tunable capacitor connected to a transformer.
- the tunable capacitor sets the magnitude of the coupling while the transformer changes the sign.
- the transformer is wound such that it has the effect of negating the sign of the voltage from one side to the other, thus implementing effective negative capacitive coupling.
- a switch that can toggle between the transformer or a wire going to the neighboring cell, is placed between the capacitor and the transformer in order to turn the negative coupling on or off.
- Hybrid digital-analog method An alternative method of achieving bipolar coupling is the hybrid digital-analog, pictured in Figure 20.
- the hardware has uncoupled unit cells, each equipped with a way to measure their voltage and capacitively coupled to an arbitrary voltage source.
- This applied voltage can be negated digitally, thus effectively implementing negative coupling.
- Attorney Docket No. NORM-001WO01 8.3 Tunable constructions for inductors Real inductors have the drawback of taking up a large amount of space on-chip. Thus, having an alternative circuit element that behaves like inductor could be helpful, to reduce the area used on-chip. Active inductors can be employed.
- Fig. 21 shows a circuit diagram for an active inductor based on a gyrator.
- two Gm cells and a capacitor provide a transfer function that mimics the behavior of inductors within a given frequency range.
- the overall inductance can be tuned by modifying the capacitance of the capacitor.
- this active inductor approach includes limited frequency range, low quality factor, increased power consumption, and nonlinearity. Nev- ertheless, this circuit can be useful for reducing the on-chip area for the inductor. 9 Strategies for removing inductors 9.1 Introduction
- our analog computational primitive for Gaussian sampling (and oth- ers) involves encoding variables of interest to an application onto the degrees of freedom of circuit unit cells.
- the correlations between the variables of interest can be encoded through capacitive coupling of the unit cells.
- simple capacitive coupling does not lend itself well to encoding both negative and positive, or bipolar, correlations.
- One way to circumvent this problem is to use a transformer in series with the capacitor wired in such a way as to negate the voltage when one wants to have coupling of opposite sign to pure capacitive coupling.
- This solution however has the downside of using two inductors per coupling element.
- the number of coupling elements grows as the number of unit cells, d, squared, d 2 .
- Equation 24 can be expanded in terms of the node voltages in the following way and equation 25 can be similarly expanded as To simplify the equations, we define a new set of variables and their time derivatives. Now, equation 26 becomes and equation 27 becomes To characterize the asymmetry between the resistances in the unit cell, let us define the asymmetry parameter, ⁇ i , as and the sum of unit-cell resistances as R i ⁇ R ia + R ib .
- This asymmetry parameter is Attorney Docket No. NORM-001WO01 defined in the range on ⁇ 1 ⁇ ⁇ i ⁇ 1 and, in practice, can be kept as close to zero as possible. Then, equation 30 becomes: and equation 31 becomes Finally, replacing Equation 34 in Equation yields
- the dynamics of this unit cell can be simulated by one degree of freedom, V C ⁇ i, and that these dynamics are insensitive to the asymmetry in the resistances, ⁇ i . 9.3 Coupling-cell Bipolar capacitive coupling between unit cells of the type described in Section 9.2 is described below. This circuit contains two switches that are operated such that there are only two possible configurations, one for positive coupling and one for negative coupling.
- the mean As described in section 3.6, we can incorporate the mean as a constant offset to each oscillator variable in post-processing.
- the covariance matrix should be incorporated in the dynamics of the system, which is represented by the capacitance matrix in our physical device.
- a d x d capacitance matrix can be used to fully express the target Gaussian probaility distribution’s covariance matrix, which represents the degree of cor- relation between any two variables.
- This matrix is composed of the d self-capacitances on the diagonal and all the two-oscillator coupling capacitances on the off-diagonals. Building systems of coupled oscillators involves deciding on the degree of connectivity that is feasible to implement.
- the degree of connectivity means how many connections any oscillator in the system has to other oscillators.
- the system of coupled oscillators was assumed to be all-to-all connected. This is the highest degree of connectivity one can achieve for a given coupling scheme: all oscillators are connected to all other oscillators.
- the hardware has order d 2 connections between oscillators, (dimension of the problem). This can become a space bottleneck in larger systems.
- This problem cab be addressed by reducing the connectivity degree to have only local couplings between a set number of oscillators.
- the degree of connectivity can be defined by a given oscillator’s maximum number of connections.
- a simple connectivity scheme is a connectivity architecture of degree 2, where the oscillators are only coupled to two of their nearest neighbors.
- a connectivity less than all-to-all full
- d 5 units.
- k 1. Extending this concept to arbitrary dimensions and connectivity is straightforward. If hardware has limited connectivity, then the effect of the covariance mismatch on the behavior of the overall process should be quantified. Attorney Docket No. NORM-001WO01 10.2 Hardware with local connectivity 10.2.1 Overview If a specific application is targeted, then the properties of the corresponding covariance matrix can be directly mimicked by the hardware.
- hardware with connec- tivity k can be designed for specific classes of applications that have banded covariance matrices.
- the accept/reject step in the process can overcome some amount of mismatch between the target covariance connectivity and the hardware connectivity. For example, if the target covariance matrix is not quite banded, but the matrix elements decay exponentially with distance from the diagonal, the process run on banded k-local hardware should still be able to draw quality samples.
- Figures 25 and 26 show several possible local connectivity hardware architectures that could be implemented for appropriate classes of distributions. Implementing these strategies in hardware enables great savings in terms of on-chip space and complexity, thus unlocking the possibility for larger scale systems, while efficiently sampling from a certain family of covariance matrix.
- Cluster graph connectivity could have application, e.g., for image processing. This is because the covariance matrices for image processing often have a block circulant struc- ture. The blocks in this block circulant structure could correspond to the clusters in the cluster graph connectivity.
- FIG. 27 shows a semi-local connectivity among unit cells organized on a square lattice. Each unit cell is fully connected to other unit cells in the same row, with sparse connections between the rows of unit cells. In this scheme, the total number of connections ⁇ grows asymptotically as D D, which is neither linear nor quadratic in D. The number ⁇ of nearest neighbors n grows asymptotically as D.
- This strategy essentially corresponds to the cluster graph connectivity (discussed ⁇ above) where each cluster is a row and the cluster size grows with D.
- thermodynamic hardware to produce samples for HMC is the native error resilience of this approach. This is due to the presence of the Metropolis- Hastings (MH) step where the sample is accepted or rejected based on the energy dif- ference of the proposed sample with the previous sample. The inclusion of this step in conjunction with analog hardware that proposes the sample can be thought of as ther- modynamic error correction.
- MH Metropolis- Hastings
- the accept/reject step serves to filter out any erroneous samples that are proposed due to hardware limitations such as capacitor imprecision or connectivity constraints as will be discussed below. Therefore, this naturally occurring form of error correction is likely to make the thermodynamic approaches to HMC resilient to the presence of imperfections.
- the MH step is typically performed on digital hardware in our methodology. This allows whatever errors that accumulated during the analog dynamics to be corrected. Thus, the combination of the analog-digital interface (during the MH step) leads to the correction of errors. In some sense, the MH step serves to keep the analog dynamics on the right track.
- thermodynamic error correc- tion serves to push the position-momentum coordinates back towards the desired trajectory.
- acceptance rate also known as the success rate or acceptance probability
- the accept/reject step in the process serves as a form of error cor- rection for the hardware.
- Figure 29 shows the effect of capacitor imprecision on the acceptance rate, based on numerical simulations. Because the acceptance rate is only a weak function of imprecision, one can use capacitors represented by a small number of bits (e.g., 2 or 3 bits) in the hardware, while still maintaining good performance.
- An additional consideration during the design stage is the tolerances of the compo- nents. This refers to the possible deviation of the values of the components from their nominal values. These added deviations can also be corrected by the accept/reject step, see Figure 30. However, these tolerances can be accounted for by a full characterization of the hardware before putting it into service. 11.3 Dealing with limited connectivity Let us consider the case of restricted connectivity, such as the local connectivity discussed in Sec.10.2.
- Limited connectivity in the device leads to several consequences when apply- ing our hardware to HMC.
- the expected improvement over digital methods should be directly related to the connectivity of the device for some applications, such as producing samples from a multivariate Gaussian.
- the nature of the accept/reject step again acts to ensure that the samples that are accepted represent the desired distribution.
- One metric to consider is the acceptance probability and to be confident that this probability does not significantly decrease.
- Another approach to mitigate the possible negative effects of limited hardware con- nectivity can be potentially mitigated by classical preprocessing, which is discussed below for the case of HMC for multivariate Gaussian sampling.
- Direct truncation can be applied during the construction of the covariance matrix and setting some elements to 0 if they are smaller than some cutoff value. Therefore, this can be completed during the calculation of the matrix itself.
- the Cuthill-Mckee process then scales according to how many non-zero elements are left in the sparse matrix O , which should be smaller than O(d 2 ).
- the application of simple truncation followed by Cuthill-Mckee can therefore take a dense covariance matrix to a banded form in steps. The number of non-zero elements will depend on the truncation, which is a parameter to be set by the user.
- Fig- ure 31A shows that the acceptance ratio is roughly constant with increasing dimension.
- Figure 31B shows an exponential decay of the acceptance probability with dimension when attempting to produce samples from an approximation of the true covariance matrix with bandwidth k.
- Thermodynamic Error Mitigation Method Error mitigation can reduce the impact of errors associated with our thermodynamic sam- pling device. Specifically, this error mitigation method is targeting imprecision errors— errors associated with the imprecision of analog hardware components.
- the acronym for our method is THERMIES (THermodynamic ERror Mitigation via Imprecise Ensemble Sampling).
- 12.1 Univariate Protocol We begin with an example in the one-dimensional case in order to explain the essential concept, and then examine the more general case in a supporting document (see sup- porting document: Error Mitigation for Thermodynamic Sampling Hardware). Imagine a Gaussian sampling device which is capable of realizing a random variable X whose Attorney Docket No.
- NORM-001WO01 probability density function is That is, the device can sample a mean-zero normally distributed random variable whose variance is a multiple of ⁇ .
- the mean-zero constraint is not restrictive because a bias may be added after sampling with little computational overhead, and the discretization of variance is a consequence of the digital encoding of parameters which tune the device.
- the samples may be from a distribution having a variance which is not a multiple of ⁇ . For example, suppose we would like to sample the normal distribution N [0, 1.5 ⁇ ] (here and in what follows, N [a, b] denotes a normal distribution with mean a and (co)variance b). This distribution is illustrated in Figure 32.
- the quadratic potential may be prescribed as a function of the position coordinate x.
- other potentials can be realized, and this is the conceptual basis of extending beyond Gaussian sampling.
- the stochastic dynamics of the HMC process ensure that in the long time limit, the position and momentum will be distributed according to a Boltzmann distribution, given Attorney Docket No. NORM-001WO01 in Eq. (2), which is true for any well-behaved choice of potential function U(x).
- the distributions over x and will be independent, with the marginal distribution over x being proportional to .
- appropriately choosing the potential U(x) allows one to engineer the probability distribution that one draws samples from.
- the Boltzmann distribution over position coor- dinates is a Gaussian distribution, and hence sampling the position x samples from a Gaussian ⁇ exp( ⁇ U(x)).
- this quadratic potential is the natural physical potential associated with LC oscillator systems. If, however, we can modify the shape of the existing quadratic potential energy function U(x) provided by the LC oscillator circuit, then we can sample from distributions associated with non-quadratic potentials. In what follows, we provide two approaches for modifying the shape of the potential function: • an approach based on the thermodynamic concept of Maxwell’s demon; and • an approach based on many-body coupling of the LC oscillators.
- Section 14 describes in detail an approach to sampling from non-Gaussian distributions based on the concept of Maxwell’s demon.
- the Maxwell’s demon is a digital or analog device that applies a state-dependent physical force to the harmonic oscillator system.
- the force is state-dependent in the sense that it depends on the oscillators’ state, e.g., it may depend on the voltage across the capacitor in each unit cell. In practice, the force physically corresponds to a voltage source (or a set of voltage sources) applied to each unit cell.
- Section 16 describes in detail an approach to sampling from non-Gaussian distributions based on engineering a many-body coupling (such as a two-body coupling) of the unit cells.
- a multivariate Gaussian distribution can be described by its first two moments, i.e., the mean vector and the covariance matrix.
- a two-port device e.g., a capacitor
- tran- sistors have three ports including a source, drain, and gate. This generates a genuine three-body coupling in the Hamiltonian and hence could generate a third-order moment in the corresponding probability distribution being sampled.
- Varicaps also known as varactors
- the exponential family can be written as: p(x
- ⁇ ) h(x) exp( ⁇ ( ⁇ )T (x) ⁇ A( ⁇ )) (75)
- T (x) x
- ⁇ ( ⁇ ) ⁇
- h(x) 1, and
- A( ⁇ ) ⁇ log ⁇ .
- the other distributions can be recovered from the general formula in Eq. (75).
- Maxwell’s demon approach to sampling non-Gaussians 14.1 Introduction to Maxwell’s demon Sampling from non-Gaussian distributions involves altering the potential energy function of the system. For this purpose we introduce the notion of Maxwell’s demon. Figure 34 illustrates the basic idea of Maxwell’s demon.
- Maxwell Historically, James Clerk Maxwell consid- ered an experiment where an intelligent observer monitors the gas particles in a box with two chambers and selectively opens the door for only fast moving particles, resulting in a separation of fast and slow particles over time. This reduces the entropy of the gas system over time, although it does not violate the second law of thermodynamics, since entropy is created elsewhere.
- the intelligent observer is dubbed Maxwell’s demon. As we elaborate below, the Maxwell’s demon allows the HMC hardware to go beyond Gaussian distributions. There are various ways to physically construct a Maxwell’s demon (MD), including digital and analog ways, as discussed below. Attorney Docket No.
- Maxwell demon modifies the circuit Hamiltonian by changing the potential energy landscape over the circuit variables x.
- the “ideal” Maxwell demon would be able to apply forces to the existing HMC cells, according to any suitably smooth potential energy function U(x).
- the probability distribution is the exponential of the potential energy, up to normalization: P (x) ⁇ exp( ⁇ U(x)) .
- FIG. 36 shows that the Maxwell’s demon device can be viewed as a voltage-controlled voltage source (VCVS).
- VCVS voltage-controlled voltage source
- the diagram is shown for a single unit cell, although the concept can be generalized to multiple unit cells.
- the voltage across the capacitor in the ith unit cell is fed to the Maxwell’s demon (MD) device, which could be a digital or analog device.
- the MD device processes this input by applying some function f to it.
- the output f(x i ) is sent back as a voltage source, which gets applied as a voltage inside the original unit cell.
- the MD device acts as a VCVS, since the voltage output is controlled by the voltage across the capacitor.
- ADCs and DACs are employed to interconvert between analog and digital signals. Specif- ically, the voltages across the capacitors in the unit cells (denoted x) can be converted to digital signals and fed to the digital MD device. The digital MD device then out- puts a digital signal, which is then converted to an analog voltage vector g(x), and the components of this vector are applied as voltage sources in the appropriate unit cells. Attorney Docket No. NORM-001WO01 The main benefit of the digital approach to the MD device is flexibility and pro- grammability. Digital devices provide one with the flexibility to choose essentially any function g(x) for the output. In this sense, the digital approach to the MD device is programmable and flexible, allowing one to sample from a wide range of probability distributions P (x).
- the main drawback of the digital approach to the MD device is that it does not fully exploit the potential speedups that one could get from analog hardware.
- Digital devices can compute the gradient of the potential energy function, which can be challenging for complex potential energy functions.
- the gradient of the potential energy function corresponds to a physical force, and this physical force naturally gets applied to the system of LC oscillators without any computation involved. (This is discussed in more detail below.)
- the Maxwell’s demon (MD) device provides an opportunity to significantly accelerate the computation, if it is constructed in an analog fashion.
- auxiliary unit cell Our non-correlating strategy for constructing an analog Maxwell’s demon includes several building blocks, which are shown in Figure 38, and described as follows: 1. A voltage measurement on the primary unit cell capacitor, which then gets applied as a voltage source to drive current through an auxiliary unit cell. 2. A non-linear circuit element in the auxiliary unit cell to allow for a complicated I-V relationship. Attorney Docket No. NORM-001WO01 3. A voltage measurement across a resistor in the auxiliary unit cell (to read off the current in this cell), which then gets inverted in sign and then applied as a voltage source in the primary unit cell. Figure 38 illustrates these components in a single unit cell.
- the same construction can be copied d times for d unit cells, possibly with different hyperparameters (e.g., different non-linear elements) for each copy.
- the voltage outputted by the auxiliary unit cell is inverted in sign before it is applied to the primary unit cell. This is because the system should have negative feedback (instead of positive feedback) in order to remain stable. Negative feedback provides a restoring force, analogous in physics to how a spring acts to pull a mass backwards towards the equilibrium point. In summary, the voltage inversion helps to maintain the stability of the system. However, Figure 39 shows that a voltage inverter can be avoided by ungrounding the resistor in the auxiliary unit cell.
- an amplifier with tunable gain inserted into the coupling between the two unit cells.
- an amplifier could amplify the voltage reading on the capacitor of the primary unit cell to boost the voltage source that is applied to the auxiliary unit cell.
- an amplifier could amplify the voltage reading on the resistor of the auxiliary unit cell to boost the voltage source that is applied to the primary unit cell.
- an additional voltage source inserted into the primary unit cell, which would be constant (independent of the state variable) but tunable.
- auxiliary unit cell As shown in Figures 38 and 39, the basic components of the auxiliary unit cell are a voltage source (coming from the primary unit cell), a non-linear element, and a resistor.
- One strategy is to use the I-V characteristic of the non-linear element (NLE) as the function generator.
- NLE non-linear element
- the I-V characteristic of the NLE provides one way to engineer this complex functional relationship.
- NLEs Non-linear elements
- NLEs can be used for sampling from unimodal distributions. It turns out that NLEs with monotonic I-V characteristics mathematically give rise to uni- modal probability distributions. Hence, we restrict our attention here to NLEs with monotonic I-V characteristics.
- the second panel shows the force f(x), which is just the negative of I(x) (due to the voltage inverter shown in Fig. 38).
- the third panel shows U(x), and the fourth shows P (x).
- P (x) is non-Gaussian since it is asymmetric, and it has a sharper drop-off than a Gaussian.
- P (x) is non-Gaussian since it is asymmetric, and it has a sharper drop-off than a Gaussian.
- Transistors also provide a nonlinear I-V characteristic.
- the input characteristic is a convex nonlinear function
- the output characteristic is a concave nonlinear function that saturates for large voltages.
- the former case gives rise to a distribution P (x) similar to that of a diode
- the latter case gives rise to a distribution P (x) that has a long slowly decaying tail.
- employing the latter enables a non-Gaussian distribution with a long tail.
- NORM-001WO01 about zero voltage by placing two transistors in parallel oriented in opposite directions. Other configurations (e.g., common emitter) are also possible for transistors. 14.5.4 Non-linear elements for multi-modal distributions Let us now discuss NLEs for sampling from multi-modal distributions. To obtain multiple modes, the potential energy should have multiple minima (e.g., local and global minima). This implies that potential energy’s derivative, the force, should change sign more than once. If the force changes sign multiple times, then it should be a non-monotonic function. Hence, the I-V characteristic of the NLE should be a non-monotonic function (since it is the negative of the force). Thus, let us discuss NLEs with non-monotonic I-V characteristics.
- minima e.g., local and global minima
- Figure 41 shows the associated curves for the current, force, potential energy, and probability distribution.
- the current looks somewhat like a cubic function and clearly is non-monotonic, with a local maximum and local minimum in the first quadrant.
- the resulting f(x) curve is shown in the second panel of Figure 41.
- a potential energy U(x) with two minima and in turn, a probability distribution P (x) that is multi-modal with two modes.
- a Gunn diode As an alternative to the tunnel diode, let us also consider a Gunn diode as a candidate for the NLE.
- the Gunn diode also has a non-monotonic I-V characteristic under forward bias, with a local maximum and a local minimum in the first quadrant.
- it has a non-monotonic I-V characteristic under reverse bias.
- the curve is non- monotonic for both positive and negative voltage. This can give rise to a more complicated probability distribution P (x) than the case of the tunnel diode.
- P (x) can have multiple modes both in the positive voltage region and the negative voltage region.
- Other NLEs could be considered, such as a memristor.
- the overall I-V curve is a composition of the individual I-V curves.
- complex I-V curves can be engineered by combining various circuit elements to form the overall NLE block.
- v out (n L ⁇ h L ⁇ ...n 1 ⁇ h 1 )(v) (89) where the symbol “ ⁇ ” indicates that the operations are composed together sequen- tially.
- h j and n j are respectively the coupling and non-linear transformations for the jth layer. 5.
- the components of v out are inverted in sign to achieve negative feedback for the reasons discussed in Sec. 14.5.1. This can be done either with a voltage inverter or by ungrounding the resistor in the auxiliary unit cell and swapping the wires associated with the voltage difference across this resistor. 6.
- One benefit of such an approach is that it enables representation of more complicated probability distributions P (x), including distributions that would be difficult to sample from with standard digital methods.
- One way to achieve higher-order terms is to replace each resistor in Fig. 43 with a different circuit element, such as an n-port device with n > 2 to achieve higher-order couplings.
- transistors which are three-port devices.
- Figure 44 shows one way of generating higher-order couplings using transistors.
- the prior distribution and/or the likelihood may be parameterized by a neural network.
- the runtime of digital HMC may be limited by the computation of the gradient which is typically calculated using automatic differentiation.
- the runtime of a gradient computation of a neural network depends on the number of parameters of the network as well as the number of layers. For example, for a dense feed forward neural network with the same number of neurons in each layer m, the runtime of calculating the gradient with automatic differentiation is O . In many cases m may be chosen to scale with the input size, hence a runtime of O for neural networks employed in modern applications.
- the gradient calculator 4114 is an analog device whose input is a voltage vector corresponding to the output of the analog neural network device, and whose output is a voltage vector corresponding to the negative gradient ⁇ f ⁇ (x).
- One possible implementation for the gradient calculator 4114 comprises analog adders and subtracters, in order to estimate the gradient using finite-differences.
- Solving Hamilton’s equations may be performed in hardware by feeding in the output voltage vector ⁇ f ⁇ (x) (the negative gradient) of the gradient calculator into a device comprised of d integrators.
- the position is then given by the integral of momentum, which can be performed by feeding in the Attorney Docket No.
- the voltage vector multiplied by the inverse mass matrix MV x after a time increment T is fed to the gradient calculator.
- the mass matrix is diagonal, hence this amounts to mutiplying the voltage vector by a constant value and can be performed in analog by using an affine layer comprised of d resistors, as depicted in Fig. 43.
- Fig. 45 shows a scheme of a Maxwell’s demon device 4110 for sampling from distri- butions represented by neural networks.
- a user 41 first compiles the model f ⁇ onto a hardware analog neural network (ANN) 4112 (as well as its gradient if a closed form exists for it). Then, the output of the gradient is fed into the momentum integrator device 4120, whose output is then fed into the position integrator device 4130.
- ANN hardware analog neural network
- the momentum inte- grator device 4120 is a device comprised of d analog integrators, which outputs a voltage that represents the momentum, as shown in Fig. 45.
- the position integrator device 4130 also comprises d analog integrators and represents the position variables.
- the voltage vector V x corresponding to the position can be fed to an affine layer, which multiplies the voltage vector by M ⁇ 1 . If M is arbitrarily dense, its inversion may cost O(d 3 ) steps.
- the matrix inverter block 4140 is an affine layer as shown in Fig. , where the elements of the matrix A are the elements of M ⁇ 1 . 14.7.1 Calculating the gradient Calculating the gradient in analog is challenging. A simpler way to estimate the gradient of a function is to perform finite differences.
- the gradient calculator feeds a voltage ⁇ ⁇ 1 x, corresponding to ⁇ log p(x), the negative gradient of the multivariate Gaussian distribution.
- ⁇ ⁇ 1 Q
- the precision matrix known as the precision matrix
- a multivariate Gaussian can be defined by its mean and its precision matrix in general.
- Fig. shows an alternative device sampling a multivariate Gaussian with a given mean and precision matrix.
- the gradient calculator may be replaced by a resistive layer 4210 whose resistances correspond to the elements of the matrix Q.
- the voltage corresponding to the negative gradient Qx can be fed into parallel integrators, forming the momentum integrator device 4120, whose output corresponds to the momentum.
- the outputs should be rescaled by 1/RC terms for both the momentum integrator device 4120 and position integrator device 4130. These devices are fully parallel; they are not coupled.
- n different unit cells i.e., n different LC oscillators from the HMC hardware
- This n-port device is called a coupling device.
- networks of transistors or networks of varicaps also known as varactors
- This n-body coupling approach can be either used independently from, or together with, a Maxwell’s demon approach.
- this approach can be either used inde- pendently from, or together with, the two-body capacitive coupling discussed in previous sections.
- Genuine many-body coupling When aiming to design an analog system that is difficult to simulate digitally, it is helpful to use genuine many-body couplings, rather than effective many-body couplings.
- difficulty to simulate digitally we mean the case where the complexity of employing the digital computer involves a scaling that is greater than d 2 (for example, d 3 scaling). 16.1.3 Genuine many-body coupling
- a genuine many- body coupling can be viewed as a many-body coupling of the circuit elements involved. This is in constrast to a many-body coupling that is built from circuit elements that have few-body couplings (e.g., two-body couplings).
- Varicaps also known as a varactors
- each varicap is a two-port device, one can place two varicaps in opposite directions in a back-to-back configuration, as illustrated in Fig. 19.
- the device shown in Fig. 19 then acts as a voltage-controlled capacitor with a total of three ports.
- Another strategy is to use the switching capacitor bank shown in Fig. 18. An external Attorney Docket No. NORM-001WO01 voltage signal controls the capacitance of this capacitor bank, and hence it can be viewed as a voltage-controlled capacitor.
- the transistor, the three-port varicap device, and the switching capacitor bank can all be viewed as voltage-controlled capacitors. Each of these voltage-controlled capacitors can serve as building blocks for higher-order couplings. Namely, we can construct n-port devices by chaining together sets of these three-port devices. These n-port devices then enable engineering of n-body couplings. 16.1.6 Adding tunability or switches to the coupling Adding tunability to the three-body or n-body coupling would allow the hardware device to express a wide variety of probability distributions that have n-body terms.
- FIG. 47 shows a schematic diagram of three-body coupling mediated by a voltage-controlled capacitor. Here, we pick off the voltages associated with each capacitor (relative to ground) in each of the three unit cells. These three voltages are fed as inputs into the three ports of a three-body coupler.
- the three-body coupler is a transistor
- one voltage is fed into the gate, one is fed into the source, and one is fed into the drain of the transistor.
- VCC voltage-controlled capacitor
- we analyze how this coupling affects the probability distribu- tion being sampled. 16.2.1 Mathematical descriptions of three coupled cells For two capacitively coupled unit cells, a contribution to the potential energy takes the form U V 1 V 2 C 12 . Now suppose that C 12 is a function of the voltage V 3 of a third unit cell, which is the case depicted in Fig. 47.
- U is some function that is determined by the physical nature of the voltage-controlled capacitor.
- Attorney Docket No. NORM-001WO01 For example, suppose that this is a linear function, which is often true over a small range of voltages. In this case, we have .
- U V V V C (0) 1 2 3 123 + V 1 V 2 C 12 (98)
- U V V V C (0) 1 2 3 123 + V 1 V 2 C 12 (98)
- FIG. 48 shows a circuit diagram for how four unit cells can be coupled with a four-body coupler.
- the voltage multiplier can either be analog or digital, as elaborated below. This voltage V is then applied to the gate of a voltage-controlled capacitor, leading to a capacitance C 12 (V 3 V 4 ) between unit cells 1 and 2.
- Fig. 48 can be extended in a direct manner to couple more than four cells (n > 4). This involves feeding additional inputs into the voltage multiplier, so that n ⁇ 2 voltages are multiplied together to obtain the voltage that is ultimately applied to the gate of the voltage-controlled capacitor in the n-body coupler. Because of the mathematical connection between n-body couplings in the potential energy and n-th order cumulants in the probability distribution P (x), the resulting distribution is non-Gaussian with an n-th order cumulant (among other possible lower-order cumulants). Attorney Docket No.
- Section 8 discloses using voltage-controlled capacitors in the two-body case, where the digital hardware is used to set the capacitance value.
- the voltage measurement V 3 can be taken from a third unit cell and sent to the digital hardware as an input, and have the digital hardware output a voltage proportional to V 3 and apply this voltage to the capacitor connecting unit cells 1 and 2.
- the digital hardware acts as the intermediary between unit cell 3 and the capacitor between unit cells 1 and 2.
- NORM-001WO01 hardware complexity and the speedup, with more connections leading to greater speedup possibilities but more complex hardware. Due to this trade-off, it may be desirable to consider local connectivity.
- One approach to local connectivity is to leverage the existing infrastructure used for two-body coupling. Namely, one can employ the digital-analog approach to many-body coupling (discussed immediately above), and use whatever connectivity structure is already in place for the two-body coupling as the same structure as for n-body coupling. In this case, the con- nectivity structure for the n-body coupling will match that of the two-body coupling. So if the two body coupling is local, then so will be the n-body coupling.
- the d 5/4 scaling is the optimal scaling for digital implementations of HMC.
- dense matrix inversion such as Cholesky factorization that costs O(d 3 ) steps, can be used.
- the covariance matrix can be written as where the Cholesky decomposition is identified as the last term. Therefore, one can randomly sample directly from u (as the u i are independent) and from there sample from x by multiplying by the Cholesky factor L. Therefore, Cholesky sampling is at least as efficient as computing the full inverse of the covariance matrix in the general case. This largely depends on the structure of the covariance matrix.
- the dense covariance Attorney Docket No. NORM-001WO01 matrix can be made sparse by a preprocessing step. This step takes O(d 2 ) operations to be performed, and enables us to obtain a sparse matrix and potentially banded matrices. This step may be done for digital as well as analog approaches, and may become the bottleneck.
- the Cholesky factorization may be performed in O(k 2 d) steps.
- the potential energy function is a polynomial in d-variables. Every kth order cumulant included in the perturbative expansion introduces a homogeneous polynomial of degree k to the potential energy function.
- the coefficients of this polynomial are represented through a k-th order tensor; as such, if the highest nontrivial cumulant is order K, the potential energy function is a polynomial of leading order K.
- Attorney Docket No. NORM-001WO01 The potential speedups that result from higher-order cumulants are expected to result from that fact that these are introduced by higher-order coupling or equivalently higher- order tensor interactions.
- V (t) originates from Johnson thermal noise; this noise arises from the thermal agitation of a macroscopic collection of electrons through the effective resistor with resistance R, held at some temperature T.
- This distribution is solveable as: If we marginalize over the momentum coordinate p, the equilibrium distribution of the position coordinate (charge on capacitor), is given as the Boltzmann distribution:
- the Boltzmann distribution of the RLC system is a univariate Gaussian with mean 0 and variance kTC; if we allow a dynamical trajectory to converge to equilibrium, repeatedly measuring the position coordinate of the circuit (charge on the capacitor) amounts to sampling from this univariate distribution.
- the matrices C and L as the capacitance and inductance matrices, respectively, of the lossless coupled oscillator of section 3 (That is, like we did for the coupled LC circuit without resistors).
- the system of coordinates follows an underdamped Langevin dynamics over a 2N -dimensional phase space.
- the dynamics are given as a coupled stochastic differential equation: whereR is a drag matrix that depends on the resistances in the network andD a diffusion matrix multiplying an N-dimensional Brownian motion W (t).
- the drag matrix is related to the diffusion matrix in the coupled case; this is because the fluctuation-dissipation remains applicable. Fortunately, we do not need to know the explicit form of the drag and diffusion matri- ces to know that the dynamics of coupled RLC network converges equilibrium regardless. Just as the equilibrium distribution (120) of the single cell does not depend on the fric- tion and diffusion constants, the equilibrium distribution of the dynamics (122) does not depend on the drag or diffusion matrices.
- the sampled Boltzmann distribution will have mean zero and covariance ⁇ .
- the mean of the sampled distribution to vector by applying a constant voltage bias ⁇ i to each of the i cells throughout the dynamics. Taking a sample without a voltage bias and digitally adding to it the vector will also have the same effect.
- the stochastic dynamics of the coupled RLC network can be used to sample from arbitrary multivariate Gaussian distributions.
- Stochastic Gradient HMC addresses these challenges and improves the scalability of HMC.
- SGHMC introduces the use of a stochastic gradient calculation that can be performed with less than the entire data set. However, as a result, the dynamics introduced are no longer those induced by the desired distribution. Therefore, an addi- tional friction term is included which counteracts the effect of the noise injected by the stochastic gradient.
- the gradient is calculated using a minibatch of the whole data set D, such that D ⁇ ⁇ D. The data points calculate Assuming that the observations of x are independent we can apply the central limit Attorney Docket No.
- the friction cannot be tuned out of a realistic oscillator circuit to match the noise profile as is done in digital SGHMC. While we can tune the friction, the fluctuation-dissipation relations says this will proportionally tune the thermal noise of the system; while this need not be true of digital dynamics, it may be unavoidable in circuit dynamics. As such, the friction term of the physical oscillator should damp out the effects of thermal noise as well as misspecification noise, yet can only damp the contributions from thermal noise. To avoid this issue, we can operate in a regime of high friction/resistance.
- the unit cell for LMC has a stochastic noise source. This cor- responds to a voltage source whose voltage output is Gaussian white noise.
- a stochastic noise source for this purpose, such as the following possible methods: 1. Analog thermal noise (also known as Johnson–Nyquist noise) from a resistor. Attorney Docket No. NORM-001WO01 2. Analog shot noise from a diode, such as a Zener diode. 3. Digital pseudo-random noise generated from a field-programmable gate array (FPGA) with analog filtering.
- FPGA field-programmable gate array
- ⁇ v ⁇ C ⁇ 1 ( J v + R ⁇ 1 ⁇ v ) , (132) where Here we have introduced the self-resistance matrix R, the capacitance matrix C, and the conductance matrix J.
- NORM-001WO01 Measure the voltages across each of the capacitors in the unit cells of the LMC hardware. • Feed this voltage vector v to the digital processor. • Quantify the energy associated with this voltage vector on the digital processor. Based on this energy value, apply the accept/reject step of the MH process to this sample. This allows one to either accept or reject the proposed sample, using a digital processor as a judge of the sample quality (via the MH condition). 19.4 Sampling Gaussian distributions with LMC Hardware For a Gaussian distribution, the gradient of the logarithm of the probability distribution is given by an affine function. Specifically, it is given by: ⁇ log p where ⁇ is the mean vector is ⁇ is the covariance matrix.
- Eq. (131) for the special case of a Gaussian as follows:
- Eqs. and (133) are the differential equa- tions for the LMC hardware with resistive and capacitive coupling, respectively.
- Eq. (135) has a similar form to these equations.
- an alternative device for the LMC hardware can be used as follows. Specifically, this alternative device employs integrators instead of unit cells, as shown in Fig. 54.
- the analog device for LMC in Fig. 54 can be used to sample arbitrary probability dis- tributions, including those parametrized by a neural network as described in Section 14.7. To do so, the user should compile their representation of the probability to hardware. For a neural network, this amounts to compiling the network to analog hardware, and then computing its gradient.
- C the Maxwell capacitance matrix
- L the inductance matrix
- R the resistance matrix (which is proportional to an identity matrix)
- ⁇ 0 is a number which scales with the injected noise amplitude, and also determines the effective temperature.
- the Maxwell capacitance matrix encodes the inverse of the covari- ance matrix rather than the covariance matrix itself, so it is most amenable to applications where the precision matrix is specified as opposed to the covariance matrix.
- thermodynamic hardware these methods scale favorably with problem size compared to digital algorithms.
- V (x) is a quadratic form
- p(x) corresponds to a multivariate Gaussian distribution.
- the spatial coordinate x is a Gaussian random variable x ⁇ N [A ⁇ 1 b, ⁇ A ⁇ 1 ].
- Ax ⁇ b 0, which also corresponds to the unique maximum of p(x).
- the maximum of p(x) is also the first moment ⁇ x ⁇ .
- thermodynamic protocol for solving linear systems, which is depicted in Figure 56. Namely, the protocol involves realizing the potential in Eq. (147), waiting for the system to come to equilibrium, and then sampling x to estimate the mean ⁇ x ⁇ of the distribution. This mean can be approximated using a time-average, defined as where t 0 should be large enough to allow for equilibration and ⁇ should be large enough for the average to converge to a desired degree of precision. The eventual convergence of this time average to the mean is the content of the ergodic hypothesis, which is often assumed for quite generic thermodynamic systems.
- the mean could also be approximated as the average of a sequence of samples; however, the integration approach can be implemented convieniently in a completely analog way (for example, using an integrator circuit), which obviates the need for transferring data from the device until the end of the protocol.
- a model of the system’s microscopic dynamics may be in- troduced.
- the system under consideration is composed of harmonic oscillators in contact with a heat bath, then it is natural to allow for damping (i.e., energy exchange with the bath) and stochastic thermal noise, which always accompanies damping due to the fluctuation-dissipation theorem.
- the Langevin equation accounts for these effects, and specifically we consider two common formulations, the overdamped Langevin (ODL) equation and the underdamped Langevin (UDL) equations.
- the station- ary distribution of x is N [A ⁇ 1 b, ⁇ ⁇ 1 A ⁇ 1 ], meaning the inverse of A can be obtained by evaluating the covariance matrix of x.
- This can be accomplished in an entirely analog way, using a combination of analog multipliers, and integrators.
- Analog component can evaluates the product x i (t)x j (t) for each pair 2 resulting in d analog multiplier components. Each of these products is then fed into an analog integrator component, which computes one element of the time-averaged covariance matrix While the equilibration time is the same as for the linear system protocol, the integration time is different, because in general the covariance matrix is slower to converge than the mean.
- Analog component can evaluates the product x i (t)x j (t) for each pair 2 resulting in d analog multiplier components. Each of these products is then fed into an analog integrator component, which computes one element of the time-averaged covariance matrix While the equilibration time is the same as for the linear system protocol, the integration time is different, because in general the covariance matrix is slower to converge than the mean.
- Table 1 Comparison of asymptotic complexities of linear algebra algorithms.
- d is the matrix dimension
- ⁇ is the condition number
- ⁇ is the error.
- the complexity depends on the dynamical regime, i.e., whether the dynamics are overdamped and underdamped.
- the complexity of solving symmetric, positive definite linear systems, matrix inverse, Lyapunov equation, and matrix determinant problems are respectively for algorithms based on: conjugate gradient method, fast matrix multiplication/inverse, Bartels-Stewart algorithm, and Cholesky decomposition.
- ⁇ ⁇ 2.3 denotes the matrix multiplication constant.
- inventive concepts may be embodied as one or more methods, of which an example has been provided.
- the acts performed as part of the method may be ordered in any suitable way. Accordingly, embodiments may be constructed in which acts are performed in an order different than illustrated, which may include performing some acts simultaneously, even though shown as sequential acts in illustrative embodiments.
- Mul- tiple components listed with “and/or” should be construed in the same fashion, i.e., “one or more” of the components so conjoined.
- Other components may optionally be present other than the components specifically identified by the “and/or” clause, whether related or unrelated to those components specifically identified.
- Attorney Docket No. NORM-001WO01 a reference to “A and/or B”, when used in conjunction with open-ended language such as “comprising” can refer, in one embodiment, to A only (optionally including compo- nents other than B); in another embodiment, to B only (optionally including components other than A); in yet another embodiment, to both A and B (optionally including other components); etc.
- the term “or” as used herein shall only be interpreted as indicating exclusive alternatives (i.e., “one or the other but not both”) when preceded by terms of exclusivity, such as “either,” “one of,” “only one of,” or “exactly one of.” “Consisting essentially of,” when used in the claims, shall have its ordinary meaning as used in the field of patent law.
- the phrase “at least one,” in reference to a list of one or more components should be understood to mean at least one component selected from any one or more of the components in the list of components, but not necessarily including at least one of each and every component specifically listed within the list of components and not excluding any combinations of components in the list of components.
- “at least one of A and B” can refer, in one embodiment, to at least one, optionally including more than one, A, with no B present (and optionally including components other than B); in another embodiment, to at least one, optionally including more than one, B, with no A present (and optionally including components other than A); in yet another embodiment, to at least one, optionally including more than one, A, and at least one, optionally including more than one, B (and optionally including other components); etc.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Neurology (AREA)
- Medical Informatics (AREA)
- Algebra (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Complex Calculations (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
Claims
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP23878256.9A EP4602521A1 (en) | 2022-10-14 | 2023-10-13 | Thermodynamic computing system for sampling high-dimensional probability distributions |
| KR1020257015307A KR20250109682A (en) | 2022-10-14 | 2023-10-13 | A thermodynamic computational system for sampling high-dimensional probability distributions |
| JP2025521477A JP2025536288A (en) | 2022-10-14 | 2023-10-13 | A thermodynamic computing system for sampling high-dimensional probability distributions |
Applications Claiming Priority (6)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202263379561P | 2022-10-14 | 2022-10-14 | |
| US63/379,561 | 2022-10-14 | ||
| US202363492832P | 2023-03-29 | 2023-03-29 | |
| US63/492,832 | 2023-03-29 | ||
| US202363518545P | 2023-08-09 | 2023-08-09 | |
| US63/518,545 | 2023-08-09 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024081827A1 true WO2024081827A1 (en) | 2024-04-18 |
Family
ID=90670194
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2023/076759 Ceased WO2024081827A1 (en) | 2022-10-14 | 2023-10-13 | Thermodynamic computing system for sampling high-dimensional probability distributions |
Country Status (4)
| Country | Link |
|---|---|
| EP (1) | EP4602521A1 (en) |
| JP (1) | JP2025536288A (en) |
| KR (1) | KR20250109682A (en) |
| WO (1) | WO2024081827A1 (en) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119509611A (en) * | 2024-10-21 | 2025-02-25 | 中国大唐集团科技创新有限公司 | Floating offshore wind power multi-body coupling real-time monitoring system and method |
| CN120892660A (en) * | 2025-09-30 | 2025-11-04 | 中国石油大学(北京) | Method and apparatus for determining the governing equations of a high-noise particulate flow system |
| CN120993758A (en) * | 2025-10-22 | 2025-11-21 | 江苏汉药医疗器械有限公司 | Dynamic optimizing control method and system for technological parameters of hot compress patch |
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20070225581A1 (en) * | 1991-03-07 | 2007-09-27 | Diab Mohamed K | Signal processing apparatus |
| US20090076780A1 (en) * | 2007-08-10 | 2009-03-19 | Fujitsu Limited | Method, apparatus and computer program for molecular simulation |
| US20160267378A1 (en) * | 2015-03-13 | 2016-09-15 | International Business Machines Corporation | Neuromorphic synapses |
| US20180364785A1 (en) * | 2015-12-18 | 2018-12-20 | Hewlett Packard Enterprise Development Lp | Memristor crossbar arrays to activate processors |
| US20210004677A1 (en) * | 2018-02-09 | 2021-01-07 | Deepmind Technologies Limited | Data compression using jointly trained encoder, decoder, and prior neural networks |
| US20210049504A1 (en) * | 2019-08-14 | 2021-02-18 | Rain Neuromorphics Inc. | Analog system using equilibrium propagation for learning |
| US20210256305A1 (en) * | 2020-02-13 | 2021-08-19 | The United States Government As Represented By The Secretary Of The Navy | Noise-Driven Coupled Dynamic Pattern Recognition Device for Low Power Applications |
| US20220076130A1 (en) * | 2020-08-31 | 2022-03-10 | International Business Machines Corporation | Deep surrogate langevin sampling for multi-objective constraint black box optimization with applications to optimal inverse design problems |
-
2023
- 2023-10-13 EP EP23878256.9A patent/EP4602521A1/en active Pending
- 2023-10-13 KR KR1020257015307A patent/KR20250109682A/en active Pending
- 2023-10-13 WO PCT/US2023/076759 patent/WO2024081827A1/en not_active Ceased
- 2023-10-13 JP JP2025521477A patent/JP2025536288A/en active Pending
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20070225581A1 (en) * | 1991-03-07 | 2007-09-27 | Diab Mohamed K | Signal processing apparatus |
| US20090076780A1 (en) * | 2007-08-10 | 2009-03-19 | Fujitsu Limited | Method, apparatus and computer program for molecular simulation |
| US20160267378A1 (en) * | 2015-03-13 | 2016-09-15 | International Business Machines Corporation | Neuromorphic synapses |
| US20180364785A1 (en) * | 2015-12-18 | 2018-12-20 | Hewlett Packard Enterprise Development Lp | Memristor crossbar arrays to activate processors |
| US20210004677A1 (en) * | 2018-02-09 | 2021-01-07 | Deepmind Technologies Limited | Data compression using jointly trained encoder, decoder, and prior neural networks |
| US20210049504A1 (en) * | 2019-08-14 | 2021-02-18 | Rain Neuromorphics Inc. | Analog system using equilibrium propagation for learning |
| US20210256305A1 (en) * | 2020-02-13 | 2021-08-19 | The United States Government As Represented By The Secretary Of The Navy | Noise-Driven Coupled Dynamic Pattern Recognition Device for Low Power Applications |
| US20220076130A1 (en) * | 2020-08-31 | 2022-03-10 | International Business Machines Corporation | Deep surrogate langevin sampling for multi-objective constraint black box optimization with applications to optimal inverse design problems |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119509611A (en) * | 2024-10-21 | 2025-02-25 | 中国大唐集团科技创新有限公司 | Floating offshore wind power multi-body coupling real-time monitoring system and method |
| CN120892660A (en) * | 2025-09-30 | 2025-11-04 | 中国石油大学(北京) | Method and apparatus for determining the governing equations of a high-noise particulate flow system |
| CN120993758A (en) * | 2025-10-22 | 2025-11-21 | 江苏汉药医疗器械有限公司 | Dynamic optimizing control method and system for technological parameters of hot compress patch |
Also Published As
| Publication number | Publication date |
|---|---|
| EP4602521A1 (en) | 2025-08-20 |
| JP2025536288A (en) | 2025-11-05 |
| KR20250109682A (en) | 2025-07-17 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7164648B2 (en) | Nonlinear calibration of quantum computing devices | |
| Yan et al. | Emerging opportunities and challenges for the future of reservoir computing | |
| Kossaifi et al. | Multi-grid tensorized fourier neural operator for high-resolution pdes | |
| Leenaerts et al. | Piecewise linear modeling and analysis | |
| Medvidović et al. | Neural-network quantum states for many-body physics | |
| EP4602521A1 (en) | Thermodynamic computing system for sampling high-dimensional probability distributions | |
| Low | Hamiltonian simulation with nearly optimal dependence on spectral norm | |
| WO2020231795A1 (en) | Frequency tunable qubit control strategy | |
| EP3398121A1 (en) | Quantum statistic machine | |
| WO2008028290A1 (en) | Method and system for solving integer programming and discrete optimization problems using analog processors | |
| WO2024118915A1 (en) | Thermodynamic artificial intelligence for generative diffusion models and bayesian deep learning | |
| CN117043789A (en) | Quantum generation countermeasure network with provable convergence | |
| KR20250018975A (en) | A method and system for encoding an intended matrix in a quantum circuit | |
| Sidheekh et al. | VQ-Flows: Vector quantized local normalizing flows | |
| Zadeh et al. | Generative ai for analog integrated circuit design: Methodologies and applications | |
| Rossetti et al. | Linear operator approximate message passing (OpAMP) | |
| Marceaux et al. | Streaming quantum gate set tomography using the extended Kalman filter | |
| Livingston | Continuous feedback on quantum superconducting circuits | |
| Hong et al. | Analog Circuit Design Automation via Sequential RL Agents and Gm/ID Methodology | |
| Yang | Input-to-state stable continuous time recurrent neural networks for transient circuit simulation | |
| Bruna et al. | Convergence of a finite volume scheme for a model for ants | |
| Guseynov et al. | Quantum Framework for Simulating Linear PDEs with Robin Boundary Conditions | |
| Alruhaili et al. | Solving Time-Fractional Nonlinear Variable-Order Delay PDEs Using Feedforward Neural Networks | |
| US20250390640A1 (en) | Thermodynamic computing system configured to emulate deep neural diffusion | |
| Petersen | Hypothesis testing via AI: Generating physically interpretable models of scientific data with machine learning (Full Technical Report) |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23878256 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2025521477 Country of ref document: JP Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2025521477 Country of ref document: JP |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2023878256 Country of ref document: EP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2023878256 Country of ref document: EP Effective date: 20250514 |
|
| WWP | Wipo information: published in national office |
Ref document number: 1020257015307 Country of ref document: KR |
|
| WWP | Wipo information: published in national office |
Ref document number: 2023878256 Country of ref document: EP |