WO2025064997A1

WO2025064997A1 - Anisotropic diffusion-based analog neural network architecture

Info

Publication number: WO2025064997A1
Application number: PCT/US2024/047981
Authority: WO
Inventors: Dario Pompili; Muhammad Khizar ANJUM
Original assignee: Rutgers State University of New Jersey
Current assignee: Rutgers State University of New Jersey
Priority date: 2023-09-22
Filing date: 2024-09-23
Publication date: 2025-03-27
Anticipated expiration: 2026-03-22

Abstract

An analog neural network may include a convolutional layer including at least one convolutional processing unit (CvPU). Each CvPU can include: a center node capacitor, a set of neighbor node capacitors, a plurality of conductance elements between the center node capacitor and the set of neighbor node capacitors, and a plurality of analog adder circuits, wherein the center node capacitor and a neighbor node capacitor of the set of neighbor node capacitors are connected as inputs to an analog adder circuit of the plurality of analog adder circuits.

Description

Docket No. RTG-013XC1PCT ANISOTROPIC DIFFUSION-BASED ANALOG NEURAL NETWORK ARCHITECTURE GOVERNMENT SUPPORT [0001] This invention was made with government support under Grant No. 1937403 awarded by the National Science Foundation (NSF RTML). The Government has certain rights in the invention. CROSS-REFERENCE TO RELATED APPLICATION [0002] This application claims the benefit of U.S. Provisional Patent Application No. 63/539,894, filed September 22, 2023. BACKGROUND [0003] Real-time acquisition, analysis, and processing of electroencephalogram (EEG) signals have become increasingly important in recent years, not only for clinical and research purposes but also for continuous monitoring. EEG sensing and brain monitoring can provide valuable insight into the health and function of the brain. It is important to track and monitor EEG data in both clinical and pre-clinical settings to identify anomalies in brain activity and potentially diagnose neurological disorders at an early stage. Early diagnosis can lead to timely treatment and prevention of serious neurological conditions, such as epilepsy and Alzheimer’s disease. Regular clinical appointments for EEG monitoring may not be feasible for everyone, but wearable EEG sensors can provide a cost-effective and convenient way to monitor brain activity in daily life. [0004] Indeed, the trend of analyzing physiological markers for health tracking using wearable sensors is on the rise. However, due to the small size of these wearables, battery life is of paramount concern. Unlike heavy mobile devices, which can be packed with powerful batteries, wearable sensors do not as easily accommodate a power source. Furthermore, simply collecting raw EEG or other multimodal physiological data is not enough. The data needs to be analyzed and translated into meaningful insights about an individual’s health. [0005] Thus, there is a need for ultra-low power techniques and systems that can perform or assist in analysis of collected data. BRIEF SUMMARY [0006] Anisotropic diffusion-based analog neural network architecture for ultra-low power processing of multimodal physiological data is described. Ultra-low power devices are Docket No. RTG-013XC1PCT suitable for scenarios in which the power being consumed is compatible with that generated by energy harvesting capabilities of the node (e.g., vibration energy harvesting without a battery). Indeed, the described ultra-low powered approach is suitable for battery-less processing on the wearable devices themselves, which can help reduce the amount of energy expended on transmission and computation while still providing insights about an individual’s health. [0007] An analog neural network circuit is presented that includes: a convolutional layer including at least one convolutional processing unit (CvPU). Each CvPU can include: a center node capacitor, a set of neighbor node capacitors, a plurality of conductance elements between the center node capacitor and the set of neighbor node capacitors, and a plurality of analog adder circuits, wherein the center node capacitor and a neighbor node capacitor of the set of neighbor node capacitors are connected as inputs to an analog adder circuit of the plurality of analog adder circuits. [0008] In some cases, each neighbor node capacitor is connected to the center node capacitor by two conductance elements and an analog adder circuit, wherein the two conductance elements are disposed in series between the center node capacitor and the neighbor node capacitor wherein the center node capacitor is connected as a first input to the analog adder circuit, the neighbor node capacitor is connected as a second input to the analog adder circuit, and an output of the analog adder circuit is connected to a node between the two conductance elements. [0009] CvPUs are tessellated when forming a convolutional layer of multiple CvPUs (i.e., more than one CvPU). The analog neural network circuit can have multiple convolutional layers with pooling filter layers between. The pooling filter layers can be implemented using adders. [0010] A wearable device can include the analog neural network as a battery-less processing device that receives, as input, sensor data from at least one sensor and outputs a trigger signal to a digital processing device that operates on sensor data upon receiving the trigger signal. [0011] This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. BRIEF DESCRIPTION OF THE DRAWINGS [0012] Figure 1 illustrates an example analog convolutional processing unit. Docket No. RTG-013XC1PCT [0013] Figures 2A-2F illustrate an example three-dimensional analog convolutional processing unit. [0014] Figure 3 illustrates an example analog neural network formed of convolutional processing units such as described with respect to Figure 1 and Figures 2A-2F. [0015] Figure 4A shows an example diode-based activation function. [0016] Figure 4B shows a single voltage-based resistive processing unit (VRPU). [0017] Figure 5 illustrates a controller and control method for operating an analog neural network formed of convolutional processing units. [0018] Figure 6 shows a timing diagram/plot of controller output for 3-fold computation (T_P = 3T_F) using an arbitrary-sized CvPU array in a neural network. [0019] Figure 7 illustrates an example wearable device implementation of the described analog neural network circuit. [0020] Figure 8 is a concept visualization for ultra-low-power hybrid analog-digital processing via multiple nanosensors forming an implantable nanonetwork. [0021] Figure 9 is an example implementation of an analog neural network that can be fit into an EEG nanosensor such as described with respect to Figure 8. [0022] Figure 10 shows a conceptual design of hybrid analog-digital architecture for continuous and energy-efficient seizure detection on nanosensors. [0023] Figure 11 shows simulations of 3 × 3 and 4 × 4 anisotropic diffusion arrays in SPICE. [0024] Figure 12 shows Mean Squared Error (MSE) for different sized arrays as simulated in SPICE. [0025] Figure 13 shows plots of Peak Signal to Noise Ratio (PSNR) as the number of layers varies for different sizes of CvPU arrays stacked on top of each other. [0026] Figure 14 shows the confusion matrix for the analog convolutional neural network simulated using anisotropic diffusion-based 4-DoF convolution as described herein. [0027] Figure 15 provides a noise analysis for 3×3 and 4 × 4 arrays for temperatures ranging from െ25°C to 125°C, with a step of 25°C. DETAILED DESCRIPTION [0028] Anisotropic diffusion-based analog neural network architecture for ultra-low power processing of multimodal physiological data is described. Ultra-low power devices are suitable for scenarios in which the power being consumed is compatible with that generated by Docket No. RTG-013XC1PCT energy harvesting capabilities of the node (e.g., vibration energy harvesting without a battery). Indeed, the described ultra-low (e.g., on the order of nano- or pico-watts or less) powered approach is suitable for battery-less processing on the wearable devices themselves, which can help reduce the amount of energy expended on transmission and computation while still providing insights about an individual’s health. [0029] Accordingly, all-analog in-situ processing can be performed at the sensor level, the output of which may be used for more robust analysis, for example, by digital devices. [0030] An analog neural network circuit is presented that includes convolutional layers formed of convolutional processing units (CvPUs). The CvPUs use the properties of anisotropic diffusion in electrical circuits. An example, which applies diffusion in vision/image processing, is provided in the section entitled “Convolutional Kernal Architecture”. It should be understood that while the architecture utilizes an approach suitable for vision/image processing, the architecture presented herein is not specific to image pixels and can be applied to any general input array. [0031] In addition to applications for processing EEG signals such as described in the section entitled “Example Implementation”, the described CvPU circuitry can be utilized wherever an acceleration of neural network processing (especially CNNs) is needed. For example, the CvPU circuitry can be used for image processing, image segmentation, natural language processing, facial recognition, recommendation systems, speech recognition, and various other applications. Due to the CvPU’s low power requirements and neural network's inherent resilience to imprecise computation, analog acceleration of CNNs using CvPUs could contribute towards energy savings at datacenters and other computation servers. [0032] As described herein, an analog neural network circuit can include a convolutional layer formed of at least on CvPU that includes a center node capacitor, a set of neighbor node capacitors, a plurality of conductance elements between the center node capacitor and the set of neighbor node capacitors, and a plurality of analog adder circuits, wherein the center node capacitor and a neighbor node capacitor of the set of neighbor node capacitors are connected as inputs to an analog adder circuit of the plurality of analog adder circuits. [0033] Figure 1 illustrates an example analog convolutional processing unit (CvPU). Referring to Figure 1, each CvPU 100 includes: a center node capacitor 110 and a set of neighbor node capacitors 120. Each neighbor node capacitor (e.g., 120-W) is connected to the center node capacitor 110 by two conductance elements (e.g., 130, 140) and an analog adder circuit 150 (example implementation of the analog adder is shown in the inset). The two Docket No. RTG-013XC1PCT conductance elements (e.g., 130, 140) are disposed in series between the center node capacitor 110 and the neighbor node capacitor (e.g., 120-W). The center node capacitor 110 is connected as a first input to the analog adder circuit 150, the neighbor node capacitor (e.g., 120-W) is connected as a second input to the analog adder circuit 150, and an output of the analog adder circuit 150 is connected to a node (e.g., node 160) between the two conductance elements (e.g., 130, 140). In the illustrative example, the set of neighbor node capacitors includes: a north nearest-neighbor node capacitor 120-N, a south nearest-neighbor node capacitor 120-S, an east nearest-neighbor node capacitor 120-E, and a west nearest-neighbor node capacitor 120-W. This configuration provides 4 degrees of freedom. [0034] 8 degrees of freedom can be garnered by including a north-east nearest-neighbor node capacitor, a south-east nearest-neighbor node capacitor, a south-west nearest-neighbor node capacitor, and a north-west nearest-neighbor node capacitor. The additional capacitors are connected to the center node capacitor 110 through the two conductance elements and analog adder circuit as described with respect to the four shown in Figure 1. A three- dimensional (3D) 8 degrees of freedom (8-DoF) implementation is shown in Figures 2A-2F. [0035] In operation, each capacitor of the center node capacitor and set of neighbor node capacitors receives a voltage corresponding to an input at a particular time. In particular, magnitudes of the input signal can be encoded as the capacitor voltages for processing, and the weights are encoded as conductances. In certain implementations, the conductances can be programmed during processing through various control methods. An example controller and control method is shown in Figure 5. [0036] Figures 2A-2F illustrate an example three-dimensional analog convolutional processing unit. Referring to Figure 2A, a 3D 8-DoF CvPU 200 design can include five CvPU layers: a first capacitor array 210, a first circuit layer 220, a second circuit layer 230, a third circuit layer 240, and a second capacitor array 250. Here, a “circuit layer” refers to a die with an integrated circuit or other fabricated component. The first capacitor array and the second capacitor array can be on their own dies or be fabricated on one of the circuit layers. The 3D design of the CvPU 200 can be in the form of a 3D IC or 3D packaging. The 3D nature mediates connections to neighboring inputs to (i, j)-th input by the summation of the two, connected by the relevant conductance. [0037] While Figures 2B-2F show details of each of the five CvPU layers of the 3D 8- DoF CvPU design centered on input (i, j), which can be tessellated to create arbitrarily large input arrays for convolution, a 4-DoF CvPU with a design similar to that shown in Figure 1 can be implemented by using the first capacitor array 210 and the first circuit layer 220. Docket No. RTG-013XC1PCT [0038] Turning to Figures 2B and 2F, a total of 13 capacitors (9 in the first capacitor array 210 and 4 in the second capacitor array 250) are connected to nodes N, O, P, Q, R, S, T, U, V, W, X, Y, and Z. These capacitors are used for input and output, whereby the voltages v written underneath each capacitor are the initial voltages for each node. Through connections (e.g., which may be accomplished using through-silicon vias or other connectors) to appropriate circuit layers are marked by letters associated with the nodes. Only node N is connected to all three of the circuit layers 220, 230, and 240 shown in Figures 2C, 2D, and 2E, respectively. Furthermore, the nodes marked with a dot (・) are connected to a capacitor element, while the nodes marked with a cross (×) are left floating. Floating nodes only exist in the first circuit layer 220, as shown in Figure 2C. Finally, 8 unique conductance elements (relating to the convolutional kernel) are defined according to spatial directions: cN, cE, cS, cW, c_NE, c_NW, c_SE, and c_SW. An analog adder of the plurality of analog adder circuits (not shown in Figures 2A-2F) forming part of the CvPU can connect to nodes similarly to the design shown in Figure 1, but instead of a floating node for where the output of the analog adder circuit is input, the output of the analog adder circuit for the 3D design is connected to a node that has a capacitor (which is charged to the resultant added voltage before diffusion begins). [0039] Referring to Figure 2B, In the 3D 8-DoF CvPU 200 design, a center node capacitor and a set of neighbor node capacitors are part of a first capacitor array forming a first CvPU layer. Here, the set of neighbor node capacitors comprises a north nearest-neighbor node capacitor, a south nearest-neighbor node capacitor, an east nearest-neighbor node capacitor, a west nearest-neighbor node capacitor, a north-east nearest-neighbor node capacitor, a south- east nearest-neighbor node capacitor, a south-west nearest-neighbor node capacitor, and a north-west nearest-neighbor node capacitor. [0040] It can be seen that the initial voltages of neighboring capacitors are the sum of the principle (i, j)-th voltage and the neighboring voltage. Using the through connections to internal CvPU layers, neighboring capacitors are connected together by relevant conductances (through nodes labeled V, O, S, R, N, P, U, Q, and T). For example, as shown in Figure 2C, a first set of conductance elements of a plurality of conductance elements are part of a first circuit layer 220 forming a second CvPU layer, wherein the center node capacitor, the north nearest- neighbor node capacitor, the south nearest-neighbor node capacitor, the east nearest-neighbor node capacitor, and the west nearest-neighbor node capacitor are coupled to nodes of the first circuit layer 220. In addition, as shown in Figure 2D a second set of conductance elements of the plurality of conductance elements are part of a second circuit layer 230 forming a third Docket No. RTG-013XC1PCT CvPU layer, wherein the center node capacitor, the north-east nearest-neighbor node capacitor, and the south-west nearest-neighbor node capacitor are coupled to nodes of the second circuit layer 230. Further, as shown in Figure 2E, a third set of conductance elements of the plurality of conductance elements are part of a third circuit layer 240 forming a fourth CvPU layer, wherein the center node capacitor, the north-west nearest-neighbor node capacitor, and the south-east nearest-neighbor node capacitor are coupled to nodes of the third circuit layer 240. [0041] Referring to Figure 2F, the second capacitor array 250 contains capacitors that are relevant for connections of other inputs (through nodes labeled Z, W, y, and x). The capacitors of second capacitor array 250 are coupled to nodes of the second circuit layer 230 and the third circuit layer 240. For example, referring to Figure 2B and 2D, node S mediates connection between (i, j)-th and (iെ1, j+1)-th input (with the initial voltage of the capacitor being the sum v_i,j+v_{i-1, j+1}) and is disposed in alignment between the first capacitor array 210 and the second circuit layer 230. However, this same position collides with the placement of node W, shown in the third circuit layer 240 (see Figure 2E) that mediates connection between (i െ 1, j)-th and (i, j + 1)-th inputs. To resolve this, W is placed in the second capacitor array 250 as shown in Figure 2F, and the relevant connections of both nodes in the internal layers are shown using the letters. [0042] Advantageously, the CvPU architecture is scalable. The given architecture for a singular node can be extended to an arbitrarily large array for computation. It is also scalable in the other direction, whereby if only the first capacitor layer and the first circuit layer (providing the conductances) are used, a 4-DoF CvPU can be implemented. [0043] Indeed, CvPUs such as described with respect to Figures 1 and 2A-2F can be tessellated when forming a convolutional layer of multiple CvPUs (i.e., more than one CvPU), allowing for an arbitrary size of an input array for convolution. [0044] In some cases, an analog neural network circuit that includes convolutional layers formed of one or more convolutional processing units (CvPUs) can include at least one additional convolutional layer, where each additional convolution layer also includes at least one CvPU. In addition, a pooling filter formed of an analog adder can be connected between adjacent convolutional layers. In some cases, the analog neural network circuit can further include a delay stage including multiple delay elements to generate temporal inputs for each input channel to the convolutional layer. Figure 3 illustrates an example analog neural network formed of convolutional processing units such as described with respect to Figure 1 and Figures 2A-2F. Docket No. RTG-013XC1PCT [0045] Referring to Figure 3, an example analog neural network 300 includes multiple convolutional layers 310 formed of CvPUs 315 as described above with respect to Figure 1 and Figures 2A-2F, each followed by a pooling layer 320. In some cases, two or more of the convolutional layers 310 are able to be implemented using the same underlying CvPUs 315. An example implementation is described with respect to Figures 5 and 6. [0046] As previously mentioned, the pooling layer may be implemented using analog adders. In certain example implementations 2 × 2, and 3 × 3 average pooling filters are used with a square lattice structure of 4-DoF. A dense layer 330 can be used as the last layer of the analog neural network 300. [0047] In some cases, a diode-based activation function can be included after each convolutional layer. Figure 4A shows an example diode-based activation function (400). [0048] In some cases, the dense layer 430 includes an array of resistive processing units (RPUs). Figure 4B shows a single voltage-based resistive processing unit (VRPU). An array of VRPUs can be used for the dense layer 330. Conceptually, a cross-bar array of RPUs have row and column connections. [0049] Referring to Figure 4B, a VRPU 450 is composed of three transistors with a capacitor, referred to as a 3T1C structure. In particular, a first PMOS transistor is coupled to receive a weight at its gate; a first NMOS transistor is coupled to receive the weight at its gate (e.g., VBP = VBN = a particular weight) and coupled by its drain to a drain of the first PMOS transistor. A first capacitor is coupled at a first end to the drains of the first NMOS transistor and the first PMOS transistor. A read PMOS transistor is coupled at its gate to the first end of the first capacitor. A load (e.g., resistor) is at a drain of the read PMOS transistor. A high pass filter is at the drain of the read PMOS transistor. In the 3T1C structure, the capacitor is responsible for storing the weights and two transistors as a NMOS and PMOS pair are designed to tune the weight of the capacitor. As the input signal is sent to the drain of the last transistor, the last transistor will multiply the input signal and the voltage on the capacitor to output the current at its source. Rather than directly using the output current, a load is designed (e.g., R1) such that the voltage at the drain of the last transistor can be used. The high pass filter is included to block the DC voltage. The illustrated RPU can be used to perform matrix multiplications at the heart of neural network computation. [0050] Accordingly, a VRPU includes a first PMOS transistor coupled to receive a weight at its gate; a first NMOS transistor coupled to receive the weight at its gate and coupled by its drain to a drain of the first PMOS transistor; a first capacitor coupled at a first end to the Docket No. RTG-013XC1PCT drains of the first NMOS transistor and the first PMOS transistor; a read PMOS transistor coupled at its gate to the first end of the first capacitor; a load at a drain of the read PMOS transistor; and a high pass filter at the drain of the read PMOS transistor. [0051] Figure 5 illustrates a controller and control method for operating an analog neural network formed of convolutional processing units. Of note herein, one arbitrarily sized CvPU array 510 can be used to implement a convolutional array in a neural network, followed by further circuits for nonlinearities (e.g., reader + non-linearity circuits 520). However, since neural networks typically have multiple layers, and in order to save space, a controller architecture such as shown in Figure 5 can be implemented to reuse the CvPU array 510 for consecutive neural network layers. [0052] For a convolutional neural network with L layers, the 1^st and L^th are considered input and output layers with array size depending on the nature of inputs. In order to reuse the same CvPU array for multiple convolutional layers, careful temporal orchestration is needed. Through use of a controller as shown and described with respect to Figure 5, consecutive convolutional layers can be executed sequentially from the same neural network, with each use of the same CvPU referred to as a “fold”. That is, the basic component of the convolutional computation is a single fold. [0053] Referring to Figure 5, controller architecture 500 for multi-layer computation using limited fixed-sized CvPU array includes controller 530 providing four signals: H_IN, H_OUT, HRW, and HW for executing input, output, write/read, and weight-change operations respectively. The controller 530 orchestrates the read/write operations to execute multiple consecutive layers sequentially. [0054] Since the weights in the CvPU array 510 implementing a convolutional layer are voltage-controlled, a capacitor can be used as the basic memory element. Here, a weights- memory 540 holds the values of weights required for sequential processing. Weights-memory 540 can include an array of capacitors read/written by signals from the controller 530. Weights from the weights-memory 540 are loaded to the CvPU array 510 via loader 550. [0055] An input switch 560 connects an input signal for the first convolutional layer to a sampler 565 so as to be input to the adder array 570, which provides a pooling layer and is used to load inputs to the CvPU array 510. The input switch 560 is open for remaining folds in a processing cycle. H_OUT controls the tri-state switch 575 so as to select between a first position for feeding back the output of one fold to be input to the next fold and a second position for forwarding the output of the last fold to fully-connected layers 580 to output the final decision by the network. For both reading and writing, the conductances are made zero so that the Docket No. RTG-013XC1PCT capacitors can either be charged (written to) or read from. Finally, HW changes between nF discrete levels during the processing window T_P, to load the weights into the loader circuit for sequential computation. A sample output for controller signals is shown in Figure 6 for nF = 3. [0056] Let the time taken to process one fold be TF, which corresponds to the time- window of a signal being processed at once. The processing of the signal can involve either single or multi-layer computation (e.g., if the CvPU array contains multiple stacked convolutional layers), denoted by n_lpf (layers processed per fold). n_lpf depends on the realized circuit design and does not change once implemented. One processing cycle—through the whole network—requires n_F folds, resulting in a processing time of T_P := n_FT_F. Lastly, the time taken for each fold depends upon the time it takes to charge input- and summation-capacitors in the CvPU array (T_W), the time it takes for diffusion to happen (T_C), and then finally, the time it takes to read the outputs of the array (TR). [0057] Figure 6 shows a timing diagram/plot of controller output for 3-fold computation (TP = 3TF) using an arbitrary-sized CvPU array in a neural network. As can be seen from the timing diagram, the input control signal (Hin) and the output control signal (Hout) have a period equal to a number of convolutional layers implemented by the folded layer and a pulse length of an amount of time taken to process a single convolutional layer. As can be seen, H_IN is HIGH (connecting to 1^st layer output) for the first fold and LOW (connecting to feedback) for all other folds in a processing cycle. HOUT is HIGH (connecting to L^th layer input) for the last fold, and LOW (connecting to feedback) for all other folds in a processing cycle. When HOUT is HIGH, the output is forwarded to HWR controls read/write-operation to the CvPU array and controls if input is being fed or output is being read from the CvPU array. The read/write control signal (HRW) is on for a time to charge the capacitors, off for the time it takes for diffusion to happen and back on for the time it takes to read the outputs of the array. The input control signal (Hin) is high during a first convolutional layer of the hidden layers and low during other convolutional layers of the hidden layers. The weight-change control signal (Hw) controls the application of weights for each convolutional layer’s processing time. The weight- change control signal (Hw) can change between discrete levels. [0058] The described analog neural network circuit can be used in a wearable device. Figure 7 illustrates an example wearable device implementation of the described analog neural network circuit. Referring to Figure 7, the wearable device 700 can include a battery-less processing device 710 including an analog neural network circuit that receives, as input, sensor data from at least one sensor 720 and outputs a trigger signal; and a digital processing device Docket No. RTG-013XC1PCT 730 that operates on sensor data upon receiving the trigger signal. The digital processing device 730 can be or include a field programmable gate array (FPGA). [0059] The analog neural network circuit can be implemented as described above with respect to Figures 1-6. Advantageously, the battery-less processing device 710 can be fabricated in a complementary metal oxide semiconductor (CMOS) process. [0060] In some cases, the sensor(s) includes an electroencephalogram (EEG) helmet. An example implementation of such a case is described in the section entitled EXAMPLE IMPLEMENTATION. [0061] EXAMPLE – Convolutional Kernal Architecture [0062] There exists a choice of c(.), which is equivalent to convolving input array I with an arbitrary convolutional kernel K, i.e., finding the choices of c(x, y, t), for which is satisfied by the equation (where *

[0063] , where I(x, y, t) is an image at time t, c(x, y, t) is the operator.

[0064] This is shown by looking at (i, j)-th input, corresponding to voltage v_ij, as shown in Figures 1 and 2A-2F. As described above, adjacent inputs in the array are connected as the summation of adjacent inputs, where conductances are defined according to cardinal directions and the conductance elements are alternated along their respective directions in the array. [0065] Based on the illustrated architecture, the discrete solution for the evolution of Ii,j is ,

vi,j in practice. The conductances c are controllable and directly depend on the I-V curves of conductance elements. [0068] is a constant for a given choice of conductances (and λ∈[0.5, 1] for an

convolutional filter), the above equation is equivalent to the convolution with a 3 x 3 kernel with 8 DoF. For example, let K be a 3 x 3 kernel with indices V = {(p, q)|1 ≤ p ≤ 3 ∧ 1 ≤ q ≤ 3}. For a pixel Ii,j, the convolution can be expressed as, Docket No. RTG-013XC1PCT directly to the

[0071] [0072]

, and [0073]

are non-existent, culminating Missing connections can be implemented

by extending the current square-shaped lattice to an octagonal lattice (8 neighboring connections) for an 8-DoF convolutional kernel implementation. The conductance elements are controllable by voltage at runtime when implemented, for example, using Mead’s Horizontal Resistor circuit (HRES), also referred to as Mead’s saturating resistor. [0074] It is desirable that the capacitors at principal nodes I_i,j discharge slower than the mediator nodes, whose initial capacitor voltages are the summation of the two neighboring nodes. In experiments, such as provided in the example implementations described below, it was found that a capacitance ratio of 10 achieved the desired discharge timing. For example, 10μF for principal nodes and 100μF for the mediator nodes enabled the capacitors at the principal nodes to discharge sufficiently slower than the capacitors at the mediator nodes. Since operation of a CvPU is equivalent to convolution with the conductance elements c_n∀n ∈ N(i, j), during implementation, the conductance values can be chosen accordingly on-the-fly based on the desired kernel to convolve with. In addition, the conductance elements can be implemented using a voltage-controlled resistor, e.g., using HRES resistor element Mead’s saturating resistor). [0075] EXAMPLE IMPLEMENTATION [0076] Figure 8 is a concept visualization for ultra-low-power hybrid analog-digital processing via multiple nanosensors forming an implantable nanonetwork. EEG data is processed in-situ at multiple nanonodes and subsequent post-processing is done at gateways. The communication between nanonodes and gateways is achieved by terahertz band (0.1–10 THz) electromagnetic communication. In the illustrated concept, a two-tier wireless sensor network architecture is presented. The lower tier consists of multiple EEG nanosensors implanted in the head/on a helmet to sense and process EEG signals. The upper tier includes Docket No. RTG-013XC1PCT fewer gateways, such as mobile phones or medical devices, that collect data/pre-processed decisions from the nanosensor nodes and forward them to the cloud or personal devices that compile them into meaningful information. [0077] Electroencephalogram (EEG) involves sensing non-invasive electrical signals on the scalp. It has been used for tasks as diverse as motor-imagery inference, emotion recognition, and speech comprehension, as well as preliminary diagnosis for neurobiological conditions such as cognitive impairment, Parkinson’s, schizophrenia, and dementia. Thus, it can be helpful to be able to provide in-situ processing of EEG signals to alert users of potential issues. [0078] Figure 9 is an example implementation of an analog neural network that can be fit into an EEG nanosensor such as described with respect to Figure 8. The scheme illustrated in Figure 9 provides for co-designed analog spatio-temporal processing of EEG signals. The spatial array of signals is gradually delayed to offer insight into the temporal behavior of the instantaneous signal. This spatio-temporal picture is then processed using multiple layers of analog convolutional layers until a decision is reached in analog. An EEG Helmet is shown for validation purposes on available datasets, but the approach is trivially extendable to CvPU- based processing on individual implanted nanosensors. [0079] For modeling the temporal behavior, convolutional connections work well as they model the relation between future and past time-steps as well as adjacent channels. Mathematically, given an analog output A(t), the corresponding time-delayed signal is given by Aτ(t) = A(t െ τ ). Furthermore, as we have multiple delay elements with delays τ1, τ2, . . . , τ_M with M being the total number of delay-elements, the set of temporal inputs to the convolutional array becomes{A(t – ∑^ki=1 τi)|k = 1, 2, . . . ,M}. The spatial resolution, on the other hand, depends on the number of channels we have available at our disposal, denoted by N. All in all, the first input to the analog array is M × N, where M denotes the number of temporal components (proportional to the length of the time-window), and N denotes the number of spatial components. For this work, we choose M × N = 300 × 19. This can be seen clearly in Figure 9, where multiple delay-line elements give rise to an array with channels and time as its dimensions. This array is then input into successive 4-DoF convolutional layers with 25, 50, 100, and 150 convolutional filters, respectively. Here, 4-DoF CvPU is used for simplicity in simulation and implementation, but the concept is still valid for 8-DoF CvPU. In the analog domain, the pooling is achieved by simply using analog voltage adders in between the successive convolutional layers, making them equivalent to average pooling. Furthermore, Docket No. RTG-013XC1PCT the ReLU non-linearities are implemented using a Diode Pair (DP) architecture after each layer. At last, a dense layer (implemented using the Resistive Processing Unit (RPU)) maps the input from the filters into a vector of length 2, to which softmax is applied to determine the final decision from the network. [0080] For training, given that we have a training dataset of input signal windows and labels (d, y) ∈ (D, Y), where each sample belongs to one of K classes (Y = 1, 2, . . . , K). Our objective is to determine a function f(D) : D → Y that maps each input d to a label y. To train our model, we use the parameterized function f(D, θ), where θ are the learned parameters obtained by minimizing the training objective function: θ^∗ = arg minθ LCE(y, f(D, θ)). Here, LCE represents the Cross-Entropy loss, which is applied to the outputs of the ensemble with respect to the ground truth labels. Mathematically, LCE can be expressed as: LCE = ∑^Kk=1 I(k = yi) logσ(O_e, y_i), where O_e = (1/N_e)∑^N _e=1O_k denotes the combined logits produced by the ensemble, Ok denotes the logits produced by an individual sub-network, I is the indicator function, and σ is the SoftMax operation given by: ^^^ ^^_^^ ൌ ^{^௫^^௭^^} ∑_{^సభ ^௫^^௭ೖ^} . To initialize the network weights, we use zero-mean Gaussian distributions with of 0.01 and set the biases to 0.

The 3×3 kernel weights are constrained to values explained with respect to the section entitled “EXAMPLE – Convolutional Kernal Architecture”. We train the network for 400 epochs with a starting learning rate of 0.001, which is divided by 10 at 50% and 75% of the total number of epochs. We also apply a parameter decay of 0.0005 on the weights and biases. Our implementation is based on the PyTorch library, and we train the network using the ADAM optimizer with a batch size of 32. [0081] The scheme illustrated in Figure 9 can be integrated into an electromagnetic nanonetwork. [0082] Figure 10 shows a conceptual design of hybrid analog-digital architecture for continuous and energy-efficient seizure detection on nanosensors. The hybrid architecture can communicate triggers/sensor data to micro-gateways. Here, the analog inference is designed to have a very low False Negative (FN) rate to not miss any probable event, while any False Positives (FP) are further analyzed by the digital system (woken up only when needed, i.e., just in time). [0083] Referring to Figure 10, the hybrid analog-digital HW/SW co-designed system provides numerous advantages for EEG seizure detection. The analog part of the system (CvPU) acts as a pre-stage only, using sensors to collect EEG data and filter out irrelevant information. This stage is energy-efficient and less accurate, but is useful in identifying Docket No. RTG-013XC1PCT interesting cases that require further analysis. The digital part of the system includes an FPGA that processes the EEG data in greater detail, providing more accurate results. Since this stage is less energy-efficient, to conserve energy, the digital stage is triggered only when the analog system’s output goes high, indicating a positive indication of an EEG seizure. In the example implementation, the digital side is implemented using Seizure-Net (U. Asif, S. Roy, J. Tang, and S. Harrer, “SeizureNet: Multi-spectral deep feature learning for seizure type classification,” in Proc. Mach. Learn. Clin. Neuroimaging Radiogenomics Neuro-Oncol., Lima, Peru, Oct. 2020, pp. 77–87) with Multi-Spectral Feature Sampling. In the example implementation, the trigger/sensor data can be communicated to gateways via antennas composed of graphene, (e.g., carbon nanotubes (CNTs) when an anomaly is detected by an individual nanosensor. Furthermore, a supercapacitor can be used as an energy source, coupled with piezoelectric energy harvesting for battery-less operation. [0084] PERFORMANCE EVALUATION OF EXAMPLE IMPLEMENTATION [0085] A. Experimental Setup [0086] 1) EEG Channels Used: [0087] There is an internationally recognized system for placing EEG sensors on the scalp, known as the 10-20 system. In this system, electrodes are placed at specific locations on the scalp relative to the landmarks on the skull. The system divides the scalp into regions named according to their position and laterality relative to the midline. The electrodes are labeled with letters and numbers. For example, electrodes placed on the midline of the forehead are labeled Fz, while those on the left and right sides of the forehead are labeled F3 and F4, respectively. Similarly, electrodes placed on the left and right sides of the temporal region are labeled T3 and T4, respectively. The system also includes electrodes placed on the mastoid processes behind the ears (M1 and M2) and on the back of the head (O1 and O2) to serve as reference and ground electrodes, respectively. Overall, the placement of EEG sensors follows a standardized system to ensure that recordings can be compared across studies and that results are consistent and reliable. For the experiments, a total of 19 channels were used, including channels FP1, FP2, F3, F4, C3, C4, P3, P4, O1, O2, F7, F8, T3, T4, T5, T6, CZ, A1, and A2. The inputs to the experiments were the outputs subtracted from the reference average of all EEG channels. [0088] 2) Dataset: [0089] The TUH EEG Seizure Corpus (TUH-EEGSC) was utilized as the source of data for the study. It is the largest publicly available dataset of seizure recordings with type annotations worldwide. The dataset was released in three versions, with TUH-EEGSC v1.4.0 Docket No. RTG-013XC1PCT being released in October 2018, TUH-EEGSC v1.5.2 being released in May 2020, and v2.0.0 being released in March 2022. The TUHEEGSC v2.0.0 was used for results presented herein. This corpus has EEG signals that have been manually annotated data for seizure events (start time, stop, channel, and seizure type). Table I provides an overview of TUH-EEGSC’s statistics regarding various seizure types and patient numbers. [0090] Table 1 [0091] To ensure statistical significance, Myoclonic (MC) seizures were excluded from the study since they had only two seizures, as indicated in Table I. Seizure-level cross- validation was performed for evaluations. For training & evaluation, the dataset was sampled of 250 Hz, with a stride of 100, yielding the input-size of 300 × 19. [0092] 3) Simulation Setup: [0093] In the study, digital and analog simulations were conducted using different tools and software. For digital simulations, Python (version 3.10.10) was used. For analog simulations, LTspice (version XVII) and ngspice (version 36)– both based on the SPICE3 simulator published by the University of California, Berkeley – were used. [0094] B. Analog Convolution [0095] 1) SPICE Simulation of Anisotropic Convolution: [0096] Figure 11 shows simulations of 3 × 3 and 4 × 4 anisotropic diffusion arrays in SPICE. These arrays were simulated with capacitors of 50μF, and the conductances cN, cE, cS, and cW represented by resistances of 20kΩ, 10kΩ, 15kΩ, and 5kΩ respectively. All arrays are normalized to be in the range [0, 1], and τ represents the time constant for the arrays (simulated using ngspice).Using the effective parallel resistance that each capacitor sees, and the capacitance (same for all C), we derive the time-constant for the arrays to be τ = 0.12s. Looking at the first row of the input kernel, we see that the analog and correct outputs are very close for the 3×3 array, with mean-squared error as low as 0.08%, and the capacitors’ voltages Docket No. RTG-013XC1PCT to become steady around 4 τ . We see a similar trend for the 4 × 4 array in the third row, where we see that the error is < 1% again, and the final voltages show a very smooth trend towards fixed values. For the second row, however, we observe a higher error-rate (∼ 4%). One salient difference here is that the input array is variable, but in spite of higher error, we see that the general structure of the output is preserved. Thus, the overall trend is clear that anisotropic diffusion-based convolution is possible using circuit elements in SPICE as well. [0097] 2) SPICE Simulations for Larger Arrays: [0098] Figure 12 shows Mean Squared Error (MSE) for different sized arrays as simulated in SPICE. As can be seen in the plots of Figure 12, the MSE, which measures the gap between expected and actual results, first decreases, hitting its lowest at 4τ, and then gradually rises due to voltage loss in the circuit. Figure 12(a) illustrates the behavior for a 3× 3 array, Figure 12(b) for a 5 × 5 array, and Figure 12(c) for a more extensive 20 × 20 array. Each depicted trend is averaged from 50 random input scenarios to ensure a broad understanding of the MSE patterns. [0099] 3) SPICE Simulations for Multi-layer Convolution: [0100] Figure 13 shows plots of Peak Signal to Noise Ratio (PSNR) as the number of layers varies for different sizes of CvPU arrays stacked on top of each other. PSNR is proportional to the log of the inverse of Mean Squared Error (MSE). A swift decline in PSNR is seen when no activation functions are present, challenging the viability of multi-layer analog computations for larger networks. However, with the inclusion of nonlinear activation functions like the Sigmoid, the performance remains stable despite increasing layers. [0101] C. End-to-End Evaluation [0102] An evaluation of the performance of the proposed analog CvPU architecture was conducted against other approaches from the literature, including efficiency with respect to power and metrics such as accuracy and false positives. [0103] Figure 14 shows the confusion matrix for the analog convolutional neural network simulated using anisotropic diffusion-based 4-DoF convolution as described herein. In particular, Figure 14(a) shows a confusion matrix for detection using the analog neural network (using CvPU arrays), Figure 14(b) shows a confusion matrix for classification using a power-intensive digital network, and Figure 14(c) shows the performance of the end-to-end joint analog-digital system which uses the analog network as a continuous detector (simulated using python). Docket No. RTG-013XC1PCT [0104] Although simulated in python, additive random Gaussian noise is added to the analog convolution to induce a target mean-squared error in it. This is done to make sure that the errors due to convolution are also included in this simulation. However, as shown in Figure 13, the non-linearities help greatly in the non-deterioration of performance in consecutive layers even in the presence of imperfect analog computation. It can be seen that the network does quite well at detecting the onset of seizures in given windows with an accuracy of 78.5% with a false-negative rate of only 16% (see Figure 14(a)). Furthermore, the Seizure-Net trained on the TUH seizure corpus attains an accuracy of 86.7% using the resource-intensive digital implementation of the convolutional network. However, as shown in Figure 14(c), we note that the final system outperforms the digital system as the analog part helps it in filtering out a lot of the data, achieving an accuracy of 88% as a whole. This is not all, as using an energy- efficient analog filter helps reduce power consumption by many orders of magnitude. We can estimate the energy savings by considering that almost 99% of the TUH seizure corpus data is non-seizure and assuming a similar ratio holds in real-life deployments. If the analog detector outputs (correctly or incorrectly) ∼ 73% of the time that the data is non-seizure (background EEG), the digital part is then woken up only 27% of the times, elongating the lifetime of the device by ∼ 4×, i.e., from 3 to 10 hours to 12 to 40 hours, providing enough time for a natural charging/recharging schedule, or even being able to harvest the said power from piezoelectric harvesters. [0105] D. Scalability, Power & Noise Analysis [0106] 1) Space Scalability: [0107] Using the enhanced CvPU architecture as the building block of convolutional layers, we can calculate the number of resistors and capacitors one needs to build a convolutional layer with the input size of M ×N. For a 4-DoF 3×3 kernel and a convolutional layer with an output size of (M + 2) × (N + 2), the number of resistors comes out to be (2M + 2)(2N + 3) + (2N + 2)(2M + 3) while the number of capacitors used comes out to be (2N + 3)(M + 2)+(M+1)(N +2). Both of these numbers are linear in area (MN), which means that the simple circuit scales linearly as the input size is increased. For an input size of 300 × 19 (same as the EEG dataset), the number of resistors and capacitors required for one layer comes out to be 48802 and 18703, respectively. Using a 5 nm process, which features around 130–230 million transistors per square millimeter, all the resistors (implemented using 10 transistors) can be furnished in a few μm². Furthermore, the above-mentioned silicon process also offers a capacitance density of around 300 fF/μm², meaning that we could implement the full capacitor Docket No. RTG-013XC1PCT array using an area of a few mm2. However, since individual nanosensors do not need full capacitor array to be implemented as they will not be processing all 19 leads of EEG, a smaller array can be furnished in a space of the order of μm², making the CvPU feasible for a nanosensor-based application. A few layers of this circuit, hence, may be incorporated into a nano-implantable device that could do in-situ processing and continuous monitoring. We have, unfortunately, not realized a prototype yet and are working actively towards it. [0108] 2) Power Scalability: [0109] As anisotropic diffusion is essentially a rearrangement of voltage in the capacitors in a particular fashion that mimics convolution, the overall power consumption comes only from the leakage currents in capacitors, currents through resistors during diffusion, and adders. For our simulated 3 × 3 and 4 × 4 arrays in SPICE, we observed the current and voltage curves of transient simulations for all individual elements. By multiplying both to get power consumption and then adding for all components, we concluded that the 3×3 and 4×4 arrays consume about 130.8μW and 221.6μW, respectively. An adder only consumes 0.9μW of power, and there are 12 and 24 adders used in 3 × 3 and 4 × 4 arrays, respectively, making a total of 10.8μW and 21.6μW respectively. Furthermore, for a 300×19 array (with all 19 EEG channels in conjunction), the power consumption is estimated to be around 55.58mW. A small nanosensor, however, does not process all the EEG channels in conjunction, meaning the power consumption will be of the order of a few hundred μW for a small-enough array, and it could be powered using piezoelectric energy harvesters. A detailed model for energy consumption and harvesting, however, is out-of-scope for this article and will be investigated in future studies. On the other hand, efficient FPGAs capable of executing neural networks usually consume from 0.5 (TinyFPGA BX)–10 (Xilinx Ultra96v2) Watts, which is 1 to 3 orders of magnitude greater than the analog power consumption. This is why analog computation is invaluable for energy-efficient continuous monitoring. It is important to note here that the power might also depend further on the capacitance as it dictates how quickly the charge accumulates on the capacitor, dictating how energy-efficient it is (with a lower time-constant). Therefore, more analysis is needed for determining energy consumption with respect to the frequency and the operating point of the circuit. [0110] 3) Noise Analysis: [0111] Figure 15 provides a noise analysis for 3×3 and 4 × 4 arrays for temperatures ranging from െ25°C to 125°C, with a step of 25°C. Referring to Figure 15, it can be seen that the noise is comparatively higher (although still in the order of nano-volts) at lower frequencies Docket No. RTG-013XC1PCT (0-10 Hz). However, the operating point for EEG signal processing is around 250 Hz, meaning that the operation of the proposed circuit is resilient against noise introduced from external sources. [0112] Example implementations can include the following. [0113] Clause 1. An analog neural network circuit comprising: a convolutional layer comprising at least one convolutional processing unit (CvPU), each CvPU comprising: a center node capacitor; a set of neighbor node capacitors; a plurality of conductance elements between the center node capacitor and the set of neighbor node capacitors; and a plurality of analog adder circuits, wherein the center node capacitor and a neighbor node capacitor of the set of neighbor node capacitors are connected as inputs to an analog adder circuit of the plurality of analog adder circuits. [0114] Clause 2. The analog neural network circuit of clause 1, further comprising: at least one additional convolutional layer, each additional convolution layer comprising at least one CvPU. [0115] Clause 3. The analog neural network circuit of clause 2, further comprising: a pooling filter between adjacent convolutional layers, each pooling filter formed of an analog adder. [0116] Clause 4. The analog neural network circuit of clauses 2 or 3, further comprising: a diode-based activation function after each convolutional layer. [0117] Clause 5. The analog neural network circuit of any of clauses 2-4, further comprising: a dense layer after the convolutional layers, the dense layer comprising an array of resistive processing units (RPUs). [0118] Clause 6. The analog neural network circuit of clause 5, wherein each RPU comprises: a first PMOS transistor coupled to receive a weight at its gate; a first NMOS transistor coupled to receive the weight at its gate and coupled by its drain to a drain of the first PMOS transistor; a first capacitor coupled at a first end to the drains of the first NMOS transistor and the first PMOS transistor; a read PMOS transistor coupled at its gate to the first end of the first capacitor; a load at a drain of the read PMOS transistor; and a high pass filter at the drain of the read PMOS transistor. [0119] Clause 7. The analog neural network circuit of clause 1, wherein the neighbor node capacitor of the set of neighbor node capacitors is connected to the center node capacitor by two conductance elements of the plurality of conductance elements and the analog adder circuit of the plurality of analog adder circuits, wherein the two conductance elements are disposed in series between the center node capacitor and the neighbor node capacitor, wherein Docket No. RTG-013XC1PCT the center node capacitor is connected as a first input to the analog adder circuit, the neighbor node capacitor is connected as a second input to the analog adder circuit, and an output of the analog adder circuit is connected to a node between the two conductance elements. [0120] Clause 8. The analog neural network circuit of clause 7, wherein the set of neighbor node capacitors comprises: a north nearest-neighbor node capacitor, a south nearest- neighbor node capacitor, an east nearest-neighbor node capacitor, and a west nearest-neighbor node capacitor. [0121] Clause 9. The analog neural network circuit of clause 8, wherein the set of neighbor node capacitors of each CvPU further comprises: a north-east nearest-neighbor node capacitor, a south-east nearest-neighbor node capacitor, a south-west nearest-neighbor node capacitor, and a north-west nearest-neighbor node capacitor. [0122] Clause 10. The analog neural network circuit of clause 1, wherein the set of neighbor node capacitors comprises a north nearest-neighbor node capacitor, a south nearest- neighbor node capacitor, an east nearest-neighbor node capacitor, a west nearest-neighbor node capacitor, a north-east nearest-neighbor node capacitor, a south-east nearest-neighbor node capacitor, a south-west nearest-neighbor node capacitor, and a north-west nearest-neighbor node capacitor, wherein the center node capacitor and the set of neighbor node capacitors are part of a first capacitor array forming a first CvPU layer; wherein a first set of conductance elements of the plurality of conductance elements are part of a first circuit layer forming a second CvPU layer, wherein the center node capacitor, the north nearest-neighbor node capacitor, the south nearest-neighbor node capacitor, the east nearest-neighbor node capacitor, and the west nearest-neighbor node capacitor are coupled to nodes of the first circuit layer; wherein a second set of conductance elements of the plurality of conductance elements are part of a second circuit layer forming a third CvPU layer, wherein the center node capacitor, the north-east nearest-neighbor node capacitor, and the south-west nearest-neighbor node capacitor are coupled to nodes of the second circuit layer; and wherein a third set of conductance elements of the plurality of conductance elements are part of a third circuit layer forming a fourth CvPU layer, wherein the center node capacitor, the north-west nearest-neighbor node capacitor, and the south-east nearest-neighbor node capacitor are coupled to nodes of the third circuit layer. [0123] Clause 11. The analog neural network circuit of clause 10, further comprising a second capacitor array forming a fifth CvPU layer, the second capacitor array comprising a second set of capacitors coupled to nodes of the second circuit layer and the third circuit layer. Docket No. RTG-013XC1PCT [0124] Clause 12. The analog neural network circuit of any preceding clause, further comprising: a delay stage comprising multiple delay elements to generate temporal inputs for each input channel to the convolutional layer. [0125] Clause 13. The analog neural network circuit of any preceding clause, wherein each capacitor of the center node capacitor and set of neighbor node capacitors receives a voltage corresponding to an input at a particular time. [0126] Clause 14. The analog neural network circuit of clause 1, further comprising: a control circuit for providing timing signals to control signal paths, including a feedback signal path to reuse the at least one CvPU of the convolutional layer for at least two cycles, wherein the control circuit generates an input control signal, an output control signal, a read/write signal and a weight-change control signal, wherein the input control signal and the output control signal have a period equal to a number of convolutional layers implemented by the at least one CvPU of the convolutional layer and a pulse length of an amount of time taken to process a single convolutional layer. [0127] Clause 15. A wearable device comprising: a battery-less processing device comprising an analog neural network circuit that receives, as input, sensor data from at least one sensor and outputs a trigger signal, wherein the analog neural network circuit comprises: a convolutional layer comprising at least one convolutional processing unit (CvPU), each CvPU comprising: a center node capacitor; a set of neighbor node capacitors; a plurality of conductance elements between the center node capacitor and the set of neighbor node capacitors; and a plurality of analog adder circuits, wherein the center node capacitor and a neighbor node capacitor of the set of neighbor node capacitors are connected as inputs to an analog adder circuit of the plurality of analog adder circuits; and a digital processing device that operates on sensor data upon receiving the trigger signal. [0128] Clause 16. The wearable device of clause 15, wherein the at least one sensor comprises an electroencephalogram (EEG) helmet. [0129] Clause 17. The wearable device of clause 15 or 16, wherein the digital processing device comprises a field programmable gate array (FPGA). [0130] Clause 18. The wearable device of any of clauses 15-17, wherein the battery- less processing device further comprises: a delay stage comprising multiple delay elements to generate temporal inputs for each input channel to the convolutional layer. [0131] Clause 19. The wearable device of any of clauses 15-18, wherein each capacitor of the center node capacitor and set of neighbor node capacitors receives a voltage corresponding to an input at a particular time. Docket No. RTG-013XC1PCT [0132] Clause 20. The wearable device of any of clauses 15-19, wherein the battery- less processing device is fabricated in a complementary metal oxide semiconductor (CMOS) process. [0133] Clause 21. A wearable device such as described in any of clauses 15-20, including an analog neural network circuit according to any of clauses 1-14. [0134] Although the subject matter has been described in language specific to structural features and/or acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as examples of implementing the claims, and other equivalent features and acts are intended to be within the scope of the claims.

Claims

Docket No. RTG-013XC1PCT CLAIMS What is claimed is: 1. An analog neural network circuit comprising: a convolutional layer comprising at least one convolutional processing unit (CvPU), each CvPU comprising: a center node capacitor; a set of neighbor node capacitors; a plurality of conductance elements between the center node capacitor and the set of neighbor node capacitors; and a plurality of analog adder circuits, wherein the center node capacitor and a neighbor node capacitor of the set of neighbor node capacitors are connected as inputs to an analog adder circuit of the plurality of analog adder circuits. 2. The analog neural network circuit of claim 1, further comprising: at least one additional convolutional layer, each additional convolution layer comprising at least one CvPU. 3. The analog neural network circuit of claim 2, further comprising: a pooling filter between adjacent convolutional layers, each pooling filter formed of an analog adder. 4. The analog neural network circuit of claim 2, further comprising: a diode-based activation function after each convolutional layer. 5. The analog neural network circuit of claim 2, further comprising: a dense layer after the convolutional layers, the dense layer comprising an array of resistive processing units (RPUs). 6. The analog neural network circuit of claim 5, wherein each RPU comprises: a first PMOS transistor coupled to receive a weight at its gate; a first NMOS transistor coupled to receive the weight at its gate and coupled by its drain to a drain of the first PMOS transistor; Docket No. RTG-013XC1PCT a first capacitor coupled at a first end to the drains of the first NMOS transistor and the first PMOS transistor; a read PMOS transistor coupled at its gate to the first end of the first capacitor; a load at a drain of the read PMOS transistor; and a high pass filter at the drain of the read PMOS transistor. 7. The analog neural network circuit of claim 1, wherein the neighbor node capacitor of the set of neighbor node capacitors is connected to the center node capacitor by two conductance elements of the plurality of conductance elements and the analog adder circuit of the plurality of analog adder circuits, wherein the two conductance elements are disposed in series between the center node capacitor and the neighbor node capacitor, wherein the center node capacitor is connected as a first input to the analog adder circuit, the neighbor node capacitor is connected as a second input to the analog adder circuit, and an output of the analog adder circuit is connected to a node between the two conductance elements. 8. The analog neural network circuit of claim 7, wherein the set of neighbor node capacitors comprises: a north nearest-neighbor node capacitor, a south nearest-neighbor node capacitor, an east nearest-neighbor node capacitor, and a west nearest-neighbor node capacitor. 9. The analog neural network circuit of claim 8, wherein the set of neighbor node capacitors of each CvPU further comprises: a north-east nearest-neighbor node capacitor, a south-east nearest-neighbor node capacitor, a south-west nearest-neighbor node capacitor, and a north-west nearest-neighbor node capacitor. 10. The analog neural network circuit of claim 1, wherein the set of neighbor node capacitors comprises a north nearest-neighbor node capacitor, a south nearest-neighbor node capacitor, an east nearest-neighbor node capacitor, a west nearest-neighbor node capacitor, a north-east nearest-neighbor node capacitor, a south-east nearest-neighbor node capacitor, a south-west nearest-neighbor node capacitor, and a north-west nearest-neighbor node capacitor, Docket No. RTG-013XC1PCT wherein the center node capacitor and the set of neighbor node capacitors are part of a first capacitor array forming a first CvPU layer; wherein a first set of conductance elements of the plurality of conductance elements are part of a first circuit layer forming a second CvPU layer, wherein the center node capacitor, the north nearest-neighbor node capacitor, the south nearest-neighbor node capacitor, the east nearest-neighbor node capacitor, and the west nearest-neighbor node capacitor are coupled to nodes of the first circuit layer; wherein a second set of conductance elements of the plurality of conductance elements are part of a second circuit layer forming a third CvPU layer, wherein the center node capacitor, the north-east nearest-neighbor node capacitor, and the south-west nearest-neighbor node capacitor are coupled to nodes of the second circuit layer; and wherein a third set of conductance elements of the plurality of conductance elements are part of a third circuit layer forming a fourth CvPU layer, wherein the center node capacitor, the north-west nearest-neighbor node capacitor, and the south-east nearest-neighbor node capacitor are coupled to nodes of the third circuit layer. 11. The analog neural network circuit of claim 10, further comprising a second capacitor array forming a fifth CvPU layer, the second capacitor array comprising a second set of capacitors coupled to nodes of the second circuit layer and the third circuit layer. 12. The analog neural network circuit of claim 1, further comprising: a delay stage comprising multiple delay elements to generate temporal inputs for each input channel to the convolutional layer. 13. The analog neural network circuit of claim 1, wherein each capacitor of the center node capacitor and set of neighbor node capacitors receives a voltage corresponding to an input at a particular time. 14. The analog neural network circuit of claim 1, further comprising: a control circuit for providing timing signals to control signal paths, including a feedback signal path to reuse the at least one CvPU of the convolutional layer for at least two cycles, wherein the control circuit generates an input control signal, an output control signal, a read/write signal and a weight-change control signal, wherein the input control signal and the output control signal have a period equal to a number of convolutional layers implemented by Docket No. RTG-013XC1PCT the at least one CvPU of the convolutional layer and a pulse length of an amount of time taken to process a single convolutional layer. 15. A wearable device comprising: a battery-less processing device comprising an analog neural network circuit that receives, as input, sensor data from at least one sensor and outputs a trigger signal, wherein the analog neural network circuit comprises: a convolutional layer comprising at least one convolutional processing unit (CvPU), each CvPU comprising: a center node capacitor; a set of neighbor node capacitors; a plurality of conductance elements between the center node capacitor and the set of neighbor node capacitors; and a plurality of analog adder circuits, wherein the center node capacitor and a neighbor node capacitor of the set of neighbor node capacitors are connected as inputs to an analog adder circuit of the plurality of analog adder circuits; and a digital processing device that operates on sensor data upon receiving the trigger signal. 16. The wearable device of claim 15, wherein the at least one sensor comprises an electroencephalogram (EEG) helmet. 17. The wearable device of claim 15, wherein the digital processing device comprises a field programmable gate array (FPGA). 18. The wearable device of claim 15, wherein the battery-less processing device further comprises: a delay stage comprising multiple delay elements to generate temporal inputs for each input channel to the convolutional layer. Docket No. RTG-013XC1PCT 19. The wearable device of claim 15, wherein each capacitor of the center node capacitor and set of neighbor node capacitors receives a voltage corresponding to an input at a particular time. 20. The wearable device of claim 15, wherein the battery-less processing device is fabricated in a complementary metal oxide semiconductor (CMOS) process.