CN114758699A

CN114758699A - Data processing method, system, device and medium

Info

Publication number: CN114758699A
Application number: CN202210276205.7A
Authority: CN
Inventors: 徐方磊; 虞志益; 陈伟冲; 赵贵华
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2022-03-21
Filing date: 2022-03-21
Publication date: 2022-07-15

Abstract

A data processing method, system, device and medium provided by the present invention mainly include the following steps: acquiring a sensor signal, sending the sensor signal to a read bit line of an SRAM array; determining that the voltage value of the read bit line is stable, and turning on the SRAM The read word line of the array; the weight value of the sensor signal is obtained through the read word line, and the multiplication operation is performed according to the weight value to obtain the first output voltage and output to the shared bit line; the first output voltage is binarized to obtain the second output data , obtain the convolution operation result from the second output data; the scheme can eliminate a series of intermediate processing steps of analog-to-digital conversion and digital-to-analog conversion, greatly reduce the hardware overhead, power consumption, and delay of data transmission, and can effectively improve the performance of neural network algorithms. It can also reduce the delay and power consumption caused by data movement; at the same time, it provides high-performance requirements such as low delay and high bandwidth, and can be widely used in the field of embedded technology.

Description

Data processing method, system, device and medium

Technical Field

The present invention relates to the field of embedded system technologies, and in particular, to a data processing method, system, apparatus, and medium.

Background

In low power embedded systems, the gap between complex computing requirements and hardware resource availability is increasing. Meanwhile, for application scenarios such as the internet of things and edge computing, high performance and low power consumption are increasingly required to meet the requirements of instantaneity and convenience.

Under a traditional von neumann computer architecture, a computing unit and a storage unit are separated in physical space, and data transmission is performed between the computing unit and the storage unit through a data bus. And the computing unit reads data from the memory according to the instruction and stores the data back to the memory after computing is completed. The architecture is one of the major bottlenecks that restrict the performance improvement of the computing system, which not only increases the computing delay but also brings huge energy consumption overhead. It is estimated that in von neumann computer architecture, the power consumption of data transmission accounts for more than 50% of the total data computation power consumption. In addition, the problem of Memory Wall (Memory Wall) is more significant. Due to the separation of computing and storage, processors and memories evolve towards respective directions and ideas. Processors are pursuing high frequency and high performance, while memory is targeted at low cost and high density, which is relatively slow developing in performance. For a long time, the data access latency of the memory has only been a poor 10% reduction compared to the annual performance improvement of about 55% for processors. Meanwhile, for the requirement of mass data transmission, the performance and efficiency of system calculation are also severely restricted by the limited bandwidth of the data bus.

Disclosure of Invention

In view of the foregoing, to at least partially solve one of the above technical problems, embodiments of the present invention provide a high-performance and low-power data processing method, and a system, an apparatus, and a medium capable of implementing the method.

In one aspect, a technical solution of the present application provides a data processing method, including the following steps:

acquiring a sensor signal, and sending the sensor signal to a read bit line of an SRAM array; the sensor signals are pixel points of an input characteristic diagram, and voltage signals obtained by conversion through binarization processing are obtained;

determining that the voltage value of the read bit line is stable, and starting a read word line of the SRAM array;

acquiring a weight value of a sensor signal in the read word line, acquiring a sensor signal in the read bit line, and performing multiplication operation on the sensor signal according to the weight value to obtain a first output voltage and outputting the first output voltage to a shared bit line; the first output voltage is a voltage signal of a pixel point in the output characteristic diagram;

and converting the first output voltage to obtain second output data, and obtaining a convolution operation result from the second output data.

In a practical embodiment of the present disclosure, the reading bit line includes a first reading bit line and a second reading bit line, the obtaining a weight value of a sensor signal in the reading bit line obtains the sensor signal in the reading bit line, and performing a multiplication operation with the sensor signal according to the weight value to obtain a first output voltage and output the first output voltage to the shared bit line includes:

Determining that the weighted value is a first numerical value, the first transistor is turned off, the second transistor is turned on, and the sensor signal in the first read bit line is sent to the shared bit line;

determining the weight value to be a second value, enabling the first transistor to be turned on, enabling the second transistor to be turned off, and sending the sensor signal in the second read bit line to the shared bit line;

and determining that the weight value is a third numerical value, turning off the first transistor, turning off the second transistor, and sending the sensor signals in the first reading bit line and the second reading bit line to the shared bit line.

In a possible embodiment of the present disclosure, the step of converting the first output voltage to obtain second output data, and obtaining a convolution operation result from the second output data includes:

and subtracting the second output data in the first shared bit line from the second output data in the second shared bit line to obtain the convolution operation result.

In a possible embodiment of the present disclosure, the data processing method further includes the following steps:

storing a first output voltage which is obtained by the multiplication operation and has a positive value to a first capacitor;

And storing the first output voltage which is obtained by the multiplication operation and has a negative value to the second capacitor.

In a possible embodiment of the present disclosure, the convolution operation result satisfies the following formula:

wherein OUT is the result of convolution operation, W_iIs a weight value, V_iFor the voltage value of the sensor signal, N is the number of SRAM arrays, i is 1,2,3, …, and N is a positive integer.

On the other hand, the technical solution of the present application further provides a data processing system, which includes:

the sensor storage array is used for acquiring a sensor signal and sending the sensor signal to a read bit line of the SRAM array; the sensor signal is a voltage signal obtained by converting a pixel point of an input characteristic diagram through binarization processing;

the reading calculation circuit is used for acquiring a weight value of a sensor signal in the reading word line, acquiring a sensor signal in the reading bit line, and performing multiplication operation on the sensor signal according to the weight value to obtain a first output voltage and outputting the first output voltage to the shared bit line; the first output voltage is a voltage signal of a pixel point in the output characteristic diagram;

and the analog-to-digital conversion circuit is used for converting the first output voltage to obtain second output data and obtaining a convolution operation result from the second output data.

In a possible embodiment of the solution of the present application, said sensor memory array is a ternary sram; the ternary static random access memory comprises a first ternary inverter and a second ternary inverter, wherein the output end of the first ternary inverter is connected to the input end of the second ternary inverter; the output end of the second ternary inverter is connected to the input end of the first ternary inverter.

In a possible embodiment of the present disclosure, the first ternary inverter includes a thin gate NMOS transistor, a thick gate NMOS transistor, a thin gate PMOS transistor, and a thick gate PMOS transistor;

when the input end of the first ternary phase inverter is a high level signal, the thick gate NMOS tube and the thin gate NMOS tube are conducted, and the output end of the first ternary phase inverter is a low level signal;

when the input end of the first ternary phase inverter is a low level signal, the thick gate PMOS tube and the thin gate PMOS tube are conducted, and the output end of the first ternary phase inverter is a high level signal;

when the signal amplitude of the input end of the first ternary phase inverter is half of the high level signal amplitude, the thin gate NMOS tube and the thin gate PMOS tube are conducted, and the signal amplitude of the output end of the first ternary phase inverter is half of the high level signal amplitude;

The second ternary inverter has the same structure as the first ternary inverter.

On the other hand, the technical solution of the present application further provides a data processing apparatus, which includes:

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, the at least one program causes the at least one processor to carry out a data processing method according to any one of the first aspect.

On the other hand, the present technical solution also provides a storage medium, in which a program executable by a processor is stored, and when the program executable by the processor is executed by the processor, the program is used to execute a data processing method according to any one of the first aspect.

Advantages and benefits of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention:

the technical scheme of the application is based on a storage-computation integrated convolution computation acceleration framework for the sensor, the scheme can be directly input from the sensor without a series of processing steps of analog-to-digital conversion and digital-to-analog conversion, and the hardware overhead, power consumption and delay of data transmission are greatly reduced; the SRAM capable of storing a plurality of weights is adopted, so that the accuracy of a neural network algorithm can be effectively improved, and the delay and power consumption caused by data movement can be reduced; the scheme meets the requirements of the neural network on reducing the cost including power consumption and hardware overhead for a hardware implementation architecture and simultaneously provides high-performance requirements such as low delay, high bandwidth and the like.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a diagram of a perceptual computing architecture commonly used in the related art;

fig. 2 is a schematic diagram of a storage-computation integrated convolution computation acceleration architecture for a proximity sensor in the technical solution of the present application;

fig. 3 is a schematic structural diagram of a data processing system according to an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a three-valued SRAM in the system according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a ternary inverter STI in the system according to an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a convolution calculation system in a system according to an embodiment of the present application;

fig. 7 is a flowchart illustrating steps of a data processing method according to an embodiment of the present disclosure.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.

In the intelligent era, the progress of human society is greatly promoted by the scientific technology represented by Artificial Intelligence (AI). In recent years, the artificial intelligence technology has been developed rapidly, and has been widely applied in the consumer field and the industrial production field. For example: image recognition, industrial robots, autopilot, meta universe, medical image analysis, and the like. Meanwhile, as the preface screen is opened in the era of intelligent internet of things, more and more data start to flow in clouds, sides and ends. Exponentially increasing amounts of data place higher demands on the computing power and power consumption of existing computing architectures. Due to the existence of Memory Wall (Memory Wall) and Power Wall (Power Wall), the limitation of von neumann computing architecture is increasingly prominent. Therefore, a new computing architecture is needed to address the challenges of future application scenarios. In such a context, the holistic concept reenters the academic and industrial fields of view.

The success of artificial neural network algorithms and the breakthrough of powerful hardware bases have together promoted the rapid development of the artificial intelligence revolution. In recent years, artificial neural networks have shown great advantages in many application scenarios such as target detection, wearable devices, natural language processing, and the like. In the software level, the artificial neural network algorithm has achieved great success, in order to support the effective implementation of the artificial neural network model from the cloud end to the edge device. Researchers in academia and industry have begun to design hardware accelerators dedicated to artificial neural networks. Currently, the mainstream platform for accelerating the artificial neural network algorithm is a Graphics Processing Unit (GPU), which has the advantages of high computational accuracy and very flexible programming. Training of artificial neural network algorithms is typically done in GPU clusters, which are very energy consuming, consuming up to hundreds to thousands of watts of power. To improve energy efficiency and make it suitable for large data center environments, researchers have begun customizing Application Specific Integrated Circuit (ASIC) architecture solutions, such as the Tensor Processing Unit (TPU) of Google, for cloud and edge devices. However, the real problem of the artificial neural network algorithm accelerator is the frequent data movement between the computing unit and the memory unit, which is the memory wall problem in the traditional von Neumann architecture. Most of the operations of artificial neural network processing are Vector Matrix Multiplication (VMM) between input vectors and weight matrices, which essentially performs multiply-accumulate (MAC) operations. Memory Computing (CIM) is therefore considered to be the most efficient solution with the potential to break the von neumann architecture bottleneck.

Furthermore, sensor systems are an important component of artificial intelligence devices. Traditional sensor systems have become increasingly unsuitable for smart devices, whose energy consumption cannot support long-term continuous data acquisition tasks. In conventional solutions, the sensor system is physically separated from the computing unit, since its functional requirements and manufacturing technology differ. The sensors work primarily in the noisy analog domain, while the computing units are typically implemented digitally based on a traditional von neumann computing architecture. The sensor terminal collects a large amount of raw data locally and then transmits the data to a computing unit of a local system. As shown in fig. 1, in a conventional intelligent system, analog data collected by a sensor is first converted into a digital signal by an analog-to-digital converter (ADC), and then temporarily stored in a memory, and then a processing unit extracts the data from the memory for processing. In the whole, data are collected from the sensor and processed by the computing unit, and a series of processes such as data conversion, data transmission and the like exist in the middle. And the published data shows that ADC and data storage dominate the power consumption of the overall system. Therefore, the system architecture inevitably causes significant problems of energy consumption, processing speed, communication bandwidth and the like.

Based on the foregoing theoretical basis, as shown in fig. 2, the technical solution of the present application provides a storage and computation integrated convolution computation acceleration architecture design for a proximity sensor. In a first aspect, an embodiment of the present application provides a data processing system based on this computation-integrated convolution computation acceleration architecture, where the system mainly includes a sensor storage array, a readout computation circuit, and an analog-to-digital conversion circuit.

The overall architecture of the system is shown in fig. 3, wherein the storage array is composed of an SRAM, and the storage array acquires a sensor signal through a connected sensor and sends the sensor signal to a read bit line of the SRAM array; and also for storing weights of the neural network model. A reading calculation circuit in the system is used for determining that the voltage value of the reading bit line is stable and starting a reading word line of the SRAM array; and obtaining a weight value of the sensor signal through the read word line, and performing multiplication operation according to the weight value to obtain a first output voltage and outputting the first output voltage to the shared bit line. And the analog-to-digital conversion circuit in the system is used for converting the first output voltage to obtain second output data, and subtracting the second output data to obtain a convolution operation result.

In some alternative embodiments, the sensor memory array in the system is a ternary SRAM; the ternary static random access memory comprises a first ternary inverter and a second ternary inverter, wherein the output end of the first ternary inverter is connected to the input end of the second ternary inverter; the output end of the second ternary inverter is connected to the input end of the first ternary inverter.

In the related art, the SRAM used in the mainstream CIM (Computer Integrated manufacturing) architecture can only store 2 weights W_i(1, -1) in the CIM architecture, high level V_ddRepresents the weight value W_iIs 1, low level V_GNDWeight value W of the representative_iIs-1. Specifically, in the technical solution of the present application, as shown in fig. 3, the embodiment system adopts a three-valued SRAM, and can store 3 weight values W_i. In FIG. 3, the Q point is at a high level V_ddRepresents a weight value W_iIs-1, the Q point is a low level V_GNDWeight value W of the representative_iIs 1, Q is 1/2V_ddWeight value W of the representative_iIs 0. Three-value SRAM actually consumes more power than ordinary six-tube SRAMSlightly higher, but a weight value is added, so that the recognition accuracy of the neural network model is increased greatly.

In some alternative embodiments, as shown in fig. 4, the configuration of the ternary inverter (including the first ternary inverter and the second ternary inverter) in the embodiment system mainly includes a thin gate NMOS transistor, a thick gate NMOS transistor, a thin gate PMOS transistor, and a thick gate PMOS transistor.

When the input end of the ternary inverter is a high level signal, the thick gate NMOS tube and the thin gate NMOS tube are conducted, and the output end of the ternary inverter is a low level signal; when the input end of the ternary inverter is a low level signal, the thick gate PMOS tube and the thin gate PMOS tube are conducted, and the output end of the ternary inverter is a high level signal; when the signal amplitude of the input end of the ternary phase inverter is half of the high-level signal amplitude, the thin-gate NMOS tube and the thin-gate PMOS tube are conducted, and the signal amplitude of the output end of the ternary phase inverter is half of the high-level signal amplitude.

As shown in fig. 5 in particular, the SRAM in the embodiment can store three weight keys, namely the design of the ternary inverter (STI); in an embodiment system, multi-threshold CMOS technology is used in STI to achieve ternary switching operation. The thick gate NMOS tube is only at the gate voltage of V _ddIs conducted, the thick gate PMOS tube is only conducted when the grid voltage is V_GNDIs conducted, and the NMOS tube with a thinner grid electrode has a grid electrode voltage of V_ddOr 1/2V_ddThe PMOS tube with a thinner grid electrode is conducted, and the grid electrode voltage is V_GNDOr 1/2V_ddAnd conducting. When the input voltage In is V_ddWhen the output voltage Out is V_GND(ii) a When the input voltage In is V_GNDWhen the output voltage Out is V_GND(ii) a When the input voltage In is 1/2V_ddWhen the output voltage Out is 1/2V_dd。

As shown in fig. 6, more specifically, when the system provided in the present application performs convolution calculation, the voltage value V may be directly input from the sensor_iThe data collected by the sensor does not need an intermediate series of analog-to-digital converter (ADC) and digital-to-analog converter (DAC) and is storedTherefore, most energy consumption can be saved, the delay of intermediate data can be greatly reduced, and the processing speed of the whole system is improved. The convolution calculation (multiply accumulate MAC) is performed next in the CIM architecture.

Furthermore, based on the data processing system based on the storage and computation integrated architecture of the proximity sensor proposed in the first aspect, the present application further provides a data processing method, and the method mainly completes the multiply-accumulate computation of the convolution layer and the full connection layer of the binarization convolutional neural network CNN. First, the artificial neural network model on which the embodiment is based is the LeNet-5 neural network model. And carrying out binarization operation based on a LeNet-5 neural network model to construct a binarization neural network model. Although the full-precision floating-point type CNN can provide high recognition accuracy, the cost behind the high recognition rate is huge data computation amount, high power consumption and high hardware cost. This is burdensome for low power, hardware resource constrained embedded edge devices. From the viewpoint of hardware friendliness, the method for performing binarization operation on the CNN is as follows: and determining a binarization result according to the positive and negative of the floating point number by using a sign function. In short, the result of binarization for positive numbers is 1, and the result of binarization for negative numbers is-1; the specific binarization formula is as follows:

The network structure of LeNet-5 neural network model, LeNet-5 neural network model is a CNN network model used for handwriting recognition. The LeNet-5 neural network model has a total of 6 layers excluding the input layer and the output layer: c1 and C3 are convolutional layers, S2 and S4 are pooling layers, and F5 and F6 are fully-connected layers. By carrying out binarization on the LeNet-5 neural network model, the calculation amount and hardware overhead are greatly reduced. In the whole LeNet-5 neural network model, most of the calculation amount occurs in a convolution layer and a full connection layer, and the convolution operation of the two layers is essentially a multiply-accumulate (MAC) operation. Therefore, the embodiment method, which completes the two layers of computation in the computation-integrated computing architecture (CIM) proposed by us, will effectively reduce the power consumption of the whole system and improve and accelerate the whole neural network model. Based on the foregoing theoretical basis, as shown in fig. 7, the embodiment method includes steps S100-S400:

s100, obtaining a sensor signal, and sending the sensor signal to a read bit line of the SRAM array, wherein the sensor signal is a voltage signal obtained by converting a pixel point of an input characteristic diagram through binarization processing.

Illustratively, in an embodiment, a memory bank architecture of the proximity sensor includes 64 SRAM cells in a row, meaning that 64 multiplications can be done in parallel and the results accumulated. First, the embodiment sends the voltage value Vi output by the sensor to the read bit line of the SRAM array.

S200, determining that the voltage value of the read bit line is stable, and starting a read word line of the SRAM array.

In an embodiment, after the voltage applied to the read bit line is stabilized, the Read Word Line (RWL) of the SRAM is turned on to start to read the weight value W pre-stored in the SRAM_i。

S300, acquiring a weight value of a sensor signal in the read word line, acquiring the sensor signal in the read bit line, and multiplying the sensor signal by the weight value to obtain a first output voltage and outputting the first output voltage to a shared bit line; the first output voltage is a voltage signal of a pixel point in the output characteristic diagram.

In an embodiment, the read bit lines include a first read bit line and a second read bit line (RBL and RBLB), and the step of obtaining the sensor signal weight value through the read word line and performing a multiplication operation according to the weight value to obtain a first output voltage output to the shared bit line in the method may include steps S310 to S330:

s310, determining that the weight value is a first numerical value, the first transistor is turned off, the second transistor is turned on, and the sensor signal in the first read bit line is sent to the shared bit line.

As shown in fig. 6, when the weight value W_iAt 1, i.e. Q has a voltage value of V _GNDThen QB has a voltage value of V_ddAt this timeThe transistor N4 (i.e., the second transistor) is turned on, the voltage on RBLB will be drained to ground, and the voltage on RBL remains unchanged when the transistor N3 (i.e., the first transistor) is turned off, which puts EN on_PThe switch is closed and the voltage on RBL is transferred to the V _ p shared bit line. Completing multiplication operation 1V_iAnd obtaining a second numerical value.

And S320, determining that the weighted value is a second numerical value, turning on the first transistor, turning off the second transistor, and sending the sensor signal in the second read bit line to the shared bit line.

As shown in fig. 6, when the weight value W_iWhen the voltage is-1, that is, the voltage value of Q is V_ddThen QB has a voltage value of V_GNDWhen the transistor N3 is turned on, the voltage on RBL will be completely discharged to ground, and when the transistor N4 is turned off, the voltage on RBLB will remain unchanged, and EN will be turned on_nThe switch is closed and the voltage on RBLB is transferred to the V _ n shared bit line. Completing multiplication operation-1V_iAnd obtaining a second numerical value.

S330, determining that the weight value is a third numerical value, turning off the first transistor, turning off the second transistor, and sending the sensor signals in the first read bit line and the second read bit line to the shared bit line.

As shown in fig. 6, when the weight value W_iWhen the voltage value is 0, i.e., the voltage value of Q is 1/2Vdd, the voltage value of QB is 1/2Vdd, the voltages at N3 and N4 transistors are both turned off, RBL and RBLB are kept constant, and EN is turned on_P、EN_nThe switch is closed and the voltage on RBL is transferred to the V _ p shared bit line and the voltage on RBLB is transferred to the V _ n shared bit line. Complete multiplication operation 0V_iAnd obtaining a second numerical value.

In an embodiment, the step of performing binarization processing on the first output voltage to obtain second output data and obtaining a convolution operation result from the second output data includes a step of subtracting the second output data in the first shared bit line and the second output data in the second shared bit line to obtain the convolution operation result.

In the embodiment, although the voltages on the two bitlines are respectively applied to the shared bitlines V _ p and V _ n, the difference between the voltages is 0, and therefore 0V is achieved_iThe effect of the multiplication.

In some possible embodiments, after the calculation results of the multiplication operations are obtained, the voltage on the read bit line (RBL, RBLB) is transferred to the shared bit lines V _ p and V _ n for each multiplication operation based on capacitive coupling and charge sharing schemes, and the voltage accumulation is stored in the Cp and Cn capacitors. C for the positive voltage on the shared bit line V _ p _pCapacitance, C on shared bit line V _ n for negative voltage of multiplication result_nAnd (4) a capacitor.

S400, converting the first output voltage to obtain second output data, and obtaining a convolution operation result from the second output data.

Specifically, in the embodiment, the voltage value on each shared bit line can be converted into a binary digital result by the ADC converter, and then subtracted to obtain the final result after convolution operation:

OUT is the result of the convolution operation, W_iIs a weight value, V_iFor the voltage value of the sensor signal, N is the number of SRAM arrays, i is 1,2,3, …, and N is a positive integer.

On the other hand, the technical scheme of the application also provides a data processing device; it includes:

at least one processor; at least one memory for storing at least one program; when the at least one program is executed by the at least one processor, the at least one processor is caused to execute a data processing method as in the first aspect.

An embodiment of the present invention further provides a storage medium, where a corresponding execution program is stored, and the program is executed by a processor, so as to implement the data processing method in the first aspect.

From the above specific implementation process, it can be concluded that the technical solution provided by the present invention has the following advantages or advantages compared to the prior art:

The technical scheme of the application provides a storage and computation integrated convolution calculation acceleration framework facing a sensor. Firstly, in an input layer, data processed by a CIM framework is directly input from a sensor without a series of processing steps of analog-to-digital conversion and digital-to-analog conversion, so that the hardware overhead, power consumption and delay of data transmission are greatly reduced. Then, the technical scheme of the application adopts the SRAM capable of storing the three weights, so that the accuracy of the neural network algorithm can be effectively improved. In addition, the CIM structure breaks through the bottleneck of the traditional von Neumann structure, and the multiply-accumulate (MAC) operation is directly completed in the SRAM, so that the delay and the power consumption generated by data transfer can be reduced. The storage and calculation integrated convolution calculation acceleration architecture for the sensor is suitable for being used as a hardware implementation architecture for convolution operation of a neural network algorithm, meets the requirements that the cost of the neural network for the hardware implementation architecture is reduced, including power consumption and hardware overhead, and provides high-performance requirements such as low delay and high bandwidth.

In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.

Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise specified to the contrary, one or more of the functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in separate physical devices or software modules. It will also be understood that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer given the nature, function, and interrelationships of the modules. Accordingly, those of ordinary skill in the art will be able to practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is to be determined from the appended claims along with their full scope of equivalents.

The logic and/or steps represented in the flowcharts or otherwise described herein, such as an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A data processing method, comprising the steps of:

acquiring a sensor signal, sending the sensor signal to a read bit line of an SRAM array, wherein the sensor signal is a voltage signal obtained by converting a pixel point of an input characteristic diagram through binarization processing;

acquiring a weight value of a sensor signal in the read word line, acquiring a sensor signal in the read bit line, and multiplying the sensor signal by the weight value to obtain a first output voltage and outputting the first output voltage to a shared bit line; the first output voltage is a voltage signal of a pixel point in the output characteristic diagram;

2. The data processing method of claim 1, wherein the read bit lines comprise a first read bit line and a second read bit line, the obtaining a weight value of the sensor signal in the read word line, obtaining the sensor signal in the read bit line, and performing a multiplication operation with the sensor signal according to the weight value to obtain a first output voltage to be output to the shared bit line comprises:

and determining that the weight value is a third value, turning off the first transistor, turning off the second transistor, and sending the sensor signals in the first read bit line and the second read bit line to the shared bit line.

3. The data processing method according to claim 2, wherein the step of converting the first output voltage to obtain second output data and obtaining the convolution operation result from the second output data comprises:

4. A data processing method according to claim 3, characterized in that it further comprises the steps of:

And storing the first output voltage which is obtained by the multiplication operation and has a negative value into the second capacitor.

5. A data processing method according to claim 1, wherein the convolution operation result satisfies the following formula:

wherein OUT is a convolution operation result, W_iIs a weight value, V_iN is the number of SRAM arrays, i is 1,2,3, …, N is a positive integer.

6. A data processing system, characterized in that the system comprises:

the sensor storage array is used for acquiring a sensor signal and sending the sensor signal to a read bit line of the SRAM array;

the sensor signal is a voltage signal obtained by converting a pixel point of an input characteristic diagram through binarization processing;

the reading calculation circuit is used for determining that the voltage value of the reading bit line is stable and starting a reading word line of the SRAM array; acquiring a weight value of a sensor signal in the read word line, acquiring a sensor signal in the read bit line, and performing multiplication operation on the sensor signal according to the weight value to obtain a first output voltage and outputting the first output voltage to a shared bit line; the first output voltage is a voltage signal of a pixel point in the output characteristic diagram;

And the analog-to-digital conversion circuit is used for converting the first output voltage to obtain second output data, and subtracting the second output data to obtain a convolution operation result.

7. A data processing system according to claim 6, wherein said sensor storage array is a ternary SRAM; the ternary static random access memory comprises a first ternary inverter and a second ternary inverter, wherein the output end of the first ternary inverter is connected to the input end of the second ternary inverter; the output end of the second ternary inverter is connected to the input end of the first ternary inverter.

8. The data processing system of claim 7, wherein the first ternary inverter comprises a thin gate NMOS transistor, a thick gate NMOS transistor, a thin gate PMOS transistor, and a thick gate PMOS transistor;

9. A data processing apparatus, characterized by comprising:

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, cause the at least one processor to carry out a data processing method according to any one of claims 1 to 6.

10. A storage medium in which a processor-executable program is stored, wherein the processor-executable program, when executed by a processor, is adapted to perform a data processing method according to any one of claims 1 to 6.