[go: up one dir, main page]

CN114758699A - Data processing method, system, device and medium - Google Patents

Data processing method, system, device and medium Download PDF

Info

Publication number
CN114758699A
CN114758699A CN202210276205.7A CN202210276205A CN114758699A CN 114758699 A CN114758699 A CN 114758699A CN 202210276205 A CN202210276205 A CN 202210276205A CN 114758699 A CN114758699 A CN 114758699A
Authority
CN
China
Prior art keywords
bit line
sensor signal
output
ternary
voltage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210276205.7A
Other languages
Chinese (zh)
Inventor
徐方磊
虞志益
陈伟冲
赵贵华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202210276205.7A priority Critical patent/CN114758699A/en
Publication of CN114758699A publication Critical patent/CN114758699A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C11/00Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
    • G11C11/21Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
    • G11C11/34Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
    • G11C11/40Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
    • G11C11/41Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming static cells with positive feedback, i.e. cells not needing refreshing or charge regeneration, e.g. bistable multivibrator or Schmitt trigger
    • G11C11/413Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing, timing or power reduction

Landscapes

  • Engineering & Computer Science (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Computer Hardware Design (AREA)
  • Logic Circuits (AREA)

Abstract

本发明提供的一种数据处理方法、系统、装置及介质,该方法主要包括以下步骤:获取传感器信号,将传感器信号发送至SRAM阵列的读位线;确定读位线的电压值稳定,开启SRAM阵列的读字线;通过读字线获取传感器信号的权重值,根据权重值进行乘法操作,得到第一输出电压输出至共享位线;将第一输出电压进行二值化处理得到第二输出数据,将第二输出数据得到卷积运算结果;方案能够无需中间一系列模数转换、数模转换的处理步骤,大大降低数据传输的硬件开销、功耗、延迟,能够有效地提高神经网络算法的正确率,并且还可以降低数据搬移所产生的延迟和功耗;同时提供低延迟、高带宽等高性能要求,可广泛应用于嵌入式技术领域。

Figure 202210276205

A data processing method, system, device and medium provided by the present invention mainly include the following steps: acquiring a sensor signal, sending the sensor signal to a read bit line of an SRAM array; determining that the voltage value of the read bit line is stable, and turning on the SRAM The read word line of the array; the weight value of the sensor signal is obtained through the read word line, and the multiplication operation is performed according to the weight value to obtain the first output voltage and output to the shared bit line; the first output voltage is binarized to obtain the second output data , obtain the convolution operation result from the second output data; the scheme can eliminate a series of intermediate processing steps of analog-to-digital conversion and digital-to-analog conversion, greatly reduce the hardware overhead, power consumption, and delay of data transmission, and can effectively improve the performance of neural network algorithms. It can also reduce the delay and power consumption caused by data movement; at the same time, it provides high-performance requirements such as low delay and high bandwidth, and can be widely used in the field of embedded technology.

Figure 202210276205

Description

Data processing method, system, device and medium
Technical Field
The present invention relates to the field of embedded system technologies, and in particular, to a data processing method, system, apparatus, and medium.
Background
In low power embedded systems, the gap between complex computing requirements and hardware resource availability is increasing. Meanwhile, for application scenarios such as the internet of things and edge computing, high performance and low power consumption are increasingly required to meet the requirements of instantaneity and convenience.
Under a traditional von neumann computer architecture, a computing unit and a storage unit are separated in physical space, and data transmission is performed between the computing unit and the storage unit through a data bus. And the computing unit reads data from the memory according to the instruction and stores the data back to the memory after computing is completed. The architecture is one of the major bottlenecks that restrict the performance improvement of the computing system, which not only increases the computing delay but also brings huge energy consumption overhead. It is estimated that in von neumann computer architecture, the power consumption of data transmission accounts for more than 50% of the total data computation power consumption. In addition, the problem of Memory Wall (Memory Wall) is more significant. Due to the separation of computing and storage, processors and memories evolve towards respective directions and ideas. Processors are pursuing high frequency and high performance, while memory is targeted at low cost and high density, which is relatively slow developing in performance. For a long time, the data access latency of the memory has only been a poor 10% reduction compared to the annual performance improvement of about 55% for processors. Meanwhile, for the requirement of mass data transmission, the performance and efficiency of system calculation are also severely restricted by the limited bandwidth of the data bus.
Disclosure of Invention
In view of the foregoing, to at least partially solve one of the above technical problems, embodiments of the present invention provide a high-performance and low-power data processing method, and a system, an apparatus, and a medium capable of implementing the method.
In one aspect, a technical solution of the present application provides a data processing method, including the following steps:
acquiring a sensor signal, and sending the sensor signal to a read bit line of an SRAM array; the sensor signals are pixel points of an input characteristic diagram, and voltage signals obtained by conversion through binarization processing are obtained;
determining that the voltage value of the read bit line is stable, and starting a read word line of the SRAM array;
acquiring a weight value of a sensor signal in the read word line, acquiring a sensor signal in the read bit line, and performing multiplication operation on the sensor signal according to the weight value to obtain a first output voltage and outputting the first output voltage to a shared bit line; the first output voltage is a voltage signal of a pixel point in the output characteristic diagram;
and converting the first output voltage to obtain second output data, and obtaining a convolution operation result from the second output data.
In a practical embodiment of the present disclosure, the reading bit line includes a first reading bit line and a second reading bit line, the obtaining a weight value of a sensor signal in the reading bit line obtains the sensor signal in the reading bit line, and performing a multiplication operation with the sensor signal according to the weight value to obtain a first output voltage and output the first output voltage to the shared bit line includes:
Determining that the weighted value is a first numerical value, the first transistor is turned off, the second transistor is turned on, and the sensor signal in the first read bit line is sent to the shared bit line;
determining the weight value to be a second value, enabling the first transistor to be turned on, enabling the second transistor to be turned off, and sending the sensor signal in the second read bit line to the shared bit line;
and determining that the weight value is a third numerical value, turning off the first transistor, turning off the second transistor, and sending the sensor signals in the first reading bit line and the second reading bit line to the shared bit line.
In a possible embodiment of the present disclosure, the step of converting the first output voltage to obtain second output data, and obtaining a convolution operation result from the second output data includes:
and subtracting the second output data in the first shared bit line from the second output data in the second shared bit line to obtain the convolution operation result.
In a possible embodiment of the present disclosure, the data processing method further includes the following steps:
storing a first output voltage which is obtained by the multiplication operation and has a positive value to a first capacitor;
And storing the first output voltage which is obtained by the multiplication operation and has a negative value to the second capacitor.
In a possible embodiment of the present disclosure, the convolution operation result satisfies the following formula:
Figure BDA0003556077520000021
wherein OUT is the result of convolution operation, WiIs a weight value, ViFor the voltage value of the sensor signal, N is the number of SRAM arrays, i is 1,2,3, …, and N is a positive integer.
On the other hand, the technical solution of the present application further provides a data processing system, which includes:
the sensor storage array is used for acquiring a sensor signal and sending the sensor signal to a read bit line of the SRAM array; the sensor signal is a voltage signal obtained by converting a pixel point of an input characteristic diagram through binarization processing;
the reading calculation circuit is used for acquiring a weight value of a sensor signal in the reading word line, acquiring a sensor signal in the reading bit line, and performing multiplication operation on the sensor signal according to the weight value to obtain a first output voltage and outputting the first output voltage to the shared bit line; the first output voltage is a voltage signal of a pixel point in the output characteristic diagram;
and the analog-to-digital conversion circuit is used for converting the first output voltage to obtain second output data and obtaining a convolution operation result from the second output data.
In a possible embodiment of the solution of the present application, said sensor memory array is a ternary sram; the ternary static random access memory comprises a first ternary inverter and a second ternary inverter, wherein the output end of the first ternary inverter is connected to the input end of the second ternary inverter; the output end of the second ternary inverter is connected to the input end of the first ternary inverter.
In a possible embodiment of the present disclosure, the first ternary inverter includes a thin gate NMOS transistor, a thick gate NMOS transistor, a thin gate PMOS transistor, and a thick gate PMOS transistor;
when the input end of the first ternary phase inverter is a high level signal, the thick gate NMOS tube and the thin gate NMOS tube are conducted, and the output end of the first ternary phase inverter is a low level signal;
when the input end of the first ternary phase inverter is a low level signal, the thick gate PMOS tube and the thin gate PMOS tube are conducted, and the output end of the first ternary phase inverter is a high level signal;
when the signal amplitude of the input end of the first ternary phase inverter is half of the high level signal amplitude, the thin gate NMOS tube and the thin gate PMOS tube are conducted, and the signal amplitude of the output end of the first ternary phase inverter is half of the high level signal amplitude;
The second ternary inverter has the same structure as the first ternary inverter.
On the other hand, the technical solution of the present application further provides a data processing apparatus, which includes:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, the at least one program causes the at least one processor to carry out a data processing method according to any one of the first aspect.
On the other hand, the present technical solution also provides a storage medium, in which a program executable by a processor is stored, and when the program executable by the processor is executed by the processor, the program is used to execute a data processing method according to any one of the first aspect.
Advantages and benefits of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention:
the technical scheme of the application is based on a storage-computation integrated convolution computation acceleration framework for the sensor, the scheme can be directly input from the sensor without a series of processing steps of analog-to-digital conversion and digital-to-analog conversion, and the hardware overhead, power consumption and delay of data transmission are greatly reduced; the SRAM capable of storing a plurality of weights is adopted, so that the accuracy of a neural network algorithm can be effectively improved, and the delay and power consumption caused by data movement can be reduced; the scheme meets the requirements of the neural network on reducing the cost including power consumption and hardware overhead for a hardware implementation architecture and simultaneously provides high-performance requirements such as low delay, high bandwidth and the like.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a diagram of a perceptual computing architecture commonly used in the related art;
fig. 2 is a schematic diagram of a storage-computation integrated convolution computation acceleration architecture for a proximity sensor in the technical solution of the present application;
fig. 3 is a schematic structural diagram of a data processing system according to an embodiment of the present application;
FIG. 4 is a schematic structural diagram of a three-valued SRAM in the system according to an embodiment of the present application;
FIG. 5 is a schematic structural diagram of a ternary inverter STI in the system according to an embodiment of the present application;
FIG. 6 is a schematic structural diagram of a convolution calculation system in a system according to an embodiment of the present application;
fig. 7 is a flowchart illustrating steps of a data processing method according to an embodiment of the present disclosure.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.
In the intelligent era, the progress of human society is greatly promoted by the scientific technology represented by Artificial Intelligence (AI). In recent years, the artificial intelligence technology has been developed rapidly, and has been widely applied in the consumer field and the industrial production field. For example: image recognition, industrial robots, autopilot, meta universe, medical image analysis, and the like. Meanwhile, as the preface screen is opened in the era of intelligent internet of things, more and more data start to flow in clouds, sides and ends. Exponentially increasing amounts of data place higher demands on the computing power and power consumption of existing computing architectures. Due to the existence of Memory Wall (Memory Wall) and Power Wall (Power Wall), the limitation of von neumann computing architecture is increasingly prominent. Therefore, a new computing architecture is needed to address the challenges of future application scenarios. In such a context, the holistic concept reenters the academic and industrial fields of view.
The success of artificial neural network algorithms and the breakthrough of powerful hardware bases have together promoted the rapid development of the artificial intelligence revolution. In recent years, artificial neural networks have shown great advantages in many application scenarios such as target detection, wearable devices, natural language processing, and the like. In the software level, the artificial neural network algorithm has achieved great success, in order to support the effective implementation of the artificial neural network model from the cloud end to the edge device. Researchers in academia and industry have begun to design hardware accelerators dedicated to artificial neural networks. Currently, the mainstream platform for accelerating the artificial neural network algorithm is a Graphics Processing Unit (GPU), which has the advantages of high computational accuracy and very flexible programming. Training of artificial neural network algorithms is typically done in GPU clusters, which are very energy consuming, consuming up to hundreds to thousands of watts of power. To improve energy efficiency and make it suitable for large data center environments, researchers have begun customizing Application Specific Integrated Circuit (ASIC) architecture solutions, such as the Tensor Processing Unit (TPU) of Google, for cloud and edge devices. However, the real problem of the artificial neural network algorithm accelerator is the frequent data movement between the computing unit and the memory unit, which is the memory wall problem in the traditional von Neumann architecture. Most of the operations of artificial neural network processing are Vector Matrix Multiplication (VMM) between input vectors and weight matrices, which essentially performs multiply-accumulate (MAC) operations. Memory Computing (CIM) is therefore considered to be the most efficient solution with the potential to break the von neumann architecture bottleneck.
Furthermore, sensor systems are an important component of artificial intelligence devices. Traditional sensor systems have become increasingly unsuitable for smart devices, whose energy consumption cannot support long-term continuous data acquisition tasks. In conventional solutions, the sensor system is physically separated from the computing unit, since its functional requirements and manufacturing technology differ. The sensors work primarily in the noisy analog domain, while the computing units are typically implemented digitally based on a traditional von neumann computing architecture. The sensor terminal collects a large amount of raw data locally and then transmits the data to a computing unit of a local system. As shown in fig. 1, in a conventional intelligent system, analog data collected by a sensor is first converted into a digital signal by an analog-to-digital converter (ADC), and then temporarily stored in a memory, and then a processing unit extracts the data from the memory for processing. In the whole, data are collected from the sensor and processed by the computing unit, and a series of processes such as data conversion, data transmission and the like exist in the middle. And the published data shows that ADC and data storage dominate the power consumption of the overall system. Therefore, the system architecture inevitably causes significant problems of energy consumption, processing speed, communication bandwidth and the like.
Based on the foregoing theoretical basis, as shown in fig. 2, the technical solution of the present application provides a storage and computation integrated convolution computation acceleration architecture design for a proximity sensor. In a first aspect, an embodiment of the present application provides a data processing system based on this computation-integrated convolution computation acceleration architecture, where the system mainly includes a sensor storage array, a readout computation circuit, and an analog-to-digital conversion circuit.
The overall architecture of the system is shown in fig. 3, wherein the storage array is composed of an SRAM, and the storage array acquires a sensor signal through a connected sensor and sends the sensor signal to a read bit line of the SRAM array; and also for storing weights of the neural network model. A reading calculation circuit in the system is used for determining that the voltage value of the reading bit line is stable and starting a reading word line of the SRAM array; and obtaining a weight value of the sensor signal through the read word line, and performing multiplication operation according to the weight value to obtain a first output voltage and outputting the first output voltage to the shared bit line. And the analog-to-digital conversion circuit in the system is used for converting the first output voltage to obtain second output data, and subtracting the second output data to obtain a convolution operation result.
In some alternative embodiments, the sensor memory array in the system is a ternary SRAM; the ternary static random access memory comprises a first ternary inverter and a second ternary inverter, wherein the output end of the first ternary inverter is connected to the input end of the second ternary inverter; the output end of the second ternary inverter is connected to the input end of the first ternary inverter.
In the related art, the SRAM used in the mainstream CIM (Computer Integrated manufacturing) architecture can only store 2 weights Wi(1, -1) in the CIM architecture, high level VddRepresents the weight value WiIs 1, low level VGNDWeight value W of the representativeiIs-1. Specifically, in the technical solution of the present application, as shown in fig. 3, the embodiment system adopts a three-valued SRAM, and can store 3 weight values Wi. In FIG. 3, the Q point is at a high level VddRepresents a weight value WiIs-1, the Q point is a low level VGNDWeight value W of the representativeiIs 1, Q is 1/2VddWeight value W of the representativeiIs 0. Three-value SRAM actually consumes more power than ordinary six-tube SRAMSlightly higher, but a weight value is added, so that the recognition accuracy of the neural network model is increased greatly.
In some alternative embodiments, as shown in fig. 4, the configuration of the ternary inverter (including the first ternary inverter and the second ternary inverter) in the embodiment system mainly includes a thin gate NMOS transistor, a thick gate NMOS transistor, a thin gate PMOS transistor, and a thick gate PMOS transistor.
When the input end of the ternary inverter is a high level signal, the thick gate NMOS tube and the thin gate NMOS tube are conducted, and the output end of the ternary inverter is a low level signal; when the input end of the ternary inverter is a low level signal, the thick gate PMOS tube and the thin gate PMOS tube are conducted, and the output end of the ternary inverter is a high level signal; when the signal amplitude of the input end of the ternary phase inverter is half of the high-level signal amplitude, the thin-gate NMOS tube and the thin-gate PMOS tube are conducted, and the signal amplitude of the output end of the ternary phase inverter is half of the high-level signal amplitude.
As shown in fig. 5 in particular, the SRAM in the embodiment can store three weight keys, namely the design of the ternary inverter (STI); in an embodiment system, multi-threshold CMOS technology is used in STI to achieve ternary switching operation. The thick gate NMOS tube is only at the gate voltage of V ddIs conducted, the thick gate PMOS tube is only conducted when the grid voltage is VGNDIs conducted, and the NMOS tube with a thinner grid electrode has a grid electrode voltage of VddOr 1/2VddThe PMOS tube with a thinner grid electrode is conducted, and the grid electrode voltage is VGNDOr 1/2VddAnd conducting. When the input voltage In is VddWhen the output voltage Out is VGND(ii) a When the input voltage In is VGNDWhen the output voltage Out is VGND(ii) a When the input voltage In is 1/2VddWhen the output voltage Out is 1/2Vdd
As shown in fig. 6, more specifically, when the system provided in the present application performs convolution calculation, the voltage value V may be directly input from the sensoriThe data collected by the sensor does not need an intermediate series of analog-to-digital converter (ADC) and digital-to-analog converter (DAC) and is storedTherefore, most energy consumption can be saved, the delay of intermediate data can be greatly reduced, and the processing speed of the whole system is improved. The convolution calculation (multiply accumulate MAC) is performed next in the CIM architecture.
Furthermore, based on the data processing system based on the storage and computation integrated architecture of the proximity sensor proposed in the first aspect, the present application further provides a data processing method, and the method mainly completes the multiply-accumulate computation of the convolution layer and the full connection layer of the binarization convolutional neural network CNN. First, the artificial neural network model on which the embodiment is based is the LeNet-5 neural network model. And carrying out binarization operation based on a LeNet-5 neural network model to construct a binarization neural network model. Although the full-precision floating-point type CNN can provide high recognition accuracy, the cost behind the high recognition rate is huge data computation amount, high power consumption and high hardware cost. This is burdensome for low power, hardware resource constrained embedded edge devices. From the viewpoint of hardware friendliness, the method for performing binarization operation on the CNN is as follows: and determining a binarization result according to the positive and negative of the floating point number by using a sign function. In short, the result of binarization for positive numbers is 1, and the result of binarization for negative numbers is-1; the specific binarization formula is as follows:
Figure BDA0003556077520000071
The network structure of LeNet-5 neural network model, LeNet-5 neural network model is a CNN network model used for handwriting recognition. The LeNet-5 neural network model has a total of 6 layers excluding the input layer and the output layer: c1 and C3 are convolutional layers, S2 and S4 are pooling layers, and F5 and F6 are fully-connected layers. By carrying out binarization on the LeNet-5 neural network model, the calculation amount and hardware overhead are greatly reduced. In the whole LeNet-5 neural network model, most of the calculation amount occurs in a convolution layer and a full connection layer, and the convolution operation of the two layers is essentially a multiply-accumulate (MAC) operation. Therefore, the embodiment method, which completes the two layers of computation in the computation-integrated computing architecture (CIM) proposed by us, will effectively reduce the power consumption of the whole system and improve and accelerate the whole neural network model. Based on the foregoing theoretical basis, as shown in fig. 7, the embodiment method includes steps S100-S400:
s100, obtaining a sensor signal, and sending the sensor signal to a read bit line of the SRAM array, wherein the sensor signal is a voltage signal obtained by converting a pixel point of an input characteristic diagram through binarization processing.
Illustratively, in an embodiment, a memory bank architecture of the proximity sensor includes 64 SRAM cells in a row, meaning that 64 multiplications can be done in parallel and the results accumulated. First, the embodiment sends the voltage value Vi output by the sensor to the read bit line of the SRAM array.
S200, determining that the voltage value of the read bit line is stable, and starting a read word line of the SRAM array.
In an embodiment, after the voltage applied to the read bit line is stabilized, the Read Word Line (RWL) of the SRAM is turned on to start to read the weight value W pre-stored in the SRAMi
S300, acquiring a weight value of a sensor signal in the read word line, acquiring the sensor signal in the read bit line, and multiplying the sensor signal by the weight value to obtain a first output voltage and outputting the first output voltage to a shared bit line; the first output voltage is a voltage signal of a pixel point in the output characteristic diagram.
In an embodiment, the read bit lines include a first read bit line and a second read bit line (RBL and RBLB), and the step of obtaining the sensor signal weight value through the read word line and performing a multiplication operation according to the weight value to obtain a first output voltage output to the shared bit line in the method may include steps S310 to S330:
s310, determining that the weight value is a first numerical value, the first transistor is turned off, the second transistor is turned on, and the sensor signal in the first read bit line is sent to the shared bit line.
As shown in fig. 6, when the weight value WiAt 1, i.e. Q has a voltage value of V GNDThen QB has a voltage value of VddAt this timeThe transistor N4 (i.e., the second transistor) is turned on, the voltage on RBLB will be drained to ground, and the voltage on RBL remains unchanged when the transistor N3 (i.e., the first transistor) is turned off, which puts EN onPThe switch is closed and the voltage on RBL is transferred to the V _ p shared bit line. Completing multiplication operation 1ViAnd obtaining a second numerical value.
And S320, determining that the weighted value is a second numerical value, turning on the first transistor, turning off the second transistor, and sending the sensor signal in the second read bit line to the shared bit line.
As shown in fig. 6, when the weight value WiWhen the voltage is-1, that is, the voltage value of Q is VddThen QB has a voltage value of VGNDWhen the transistor N3 is turned on, the voltage on RBL will be completely discharged to ground, and when the transistor N4 is turned off, the voltage on RBLB will remain unchanged, and EN will be turned onnThe switch is closed and the voltage on RBLB is transferred to the V _ n shared bit line. Completing multiplication operation-1ViAnd obtaining a second numerical value.
S330, determining that the weight value is a third numerical value, turning off the first transistor, turning off the second transistor, and sending the sensor signals in the first read bit line and the second read bit line to the shared bit line.
As shown in fig. 6, when the weight value WiWhen the voltage value is 0, i.e., the voltage value of Q is 1/2Vdd, the voltage value of QB is 1/2Vdd, the voltages at N3 and N4 transistors are both turned off, RBL and RBLB are kept constant, and EN is turned onP、ENnThe switch is closed and the voltage on RBL is transferred to the V _ p shared bit line and the voltage on RBLB is transferred to the V _ n shared bit line. Complete multiplication operation 0ViAnd obtaining a second numerical value.
In an embodiment, the step of performing binarization processing on the first output voltage to obtain second output data and obtaining a convolution operation result from the second output data includes a step of subtracting the second output data in the first shared bit line and the second output data in the second shared bit line to obtain the convolution operation result.
In the embodiment, although the voltages on the two bitlines are respectively applied to the shared bitlines V _ p and V _ n, the difference between the voltages is 0, and therefore 0V is achievediThe effect of the multiplication.
In some possible embodiments, after the calculation results of the multiplication operations are obtained, the voltage on the read bit line (RBL, RBLB) is transferred to the shared bit lines V _ p and V _ n for each multiplication operation based on capacitive coupling and charge sharing schemes, and the voltage accumulation is stored in the Cp and Cn capacitors. C for the positive voltage on the shared bit line V _ p pCapacitance, C on shared bit line V _ n for negative voltage of multiplication resultnAnd (4) a capacitor.
S400, converting the first output voltage to obtain second output data, and obtaining a convolution operation result from the second output data.
Specifically, in the embodiment, the voltage value on each shared bit line can be converted into a binary digital result by the ADC converter, and then subtracted to obtain the final result after convolution operation:
Figure BDA0003556077520000091
OUT is the result of the convolution operation, WiIs a weight value, ViFor the voltage value of the sensor signal, N is the number of SRAM arrays, i is 1,2,3, …, and N is a positive integer.
On the other hand, the technical scheme of the application also provides a data processing device; it includes:
at least one processor; at least one memory for storing at least one program; when the at least one program is executed by the at least one processor, the at least one processor is caused to execute a data processing method as in the first aspect.
An embodiment of the present invention further provides a storage medium, where a corresponding execution program is stored, and the program is executed by a processor, so as to implement the data processing method in the first aspect.
From the above specific implementation process, it can be concluded that the technical solution provided by the present invention has the following advantages or advantages compared to the prior art:
The technical scheme of the application provides a storage and computation integrated convolution calculation acceleration framework facing a sensor. Firstly, in an input layer, data processed by a CIM framework is directly input from a sensor without a series of processing steps of analog-to-digital conversion and digital-to-analog conversion, so that the hardware overhead, power consumption and delay of data transmission are greatly reduced. Then, the technical scheme of the application adopts the SRAM capable of storing the three weights, so that the accuracy of the neural network algorithm can be effectively improved. In addition, the CIM structure breaks through the bottleneck of the traditional von Neumann structure, and the multiply-accumulate (MAC) operation is directly completed in the SRAM, so that the delay and the power consumption generated by data transfer can be reduced. The storage and calculation integrated convolution calculation acceleration architecture for the sensor is suitable for being used as a hardware implementation architecture for convolution operation of a neural network algorithm, meets the requirements that the cost of the neural network for the hardware implementation architecture is reduced, including power consumption and hardware overhead, and provides high-performance requirements such as low delay and high bandwidth.
In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.
Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise specified to the contrary, one or more of the functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in separate physical devices or software modules. It will also be understood that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer given the nature, function, and interrelationships of the modules. Accordingly, those of ordinary skill in the art will be able to practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is to be determined from the appended claims along with their full scope of equivalents.
The logic and/or steps represented in the flowcharts or otherwise described herein, such as an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A data processing method, comprising the steps of:
acquiring a sensor signal, sending the sensor signal to a read bit line of an SRAM array, wherein the sensor signal is a voltage signal obtained by converting a pixel point of an input characteristic diagram through binarization processing;
determining that the voltage value of the read bit line is stable, and starting a read word line of the SRAM array;
acquiring a weight value of a sensor signal in the read word line, acquiring a sensor signal in the read bit line, and multiplying the sensor signal by the weight value to obtain a first output voltage and outputting the first output voltage to a shared bit line; the first output voltage is a voltage signal of a pixel point in the output characteristic diagram;
and converting the first output voltage to obtain second output data, and obtaining a convolution operation result from the second output data.
2. The data processing method of claim 1, wherein the read bit lines comprise a first read bit line and a second read bit line, the obtaining a weight value of the sensor signal in the read word line, obtaining the sensor signal in the read bit line, and performing a multiplication operation with the sensor signal according to the weight value to obtain a first output voltage to be output to the shared bit line comprises:
Determining that the weighted value is a first numerical value, the first transistor is turned off, the second transistor is turned on, and the sensor signal in the first read bit line is sent to the shared bit line;
determining the weight value to be a second value, enabling the first transistor to be turned on, enabling the second transistor to be turned off, and sending the sensor signal in the second read bit line to the shared bit line;
and determining that the weight value is a third value, turning off the first transistor, turning off the second transistor, and sending the sensor signals in the first read bit line and the second read bit line to the shared bit line.
3. The data processing method according to claim 2, wherein the step of converting the first output voltage to obtain second output data and obtaining the convolution operation result from the second output data comprises:
and subtracting the second output data in the first shared bit line from the second output data in the second shared bit line to obtain the convolution operation result.
4. A data processing method according to claim 3, characterized in that it further comprises the steps of:
storing a first output voltage which is obtained by the multiplication operation and has a positive value to a first capacitor;
And storing the first output voltage which is obtained by the multiplication operation and has a negative value into the second capacitor.
5. A data processing method according to claim 1, wherein the convolution operation result satisfies the following formula:
Figure FDA0003556077510000011
wherein OUT is a convolution operation result, WiIs a weight value, ViN is the number of SRAM arrays, i is 1,2,3, …, N is a positive integer.
6. A data processing system, characterized in that the system comprises:
the sensor storage array is used for acquiring a sensor signal and sending the sensor signal to a read bit line of the SRAM array;
the sensor signal is a voltage signal obtained by converting a pixel point of an input characteristic diagram through binarization processing;
the reading calculation circuit is used for determining that the voltage value of the reading bit line is stable and starting a reading word line of the SRAM array; acquiring a weight value of a sensor signal in the read word line, acquiring a sensor signal in the read bit line, and performing multiplication operation on the sensor signal according to the weight value to obtain a first output voltage and outputting the first output voltage to a shared bit line; the first output voltage is a voltage signal of a pixel point in the output characteristic diagram;
And the analog-to-digital conversion circuit is used for converting the first output voltage to obtain second output data, and subtracting the second output data to obtain a convolution operation result.
7. A data processing system according to claim 6, wherein said sensor storage array is a ternary SRAM; the ternary static random access memory comprises a first ternary inverter and a second ternary inverter, wherein the output end of the first ternary inverter is connected to the input end of the second ternary inverter; the output end of the second ternary inverter is connected to the input end of the first ternary inverter.
8. The data processing system of claim 7, wherein the first ternary inverter comprises a thin gate NMOS transistor, a thick gate NMOS transistor, a thin gate PMOS transistor, and a thick gate PMOS transistor;
when the input end of the first ternary phase inverter is a high level signal, the thick gate NMOS tube and the thin gate NMOS tube are conducted, and the output end of the first ternary phase inverter is a low level signal;
when the input end of the first ternary phase inverter is a low level signal, the thick gate PMOS tube and the thin gate PMOS tube are conducted, and the output end of the first ternary phase inverter is a high level signal;
When the signal amplitude of the input end of the first ternary phase inverter is half of the high level signal amplitude, the thin gate NMOS tube and the thin gate PMOS tube are conducted, and the signal amplitude of the output end of the first ternary phase inverter is half of the high level signal amplitude;
the second ternary inverter has the same structure as the first ternary inverter.
9. A data processing apparatus, characterized by comprising:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to carry out a data processing method according to any one of claims 1 to 6.
10. A storage medium in which a processor-executable program is stored, wherein the processor-executable program, when executed by a processor, is adapted to perform a data processing method according to any one of claims 1 to 6.
CN202210276205.7A 2022-03-21 2022-03-21 Data processing method, system, device and medium Pending CN114758699A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210276205.7A CN114758699A (en) 2022-03-21 2022-03-21 Data processing method, system, device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210276205.7A CN114758699A (en) 2022-03-21 2022-03-21 Data processing method, system, device and medium

Publications (1)

Publication Number Publication Date
CN114758699A true CN114758699A (en) 2022-07-15

Family

ID=82327332

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210276205.7A Pending CN114758699A (en) 2022-03-21 2022-03-21 Data processing method, system, device and medium

Country Status (1)

Country Link
CN (1) CN114758699A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115860079A (en) * 2023-01-30 2023-03-28 深圳市九天睿芯科技有限公司 Neural network acceleration device, method, chip, electronic device, and storage medium
WO2024144172A1 (en) * 2022-12-26 2024-07-04 울산과학기술원 Memory device comprising ternary memory cell and performing ternary operation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040083421A1 (en) * 2002-10-29 2004-04-29 Richard Foss Method and circuit for error correction in CAM cells
US20180158515A1 (en) * 2016-12-07 2018-06-07 Ningbo University Ternary sense amplifier and sram array realized by the ternary sense amplifier
US20180315473A1 (en) * 2017-04-28 2018-11-01 Arizona Board Of Regents On Behalf Of Arizona State University Static random access memory (sram) cell and related sram array for deep neural network and machine learning applications
US20190087719A1 (en) * 2017-09-21 2019-03-21 Arizona Board Of Regents On Behalf Of Arizona State University Static random-access memory for deep neural networks
US20210089272A1 (en) * 2019-09-25 2021-03-25 Purdue Research Foundation Ternary in-memory accelerator

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040083421A1 (en) * 2002-10-29 2004-04-29 Richard Foss Method and circuit for error correction in CAM cells
US20180158515A1 (en) * 2016-12-07 2018-06-07 Ningbo University Ternary sense amplifier and sram array realized by the ternary sense amplifier
US20180315473A1 (en) * 2017-04-28 2018-11-01 Arizona Board Of Regents On Behalf Of Arizona State University Static random access memory (sram) cell and related sram array for deep neural network and machine learning applications
US20190087719A1 (en) * 2017-09-21 2019-03-21 Arizona Board Of Regents On Behalf Of Arizona State University Static random-access memory for deep neural networks
US20210089272A1 (en) * 2019-09-25 2021-03-25 Purdue Research Foundation Ternary in-memory accelerator

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
曾剑敏, 张章, 虞志益, 等: "基于SRAM的通用存算一体架构平台在物联网中的应用", 电子与信息学报, vol. 43, no. 6, 23 June 2021 (2021-06-23), pages 1574 - 1586 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024144172A1 (en) * 2022-12-26 2024-07-04 울산과학기술원 Memory device comprising ternary memory cell and performing ternary operation
CN115860079A (en) * 2023-01-30 2023-03-28 深圳市九天睿芯科技有限公司 Neural network acceleration device, method, chip, electronic device, and storage medium
CN115860079B (en) * 2023-01-30 2023-05-12 深圳市九天睿芯科技有限公司 Neural network acceleration device, method, chip, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110807519B (en) Parallel acceleration method, processor, and device for memristor-based neural network
Welser et al. Future computing hardware for AI
Choi et al. An energy-efficient deep convolutional neural network training accelerator for in situ personalization on smart devices
KR102207909B1 (en) Computation in memory apparatus based on bitline charge sharing and operating method thereof
WO2023056779A1 (en) Computing-in-memory edram accelerator for convolutional neural network
CN108090565A (en) Accelerated method is trained in a kind of convolutional neural networks parallelization
CN111105023A (en) Data stream reconstruction method and reconfigurable data stream processor
US20230168891A1 (en) In-memory computing processor, processing system, processing apparatus, deployment method of algorithm model
CN115879530B (en) A method for array structure optimization of RRAM in-memory computing system
CN115910152A (en) Charge domain memory calculation circuit and calculation circuit with positive and negative number operation function
JP2024525332A (en) Computation-in-Memory (CIM) Architecture and Data Flow Supporting Deeply Convolutional Neural Networks (CNN)
CN114758699A (en) Data processing method, system, device and medium
US20210150311A1 (en) Data layout conscious processing in memory architecture for executing neural network model
CN117795473A (en) Partial and managed and reconfigurable systolic flow architecture for in-memory computation
US11705171B2 (en) Switched capacitor multiplier for compute in-memory applications
Yang et al. A parallel processing cnn accelerator on embedded devices based on optimized mobilenet
Song et al. Hardware for deep learning acceleration
Hu et al. A co-designed neuromorphic chip with compact (17.9 KF 2) and weak neuron number-dependent neuron/synapse modules
CN117765334A (en) Image classification method, system, equipment and medium based on depth binary neural network model
CN110717580B (en) Calculation array based on voltage modulation and oriented to binarization neural network
Tabrizchi et al. APRIS: Approximate Processing ReRAM In-Sensor Architecture Enabling Artificial-Intelligence-Powered Edge
CN110363292A (en) A Mixed Signal Binary CNN Processor
Ma et al. HPA: A hybrid data flow for PIM architectures
Lei et al. Low power AI ASIC design for portable edge computing
CN117037877A (en) Memory computing chip based on NOR Flash and control method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination