Disclosure of Invention
In view of this, the present invention provides a software-definable integrated memory chip, method, device and apparatus, in which a plurality of flash memory processing sub-arrays, a plurality of programmable arithmetic units and a control module are adopted to cooperate, so that the circuit structure of the chip is dynamically configured according to the actual application requirements, flexible adjustment can be performed according to the actual tasks, and peripheral circuits such as ADC, DAC, register, programmable arithmetic units and the like can be multiplexed, thereby reducing the circuit area and adapting to the needs of integration and miniaturization.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
In a first aspect, a software-definable memory chip is provided, comprising a flash memory processing array, a programmable arithmetic operation module, and a control module coupled to the flash memory processing array and the programmable arithmetic operation module,
The flash memory processing array comprises a plurality of flash memory processing subarrays for respectively executing different analog vector-matrix multiplication operations;
the programmable arithmetic operation module includes a plurality of programmable arithmetic operation units for respectively implementing different arithmetic operations;
the control module carries out combined configuration on a plurality of flash memory processing subarrays and a plurality of programmable arithmetic operation units according to configuration information, and realizes dynamic configuration of a circuit structure in a chip.
Further, the software-definable memory chip further includes:
the input interface module is used for receiving external input data;
the input register file is connected with the input interface module and used for storing the external input data or the data to be processed;
The input end of the digital-to-analog conversion module is connected with the input register file, the output end of the digital-to-analog conversion module is connected with the flash memory processing array, the digital-to-analog conversion module is used for converting the external input data or the data to be processed into analog signals and outputting the analog signals to the flash memory processing array, and the flash memory processing array performs analog vector-matrix multiplication operation on the analog signals and outputs operation results;
The input end of the analog-to-digital conversion module is connected with the flash memory processing array, the output end of the analog-to-digital conversion module is connected with the programmable arithmetic operation module and is used for converting the analog vector-matrix multiplication result into a digital signal and outputting the digital signal to the programmable arithmetic operation module, and the programmable arithmetic operation module carries out arithmetic operation on the digital signal and outputs an arithmetic operation result;
The output register file is connected with the programmable arithmetic operation module and the input register file and is used for temporarily storing the arithmetic operation result and outputting the arithmetic operation result or outputting the arithmetic operation result to the input register file as the data to be processed;
The output interface module is connected with the output register file, receives output data of the output register file and outputs the output data outwards;
The control module is connected with the input interface module, the input register file, the digital-to-analog conversion module, the flash memory processing array, the analog-to-digital conversion module, the output register file, the programmable arithmetic operation module and the output interface module and is used for dynamically configuring the circuit modules according to actual application requirements.
Further, the output end of the input register file is also connected with the programmable arithmetic operation module.
Further, a plurality of the programmable arithmetic operation units are connected in series, each of the programmable arithmetic operation units including a demultiplexer, an arithmetic operation subunit, and a multiplexer;
the input end of the demultiplexer is connected with a programmable arithmetic operation unit or the analog-to-digital conversion module, one output end of the demultiplexer is connected with the arithmetic operation subunit, the other output end of the demultiplexer and the output end of the arithmetic operation subunit are connected with the next programmable arithmetic operation unit or the output register file through the multiplexer, and the control end of the demultiplexer is connected with the control module.
Further, the software-defined integrated memory chip further comprises a programming circuit connected with the control module, wherein the programming circuit is connected with the source electrode, the grid electrode and/or the substrate of each flash memory unit in the flash memory processing array and is used for regulating and controlling the threshold voltage of the flash memory unit;
the programming circuit includes a voltage generating circuit for generating a programming voltage or an erasing voltage and a voltage control circuit for applying the programming voltage to a selected flash memory cell.
Further, the software-definable memory chip further includes:
And the row-column decoder is connected with the flash memory processing array and the control module and is used for decoding the rows and the columns of the flash memory processing array under the control of the control module.
Further, the control module dynamically configures each circuit module connected with the control module according to configuration information, wherein the configuration information comprises configuration information of a flash memory processing subarray, configuration information of a programmable arithmetic operation unit, configuration information of a digital-to-analog conversion module, configuration information of an analog-to-digital conversion module, configuration information of an input interface module, configuration information of an output interface module, configuration information of an input register file and configuration information of an output register file, and the dynamically configuring each circuit module connected with the control module according to the configuration information comprises:
dividing the flash memory processing array into a plurality of flash memory processing subarrays according to the configuration information of the flash memory processing subarrays, and controlling the working time sequence of the plurality of flash memory processing subarrays;
controlling the working states of the demultiplexer and the multiplexer corresponding to each programmable arithmetic unit according to the configuration information of the programmable arithmetic unit, so that the programmable arithmetic units realize any combination operation;
control according to the configuration information of the digital-to-analog conversion module the digital-to-analog conversion circuit which participates in the actual task is opened and closed;
Controlling the on-off state of an analog-digital conversion circuit participating in an actual task according to the configuration information of the analog-digital conversion module;
controlling the switching state of an input interface circuit participating in an actual task according to the configuration information of the input interface module;
controlling the switching state of an output interface circuit participating in an actual task according to the configuration information of the output interface module;
Controlling the data to be stored in the input register to be derived from the input data of the input interface module or the data to be processed in the output register file according to the configuration information of the input register file;
and controlling the output register file to output the data in the output register file or to be processed data to the input register file according to the configuration information of the output register file.
In a second aspect, a software defining method of a software-definable integrated memory chip is provided, and is applied to the software-definable integrated memory chip, where the software defining method includes:
Acquiring configuration information and finite state machine information;
The method comprises the steps of configuring an input interface module, an input register file, a digital-to-analog conversion module, a flash memory processing array, an analog-to-digital conversion module, an output register file, a programmable arithmetic operation module and an output interface module according to configuration information, and realizing dynamic configuration of a circuit structure in a chip;
and controlling the working time sequence of the input interface module, the input register file, the digital-to-analog conversion module, the flash memory processing array, the analog-to-digital conversion module, the output register file, the programmable arithmetic operation module and the output interface module according to the information of the finite state machine.
Further, the software defining method includes:
Dividing the flash memory processing array into a plurality of flash memory processing subarrays according to the configuration information of the flash memory processing subarrays, and controlling the working time sequence of the plurality of flash memory processing subarrays according to the finite state machine information;
The working states of the selectors corresponding to the programmable arithmetic units are controlled according to the configuration information of the programmable arithmetic units, so that the programmable arithmetic units realize arbitrary combination operation, and the working time sequences of the programmable arithmetic units are controlled according to the finite state machine information.
In a third aspect, an electronic device is provided that includes the software-definable memory chip described above.
The invention provides a software-defined integrated memory chip, a method and electronic equipment, wherein a flash memory processing array of the software-defined integrated memory chip comprises a plurality of flash memory processing subarrays for respectively executing different analog vector-matrix multiplication operations, a programmable arithmetic operation module comprises a plurality of programmable arithmetic operation units for respectively realizing different arithmetic operations, a control module carries out combined configuration on each module of the integrated memory chip according to configuration information of practical application and finite state machine information, dynamic configuration of a circuit structure in the chip is realized, the chip can flexibly adjust the circuit structure in the chip according to practical tasks, peripheral circuits such as an ADC (analog-digital converter), a DAC (digital-analog converter), a register, a programmable arithmetic operation unit and the like can realize multiplexing, thereby reducing circuit area, adapting to the requirements of integration and miniaturization, and effectively reducing the cost of the chip.
The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular description of preferred embodiments, as illustrated in the accompanying drawings.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Once the existing integrated chip architecture is customized, the circuit structure is fixed, flexible adjustment cannot be performed according to actual tasks, and circuit modules cannot be shared, so that the circuit area is large.
In order to solve the above-mentioned problems in the prior art, the embodiment of the invention provides a software-definable integrated chip, a method and an electronic device, where a flash memory processing array of the software-definable integrated chip includes a plurality of flash memory processing subarrays for respectively executing different analog vector-matrix multiplication operations, a programmable arithmetic operation module includes a plurality of programmable arithmetic operation units for respectively implementing different arithmetic operations, and a control module performs combined configuration on each module of the integrated chip according to configuration information of practical application and finite state machine information, so as to implement dynamic configuration of circuit structures in the chip, and enable the chip to flexibly adjust circuit structures in the chip according to practical tasks, and peripheral circuits such as an ADC, a DAC, a register, a programmable arithmetic operation unit and the like can implement multiplexing, thereby reducing circuit area and adapting to the needs of integration and miniaturization.
FIG. 1 is a block diagram of a software-definable memory chip according to an embodiment of the present invention. As shown in fig. 1, the software-definable memory chip includes a flash memory processing array 20, a programmable arithmetic operation module 30, and a control module 10 coupled to the flash memory processing array 20 and the programmable arithmetic operation module 30,
The flash processing array 20 includes a plurality of flash processing sub-arrays (not shown in fig. 1) for respectively performing different analog vector-matrix multiplication operations.
The multiple flash memory processing subarrays may be flash memory processing subarrays with the same structure, or the structures of the flash memory processing subarrays may be set to be different according to actual application requirements, for example, the number of rows and the number of columns of the flash memory processing subarrays may be set according to the actual application requirements, which is not limited in the embodiment of the present invention.
The programmable arithmetic operation module 30 includes a plurality of programmable arithmetic operation units (not shown in fig. 1) for respectively realizing different arithmetic operations.
The programmable arithmetic unit is implemented in hardware for performing specific arithmetic operations.
The arithmetic operation comprises one or a combination of a plurality of multiplication operation, addition operation, subtraction operation, division operation, shift operation, activation function, maximum value taking, minimum value taking, average value taking, pooling and the like.
The control module 10 performs combined configuration on an input interface module, an input register file, a digital-to-analog conversion module, a flash memory processing array, an analog-to-digital conversion module, an output register file, a programmable arithmetic operation module and an output interface module in the chip according to configuration information and finite state machine information, so as to realize dynamic configuration of a circuit structure in the chip.
The configuration information and the finite state machine information can be obtained through a compiling tool according to actual application requirements.
Wherein the configuration information is typically static, such as specifying the status of each module participating in the task, the configuration size of each unit, and is typically stored in memory, and the scheduling is performed prior to execution of the task. Whereas finite state machine information is typically dynamic, controlling the timing and state of the actual task while it is running.
Specifically, the control module 10 performs a combined configuration on the plurality of flash memory processing sub-arrays and the plurality of programmable arithmetic units according to the configuration information, selects the flash memory processing sub-arrays and the programmable arithmetic units that are put into operation, and controls a combined pairing manner of the flash memory processing sub-arrays and the programmable arithmetic units to implement a specific operation.
It can be understood that each of the plurality of programmable arithmetic units may implement one or more kinds of arithmetic operations, and the plurality of programmable arithmetic units may be arranged to combine a plurality of kinds of complex operations, and cooperate with the plurality of flash memory processing sub-arrays, thereby implementing a plurality of kinds of combination configurations and further implementing complex operation functions.
As can be seen from the above description, the software-definable integrated memory chip provided in the embodiments of the present invention has a flash memory processing array including a plurality of flash memory processing sub-arrays for respectively performing different analog vector-matrix multiplication operations, and a programmable arithmetic operation module including a plurality of programmable arithmetic operation units for respectively implementing different arithmetic operations, where a control module performs a combined configuration on the plurality of flash memory processing sub-arrays and the plurality of programmable arithmetic operation units according to configuration information, so as to implement dynamic configuration of a chip architecture, not only flexibly adjust the chip architecture according to actual tasks, but also implement a plurality of complex operation functions, and peripheral circuits such as an ADC, a DAC, a register, a programmable arithmetic operation unit, etc. can implement multiplexing, thereby reducing circuit area, and adapting to the needs of integration and miniaturization.
In an alternative embodiment, referring to FIG. 2, the software-definable unified memory chip may further include an input interface module 40, an input register file 50, a digital-to-analog conversion module 60, an analog-to-digital conversion module 70, an output register file 80, and an output interface module 90.
The input terminal of the input interface module 40 is connected to an external device, and is used for receiving input data (i.e. data requiring operation) from the external device.
The input end of the input register file 50 is connected to the output end of the input interface module 40, and is used for temporarily storing the input data or data to be processed.
The input end of the digital-to-analog conversion module 60 is connected to the output end of the input register file 50, and the output end is connected to the input end of the flash memory processing array 20, so as to convert the external input data or the data to be processed output from the input register file 50 into analog signals and output the analog signals to the flash memory processing array 20, and the flash memory processing array 20 performs analog vector-matrix multiplication operation on the analog signals and outputs the analog vector-matrix multiplication operation result.
The input end of the analog-to-digital conversion module 70 is connected to the flash memory processing array 20, the output end is connected to the programmable arithmetic operation module 30, and the analog-to-digital conversion module is used for converting the analog vector-matrix multiplication result into a digital signal and outputting the digital signal to the programmable arithmetic operation module 30, and the programmable arithmetic operation module 30 performs arithmetic operation on the digital signal and outputs an arithmetic operation result.
The output register file 80 has an input terminal connected to the programmable arithmetic operation module 30 and an output terminal connected to the input register file 50, and is used for temporarily storing the arithmetic operation result and outputting the arithmetic operation result or outputting the arithmetic operation result as the data to be processed to the input register file 50.
An input terminal of the output interface module 90 is connected to an output terminal of the output register file 80, receives output data of the output register file 80, and outputs the output data to an external device.
The control module 10 is connected to the input interface module 40, the input register file 50, the digital-to-analog conversion module 60, the flash processing array 20, the analog-to-digital conversion module 70, the output register file 80, the programmable arithmetic operation module 30 and the output interface module 90, and is configured to dynamically configure the above circuit modules according to configuration information.
The control module 10 dynamically configures each circuit module connected to the control module according to configuration information, where the configuration information includes configuration information of the flash memory processing sub-array 20 1~20n, configuration information of the programmable arithmetic unit 30 1~30n, configuration information of the digital-to-analog conversion module 60, configuration information of the analog-to-digital conversion module 70, configuration information of the input interface module 40, configuration information of the output interface module 90, configuration information of the input register file 50, and configuration information of the output register file 80, and dynamically configuring each circuit module connected to the control module according to the configuration information may include:
The flash processing array 20 is divided into a plurality of flash processing sub-arrays 20 1~20n according to the configuration information of the flash processing sub-arrays 20 1~20n, and the operation timing of the plurality of flash processing sub-arrays 20 1~20n is controlled.
According to the configuration information of the programmable arithmetic operation unit 30 1~30n, the working state of the selector corresponding to each programmable arithmetic operation unit is controlled, so that a plurality of programmable arithmetic operation units realize any combination operation to participate in work.
Control according to configuration information of the digital-to-analog conversion module 60 the digital-to-analog conversion circuit which participates in the actual task is opened and closed;
Controlling the on-off state of an analog-digital conversion circuit participating in an actual task according to the configuration information of the analog-digital conversion module 70;
controlling the on-off state of an input interface circuit participating in an actual task according to the configuration information of the input interface module 40;
Controlling the switching state of an output interface circuit participating in an actual task according to the configuration information of the output interface module 90;
controlling the data to be stored in the input register to be derived from the input data of the input interface module or the data to be processed in the output register file according to the configuration information of the input register file 50;
The output register file 80 is controlled to output data therein or as data to be processed to the input register file 50 according to configuration information of the output register file 80.
Specifically, the input of the input register file 50 is connected to the output of the input interface module 40 and the output of the output register file 80 through a Multiplexer (MUX) 110 to selectively receive external input data from the input interface module 40 or data to be processed from the output register file 80. The control module 10 is connected to the Multiplexer (MUX) 100, and controls the multiplexer 100 according to the configuration information, thereby controlling whether the input register file 50 receives the external input data or the data to be processed.
The digital to analog conversion module 60 is selectively coupled to the plurality of flash processing sub-arrays (20 1~20n) via a Demultiplexer (DEMUX) 120. The control module 10 is connected to the demultiplexer 120 to control the demultiplexer Q according to the configuration information, so as to select which flash processing sub-array participates in the operation.
The outputs of the flash processing sub-arrays (20 1~20n) are coupled to the analog-to-digital conversion module 70 via a multiplexer 130. The control module 10 is connected to the multiplexer 130, and controls the multiplexer 130 according to the configuration information, so as to select which flash processing sub-array output is connected to the input terminal of the analog-to-digital conversion module 70, i.e. the output of the flash processing sub-array participating in the operation is connected to the input terminal of the analog-to-digital conversion module 70.
An input of the programmable arithmetic operation module 30 is connected to an output of the demultiplexer 110 and an output of the analog-to-digital conversion module 70 through a multiplexer 140.
A plurality of the programmable arithmetic operation units 30 1~30n of the programmable arithmetic operation module 30, each of which includes a demultiplexer 30a, an arithmetic operation subunit 30b, and a multiplexer 30c, see fig. 3, are connected in series.
The input end of the demultiplexer 30a is connected to a programmable arithmetic unit or the analog-to-digital conversion module 70, one output end is connected to the arithmetic operator unit 30b, the output end of the arithmetic operator unit 30b and the other output end of the demultiplexer 30a are connected to the next programmable arithmetic operator unit or the output register file 80 through a multiplexer 30c, and in addition, the control ends of the demultiplexer 30a and the multiplexer 30c are connected to the control module 20.
Specifically, the input terminal of the demultiplexer in the first programmable arithmetic unit 30 1 is connected to the output terminal of the analog-to-digital conversion module 70, one of the output terminals is connected to the input terminal of the arithmetic operator unit in the first programmable arithmetic unit 30 1, the other output terminal and the output terminal of the arithmetic operator unit are connected to the input terminal of the second programmable arithmetic unit 30 2 through a multiplexer, and the control terminals of the demultiplexer and the multiplexer are connected to the control module 20.
The input of the demultiplexer in the second programmable arithmetic unit 30 2 is connected to the output of the first programmable arithmetic unit 30 1, one of the outputs is connected to the input of the arithmetic operator unit in the second programmable arithmetic unit 30 2, the other output and the output of the arithmetic operator unit are connected to the input of the third programmable arithmetic unit 30 3 through a multiplexer, and the control of the demultiplexer and the multiplexer is connected to the control module 20. And so on, up to the nth programmable arithmetic unit 30 n, the input of the demultiplexer in the nth programmable arithmetic unit 30 n is connected to the output of the n-1 th programmable arithmetic unit 30 n-1, one of the outputs is connected to the input of the arithmetic operator unit in the nth programmable arithmetic unit 30 n, the other output and the output of the arithmetic operator unit are connected to the input of the output register file 80 through a multiplexer, and the control terminals of the demultiplexer and the multiplexer are connected to the control module 20.
The control module 20 is connected to the demultiplexer and the multiplexer in each programmable arithmetic operation unit, controls the demultiplexer and the multiplexer in each programmable arithmetic operation unit according to configuration information to select whether the arithmetic operation subunit in the programmable arithmetic operation unit participates in operation, thereby realizing the arrangement and combination configuration of a plurality of programmable arithmetic operation units, realizing different complex operations, and flexibly configuring arithmetic operation functions.
In an alternative embodiment, each of the programmable arithmetic operator units may include a plurality of arithmetic operators disposed side by side, such as one or more of a multiplier, an adder, a subtractor, a divider, a shifter, an activation function, a maximum operator, a minimum operator, an average operator, and a pooler, where the arithmetic operators are connected in parallel, and the inputs are respectively connected to the outputs of the corresponding demultiplexers, and the outputs are respectively connected to the inputs of the corresponding demultiplexers, see fig. 4.
The process by which the programmable arithmetic operation module performs the compound operation is shown in fig. 5.
The output of the output register file 80 is selectively coupled to either the input of the output interface module 90 or the input of the input register file 50 via a demultiplexer 150. The control module 20 is connected to the demultiplexer 150, and controls the working state of the demultiplexer 150 according to the configuration information to select whether to output the output result of the output register file 80 to the output interface module 90 or to the input register file 50, and when the output result of the output register file 80 is selected to be output to the input register file 50, it means that a new round of operation processing will be performed on the output result.
In an alternative embodiment, the output end of the input register file 50 may be selectively connected to the input end of the digital-to-analog conversion module 50 or the input end of the programmable arithmetic operation module 30 through a demultiplexer 110, and the control module 10 is connected to the demultiplexer 110, and controls the working state of the demultiplexer 110 according to the configuration information, so as to select whether the output end of the input register file 50 is connected to the input end of the digital-to-analog conversion module 50 or the input end of the programmable arithmetic operation module 30, wherein when the output end of the input register file 50 is connected to the input end of the digital-to-analog conversion module 50, it means performing an analog vector-matrix multiplication operation and an arithmetic operation on the output of the input register file 50, and when the output end of the input register file 50 is connected to the input end of the programmable arithmetic operation module 30, it means performing a certain arithmetic operation on the output of the input register file 50, thereby further increasing the flexibility of the chip architecture.
In an alternative embodiment, each of the flash memory processing sub-arrays employs a source-coupled, drain-summed topology, see FIG. 6, comprising a plurality of programmable semiconductor devices (also referred to as flash memory cells) arranged in an array.
The programmable semiconductor device comprises a plurality of programmable semiconductor devices, a plurality of analog current output terminals, a plurality of bias voltage input terminals, a plurality of programmable semiconductor devices, a plurality of voltage control circuits and a plurality of voltage control circuits, wherein the source electrodes of all the programmable semiconductor devices of each column are connected to the same analog voltage input terminal, the programmable semiconductor devices of the plurality of columns are correspondingly connected to the plurality of analog voltage input terminals, the drain electrodes of all the programmable semiconductor devices of each column are correspondingly connected to the same analog current output terminal, the grid electrodes of all the programmable semiconductor devices of each row are correspondingly connected to the same bias voltage input terminal, the programmable semiconductor devices of the plurality of rows are correspondingly connected to the plurality of bias voltage input terminals, and the threshold voltage of each programmable semiconductor device can be adjusted.
In another alternative embodiment, each of the flash memory processing subarrays includes a plurality of programmable semiconductor devices arranged in an array, gates of all the programmable semiconductor devices of each row are connected to a same analog voltage input terminal, a plurality of rows of the programmable semiconductor devices are correspondingly connected to a plurality of analog voltage input terminals, drains of all the programmable semiconductor devices of each column are connected to a same first terminal, a plurality of columns of the programmable semiconductor devices are correspondingly connected to a plurality of first terminals, sources of all the programmable semiconductor devices of each column are correspondingly connected to a same second terminal, a plurality of columns of the programmable semiconductor devices are correspondingly connected to a plurality of second terminals, and threshold voltages of each of the programmable semiconductor devices are adjustable, wherein the first terminal is a bias voltage input terminal, the second terminal is an analog current output terminal, a topology structure of gate coupling and source summing is achieved, see fig. 7, or the first terminal is an analog current output terminal, the second terminal is a bias voltage input terminal, and a topology structure of gate coupling and drain summing is achieved, see fig. 8.
Specifically, the flash memory processing subarray treats each programmable semiconductor device as a variable equivalent analog weight by adjusting the threshold voltage of the programmable semiconductor device, which is equivalent to analog matrix data, and applies analog voltage to the programmable semiconductor device array to realize the matrix multiplication operation function.
In an alternative embodiment, the software-definable memory chip may also include programming circuitry 22.
The programming circuit 22 is coupled to the source, gate and/or substrate of each flash memory cell in the flash memory processing array for regulating the threshold voltage of the flash memory cell.
The programming circuit comprises a voltage generating circuit for generating a programming voltage or an erasing voltage and a voltage control circuit for loading the programming voltage to a selected flash memory cell.
Specifically, the programming circuit applies a high voltage to the source of the flash memory cell according to the flash memory cell threshold voltage requirement data using a hot electron injection effect to accelerate channel electrons to a high speed to increase the threshold voltage of the flash memory cell.
And, the programming circuit utilizes tunneling effect, according to the threshold voltage requirement data of the flash memory unit, applies high voltage to the grid electrode or the substrate of the flash memory unit, thereby reducing the threshold voltage of the flash memory unit.
In addition, the control module 10 is connected to the programming circuit for controlling the programming circuit according to the configuration information to adjust the weight stored in the flash memory processing array 20.
In an alternative embodiment, the software-definable memory chip may further include a rank decoder.
The row-column decoder is connected to the flash memory processing array 20 and the control module 10, and is used for decoding the rows and columns of the flash memory processing array 20 under the control of the control module 10.
In an alternative embodiment, the programmable semiconductor device may be implemented with a floating gate transistor.
The flash memory processing array comprises a NOR type flash memory processing array and a NAND type flash memory processing array, and the invention is not limited thereto.
Based on the above, the present application provides a scenario for implementing neural network operation by using the software-defined and integrated memory chip according to the embodiment of the present application, so as to illustrate a workflow of the software-defined and integrated memory chip.
The neural network is used for realizing operation on the data P, the neural network comprises R layers of neurons, each layer of neurons mainly realizes vector-matrix multiplication operation, and the neurons of each layer are connected through a certain arithmetic operation (because the application focuses on a software-defined storage integrated chip and a software definition method thereof, the operation of the neural network is not deeply described herein, and only the operation architecture is described to exemplarily illustrate the workflow of the software-defined storage integrated chip, but not limit the application).
For the neural network operation, the workflow of the software-definable memory integrated chip is as follows:
The control module 10 obtains configuration information and finite state machine information, where the configuration information and finite state machine information include R cycles of configuration information and finite state machine information, where R cycles correspond to operations (e.g., convolution, pooling, etc.) of R-layer neurons of the neural network, and each cycle corresponds to an operation of one-layer neuron. The configuration information of each period includes configuration information of the flash memory processing subarray, configuration information of the programmable arithmetic operation unit, configuration information of the output register file, configuration information of the input register file, and the like. The control module 10 divides the flash memory processing array 20 into R flash memory processing sub-arrays according to the configuration information, each flash memory processing sub-array corresponds to a period, that is, each flash memory processing sub-array implements operation of one layer of the neural network, and then the control module 10 controls the operation timing sequence of each circuit module according to the finite state machine information.
The input interface module 40 receives the data P;
The control module 10 controls a multiplexer (DEMUX) a at the front end of the input register file 50 according to the configuration information and the finite state machine information of the first period, so that the input interface module 40 is communicated with the input register file 50, controls a demultiplexer (MUX) Q at the front end of the flash memory processing array 20, so that the digital-to-analog conversion module 60 is communicated with the flash memory processing subarray 1 corresponding to the first layer of the neural network, controls a multiplexer B at the rear end of the flash memory processing array 20, so that the flash memory processing subarray 1 is communicated with the analog-to-digital conversion module 70, controls a selector and a selector of each programmable arithmetic operation unit of the programmable arithmetic operation module 30, so as to realize arithmetic operation 1 corresponding to the first layer of the neural network, and controls a demultiplexer W at the output end of the output register file 80 and the multiplexer (DEMUX) a at the front end of the input register file 50 after the data P is input to the input register file 50, so that the input end of the input register file 50 is communicated with the output end of the output register file 80, so as to realize the operation configuration of the first period;
The data P is temporarily stored in the input register file 50 and then is input to the digital-to-analog conversion module 60, converted into an analog signal and then is input to the flash processing sub-array 1, the flash processing sub-array 1 performs an analog vector-matrix multiplication operation 1 (such as a matrix multiplication operation) on the analog signal, the analog vector-matrix multiplication operation result 1 is converted into a digital signal through the analog-to-digital conversion module 70, the digital signal is obtained through the programmable arithmetic operation module 30, and the digital signal is input to the input register file 50 through the output register file 80, so as to finish the operation of the first layer neural network;
At this time, the control module 10 is automatically triggered, and the control module 10 controls a demultiplexer (MUX) Q at the front end of the flash memory processing array 20 according to the configuration information and the finite state machine information of the second period, so that the digital-to-analog conversion module 60 is communicated with the flash memory processing sub-array 2 corresponding to the second layer of the neural network, controls a multiplexer B at the rear end of the flash memory processing array 20, and causes the flash memory processing sub-array 2 to be communicated with the analog-to-digital conversion module 70, controls the selectors of each programmable arithmetic operation unit of the programmable arithmetic operation module 30, and realizes the arithmetic operation 2 corresponding to the second layer of the neural network, thereby realizing the operation architecture configuration of the second period.
The arithmetic operation result 1 of the first layer of neural network is temporarily stored through the input register file 50 and then is input to the digital-to-analog conversion module 60, and is converted into an analog signal and then is input to the flash processing sub-array 2, the flash processing sub-array 2 performs analog vector-matrix multiplication operation (such as matrix multiplication operation) on the analog signal, the analog vector-matrix multiplication operation result is converted into a digital signal through the analog-to-digital conversion module 70, the arithmetic operation result 2 is obtained through the programmable arithmetic operation module 30, the digital signal is input to the input register file 50 after passing through the output register file 80, so as to finish the operation of the second layer of neural network, and so on until the last layer of neural network, wherein when the last layer of neural network is configured, the demultiplexer W at the output end of the output register file 80 is controlled, so that the output end of the output register file 80 is connected with the input end of the output interface module 90, and the operation result of the whole neural network is output to the external equipment through the output interface module 90.
It will be understood by those skilled in the art that when a neural network of a certain layer only needs arithmetic operation and does not need analog vector-matrix multiplication operation, the demultiplexer E output by the input register file 50 is controlled only when the control module 10 configures a circuit, so that the output end of the input register file 50 is communicated with the input end of the arithmetic operation module 30, and other configuration processes are not repeated.
According to the technical scheme, the software-defined memory integrated chip provided by the embodiment of the invention can flexibly combine the chip architecture according to actual application requirements by matching the control module with the flash memory processing subarrays and the programmable arithmetic operation units, can realize complex operation tasks, is suitable for various application occasions such as voice processing, image processing, machine processing, artificial Intelligence (AI) and the like, and can realize multiplexing of peripheral circuits such as ADC, DAC, registers, programmable arithmetic operation units and the like, thereby reducing the circuit area, adapting to the requirements of integration and miniaturization and effectively reducing the chip cost.
FIG. 9 is a third block diagram of a software-defined and computationally intensive chip according to an embodiment of the present invention. As shown in fig. 9, the input of the input register file 50 is connected to the output of the input interface module 40 and the output of the output register file 80 through a multiplexer (DEMUX) 100 on the basis of the software-definable memory unified chip shown in fig. 2 to selectively receive external input data from the input interface module 40 or data to be processed from the output register file 80. The control module 10 is coupled to the multiplexer (DEMUX) 100.
The digital-to-analog conversion module 60 is selectively coupled to the plurality of flash processing sub-arrays (20 1~20n) via a demultiplexer (MUX) 120. The control module 10 is connected to the demultiplexer Q.
The outputs of the flash processing sub-arrays (20 1~20n) are coupled to the analog-to-digital conversion module 70 via a multiplexer 130. The control module 10 is connected to the multiplexer B.
An input of the programmable arithmetic operation module 30 is connected to an output of the demultiplexer 110 and an output of the analog-to-digital conversion module 70 through a multiplexer 140.
A plurality of the programmable arithmetic operation units 30 1~30n of the programmable arithmetic operation module 30 are connected in series, each of the programmable arithmetic operation units including a selector 30a and an arithmetic operation subunit 30b.
The input end of the selector 30a is connected to a programmable arithmetic unit or the analog-to-digital conversion module 70, one output end is connected to the arithmetic operator unit 30b, the other output end and the output end of the arithmetic operator unit 30b are connected to the next programmable arithmetic operator unit or the output register file 80 through a selector, and the control end is connected to the control module 20.
The output of the output register file 80 is selectively coupled to either the input of the output interface module 90 or the input of the input register file 50 via a demultiplexer 150. The control module 20 is connected to the demultiplexer W, and controls the working state of the demultiplexer W according to the configuration information to select whether to output the output result of the output register file 80 to the output interface module 90 or to the input register file 50, and when the output result of the output register file 80 is selected to be output to the input register file 50, it means that a new round of operation processing will be performed on the output result.
The output end of the input register file 50 is selectively connected to the input end of the digital-to-analog conversion module 50 or the input end of the programmable arithmetic operation module 30 through a demultiplexer 110, the control module 10 is connected to the demultiplexer E, and controls the working state of the demultiplexer E according to configuration information to select whether the output end of the input register file 50 is connected to the input end of the digital-to-analog conversion module 50 or the input end of the programmable arithmetic operation module 30, wherein when the output end of the input register file 50 is connected to the input end of the digital-to-analog conversion module 50, it means that analog vector-matrix multiplication operation and arithmetic operation are performed on the output of the input register file 50, and when the output end of the input register file 50 is connected to the input end of the programmable arithmetic operation module 30, it means that certain arithmetic operation is performed on the output of the input register file 50, thereby further increasing the flexibility of the chip architecture.
It will be understood by those skilled in the art that when a neural network of a certain layer only needs arithmetic operation and does not need analog vector-matrix multiplication operation, the demultiplexer E output by the input register file 50 is controlled only when the control module 10 configures a circuit, so that the output end of the input register file 50 is communicated with the input end of the arithmetic operation module 30, and other configuration processes are not repeated.
In addition, as can be appreciated by those skilled in the art, when generating the configuration information according to the actual application requirement, the configuration information may be implemented according to a preset instruction-architecture correspondence table.
It should be noted that, when the configuration information is generated according to the actual application requirement, the number of the flash memory processing subarrays to be input and the scale of each flash memory processing subarray can be known, at this time, the dividing instruction of the flash memory processing array can be obtained according to the actual application requirement, and then the flash memory processing array is divided into a plurality of flash memory processing subarrays according to the dividing instruction, so as to correspond to the multiplication scale of a plurality of matrixes.
It will be understood by those skilled in the art that when the software-defined memory integrated chip according to the embodiment of the present invention is applied, when performing a plurality of period operations, the flash memory processing sub-arrays corresponding to the period may be programmed in each period, or before performing each period operation, each flash memory processing sub-array may be uniformly programmed according to a programming instruction.
FIG. 10 is a flowchart of a software defining method according to an embodiment of the present invention, where the software defining method is applied to the software-definable memory integrated chip. As shown in fig. 10, the software definition method includes the following:
step S1001, acquiring configuration information and finite state machine information.
The configuration information and the finite state machine information can be obtained through a compiling tool according to actual application requirements.
Step S1002, the plurality of flash memory processing subarrays, the plurality of programmable arithmetic operation units, the output register file and other circuit modules are configured according to the configuration information, so that the dynamic configuration of the chip architecture is realized.
Step S1003 is to control the operation timing of the flash memory processing array, the programmable arithmetic operation module, the output register file, and other circuit modules according to the finite state machine information.
Specifically, the plurality of flash memory processing sub-arrays and the plurality of programmable arithmetic units are configured in a combined manner according to the configuration information, the flash memory processing sub-arrays and the programmable arithmetic units which are put into operation are selected, and a combined pairing mode of the flash memory processing sub-arrays and the programmable arithmetic units is controlled to realize specific operation.
Because each of the plurality of programmable arithmetic operation units can realize one or more arithmetic operations, the plurality of programmable arithmetic operation units can be arranged and combined to form a plurality of compound operations, and the compound operations are matched with the plurality of flash memory processing subarrays, so that a plurality of combined configurations can be realized, and further, complex operation functions are realized.
The arithmetic operation comprises one or a combination of a plurality of multiplication operation, addition operation, subtraction operation, division operation, shift operation, activation function, maximum value taking, minimum value taking, average value taking, pooling and the like.
The analog vector-matrix multiplication operation realized by the flash memory processing subarray mainly comprises an analog vector-matrix multiplication operation.
The software definition method provided by the embodiment of the invention can carry out combined configuration on the flash memory processing subarrays and the programmable arithmetic operation units according to actual application requirements, realize dynamic configuration of a chip architecture, flexibly adjust the chip architecture according to actual tasks, realize multiplexing of peripheral circuits such as ADC, DAC, register, programmable arithmetic operation units and the like, further reduce circuit area, adapt to the requirements of integration and miniaturization, and effectively reduce the chip cost.
In an alternative embodiment, the step S1002 includes:
Step 1, dividing the flash memory processing array into a plurality of flash memory processing subarrays according to configuration information of the flash memory processing subarrays, and controlling working time sequences of the plurality of flash memory processing subarrays according to finite state machine information;
step 2, controlling the working state of the selector corresponding to each programmable arithmetic unit according to the configuration information of the programmable arithmetic unit, enabling the programmable arithmetic units to realize any combination operation, and controlling the working time sequence of the programmable arithmetic units according to the finite state machine information;
and 3, controlling the output register file 80 to output the data in the output register file 80 or to be processed data to the input register file 50 according to the configuration information of the output register file 80.
Based on the above, the present application provides a scenario in which the software-definable memory integrated chip is software-defined by using the software-defined method according to the embodiment of the present application to implement neural network operation, so as to describe the workflow of the software-defined method.
The neural network is used for realizing operation on the data P, and the neural network comprises R layers of neurons, each layer of neurons mainly realizes matrix multiplication operation, and the neurons of each layer are connected through a certain arithmetic operation (because the focus of the example is on describing a software definition method, the operation of the neural network is not deeply described herein, only the operation architecture is described, and the flow of the software definition method is exemplified and not limited by the invention).
For the neural network operation, the workflow of the software defined method is as follows:
(1) Configuration information and finite state machine information are obtained. The configuration information includes configuration information of R cycles, where R cycles correspond to operations (such as convolution, pooling, etc.) of R-layer neurons of the neural network, and each cycle corresponds to an operation of one-layer neuron. The configuration information of each period includes configuration information of the flash memory processing subarray, configuration information of the programmable arithmetic operation unit, configuration information of the output register file, configuration information of the input register file, and the like. The control module 10 divides the flash processing array 20 into R flash processing sub-arrays according to the configuration information, and each flash processing sub-array corresponds to a period, that is, each flash processing sub-array implements a layer of operation of the neural network.
(2) Controlling a multiplexer (DEMUX) a at the front end of the input register file 50 according to the configuration information and the finite state machine information of the first period, enabling the input interface module 40 to be communicated with the input register file 50, controlling a demultiplexer (MUX) Q at the front end of the flash memory processing array 20, enabling the digital-to-analog conversion module 60 to be communicated with the flash memory processing sub-array 1 corresponding to the first layer of the neural network, controlling a multiplexer B at the rear end of the flash memory processing array 20, enabling the flash memory processing sub-array 1 to be communicated with the analog-to-digital conversion module 70, controlling a selector of each programmable arithmetic operation unit of the programmable arithmetic operation module 30 to realize arithmetic operation 1 corresponding to the first layer of the neural network, and controlling a demultiplexer W at the output end of the output register file 80 and a multiplexer (DEMUX) a at the front end of the input register file 50 after the data P is input to the input register file 50, enabling the input end of the input register file 50 to be communicated with the output end of the output register file 80 to realize the configuration of the arithmetic architecture of the first period;
(3) According to the configuration information of the second period and the finite state machine information, a demultiplexer (MUX) Q at the front end of the flash memory processing array 20 is controlled, so that the digital-to-analog conversion module 60 is communicated with the flash memory processing sub-array 2 corresponding to the second layer of the neural network, a multiplexer B at the rear end of the flash memory processing array 20 is controlled, the flash memory processing sub-array 2 is communicated with the analog-to-digital conversion module 70, and a selector of each programmable arithmetic operation unit of the programmable arithmetic operation module 30 is controlled, so that the arithmetic operation 2 corresponding to the second layer of the neural network is realized, and the configuration of the operation architecture of the second period is realized. .., and so on, until the last layer of neural network configuration step, wherein when the last layer of neural network configuration is performed, controlling the demultiplexer W at the output end of the output register file 80, so that the output end of the output register file 80 is connected with the input end of the output interface module 90, and further, the operation result of the whole neural network is output to an external device through the output interface module 90.
It will be understood by those skilled in the art that when a neural network of a certain layer only needs arithmetic operation and does not need analog vector-matrix multiplication operation, the demultiplexer E output by the input register file 50 is controlled only when the circuit configuration is performed, so that the output end of the input register file 50 is communicated with the input end of the arithmetic operation module 30, and other configuration processes are not repeated.
The embodiment of the invention also provides electronic equipment, which can execute a neural network algorithm, wherein the neural network comprises a plurality of layers of neurons, each layer of neurons carries out corresponding operation according to the output result of the neuron on the upper layer, and the electronic equipment comprises the software-defined storage integrated chip.
The embodiment of the invention also provides another electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of the software definition method.
The electronic device may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
The embodiments of the present invention also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the software defined method described above.
While the principles and embodiments of the present invention have been described in detail in the foregoing application of the principles and embodiments of the present invention, the above examples are provided for the purpose of aiding in the understanding of the principles and concepts of the present invention and may be varied in many ways by those of ordinary skill in the art in light of the teachings of the present invention, and the above descriptions should not be construed as limiting the invention.