[go: up one dir, main page]

CN111611195B - Software-definable storage-computing integrated chip and software-defining method thereof - Google Patents

Software-definable storage-computing integrated chip and software-defining method thereof Download PDF

Info

Publication number
CN111611195B
CN111611195B CN201910143132.2A CN201910143132A CN111611195B CN 111611195 B CN111611195 B CN 111611195B CN 201910143132 A CN201910143132 A CN 201910143132A CN 111611195 B CN111611195 B CN 111611195B
Authority
CN
China
Prior art keywords
module
register file
flash memory
arithmetic operation
analog
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910143132.2A
Other languages
Chinese (zh)
Other versions
CN111611195A (en
Inventor
王绍迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Zhicun Computing Technology Co ltd
Original Assignee
Hangzhou Zhicun Computing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Zhicun Computing Technology Co ltd filed Critical Hangzhou Zhicun Computing Technology Co ltd
Priority to CN201910143132.2A priority Critical patent/CN111611195B/en
Priority to PCT/CN2019/081339 priority patent/WO2020172951A1/en
Publication of CN111611195A publication Critical patent/CN111611195A/en
Application granted granted Critical
Publication of CN111611195B publication Critical patent/CN111611195B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
    • G06F15/7871Reconfiguration support, e.g. configuration loading, configuration switching, or hardware OS
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
    • G06F15/7871Reconfiguration support, e.g. configuration loading, configuration switching, or hardware OS
    • G06F15/7882Reconfiguration support, e.g. configuration loading, configuration switching, or hardware OS for self reconfiguration
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Logic Circuits (AREA)
  • Microcomputers (AREA)

Abstract

本发明提供了一种可软件定义的存算一体芯片及其软件定义方法,该可软件定义的存算一体芯片的闪存处理阵列包括用于分别执行不同模拟向量‑矩阵乘法运算的多个闪存处理子阵列,可编程算术运算模块包括用于分别实现不同算术运算的多个可编程算术运算单元,控制模块根据实际应用的配置信息与有限状态机信息对存算一体芯片中各模块进行组合配置,实现芯片中电路结构的动态配置,使芯片能够根据实际任务灵活调节芯片的电路结构,且ADC、DAC、寄存器、可编程算术运算单元等外围电路能够实现复用,进而减小了电路面积,适应集成化、小型化的需要,并有效降低了芯片成本。

The present invention provides a software-definable storage-computing integrated chip and a software definition method thereof. The flash memory processing array of the software-definable storage-computing integrated chip includes a plurality of flash memory processing sub-arrays for respectively executing different analog vector-matrix multiplication operations, and the programmable arithmetic operation module includes a plurality of programmable arithmetic operation units for respectively implementing different arithmetic operations. The control module combines and configures the modules in the storage-computing integrated chip according to the configuration information of the actual application and the finite state machine information, so as to realize the dynamic configuration of the circuit structure in the chip, so that the chip can flexibly adjust the circuit structure of the chip according to the actual task, and the peripheral circuits such as ADC, DAC, register, and programmable arithmetic operation unit can be reused, thereby reducing the circuit area, meeting the needs of integration and miniaturization, and effectively reducing the chip cost.

Description

Software-defined memory integrated chip and software definition method thereof
Technical Field
The present invention relates to the field of semiconductor integrated circuits, and more particularly, to a software-definable integrated memory chip and a software definition method thereof.
Background
Flash memory is a non-volatile memory that enables the storage of data by regulating the threshold voltage of the flash transistor. Flash memories are largely classified into NOR-type flash memories and NAND-type flash memories according to the difference of flash transistors and array structures. The NAND-type flash memory has large capacity and low cost, is widely applied to a large-scale independent memory, supports random access of data, has lower density, smaller capacity and higher cost compared with the NAND-type flash memory, and is mainly applied to an embedded memory.
In recent years, in order to solve the bottleneck of the traditional von neumann Computing architecture, a Computing-In-Memory (CIM) chip architecture has been widely focused, and the basic idea is to directly utilize a Memory to perform logic computation, so as to reduce the data transmission amount and transmission distance between the Memory and a processor, reduce power consumption and improve performance.
Once the existing integrated chip architecture is customized, the circuit structure is fixed and cannot be flexibly adjusted according to the actual task, and the circuit modules cannot be shared, so that the circuit area is large, and the circuit modules cannot meet the requirements of integration and miniaturization.
Disclosure of Invention
In view of this, the present invention provides a software-definable integrated memory chip, method, device and apparatus, in which a plurality of flash memory processing sub-arrays, a plurality of programmable arithmetic units and a control module are adopted to cooperate, so that the circuit structure of the chip is dynamically configured according to the actual application requirements, flexible adjustment can be performed according to the actual tasks, and peripheral circuits such as ADC, DAC, register, programmable arithmetic units and the like can be multiplexed, thereby reducing the circuit area and adapting to the needs of integration and miniaturization.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
In a first aspect, a software-definable memory chip is provided, comprising a flash memory processing array, a programmable arithmetic operation module, and a control module coupled to the flash memory processing array and the programmable arithmetic operation module,
The flash memory processing array comprises a plurality of flash memory processing subarrays for respectively executing different analog vector-matrix multiplication operations;
the programmable arithmetic operation module includes a plurality of programmable arithmetic operation units for respectively implementing different arithmetic operations;
the control module carries out combined configuration on a plurality of flash memory processing subarrays and a plurality of programmable arithmetic operation units according to configuration information, and realizes dynamic configuration of a circuit structure in a chip.
Further, the software-definable memory chip further includes:
the input interface module is used for receiving external input data;
the input register file is connected with the input interface module and used for storing the external input data or the data to be processed;
The input end of the digital-to-analog conversion module is connected with the input register file, the output end of the digital-to-analog conversion module is connected with the flash memory processing array, the digital-to-analog conversion module is used for converting the external input data or the data to be processed into analog signals and outputting the analog signals to the flash memory processing array, and the flash memory processing array performs analog vector-matrix multiplication operation on the analog signals and outputs operation results;
The input end of the analog-to-digital conversion module is connected with the flash memory processing array, the output end of the analog-to-digital conversion module is connected with the programmable arithmetic operation module and is used for converting the analog vector-matrix multiplication result into a digital signal and outputting the digital signal to the programmable arithmetic operation module, and the programmable arithmetic operation module carries out arithmetic operation on the digital signal and outputs an arithmetic operation result;
The output register file is connected with the programmable arithmetic operation module and the input register file and is used for temporarily storing the arithmetic operation result and outputting the arithmetic operation result or outputting the arithmetic operation result to the input register file as the data to be processed;
The output interface module is connected with the output register file, receives output data of the output register file and outputs the output data outwards;
The control module is connected with the input interface module, the input register file, the digital-to-analog conversion module, the flash memory processing array, the analog-to-digital conversion module, the output register file, the programmable arithmetic operation module and the output interface module and is used for dynamically configuring the circuit modules according to actual application requirements.
Further, the output end of the input register file is also connected with the programmable arithmetic operation module.
Further, a plurality of the programmable arithmetic operation units are connected in series, each of the programmable arithmetic operation units including a demultiplexer, an arithmetic operation subunit, and a multiplexer;
the input end of the demultiplexer is connected with a programmable arithmetic operation unit or the analog-to-digital conversion module, one output end of the demultiplexer is connected with the arithmetic operation subunit, the other output end of the demultiplexer and the output end of the arithmetic operation subunit are connected with the next programmable arithmetic operation unit or the output register file through the multiplexer, and the control end of the demultiplexer is connected with the control module.
Further, the software-defined integrated memory chip further comprises a programming circuit connected with the control module, wherein the programming circuit is connected with the source electrode, the grid electrode and/or the substrate of each flash memory unit in the flash memory processing array and is used for regulating and controlling the threshold voltage of the flash memory unit;
the programming circuit includes a voltage generating circuit for generating a programming voltage or an erasing voltage and a voltage control circuit for applying the programming voltage to a selected flash memory cell.
Further, the software-definable memory chip further includes:
And the row-column decoder is connected with the flash memory processing array and the control module and is used for decoding the rows and the columns of the flash memory processing array under the control of the control module.
Further, the control module dynamically configures each circuit module connected with the control module according to configuration information, wherein the configuration information comprises configuration information of a flash memory processing subarray, configuration information of a programmable arithmetic operation unit, configuration information of a digital-to-analog conversion module, configuration information of an analog-to-digital conversion module, configuration information of an input interface module, configuration information of an output interface module, configuration information of an input register file and configuration information of an output register file, and the dynamically configuring each circuit module connected with the control module according to the configuration information comprises:
dividing the flash memory processing array into a plurality of flash memory processing subarrays according to the configuration information of the flash memory processing subarrays, and controlling the working time sequence of the plurality of flash memory processing subarrays;
controlling the working states of the demultiplexer and the multiplexer corresponding to each programmable arithmetic unit according to the configuration information of the programmable arithmetic unit, so that the programmable arithmetic units realize any combination operation;
control according to the configuration information of the digital-to-analog conversion module the digital-to-analog conversion circuit which participates in the actual task is opened and closed;
Controlling the on-off state of an analog-digital conversion circuit participating in an actual task according to the configuration information of the analog-digital conversion module;
controlling the switching state of an input interface circuit participating in an actual task according to the configuration information of the input interface module;
controlling the switching state of an output interface circuit participating in an actual task according to the configuration information of the output interface module;
Controlling the data to be stored in the input register to be derived from the input data of the input interface module or the data to be processed in the output register file according to the configuration information of the input register file;
and controlling the output register file to output the data in the output register file or to be processed data to the input register file according to the configuration information of the output register file.
In a second aspect, a software defining method of a software-definable integrated memory chip is provided, and is applied to the software-definable integrated memory chip, where the software defining method includes:
Acquiring configuration information and finite state machine information;
The method comprises the steps of configuring an input interface module, an input register file, a digital-to-analog conversion module, a flash memory processing array, an analog-to-digital conversion module, an output register file, a programmable arithmetic operation module and an output interface module according to configuration information, and realizing dynamic configuration of a circuit structure in a chip;
and controlling the working time sequence of the input interface module, the input register file, the digital-to-analog conversion module, the flash memory processing array, the analog-to-digital conversion module, the output register file, the programmable arithmetic operation module and the output interface module according to the information of the finite state machine.
Further, the software defining method includes:
Dividing the flash memory processing array into a plurality of flash memory processing subarrays according to the configuration information of the flash memory processing subarrays, and controlling the working time sequence of the plurality of flash memory processing subarrays according to the finite state machine information;
The working states of the selectors corresponding to the programmable arithmetic units are controlled according to the configuration information of the programmable arithmetic units, so that the programmable arithmetic units realize arbitrary combination operation, and the working time sequences of the programmable arithmetic units are controlled according to the finite state machine information.
In a third aspect, an electronic device is provided that includes the software-definable memory chip described above.
The invention provides a software-defined integrated memory chip, a method and electronic equipment, wherein a flash memory processing array of the software-defined integrated memory chip comprises a plurality of flash memory processing subarrays for respectively executing different analog vector-matrix multiplication operations, a programmable arithmetic operation module comprises a plurality of programmable arithmetic operation units for respectively realizing different arithmetic operations, a control module carries out combined configuration on each module of the integrated memory chip according to configuration information of practical application and finite state machine information, dynamic configuration of a circuit structure in the chip is realized, the chip can flexibly adjust the circuit structure in the chip according to practical tasks, peripheral circuits such as an ADC (analog-digital converter), a DAC (digital-analog converter), a register, a programmable arithmetic operation unit and the like can realize multiplexing, thereby reducing circuit area, adapting to the requirements of integration and miniaturization, and effectively reducing the cost of the chip.
The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular description of preferred embodiments, as illustrated in the accompanying drawings.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a block diagram of a software-definable memory chip according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating a second embodiment of a software-definable memory chip;
FIG. 3 is a block diagram of a programmable arithmetic unit 30 in a software-definable memory chip according to an embodiment of the present invention;
FIG. 4 is a block diagram of a programmable arithmetic operator unit in a software-definable unified memory chip according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a programmable arithmetic operation module in a software-definable memory chip for implementing a compound operation according to an embodiment of the present invention;
FIG. 6 is a block diagram of a flash processing sub-array in a software-definable unified memory chip according to an embodiment of the present invention;
FIG. 7 is a diagram illustrating a second embodiment of a flash processing sub-array in a software-definable unified memory chip;
FIG. 8 is a third block diagram of a flash processing sub-array in a software-definable unified memory chip according to an embodiment of the present invention;
FIG. 9 is a third block diagram of a software-definable memory chip according to an embodiment of the present invention;
FIG. 10 is a flow chart of a software definable method according to an embodiment of the invention;
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Once the existing integrated chip architecture is customized, the circuit structure is fixed, flexible adjustment cannot be performed according to actual tasks, and circuit modules cannot be shared, so that the circuit area is large.
In order to solve the above-mentioned problems in the prior art, the embodiment of the invention provides a software-definable integrated chip, a method and an electronic device, where a flash memory processing array of the software-definable integrated chip includes a plurality of flash memory processing subarrays for respectively executing different analog vector-matrix multiplication operations, a programmable arithmetic operation module includes a plurality of programmable arithmetic operation units for respectively implementing different arithmetic operations, and a control module performs combined configuration on each module of the integrated chip according to configuration information of practical application and finite state machine information, so as to implement dynamic configuration of circuit structures in the chip, and enable the chip to flexibly adjust circuit structures in the chip according to practical tasks, and peripheral circuits such as an ADC, a DAC, a register, a programmable arithmetic operation unit and the like can implement multiplexing, thereby reducing circuit area and adapting to the needs of integration and miniaturization.
FIG. 1 is a block diagram of a software-definable memory chip according to an embodiment of the present invention. As shown in fig. 1, the software-definable memory chip includes a flash memory processing array 20, a programmable arithmetic operation module 30, and a control module 10 coupled to the flash memory processing array 20 and the programmable arithmetic operation module 30,
The flash processing array 20 includes a plurality of flash processing sub-arrays (not shown in fig. 1) for respectively performing different analog vector-matrix multiplication operations.
The multiple flash memory processing subarrays may be flash memory processing subarrays with the same structure, or the structures of the flash memory processing subarrays may be set to be different according to actual application requirements, for example, the number of rows and the number of columns of the flash memory processing subarrays may be set according to the actual application requirements, which is not limited in the embodiment of the present invention.
The programmable arithmetic operation module 30 includes a plurality of programmable arithmetic operation units (not shown in fig. 1) for respectively realizing different arithmetic operations.
The programmable arithmetic unit is implemented in hardware for performing specific arithmetic operations.
The arithmetic operation comprises one or a combination of a plurality of multiplication operation, addition operation, subtraction operation, division operation, shift operation, activation function, maximum value taking, minimum value taking, average value taking, pooling and the like.
The control module 10 performs combined configuration on an input interface module, an input register file, a digital-to-analog conversion module, a flash memory processing array, an analog-to-digital conversion module, an output register file, a programmable arithmetic operation module and an output interface module in the chip according to configuration information and finite state machine information, so as to realize dynamic configuration of a circuit structure in the chip.
The configuration information and the finite state machine information can be obtained through a compiling tool according to actual application requirements.
Wherein the configuration information is typically static, such as specifying the status of each module participating in the task, the configuration size of each unit, and is typically stored in memory, and the scheduling is performed prior to execution of the task. Whereas finite state machine information is typically dynamic, controlling the timing and state of the actual task while it is running.
Specifically, the control module 10 performs a combined configuration on the plurality of flash memory processing sub-arrays and the plurality of programmable arithmetic units according to the configuration information, selects the flash memory processing sub-arrays and the programmable arithmetic units that are put into operation, and controls a combined pairing manner of the flash memory processing sub-arrays and the programmable arithmetic units to implement a specific operation.
It can be understood that each of the plurality of programmable arithmetic units may implement one or more kinds of arithmetic operations, and the plurality of programmable arithmetic units may be arranged to combine a plurality of kinds of complex operations, and cooperate with the plurality of flash memory processing sub-arrays, thereby implementing a plurality of kinds of combination configurations and further implementing complex operation functions.
As can be seen from the above description, the software-definable integrated memory chip provided in the embodiments of the present invention has a flash memory processing array including a plurality of flash memory processing sub-arrays for respectively performing different analog vector-matrix multiplication operations, and a programmable arithmetic operation module including a plurality of programmable arithmetic operation units for respectively implementing different arithmetic operations, where a control module performs a combined configuration on the plurality of flash memory processing sub-arrays and the plurality of programmable arithmetic operation units according to configuration information, so as to implement dynamic configuration of a chip architecture, not only flexibly adjust the chip architecture according to actual tasks, but also implement a plurality of complex operation functions, and peripheral circuits such as an ADC, a DAC, a register, a programmable arithmetic operation unit, etc. can implement multiplexing, thereby reducing circuit area, and adapting to the needs of integration and miniaturization.
In an alternative embodiment, referring to FIG. 2, the software-definable unified memory chip may further include an input interface module 40, an input register file 50, a digital-to-analog conversion module 60, an analog-to-digital conversion module 70, an output register file 80, and an output interface module 90.
The input terminal of the input interface module 40 is connected to an external device, and is used for receiving input data (i.e. data requiring operation) from the external device.
The input end of the input register file 50 is connected to the output end of the input interface module 40, and is used for temporarily storing the input data or data to be processed.
The input end of the digital-to-analog conversion module 60 is connected to the output end of the input register file 50, and the output end is connected to the input end of the flash memory processing array 20, so as to convert the external input data or the data to be processed output from the input register file 50 into analog signals and output the analog signals to the flash memory processing array 20, and the flash memory processing array 20 performs analog vector-matrix multiplication operation on the analog signals and outputs the analog vector-matrix multiplication operation result.
The input end of the analog-to-digital conversion module 70 is connected to the flash memory processing array 20, the output end is connected to the programmable arithmetic operation module 30, and the analog-to-digital conversion module is used for converting the analog vector-matrix multiplication result into a digital signal and outputting the digital signal to the programmable arithmetic operation module 30, and the programmable arithmetic operation module 30 performs arithmetic operation on the digital signal and outputs an arithmetic operation result.
The output register file 80 has an input terminal connected to the programmable arithmetic operation module 30 and an output terminal connected to the input register file 50, and is used for temporarily storing the arithmetic operation result and outputting the arithmetic operation result or outputting the arithmetic operation result as the data to be processed to the input register file 50.
An input terminal of the output interface module 90 is connected to an output terminal of the output register file 80, receives output data of the output register file 80, and outputs the output data to an external device.
The control module 10 is connected to the input interface module 40, the input register file 50, the digital-to-analog conversion module 60, the flash processing array 20, the analog-to-digital conversion module 70, the output register file 80, the programmable arithmetic operation module 30 and the output interface module 90, and is configured to dynamically configure the above circuit modules according to configuration information.
The control module 10 dynamically configures each circuit module connected to the control module according to configuration information, where the configuration information includes configuration information of the flash memory processing sub-array 20 1~20n, configuration information of the programmable arithmetic unit 30 1~30n, configuration information of the digital-to-analog conversion module 60, configuration information of the analog-to-digital conversion module 70, configuration information of the input interface module 40, configuration information of the output interface module 90, configuration information of the input register file 50, and configuration information of the output register file 80, and dynamically configuring each circuit module connected to the control module according to the configuration information may include:
The flash processing array 20 is divided into a plurality of flash processing sub-arrays 20 1~20n according to the configuration information of the flash processing sub-arrays 20 1~20n, and the operation timing of the plurality of flash processing sub-arrays 20 1~20n is controlled.
According to the configuration information of the programmable arithmetic operation unit 30 1~30n, the working state of the selector corresponding to each programmable arithmetic operation unit is controlled, so that a plurality of programmable arithmetic operation units realize any combination operation to participate in work.
Control according to configuration information of the digital-to-analog conversion module 60 the digital-to-analog conversion circuit which participates in the actual task is opened and closed;
Controlling the on-off state of an analog-digital conversion circuit participating in an actual task according to the configuration information of the analog-digital conversion module 70;
controlling the on-off state of an input interface circuit participating in an actual task according to the configuration information of the input interface module 40;
Controlling the switching state of an output interface circuit participating in an actual task according to the configuration information of the output interface module 90;
controlling the data to be stored in the input register to be derived from the input data of the input interface module or the data to be processed in the output register file according to the configuration information of the input register file 50;
The output register file 80 is controlled to output data therein or as data to be processed to the input register file 50 according to configuration information of the output register file 80.
Specifically, the input of the input register file 50 is connected to the output of the input interface module 40 and the output of the output register file 80 through a Multiplexer (MUX) 110 to selectively receive external input data from the input interface module 40 or data to be processed from the output register file 80. The control module 10 is connected to the Multiplexer (MUX) 100, and controls the multiplexer 100 according to the configuration information, thereby controlling whether the input register file 50 receives the external input data or the data to be processed.
The digital to analog conversion module 60 is selectively coupled to the plurality of flash processing sub-arrays (20 1~20n) via a Demultiplexer (DEMUX) 120. The control module 10 is connected to the demultiplexer 120 to control the demultiplexer Q according to the configuration information, so as to select which flash processing sub-array participates in the operation.
The outputs of the flash processing sub-arrays (20 1~20n) are coupled to the analog-to-digital conversion module 70 via a multiplexer 130. The control module 10 is connected to the multiplexer 130, and controls the multiplexer 130 according to the configuration information, so as to select which flash processing sub-array output is connected to the input terminal of the analog-to-digital conversion module 70, i.e. the output of the flash processing sub-array participating in the operation is connected to the input terminal of the analog-to-digital conversion module 70.
An input of the programmable arithmetic operation module 30 is connected to an output of the demultiplexer 110 and an output of the analog-to-digital conversion module 70 through a multiplexer 140.
A plurality of the programmable arithmetic operation units 30 1~30n of the programmable arithmetic operation module 30, each of which includes a demultiplexer 30a, an arithmetic operation subunit 30b, and a multiplexer 30c, see fig. 3, are connected in series.
The input end of the demultiplexer 30a is connected to a programmable arithmetic unit or the analog-to-digital conversion module 70, one output end is connected to the arithmetic operator unit 30b, the output end of the arithmetic operator unit 30b and the other output end of the demultiplexer 30a are connected to the next programmable arithmetic operator unit or the output register file 80 through a multiplexer 30c, and in addition, the control ends of the demultiplexer 30a and the multiplexer 30c are connected to the control module 20.
Specifically, the input terminal of the demultiplexer in the first programmable arithmetic unit 30 1 is connected to the output terminal of the analog-to-digital conversion module 70, one of the output terminals is connected to the input terminal of the arithmetic operator unit in the first programmable arithmetic unit 30 1, the other output terminal and the output terminal of the arithmetic operator unit are connected to the input terminal of the second programmable arithmetic unit 30 2 through a multiplexer, and the control terminals of the demultiplexer and the multiplexer are connected to the control module 20.
The input of the demultiplexer in the second programmable arithmetic unit 30 2 is connected to the output of the first programmable arithmetic unit 30 1, one of the outputs is connected to the input of the arithmetic operator unit in the second programmable arithmetic unit 30 2, the other output and the output of the arithmetic operator unit are connected to the input of the third programmable arithmetic unit 30 3 through a multiplexer, and the control of the demultiplexer and the multiplexer is connected to the control module 20. And so on, up to the nth programmable arithmetic unit 30 n, the input of the demultiplexer in the nth programmable arithmetic unit 30 n is connected to the output of the n-1 th programmable arithmetic unit 30 n-1, one of the outputs is connected to the input of the arithmetic operator unit in the nth programmable arithmetic unit 30 n, the other output and the output of the arithmetic operator unit are connected to the input of the output register file 80 through a multiplexer, and the control terminals of the demultiplexer and the multiplexer are connected to the control module 20.
The control module 20 is connected to the demultiplexer and the multiplexer in each programmable arithmetic operation unit, controls the demultiplexer and the multiplexer in each programmable arithmetic operation unit according to configuration information to select whether the arithmetic operation subunit in the programmable arithmetic operation unit participates in operation, thereby realizing the arrangement and combination configuration of a plurality of programmable arithmetic operation units, realizing different complex operations, and flexibly configuring arithmetic operation functions.
In an alternative embodiment, each of the programmable arithmetic operator units may include a plurality of arithmetic operators disposed side by side, such as one or more of a multiplier, an adder, a subtractor, a divider, a shifter, an activation function, a maximum operator, a minimum operator, an average operator, and a pooler, where the arithmetic operators are connected in parallel, and the inputs are respectively connected to the outputs of the corresponding demultiplexers, and the outputs are respectively connected to the inputs of the corresponding demultiplexers, see fig. 4.
The process by which the programmable arithmetic operation module performs the compound operation is shown in fig. 5.
The output of the output register file 80 is selectively coupled to either the input of the output interface module 90 or the input of the input register file 50 via a demultiplexer 150. The control module 20 is connected to the demultiplexer 150, and controls the working state of the demultiplexer 150 according to the configuration information to select whether to output the output result of the output register file 80 to the output interface module 90 or to the input register file 50, and when the output result of the output register file 80 is selected to be output to the input register file 50, it means that a new round of operation processing will be performed on the output result.
In an alternative embodiment, the output end of the input register file 50 may be selectively connected to the input end of the digital-to-analog conversion module 50 or the input end of the programmable arithmetic operation module 30 through a demultiplexer 110, and the control module 10 is connected to the demultiplexer 110, and controls the working state of the demultiplexer 110 according to the configuration information, so as to select whether the output end of the input register file 50 is connected to the input end of the digital-to-analog conversion module 50 or the input end of the programmable arithmetic operation module 30, wherein when the output end of the input register file 50 is connected to the input end of the digital-to-analog conversion module 50, it means performing an analog vector-matrix multiplication operation and an arithmetic operation on the output of the input register file 50, and when the output end of the input register file 50 is connected to the input end of the programmable arithmetic operation module 30, it means performing a certain arithmetic operation on the output of the input register file 50, thereby further increasing the flexibility of the chip architecture.
In an alternative embodiment, each of the flash memory processing sub-arrays employs a source-coupled, drain-summed topology, see FIG. 6, comprising a plurality of programmable semiconductor devices (also referred to as flash memory cells) arranged in an array.
The programmable semiconductor device comprises a plurality of programmable semiconductor devices, a plurality of analog current output terminals, a plurality of bias voltage input terminals, a plurality of programmable semiconductor devices, a plurality of voltage control circuits and a plurality of voltage control circuits, wherein the source electrodes of all the programmable semiconductor devices of each column are connected to the same analog voltage input terminal, the programmable semiconductor devices of the plurality of columns are correspondingly connected to the plurality of analog voltage input terminals, the drain electrodes of all the programmable semiconductor devices of each column are correspondingly connected to the same analog current output terminal, the grid electrodes of all the programmable semiconductor devices of each row are correspondingly connected to the same bias voltage input terminal, the programmable semiconductor devices of the plurality of rows are correspondingly connected to the plurality of bias voltage input terminals, and the threshold voltage of each programmable semiconductor device can be adjusted.
In another alternative embodiment, each of the flash memory processing subarrays includes a plurality of programmable semiconductor devices arranged in an array, gates of all the programmable semiconductor devices of each row are connected to a same analog voltage input terminal, a plurality of rows of the programmable semiconductor devices are correspondingly connected to a plurality of analog voltage input terminals, drains of all the programmable semiconductor devices of each column are connected to a same first terminal, a plurality of columns of the programmable semiconductor devices are correspondingly connected to a plurality of first terminals, sources of all the programmable semiconductor devices of each column are correspondingly connected to a same second terminal, a plurality of columns of the programmable semiconductor devices are correspondingly connected to a plurality of second terminals, and threshold voltages of each of the programmable semiconductor devices are adjustable, wherein the first terminal is a bias voltage input terminal, the second terminal is an analog current output terminal, a topology structure of gate coupling and source summing is achieved, see fig. 7, or the first terminal is an analog current output terminal, the second terminal is a bias voltage input terminal, and a topology structure of gate coupling and drain summing is achieved, see fig. 8.
Specifically, the flash memory processing subarray treats each programmable semiconductor device as a variable equivalent analog weight by adjusting the threshold voltage of the programmable semiconductor device, which is equivalent to analog matrix data, and applies analog voltage to the programmable semiconductor device array to realize the matrix multiplication operation function.
In an alternative embodiment, the software-definable memory chip may also include programming circuitry 22.
The programming circuit 22 is coupled to the source, gate and/or substrate of each flash memory cell in the flash memory processing array for regulating the threshold voltage of the flash memory cell.
The programming circuit comprises a voltage generating circuit for generating a programming voltage or an erasing voltage and a voltage control circuit for loading the programming voltage to a selected flash memory cell.
Specifically, the programming circuit applies a high voltage to the source of the flash memory cell according to the flash memory cell threshold voltage requirement data using a hot electron injection effect to accelerate channel electrons to a high speed to increase the threshold voltage of the flash memory cell.
And, the programming circuit utilizes tunneling effect, according to the threshold voltage requirement data of the flash memory unit, applies high voltage to the grid electrode or the substrate of the flash memory unit, thereby reducing the threshold voltage of the flash memory unit.
In addition, the control module 10 is connected to the programming circuit for controlling the programming circuit according to the configuration information to adjust the weight stored in the flash memory processing array 20.
In an alternative embodiment, the software-definable memory chip may further include a rank decoder.
The row-column decoder is connected to the flash memory processing array 20 and the control module 10, and is used for decoding the rows and columns of the flash memory processing array 20 under the control of the control module 10.
In an alternative embodiment, the programmable semiconductor device may be implemented with a floating gate transistor.
The flash memory processing array comprises a NOR type flash memory processing array and a NAND type flash memory processing array, and the invention is not limited thereto.
Based on the above, the present application provides a scenario for implementing neural network operation by using the software-defined and integrated memory chip according to the embodiment of the present application, so as to illustrate a workflow of the software-defined and integrated memory chip.
The neural network is used for realizing operation on the data P, the neural network comprises R layers of neurons, each layer of neurons mainly realizes vector-matrix multiplication operation, and the neurons of each layer are connected through a certain arithmetic operation (because the application focuses on a software-defined storage integrated chip and a software definition method thereof, the operation of the neural network is not deeply described herein, and only the operation architecture is described to exemplarily illustrate the workflow of the software-defined storage integrated chip, but not limit the application).
For the neural network operation, the workflow of the software-definable memory integrated chip is as follows:
The control module 10 obtains configuration information and finite state machine information, where the configuration information and finite state machine information include R cycles of configuration information and finite state machine information, where R cycles correspond to operations (e.g., convolution, pooling, etc.) of R-layer neurons of the neural network, and each cycle corresponds to an operation of one-layer neuron. The configuration information of each period includes configuration information of the flash memory processing subarray, configuration information of the programmable arithmetic operation unit, configuration information of the output register file, configuration information of the input register file, and the like. The control module 10 divides the flash memory processing array 20 into R flash memory processing sub-arrays according to the configuration information, each flash memory processing sub-array corresponds to a period, that is, each flash memory processing sub-array implements operation of one layer of the neural network, and then the control module 10 controls the operation timing sequence of each circuit module according to the finite state machine information.
The input interface module 40 receives the data P;
The control module 10 controls a multiplexer (DEMUX) a at the front end of the input register file 50 according to the configuration information and the finite state machine information of the first period, so that the input interface module 40 is communicated with the input register file 50, controls a demultiplexer (MUX) Q at the front end of the flash memory processing array 20, so that the digital-to-analog conversion module 60 is communicated with the flash memory processing subarray 1 corresponding to the first layer of the neural network, controls a multiplexer B at the rear end of the flash memory processing array 20, so that the flash memory processing subarray 1 is communicated with the analog-to-digital conversion module 70, controls a selector and a selector of each programmable arithmetic operation unit of the programmable arithmetic operation module 30, so as to realize arithmetic operation 1 corresponding to the first layer of the neural network, and controls a demultiplexer W at the output end of the output register file 80 and the multiplexer (DEMUX) a at the front end of the input register file 50 after the data P is input to the input register file 50, so that the input end of the input register file 50 is communicated with the output end of the output register file 80, so as to realize the operation configuration of the first period;
The data P is temporarily stored in the input register file 50 and then is input to the digital-to-analog conversion module 60, converted into an analog signal and then is input to the flash processing sub-array 1, the flash processing sub-array 1 performs an analog vector-matrix multiplication operation 1 (such as a matrix multiplication operation) on the analog signal, the analog vector-matrix multiplication operation result 1 is converted into a digital signal through the analog-to-digital conversion module 70, the digital signal is obtained through the programmable arithmetic operation module 30, and the digital signal is input to the input register file 50 through the output register file 80, so as to finish the operation of the first layer neural network;
At this time, the control module 10 is automatically triggered, and the control module 10 controls a demultiplexer (MUX) Q at the front end of the flash memory processing array 20 according to the configuration information and the finite state machine information of the second period, so that the digital-to-analog conversion module 60 is communicated with the flash memory processing sub-array 2 corresponding to the second layer of the neural network, controls a multiplexer B at the rear end of the flash memory processing array 20, and causes the flash memory processing sub-array 2 to be communicated with the analog-to-digital conversion module 70, controls the selectors of each programmable arithmetic operation unit of the programmable arithmetic operation module 30, and realizes the arithmetic operation 2 corresponding to the second layer of the neural network, thereby realizing the operation architecture configuration of the second period.
The arithmetic operation result 1 of the first layer of neural network is temporarily stored through the input register file 50 and then is input to the digital-to-analog conversion module 60, and is converted into an analog signal and then is input to the flash processing sub-array 2, the flash processing sub-array 2 performs analog vector-matrix multiplication operation (such as matrix multiplication operation) on the analog signal, the analog vector-matrix multiplication operation result is converted into a digital signal through the analog-to-digital conversion module 70, the arithmetic operation result 2 is obtained through the programmable arithmetic operation module 30, the digital signal is input to the input register file 50 after passing through the output register file 80, so as to finish the operation of the second layer of neural network, and so on until the last layer of neural network, wherein when the last layer of neural network is configured, the demultiplexer W at the output end of the output register file 80 is controlled, so that the output end of the output register file 80 is connected with the input end of the output interface module 90, and the operation result of the whole neural network is output to the external equipment through the output interface module 90.
It will be understood by those skilled in the art that when a neural network of a certain layer only needs arithmetic operation and does not need analog vector-matrix multiplication operation, the demultiplexer E output by the input register file 50 is controlled only when the control module 10 configures a circuit, so that the output end of the input register file 50 is communicated with the input end of the arithmetic operation module 30, and other configuration processes are not repeated.
According to the technical scheme, the software-defined memory integrated chip provided by the embodiment of the invention can flexibly combine the chip architecture according to actual application requirements by matching the control module with the flash memory processing subarrays and the programmable arithmetic operation units, can realize complex operation tasks, is suitable for various application occasions such as voice processing, image processing, machine processing, artificial Intelligence (AI) and the like, and can realize multiplexing of peripheral circuits such as ADC, DAC, registers, programmable arithmetic operation units and the like, thereby reducing the circuit area, adapting to the requirements of integration and miniaturization and effectively reducing the chip cost.
FIG. 9 is a third block diagram of a software-defined and computationally intensive chip according to an embodiment of the present invention. As shown in fig. 9, the input of the input register file 50 is connected to the output of the input interface module 40 and the output of the output register file 80 through a multiplexer (DEMUX) 100 on the basis of the software-definable memory unified chip shown in fig. 2 to selectively receive external input data from the input interface module 40 or data to be processed from the output register file 80. The control module 10 is coupled to the multiplexer (DEMUX) 100.
The digital-to-analog conversion module 60 is selectively coupled to the plurality of flash processing sub-arrays (20 1~20n) via a demultiplexer (MUX) 120. The control module 10 is connected to the demultiplexer Q.
The outputs of the flash processing sub-arrays (20 1~20n) are coupled to the analog-to-digital conversion module 70 via a multiplexer 130. The control module 10 is connected to the multiplexer B.
An input of the programmable arithmetic operation module 30 is connected to an output of the demultiplexer 110 and an output of the analog-to-digital conversion module 70 through a multiplexer 140.
A plurality of the programmable arithmetic operation units 30 1~30n of the programmable arithmetic operation module 30 are connected in series, each of the programmable arithmetic operation units including a selector 30a and an arithmetic operation subunit 30b.
The input end of the selector 30a is connected to a programmable arithmetic unit or the analog-to-digital conversion module 70, one output end is connected to the arithmetic operator unit 30b, the other output end and the output end of the arithmetic operator unit 30b are connected to the next programmable arithmetic operator unit or the output register file 80 through a selector, and the control end is connected to the control module 20.
The output of the output register file 80 is selectively coupled to either the input of the output interface module 90 or the input of the input register file 50 via a demultiplexer 150. The control module 20 is connected to the demultiplexer W, and controls the working state of the demultiplexer W according to the configuration information to select whether to output the output result of the output register file 80 to the output interface module 90 or to the input register file 50, and when the output result of the output register file 80 is selected to be output to the input register file 50, it means that a new round of operation processing will be performed on the output result.
The output end of the input register file 50 is selectively connected to the input end of the digital-to-analog conversion module 50 or the input end of the programmable arithmetic operation module 30 through a demultiplexer 110, the control module 10 is connected to the demultiplexer E, and controls the working state of the demultiplexer E according to configuration information to select whether the output end of the input register file 50 is connected to the input end of the digital-to-analog conversion module 50 or the input end of the programmable arithmetic operation module 30, wherein when the output end of the input register file 50 is connected to the input end of the digital-to-analog conversion module 50, it means that analog vector-matrix multiplication operation and arithmetic operation are performed on the output of the input register file 50, and when the output end of the input register file 50 is connected to the input end of the programmable arithmetic operation module 30, it means that certain arithmetic operation is performed on the output of the input register file 50, thereby further increasing the flexibility of the chip architecture.
It will be understood by those skilled in the art that when a neural network of a certain layer only needs arithmetic operation and does not need analog vector-matrix multiplication operation, the demultiplexer E output by the input register file 50 is controlled only when the control module 10 configures a circuit, so that the output end of the input register file 50 is communicated with the input end of the arithmetic operation module 30, and other configuration processes are not repeated.
In addition, as can be appreciated by those skilled in the art, when generating the configuration information according to the actual application requirement, the configuration information may be implemented according to a preset instruction-architecture correspondence table.
It should be noted that, when the configuration information is generated according to the actual application requirement, the number of the flash memory processing subarrays to be input and the scale of each flash memory processing subarray can be known, at this time, the dividing instruction of the flash memory processing array can be obtained according to the actual application requirement, and then the flash memory processing array is divided into a plurality of flash memory processing subarrays according to the dividing instruction, so as to correspond to the multiplication scale of a plurality of matrixes.
It will be understood by those skilled in the art that when the software-defined memory integrated chip according to the embodiment of the present invention is applied, when performing a plurality of period operations, the flash memory processing sub-arrays corresponding to the period may be programmed in each period, or before performing each period operation, each flash memory processing sub-array may be uniformly programmed according to a programming instruction.
FIG. 10 is a flowchart of a software defining method according to an embodiment of the present invention, where the software defining method is applied to the software-definable memory integrated chip. As shown in fig. 10, the software definition method includes the following:
step S1001, acquiring configuration information and finite state machine information.
The configuration information and the finite state machine information can be obtained through a compiling tool according to actual application requirements.
Step S1002, the plurality of flash memory processing subarrays, the plurality of programmable arithmetic operation units, the output register file and other circuit modules are configured according to the configuration information, so that the dynamic configuration of the chip architecture is realized.
Step S1003 is to control the operation timing of the flash memory processing array, the programmable arithmetic operation module, the output register file, and other circuit modules according to the finite state machine information.
Specifically, the plurality of flash memory processing sub-arrays and the plurality of programmable arithmetic units are configured in a combined manner according to the configuration information, the flash memory processing sub-arrays and the programmable arithmetic units which are put into operation are selected, and a combined pairing mode of the flash memory processing sub-arrays and the programmable arithmetic units is controlled to realize specific operation.
Because each of the plurality of programmable arithmetic operation units can realize one or more arithmetic operations, the plurality of programmable arithmetic operation units can be arranged and combined to form a plurality of compound operations, and the compound operations are matched with the plurality of flash memory processing subarrays, so that a plurality of combined configurations can be realized, and further, complex operation functions are realized.
The arithmetic operation comprises one or a combination of a plurality of multiplication operation, addition operation, subtraction operation, division operation, shift operation, activation function, maximum value taking, minimum value taking, average value taking, pooling and the like.
The analog vector-matrix multiplication operation realized by the flash memory processing subarray mainly comprises an analog vector-matrix multiplication operation.
The software definition method provided by the embodiment of the invention can carry out combined configuration on the flash memory processing subarrays and the programmable arithmetic operation units according to actual application requirements, realize dynamic configuration of a chip architecture, flexibly adjust the chip architecture according to actual tasks, realize multiplexing of peripheral circuits such as ADC, DAC, register, programmable arithmetic operation units and the like, further reduce circuit area, adapt to the requirements of integration and miniaturization, and effectively reduce the chip cost.
In an alternative embodiment, the step S1002 includes:
Step 1, dividing the flash memory processing array into a plurality of flash memory processing subarrays according to configuration information of the flash memory processing subarrays, and controlling working time sequences of the plurality of flash memory processing subarrays according to finite state machine information;
step 2, controlling the working state of the selector corresponding to each programmable arithmetic unit according to the configuration information of the programmable arithmetic unit, enabling the programmable arithmetic units to realize any combination operation, and controlling the working time sequence of the programmable arithmetic units according to the finite state machine information;
and 3, controlling the output register file 80 to output the data in the output register file 80 or to be processed data to the input register file 50 according to the configuration information of the output register file 80.
Based on the above, the present application provides a scenario in which the software-definable memory integrated chip is software-defined by using the software-defined method according to the embodiment of the present application to implement neural network operation, so as to describe the workflow of the software-defined method.
The neural network is used for realizing operation on the data P, and the neural network comprises R layers of neurons, each layer of neurons mainly realizes matrix multiplication operation, and the neurons of each layer are connected through a certain arithmetic operation (because the focus of the example is on describing a software definition method, the operation of the neural network is not deeply described herein, only the operation architecture is described, and the flow of the software definition method is exemplified and not limited by the invention).
For the neural network operation, the workflow of the software defined method is as follows:
(1) Configuration information and finite state machine information are obtained. The configuration information includes configuration information of R cycles, where R cycles correspond to operations (such as convolution, pooling, etc.) of R-layer neurons of the neural network, and each cycle corresponds to an operation of one-layer neuron. The configuration information of each period includes configuration information of the flash memory processing subarray, configuration information of the programmable arithmetic operation unit, configuration information of the output register file, configuration information of the input register file, and the like. The control module 10 divides the flash processing array 20 into R flash processing sub-arrays according to the configuration information, and each flash processing sub-array corresponds to a period, that is, each flash processing sub-array implements a layer of operation of the neural network.
(2) Controlling a multiplexer (DEMUX) a at the front end of the input register file 50 according to the configuration information and the finite state machine information of the first period, enabling the input interface module 40 to be communicated with the input register file 50, controlling a demultiplexer (MUX) Q at the front end of the flash memory processing array 20, enabling the digital-to-analog conversion module 60 to be communicated with the flash memory processing sub-array 1 corresponding to the first layer of the neural network, controlling a multiplexer B at the rear end of the flash memory processing array 20, enabling the flash memory processing sub-array 1 to be communicated with the analog-to-digital conversion module 70, controlling a selector of each programmable arithmetic operation unit of the programmable arithmetic operation module 30 to realize arithmetic operation 1 corresponding to the first layer of the neural network, and controlling a demultiplexer W at the output end of the output register file 80 and a multiplexer (DEMUX) a at the front end of the input register file 50 after the data P is input to the input register file 50, enabling the input end of the input register file 50 to be communicated with the output end of the output register file 80 to realize the configuration of the arithmetic architecture of the first period;
(3) According to the configuration information of the second period and the finite state machine information, a demultiplexer (MUX) Q at the front end of the flash memory processing array 20 is controlled, so that the digital-to-analog conversion module 60 is communicated with the flash memory processing sub-array 2 corresponding to the second layer of the neural network, a multiplexer B at the rear end of the flash memory processing array 20 is controlled, the flash memory processing sub-array 2 is communicated with the analog-to-digital conversion module 70, and a selector of each programmable arithmetic operation unit of the programmable arithmetic operation module 30 is controlled, so that the arithmetic operation 2 corresponding to the second layer of the neural network is realized, and the configuration of the operation architecture of the second period is realized. .., and so on, until the last layer of neural network configuration step, wherein when the last layer of neural network configuration is performed, controlling the demultiplexer W at the output end of the output register file 80, so that the output end of the output register file 80 is connected with the input end of the output interface module 90, and further, the operation result of the whole neural network is output to an external device through the output interface module 90.
It will be understood by those skilled in the art that when a neural network of a certain layer only needs arithmetic operation and does not need analog vector-matrix multiplication operation, the demultiplexer E output by the input register file 50 is controlled only when the circuit configuration is performed, so that the output end of the input register file 50 is communicated with the input end of the arithmetic operation module 30, and other configuration processes are not repeated.
The embodiment of the invention also provides electronic equipment, which can execute a neural network algorithm, wherein the neural network comprises a plurality of layers of neurons, each layer of neurons carries out corresponding operation according to the output result of the neuron on the upper layer, and the electronic equipment comprises the software-defined storage integrated chip.
The embodiment of the invention also provides another electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of the software definition method.
The electronic device may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
The embodiments of the present invention also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the software defined method described above.
While the principles and embodiments of the present invention have been described in detail in the foregoing application of the principles and embodiments of the present invention, the above examples are provided for the purpose of aiding in the understanding of the principles and concepts of the present invention and may be varied in many ways by those of ordinary skill in the art in light of the teachings of the present invention, and the above descriptions should not be construed as limiting the invention.

Claims (10)

1. A software-defined integrated memory chip is characterized by comprising an input interface module, an input register file, a digital-to-analog conversion module, a flash memory processing array, an analog-to-digital conversion module, an output register file, a programmable arithmetic operation module, an output interface module and a control module connected with the input interface module, the input register file, the digital-to-analog conversion module, the flash memory processing array, the analog-to-digital conversion module, the output register file, the programmable arithmetic operation module and the output interface module,
The flash memory processing array comprises a plurality of flash memory processing subarrays for respectively executing different analog vector-matrix multiplication operations;
The programmable arithmetic operation module includes a plurality of programmable arithmetic operation units for respectively implementing different arithmetic operations;
The control module configures the input interface module, the input register file, the digital-to-analog conversion module, the flash memory processing array, the analog-to-digital conversion module, the output register file, the programmable arithmetic operation module and the output interface module according to configuration information to realize dynamic configuration of circuit structures in a chip,
The control module controls the working time sequence of the input interface module, the input register file, the digital-to-analog conversion module, the flash memory processing array, the analog-to-digital conversion module, the output register file, the programmable arithmetic operation module and the output interface module according to the finite state machine information.
2. The software-definable memory chip of claim 1, wherein:
the input interface module is used for receiving external input data;
The input register file is connected with the input interface module and is used for storing the external input data or the data to be processed;
The input end of the digital-to-analog conversion module is connected with the input register file, the output end of the digital-to-analog conversion module is connected with the flash memory processing array, the digital-to-analog conversion module is used for converting the external input data or the data to be processed into analog signals and outputting the analog signals to the flash memory processing array, and the flash memory processing array performs analog vector-matrix multiplication operation on the analog signals and outputs operation results;
The input end of the analog-to-digital conversion module is connected with the flash memory processing array, the output end of the analog-to-digital conversion module is connected with the programmable arithmetic operation module and is used for converting the analog vector-matrix multiplication result into a digital signal and outputting the digital signal to the programmable arithmetic operation module, and the programmable arithmetic operation module carries out arithmetic operation on the digital signal and outputs an arithmetic operation result;
the output register file is connected with the programmable arithmetic operation module and the input register file, and is used for temporarily storing the arithmetic operation result and outputting the arithmetic operation result or outputting the arithmetic operation result to the input register file as the data to be processed;
And the output interface module is connected with the output register file, receives the output data of the output register file and outputs the output data outwards.
3. The software-definable memory chip of claim 2 wherein the output of the input register file is further coupled to the programmable arithmetic operation module.
4. The software-definable memory chip according to claim 2 wherein a plurality of said programmable arithmetic operation units are connected in series, each of said programmable arithmetic operation units comprising a demultiplexer, an arithmetic operator unit and a multiplexer;
The input end of the demultiplexer is connected with a programmable arithmetic operation unit or the analog-to-digital conversion module, one output end of the demultiplexer is connected with the arithmetic operation subunit, the other output end of the demultiplexer and the output end of the arithmetic operation subunit are connected with the next programmable arithmetic operation unit or the output register file through the multiplexer, and the control end of the demultiplexer is connected with the control module.
5. The software-definable memory chip according to claim 4, further comprising programming circuitry coupled to the control module, the programming circuitry coupled to the source, gate and/or substrate of each flash memory cell in the flash memory processing sub-array for regulating a threshold voltage of the flash memory cell;
the programming circuit comprises a voltage generating circuit for generating a programming voltage or an erasing voltage and a voltage control circuit for loading the programming voltage to a selected flash memory cell.
6. The software-definable memory chip of claim 1 or 2, further comprising:
And the row-column decoder is connected with the flash memory processing array and the control module and is used for carrying out row-column decoding on the flash memory processing array under the control of the control module.
7. The software-definable memory chip according to claim 4 wherein the configuration information includes configuration information for the flash memory processing sub-array, configuration information for the programmable arithmetic unit, configuration information for the digital to analog conversion module, configuration information for the analog to digital conversion module, configuration information for the input interface module, configuration information for the output interface module, configuration information for the input register file, and configuration information for the output register file,
The configuration of the input interface module, the input register file, the digital-to-analog conversion module, the flash memory processing array, the analog-to-digital conversion module, the output register file, the programmable arithmetic operation module and the output interface module according to configuration information realizes the dynamic configuration of a circuit structure in a chip, and the configuration method comprises the following steps:
Dividing the flash memory processing array into a plurality of flash memory processing subarrays according to the configuration information of the flash memory processing subarrays, and controlling the working time sequence of the plurality of flash memory processing subarrays;
Controlling the working states of a demultiplexer and a multiplexer corresponding to each programmable arithmetic unit according to the configuration information of the programmable arithmetic unit, so that a plurality of programmable arithmetic units realize arbitrary combination operation;
control according to the configuration information of the digital-to-analog conversion module the digital-to-analog conversion circuit which participates in the actual task is opened and closed;
controlling the switching state of an analog-digital conversion circuit participating in an actual task according to the configuration information of the analog-digital conversion module;
controlling the switching state of an input interface circuit participating in an actual task according to the configuration information of the input interface module;
Controlling the switching state of an output interface circuit participating in an actual task according to the configuration information of the output interface module;
Controlling the data to be stored in the input register to be derived from the input data of the input interface module or the data to be processed in the output register file according to the configuration information of the input register file;
and controlling the output register file to output the data in the output register file or to be processed data to the input register file according to the configuration information of the output register file.
8. A software-defined method of a software-definable memory chip, applied to the software-definable memory chip according to any one of claims 1 to 7, the software-defined method comprising:
acquiring the configuration information and the finite state machine information;
The input interface module, the input register file, the digital-to-analog conversion module, the flash memory processing array, the analog-to-digital conversion module, the output register file, the programmable arithmetic operation module and the output interface module are configured according to the configuration information, so that dynamic configuration of a circuit structure in a chip is realized;
And controlling the working time sequences of the input interface module, the input register file, the digital-to-analog conversion module, the flash memory processing array, the analog-to-digital conversion module, the output register file, the programmable arithmetic operation module and the output interface module according to the finite state machine information.
9. The software-defined method of a software-definable memory chip of claim 8, comprising:
Dividing the flash memory processing array into a plurality of flash memory processing subarrays according to the configuration information of the flash memory processing subarrays, and controlling the working time sequence of the plurality of flash memory processing subarrays according to the finite state machine information;
and controlling the working states of the selectors corresponding to the programmable arithmetic operation units according to the configuration information of the programmable arithmetic operation units, enabling the programmable arithmetic operation units to realize any combined operation, and controlling the working time sequences of the programmable arithmetic operation units according to the finite state machine information.
10. An electronic device comprising a software-definable memory chip according to any one of claims 1 to 7.
CN201910143132.2A 2019-02-26 2019-02-26 Software-definable storage-computing integrated chip and software-defining method thereof Active CN111611195B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910143132.2A CN111611195B (en) 2019-02-26 2019-02-26 Software-definable storage-computing integrated chip and software-defining method thereof
PCT/CN2019/081339 WO2020172951A1 (en) 2019-02-26 2019-04-03 Software-definable computing-in-memory chip and software definition method therefor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910143132.2A CN111611195B (en) 2019-02-26 2019-02-26 Software-definable storage-computing integrated chip and software-defining method thereof

Publications (2)

Publication Number Publication Date
CN111611195A CN111611195A (en) 2020-09-01
CN111611195B true CN111611195B (en) 2025-06-03

Family

ID=72202924

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910143132.2A Active CN111611195B (en) 2019-02-26 2019-02-26 Software-definable storage-computing integrated chip and software-defining method thereof

Country Status (2)

Country Link
CN (1) CN111611195B (en)
WO (1) WO2020172951A1 (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11392325B2 (en) * 2020-09-28 2022-07-19 Quanta Computer Inc. Method and system for parallel flash memory programming
CN112395247B (en) * 2020-11-18 2024-05-03 北京灵汐科技有限公司 Data processing method and storage-computing integrated chip
CN112306931B (en) * 2020-11-20 2023-07-04 广州安凯微电子股份有限公司 Method, system and storage medium for realizing usb host controller by software
CN112989273B (en) * 2021-02-06 2023-10-27 江南大学 Method for carrying out memory operation by utilizing complementary code coding
WO2022217575A1 (en) * 2021-04-16 2022-10-20 尼奥耐克索斯有限私人贸易公司 Low-loss computing circuit and operation method therefor
CN113918233B (en) * 2021-09-13 2024-12-20 山东产研鲲云人工智能研究院有限公司 AI chip control method, electronic device and AI chip
CN114242137A (en) * 2021-11-09 2022-03-25 厦门半导体工业技术研发有限公司 Configuration circuit and chip of array and configuration method of array
CN114564439A (en) * 2022-02-18 2022-05-31 北京航空航天大学 Self-adaptive configuration and storage integrated array SOC chip and configuration method
CN116670643A (en) * 2023-02-22 2023-08-29 声龙(新加坡)私人有限公司 Integrated memory chip and chip control method
CN117289896B (en) * 2023-11-20 2024-02-20 之江实验室 A basic computing device integrating storage and calculation

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108763163A (en) * 2018-08-02 2018-11-06 北京知存科技有限公司 Simulate vector-matrix multiplication operation circuit
CN108777155A (en) * 2018-08-02 2018-11-09 北京知存科技有限公司 flash memory chip
CN209388304U (en) * 2019-02-26 2019-09-13 北京知存科技有限公司 Can software definition deposit the integrated chip of calculation and electronic equipment

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4005979B2 (en) * 2004-04-28 2007-11-14 株式会社東芝 Programmable gate array and circuit function change control method
US8140738B2 (en) * 2006-07-20 2012-03-20 Stmicroelectronics Pvt. Ltd. Flash memory interface device
CN102306141B (en) * 2011-07-18 2015-04-08 清华大学 Method for describing configuration information of dynamic reconfigurable array
CN103390070A (en) * 2012-05-07 2013-11-13 北京大学深圳研究生院 Reconfigurable operator array structure
CN103631754B (en) * 2013-09-22 2016-07-06 中国科学院电子学研究所 Programmable signal processing unit
CN107430586B (en) * 2015-07-31 2018-08-21 吴国盛 Adaptive chip and configuration method
US11064019B2 (en) * 2016-09-14 2021-07-13 Advanced Micro Devices, Inc. Dynamic configuration of inter-chip and on-chip networks in cloud computing system
CN109379087B (en) * 2018-10-24 2022-03-29 江苏华存电子科技有限公司 Method for LDPC to modulate kernel coding and decoding rate according to error rate of flash memory component

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108763163A (en) * 2018-08-02 2018-11-06 北京知存科技有限公司 Simulate vector-matrix multiplication operation circuit
CN108777155A (en) * 2018-08-02 2018-11-09 北京知存科技有限公司 flash memory chip
CN209388304U (en) * 2019-02-26 2019-09-13 北京知存科技有限公司 Can software definition deposit the integrated chip of calculation and electronic equipment

Also Published As

Publication number Publication date
WO2020172951A1 (en) 2020-09-03
CN111611195A (en) 2020-09-01

Similar Documents

Publication Publication Date Title
CN111611195B (en) Software-definable storage-computing integrated chip and software-defining method thereof
US11335400B2 (en) Computing-in-memory chip and memory cell array structure
CN111611197B (en) Operation control method and device of software-definable storage and calculation integrated chip
CN111241028B (en) A digital-analog hybrid storage and computing integrated chip and computing device
CN209766043U (en) Storage and calculation integrated chip and storage unit array structure
CN109409510B (en) Neuron circuit, chip, system and method thereof, and storage medium
US11507808B2 (en) Multi-layer vector-matrix multiplication apparatus for a deep neural network
TW202526619A (en) Scalable array architecture for in-memory computing
CN211016545U (en) Memory computing chip based on NAND Flash, memory device and terminal
CN109086249B (en) Analog vector-matrix multiplication circuit
CN209388304U (en) Can software definition deposit the integrated chip of calculation and electronic equipment
CA3034597A1 (en) A homomorphic processing unit (hpu) for accelerating secure computations under homomorphic encryption
CN111128279A (en) Memory computing chip based on NAND Flash and control method thereof
US10693466B2 (en) Self-adaptive chip and configuration method
US20190050719A1 (en) Accelerating Neural Networks in Hardware Using Interconnected Crossbars
WO2023142883A1 (en) In-memory computing operation method, memristor neural network chip and storage medium
US20210287745A1 (en) Convolution operation method based on nor flash array
Bavandpour et al. Acortex: An energy-efficient multipurpose mixed-signal inference accelerator
US12124530B2 (en) Computational memory
CN112162947B (en) Output-configurable memory integrated chip and electronic device
Lu et al. A runtime reconfigurable design of compute-in-memory based hardware accelerator
Baek et al. A memristor-CMOS Braun multiplier array for arithmetic pipelining
CN112231631A (en) Assembly line control method for parallel work of storage and calculation integrated chip
CN111611196A (en) Storage and calculation integrated chip and DAC multiplexing control method thereof
CN111243648B (en) Flash memory unit, flash memory module and flash memory chip

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Country or region after: China

Address after: Room 213-175, 2nd Floor, Building 1, No. 180 Kecheng Street, Qiaosi Street, Linping District, Hangzhou City, Zhejiang Province, 311100

Applicant after: Hangzhou Zhicun Computing Technology Co.,Ltd.

Address before: 1416, shining building, No. 35, Xueyuan Road, Haidian District, Beijing 100083

Applicant before: BEIJING WITINMEM TECHNOLOGY Co.,Ltd.

Country or region before: China

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant