CN111667051B

CN111667051B - Neural network accelerator applicable to edge equipment and neural network acceleration calculation method

Info

Publication number: CN111667051B
Application number: CN202010462707.XA
Authority: CN
Inventors: 王维; 伍骏
Original assignee: Shanghai Saifang Technology Co ltd
Current assignee: Shanghai Saifang Technology Co ltd
Priority date: 2020-05-27
Filing date: 2020-05-27
Publication date: 2023-06-06
Anticipated expiration: 2040-05-27
Also published as: CN111667051A

Abstract

The invention discloses a neural network accelerator applicable to edge equipment and a neural network acceleration calculation method, and relates to the technical field of neural networks. The network accelerator comprises a configuration unit, a data buffer unit, a processing matrix assembly (PMs), a post-processing unit and a main controller, wherein the configuration unit writes characteristic parameters of different types of network layers into a register of the configuration unit to control the mapping of different network layer operation logics to processing matrix hardware, so that the multiplexing of the processing matrix assembly can be realized, namely, the operation acceleration of different types of network layers in a neural network is realized by using a hardware circuit without additional hardware resources; different types of network layers include standard convolutional layers and pooled network layers. The multi-multiplexing accelerator provided by the invention not only ensures the same function realization, but also has the advantages of less hardware resource consumption, higher hardware multiplexing rate, lower power consumption, high concurrency and high multiplexing characteristics, and strong structural expansibility.

Description

Neural network accelerator applicable to edge equipment and neural network acceleration calculation method

Technical Field

The invention belongs to the technical field of neural networks, and particularly relates to a neural network accelerator applicable to edge equipment and a neural network acceleration calculation method applicable to the edge equipment.

Background

As a computationally intensive application, neural network inference has the characteristics of complex computational mechanism, huge computational effort, and high computational delay, and thus, some dedicated neural network accelerators have been developed to accelerate the neural network inference process. Most of the existing neural network accelerators adopt a direct mapping mode to finish operation acceleration, namely, the operation logic of various network layers in the neural network is directly mapped to a specific functional circuit, such as a specific functional circuit for accelerating a standard convolution network layer, a specific functional circuit for accelerating a pooling network layer, a functional circuit for accelerating a fully-connected network layer and the like.

The accelerator structure spliced by the circuits with the specific functions improves the operation efficiency of the neural network, but consumes more hardware resources and has higher power consumption. On the other hand, the edge equipment generally has the characteristic of limited available electric energy, so that the neural network on the edge equipment deduces that an accelerator structure and an accelerating operation method with low power consumption, perfect functions and guaranteed operation efficiency are needed. The prior related technologies include the following patent numbers: CN110322001a, chinese patent No. entitled deep learning accelerator and method for accelerating deep learning operation; compared with the technology of the invention, the invention is a complete accelerator architecture, and the comparison file is a part of the accelerator, which is equivalent to a buffer module, a processing matrix component and an accumulation unit in the technology of the patent, and the problems to be solved are different, the technology of the patent solves the problems that the hardware resources are used, namely, one hardware is multiplexed to complete the operation of a plurality of network layers; and the comparison file is to reduce external data access by adding a buffer memory.

The technical problem to be solved by the invention is to provide a multi-multiplexing neural network inference accelerator which is suitable for edge equipment and has the advantages of less hardware resource use, high hardware multiplexing rate, low power consumption and high concurrency, and a neural network acceleration calculation method realized by using the accelerator.

Disclosure of Invention

The invention provides a neural network accelerator applicable to edge equipment and a neural network acceleration calculation method, and solves the problems.

In order to solve the technical problems, the invention is realized by the following technical scheme:

the neural network accelerator applicable to the edge equipment comprises:

configuration unit: comprises a register file for configuring the computation of different types of network layers;

a memory: the Static Random Access Memory (SRAM) is mainly used for receiving input feature graphs, weight coefficients, bias data and the like;

a data buffer unit: the data distribution unit is used for respectively caching the input characteristic diagram and the weight coefficient required by the operation according to the size pair of the processing matrix, and selecting proper cache data according to the step configuration information and distributing the cache data to a processing matrix assembly (PMs);

processing matrix components (PMs): the main functional unit for completing the operation of the input feature map and the weight coefficient comprises one or more Processing Matrixes (PM) for independent operation, wherein the Processing Matrixes (PM) are formed by a plurality of processing units (PE) according to certain topological interconnection and are used for realizing different operation logics, basic operation electronic devices such as a multiplier, an adder, a register, a comparator and the like are arranged in the processing units (PE) to realize multiplication, addition and comparison operation, and the processing units (PE) receive the input feature value and the weight coefficient as operands to calculate;

post-processing unit: the method comprises an output buffer, an accumulator, a normalization process, an activation process and a quantization process; the accumulator is used for supporting a network layer needing accumulation operation;

the main controller controls the mapping of the operation logic of different network layers to the processing matrix hardware by writing the characteristic parameters of different network layers into the register of the configuration unit, so that multiplexing of the processing matrix components can be realized, namely, the operation acceleration of different network layers in the neural network is realized by using a hardware circuit without additional hardware resources; different types of network layers include standard convolutional layers and pooled network layers.

Further, the registers of the configuration unit include two types:

the network layer type attribute configuration is that a standard convolution layer is required to be configured with the number of input feature map channels, the size of an input feature map, the number of output feature map channels, the size of convolution kernels, the moving step length of the convolution kernels, the initial address of the input feature map in a memory, the initial address of the convolution kernels in the memory, the initial address of offset coefficients in the memory and the like; if the pooling layer is needed to be configured with the number of channels of the input feature map, the number of channels of the output feature map, the pooling type, the size of the pooling filter, the moving step length of the pooling filter, the initial address of the input feature map in the memory, the initial address of the output feature map in the memory and the like;

the other type is control command configuration of neural network operation, such as operation starting, interconnection interface interruption and the like; the accelerator main controller writes network configuration parameters into the configuration unit according to the network layer number, the network type and the network connection sequence contained in the target neural network, so as to realize layer-by-layer calculation.

Further, the multiplier and adder are used for supporting standard convolution calculation and average value pooling operation, and the comparator is used for supporting maximum value pooling operation.

The neural network acceleration calculation method of the neural network accelerator suitable for the edge equipment comprises a method for accelerating the operation of a multiplexing processing matrix component on the standard convolution layer operation, and the method comprises the following steps:

s01, the accelerator main controller writes the network attribute information such as the channel number of the input feature map, the size of the input feature map, the channel number of the output feature map, the size of the convolution kernel, the moving step length of the convolution kernel, the initial address of the input feature map in the memory, the initial address of the convolution kernel in the memory, the initial address of the offset coefficient in the memory and the like into the configuration unit according to the characteristics of the current standard convolution layer, and sets a network type mark register as standard convolution, so that the operation can effectively distinguish the network type of the multiplexing processing matrix component;

s02, judging whether the feature map blocking operation is required to be executed by the input feature map of the standard convolution layer by the main controller according to the available capacity of the on-chip memory, writing blocking information into the configuration unit so that an accumulator in the post-processing unit can perform cache accumulation on the current result, and writing blocking or all data into the on-chip memory by the main controller;

s03, selecting data from an on-chip memory by a data buffer unit, respectively caching the input feature image and convolution kernel data into a feature image buffer and a weight buffer, configuring the feature image distributor and the weight distributor in the data distribution unit according to the number of processing matrixes, the size of the processing matrixes, the size of convolution kernels, the moving step length of the convolution kernels and the filling number in the processing matrix assembly, selecting the input feature image and the convolution kernels with the specified channel number from the data buffer unit, and distributing the input feature image and the convolution kernels to the processing matrix assembly;

s04, processing matrixes in the processing matrix assembly execute standard convolution operation in parallel, wherein the convolution operation mainly occupies a multiplier, an adder, a comparator and other electronic devices in a processing unit to be in a closed state;

s05, the processing matrix component outputs convolution results of all processing matrixes to an accumulation unit in parallel, the accumulation unit judges whether to buffer the current convolution result according to the block information and the input channel number information of the input feature map in the configuration unit, and prepares to accumulate with the convolution result of the next block or channel, if yes, the previous accumulation sum in the output buffer is taken out to accumulate with the convolution result of the current block or channel, and the previous accumulation sum is rewritten into the output buffer to wait for the next accumulation or direct output, if not, an initial output feature map is generated;

s06, the other processing units in the post-processing unit further process the initial output characteristic diagram to generate a final output characteristic diagram.

The neural network acceleration calculation method of the neural network accelerator suitable for the edge equipment comprises a method for operating the operation acceleration of the multiplexing processing matrix component on the pooling layer operation, and comprises the following steps:

the accelerator main controller writes the input characteristic map channel number, the output characteristic map channel number, the pooling type, the pooling filter size, the pooling filter moving step length, the input characteristic map initial address in the memory, the output characteristic map initial address in the memory and other attribute information into the configuration unit according to the current pooling layer characteristics, and sets the network type marking register as pooling, so that the operation can effectively distinguish the network type of the multiplexing matrix component;

t02 the main controller judges whether the input feature map is required to be segmented according to the available capacity of the on-chip memory, then the feature map is segmented or all feature maps are written into the on-chip memory, and the pooling operation is two-dimensional operation, and the pooling results of the feature maps of different input channels are irrelevant, so that the segmentation information is not required to be written into the configuration unit;

t03. as described above, the two-dimensional filter in the pooling operation is a unit filter with a coefficient of 1, so the data buffer unit only needs to select the input feature map data from the memory and buffer the input feature map data in the feature map buffer, and then the feature map distributor in the data distribution unit distributes an input feature map for each processing matrix from the data buffer unit according to the number of processing matrices in the processing matrix assembly, the size of the processing matrix, the moving step length of the pooled filter and the configuration of the filling number;

t04. processing matrixes in the processing matrix component execute pooling operation in parallel, the pooling operation occupies different operation devices in the processing unit according to the pooling layer type, namely an average pooling occupied multiplier and an adder, a maximum pooling occupied comparator and other electronic devices are in a closed state;

t05. the processing matrix component outputs the results of each processing matrix in parallel to the buffer of the post-processing unit, and generates a final output characteristic diagram after normalization, quantization or activation.

Compared with the prior art, the invention has the following beneficial effects:

1. in the invention, the processing matrix component comprises one or more Processing Matrixes (PM), the processing matrixes are formed by interconnecting a plurality of processing units (PE) according to a certain topology, and basic operation electronic devices such as multipliers, adders, registers, comparators and the like are usually contained in the processing units, so that operations such as multiplication, addition, comparison and the like can be realized; the processing matrix can realize different operation logics; the processing matrix components can be multiplexed by writing the characteristic parameters of different types of network layers into the configuration unit register to control the mapping of the operation logic of different network layers to the processing matrix hardware, namely, the operation acceleration of different types of network layers in the neural network is realized by using a hardware circuit without additional hardware resources; compared with a neural network accelerator adopting a direct mapping mode and realizing a type of network by a special circuit with a specific function, the multi-multiplexing accelerator provided by the invention not only ensures the same function realization, but also has the advantages of less hardware resource consumption, higher hardware multiplexing rate and lower power consumption.

2. Any Processing Matrix (PM) in the processing matrix assembly independently operates and does not interfere with each other, so that the calculation parallelism of the accelerator can be increased by increasing the number of the processing matrices, the operation efficiency is improved, and the whole architecture of the accelerator is not required to be modified; the number and the size of the input feature graphs are not limited by each processing matrix, namely the processing matrix can process a network layer with the number and the size of the input feature graphs being larger than the size of the processing matrix, so that the accelerator provided by the invention has high concurrency and high multiplexing characteristics and has strong structural expansibility.

Of course, it is not necessary for any one product to practice the invention to achieve all of the advantages set forth above at the same time.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is an internal block diagram of a neural network accelerator for edge devices in accordance with an embodiment of the present invention;

FIG. 2 is a schematic diagram of a processing matrix assembly according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a single processing matrix configuration in accordance with an embodiment of the present invention;

FIG. 4 is a step diagram of a method for accelerating operation of standard convolutional layer operations using the neural network accelerator multiplexing matrix component according to an embodiment of the present invention;

FIG. 5 is a diagram of the internal structure of a processing matrix for accelerating standard convolution operations based on the method for accelerating standard convolution operations according to an embodiment of the present invention;

FIG. 6 is a step diagram of a method for accelerating operations on a pooled layer by using the neural network accelerator multiplexing matrix component according to an embodiment of the present invention;

FIG. 7 is a diagram of the internal structure of a processing matrix for accelerating the pooling operation based on the method for accelerating the operation of the pooling layer operation according to the present invention;

FIG. 8 is a schematic diagram of a 3x3 standard convolution operation;

FIG. 9 is a 2x2 max pooling operation diagram;

FIG. 10 is a schematic illustration of a computational flow of a given neural network layer;

FIG. 11 is a schematic diagram illustrating a calculation process for a network layer input feature map number greater than the processing matrix size in accordance with an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The standard convolution layer is the most important network layer in the deep neural network, and the acceleration effect on the standard convolution operation is also a basic index for evaluating the performance of a special neural network accelerator. The standard convolution layer operation is a convolution operation of the input feature map and the convolution kernel. The input feature maps are typically three-dimensional structures, i.e. each input feature map has a two-dimensional size HxW, and for an image type the input feature map represents a picture consisting of H pixels in the vertical direction and W pixels in the horizontal direction. The third dimension is the channel direction, i.e. the size is the number C of HxW input feature maps, each representing the input information of one channel. Convolving the three-dimensional input feature map, the convolution kernel (or filter) also having a three-dimensional structure CxVxH, C representing the number of channels of the input feature map, vxW representing the two-dimensional dimensions of the convolution kernel, such as 1x1,3x3,5x5, and the values within the two-dimensional convolution kernels being not all the same, the convolution kernel on each channel being referred to as a convolution kernel component, and the two-dimensional dimensions of the convolution kernel components typically being smaller than the two-dimensional dimensions of the feature map; the input feature map is convolved by convolution cores of different sizes, so that features of different granularities and different types can be obtained. The output of the standard convolution layer is called an output feature map, which is the convolution result of the input feature map and the convolution kernel, and is also three-dimensional, i.e. each output channel contains a two-dimensional feature map. The number of convolution kernels matches the number of channels of the output feature map. Therefore, checking the C channel input signature for standard convolution using N convolutions of size CxVxH will produce N channel output signatures. When a three-dimensional convolution kernel is convolved with a multi-channel input feature map, two processes are involved: firstly, carrying out multiply-add operation on a two-dimensional convolution kernel on each channel and characteristic points falling into the size range of the convolution kernel on an input characteristic map, and then accumulating multiplication-add results of all channels to obtain a characteristic point of an output characteristic map; secondly, the convolution kernel components on all the input channels synchronously slide once along the horizontal or vertical direction according to the appointed step length, the operation steps of the first process are repeated to obtain a second characteristic point of the output characteristic diagram, and the like, so that a multi-channel output characteristic diagram is finally obtained; as shown in fig. 8, a 3x3 standard convolution operation is described.

The pooled network layer is another typical network layer that is widely used. The pooling layer arithmetic logic is similar to standard convolution, and slides on an input characteristic graph along a horizontal or vertical direction according to a designated step length through a two-dimensional filter with variable size, but unlike a two-dimensional convolution kernel, the pooling two-dimensional filter is a unit filter with coefficients of all 1, and can carry out arithmetic on characteristic points falling in the size range of the filter, such as average value or maximum value, so common pooling layer types comprise a maximum value pooling layer and an average value pooling layer; as shown in fig. 9, the 2x2 maximum value pooling operation is performed, and it can be seen from the figure that the pooling filter is slid once in the horizontal direction by a predetermined step S (s=2 in the figure) after one calculation, and the next operation is prepared. On the other hand, the standard convolution operation is three-dimensional, while the pooling operation is two-dimensional, i.e. there is no correlation between the pooled calculation results of the feature maps on the different input channels. Due to the difference of the two calculation dimensions, most of the existing neural network accelerators adopt a mode of respectively designing specific functional circuits to accelerate standard convolution layer operation and pooling layer operation.

From the hardware implementation perspective, both the standard convolution layer and the pooling layer described above and other types of network layers conform to the general calculation flow of the neural network layer shown in fig. 10, that is, a processing matrix component including a large number of multiply-accumulate units (MACs) receives an input feature map and a weight coefficient from a memory, and calculates the two; and performing a series of post-processing, such as accumulation, regularization, activation function, quantization and the like, on the operation result of the processing matrix assembly to finally obtain an output characteristic diagram.

For the above hardware process flow, fig. 1 depicts one embodiment of a neural network accelerator architecture proposed by the present invention; the configuration unit contains a register file for configuring the computation of the different types of network layers. The configuration registers can be divided into two types, one type is network layer type attribute configuration, such as standard convolution layer, the number of channels of an input characteristic diagram, the size of an input characteristic diagram, the number of channels of an output characteristic diagram, the size of a convolution kernel, the moving step length of the convolution kernel, the initial address of the input characteristic diagram in a memory, the initial address of the convolution kernel in the memory, the initial address of a bias coefficient in the memory and the like are required to be configured; if the pooling layer is needed to be configured with the number of channels of the input feature map, the number of channels of the output feature map, the pooling type, the size of the pooling filter, the moving step length of the pooling filter, the initial address of the input feature map in the memory, the initial address of the output feature map in the memory and the like; the other type is control command configuration of neural network operation, such as start operation, interconnection interface interrupt, etc. The accelerator main controller writes network configuration parameters into the configuration unit according to the network layer number, the network type and the network connection sequence contained in the target neural network, so as to realize layer-by-layer calculation.

The memory in the embodiment of fig. 1 is typically a Static Random Access Memory (SRAM) and is mainly used for receiving input feature maps, weight coefficients, bias data, and the like.

The data buffer unit in the embodiment of fig. 1 buffers the input feature map and the weight coefficient required by the operation according to the size pair of the processing matrix, and the data distribution unit selects proper buffered data according to the step configuration information to distribute the buffered data to the processing matrix component.

The processing matrix elements (PMs) in the embodiment of fig. 1 are the main functional units that perform the operations of the input feature map and the weight coefficients. The processing matrix assembly is typically made up of a plurality of Processing Matrices (PMs).

Fig. 2 depicts the constituent structure of the processing matrix assembly in the above embodiment. For a neural network layer, the processing matrices in the processing matrix assembly are calculated in parallel and run independently of each other, so that the greater the number of processing matrices in the processing matrix assembly, the higher the parallelism of the accelerator and the higher the operation efficiency. The Processing Matrix (PM) is formed by connecting a plurality of processing units (PE) according to a certain topological structure, and the processing units receive input characteristic values and weight coefficients as operands to calculate.

Fig. 3 depicts the internal composition of the processing matrix in some embodiments. Obviously, the larger the size of the processing matrix, i.e. the larger the number of processing units, the greater the parallelism of the processing matrix, and the higher the operation efficiency. The internal structure of the processing unit (PE) depends on the network type of the selected multiplexing processing matrix, and the standard convolution layer and pooling layer multiplexing processing matrix components are selected to accelerate operation, so that the processing unit at least comprises a multiplier, an adder and a comparator, wherein the multiplier and the adder are used for supporting standard convolution calculation and average pooling operation, and the comparator is used for supporting maximum pooling operation. The basis of multiplexing the same hardware by different types of network layers for operation is that the operation characteristics are similar, if the operation characteristics are satisfied or can be converted into vector operation, then the different types of network layers have the same operation rule as far as the operation itself is concerned, and the difference is only that the operation operators are different and the operation numbers are different. In the accelerator structure proposed by the invention, the operation rules determine the structure of the Processing Matrix (PM), and the fusion of different operators is embodied on the internal structure of the processing unit (PE).

The post-processing unit in the embodiment of fig. 1 includes an output buffer, an accumulator, a normalization process, an activation process, a quantization process, and the like, where the accumulator is mainly used to support a network layer that needs accumulation operation, such as a standard convolution layer, and in particular, support for an input feature map partitioning operation. For networks such as Convolutional Neural Networks (CNNs), the size of the input feature map is typically large, e.g., 1920x1080,4k, etc., which results in a large amount of data in the intermediate layer network, so it is difficult to store all the input and output data of each network layer of the neural network in the on-chip memory. On the other hand, the size of the processing matrix component may not meet the calculation requirements of some large-scale input feature graphs, and the large-scale input feature graphs need to be segmented. To solve these contradictions, the input feature map may be subjected to block processing, convolution operation in batches, and integration processing of the sum of all the blocks in the accumulator. FIG. 11 depicts a flow of computation of a network layer input feature map number greater than a processing matrix size, in accordance with some embodiments.

As shown in fig. 4, based on the neural network accelerator, the method for accelerating operation of the matrix component on the standard convolution layer in the embodiment includes the following steps:

s03, selecting data from an on-chip memory by a data buffer unit, respectively caching input feature images and convolution kernel data into a feature image buffer and a weight buffer, configuring the feature image distributor and the weight distributor in the data distribution unit according to the number of processing matrixes, the size of the processing matrixes, the size of convolution kernels, the moving step length of the convolution kernels and the filling number in a processing matrix assembly, selecting input feature images and convolution kernels with the number of specified channels from the data buffer unit, and distributing the input feature images and the convolution kernels to the processing matrix assembly, wherein fig. 8 describes a processing matrix structure of standard convolution in the embodiment;

As can be seen from the 3x3 standard convolution operation shown in fig. 8, the input is an input feature map of C channels, and N Cx3x3 convolution kernels, each having C3 x3 components matching the feature map of C input channels, correspond to N output feature maps. And finally generating an output characteristic diagram for each column corresponding to the convolution kernel. Whereas in the horizontal direction, each row of input channel feature maps in fig. 1 corresponds to N convolution kernel components. Thus, when there are more than 1 processing matrix in the processing matrix assembly, the multi-channel input feature map of a single processing matrix shown in fig. 5 can be shared with other processing matrices, which not only indicates that the structure of the processing matrix in the above-described accelerator matches the standard convolution operation feature, but also improves the reuse rate of the input feature map. On the other hand, assuming that each row of the processing matrix is formed by PW processing units and PH rows are shared, one processing matrix can process PW/3 x3 standard convolution operations at maximum, and assuming that the processing matrix assembly is formed by K processing matrices, the whole processing matrix assembly can compute (PWxK)/3 x3 standard convolution operations at maximum, which greatly improves the computation efficiency of the neural network accelerator provided by the invention.

As shown in fig. 6, based on the neural network accelerator, the method for accelerating operation of the processing matrix component on the pooling layer operation in the embodiment includes the following steps:

the method comprises the steps that T01, an accelerator main controller writes attribute information such as the channel number of an input feature map, the channel number of an output feature map, the pooling type, the size of a pooling filter, the moving step length of the pooling filter, the initial address of the input feature map in a memory, the initial address of the output feature map in the memory and the like into a configuration unit according to the current pooling layer characteristics, and sets a network type mark register into pooling, so that the operation can effectively distinguish the network type of a multiplexing processing matrix component;

t02, the main controller judges whether the input feature images need to be segmented according to the available capacity of the on-chip memory, then the feature images are segmented or all the feature images are written into the on-chip memory, and the pooling operation is two-dimensional operation, and the pooling results of the feature images of different input channels are irrelevant, so that the segmentation information does not need to be written into the configuration unit;

since the two-dimensional filter in the pooling operation is a unit filter with a coefficient of 1 as described above, the data buffer unit only needs to select the input feature map data from the memory and buffer the input feature map data in the feature map buffer, and then the feature map distributor in the data distribution unit distributes an input feature map for each processing matrix from the data buffer unit according to the number of processing matrices in the processing matrix assembly, the size of the processing matrix, the moving step length of the pooling filter and the configuration of the filling number, and fig. 9 illustrates the pooled processing matrix structure of the present embodiment;

t04, processing matrixes in the processing matrix component execute pooling operation in parallel, the pooling operation occupies different operation devices in the processing unit according to the pooling layer type, namely an average pooling occupied multiplier and an adder, a maximum pooling occupied comparator and other electronic devices are in a closed state;

and T05, the processing matrix component outputs the result of each processing matrix in parallel to a buffer of the post-processing unit, and generates a final output characteristic diagram after standardization, quantification or activation.

As previously described, the pooling layer operation and the standard convolution layer operation can achieve acceleration on the same processing matrix component because both have similar operation rules. Comparing fig. 8 and fig. 9, it can be found that, on each input channel in fig. 8, the convolution operation of the characteristic map and K convolution kernel components on the channel is similar to the maximum value operation rule of shifting the pooling filter of fig. 9 in the horizontal direction K times, the difference is that the operator of the convolution operation is multiply-add, the pooling operation is comparison, the difference is that the pooling filter is a unit filter with a coefficient of 1, and the coefficients of the convolution kernels are not identical. Therefore, in fig. 6, assuming that the processing matrix is composed of PW processing units in the horizontal direction and PH processing units in the vertical direction, the standard convolution is to multiply and add 3 rows of feature points corresponding to PH/3 input channel feature maps with PW/3 convolution kernels; in fig. 7, assuming that the processing matrix size is PWxPH, the pooling operation is a comparison operation of PH/2 line feature points of 1 input channel feature map and PW/2 unit filters. In addition, one processing matrix can process a plurality of pooled filter operations of an input characteristic diagram in parallel, and when the processing matrix component consists of a plurality of processing matrices, the calculation parallelism can be further increased, and the calculation efficiency is improved.

Key technical points

(1) A special neural network accelerator is provided, which comprises a processing matrix assembly (PMs) formed by a plurality of processing matrixes, wherein each Processing Matrix (PM) is internally formed by a plurality of processing units (PE), and the processing units (PE) internally comprise a plurality of types of arithmetic devices, such as multipliers, adders, registers, multiplexers and the like, so as to support different types of logic operations. By writing the neural network type and the attribute characteristics into the accelerator configuration unit, operand selection and operation device selection of the processing matrix component can be realized, and acceleration operation of different network layers on the same hardware is completed.

(2) Based on the special neural network accelerator, a hardware multiplexing acceleration method for a standard convolutional network layer and a pooling network layer is provided.

The preferred embodiments of the invention disclosed above are intended only to assist in the explanation of the invention. The preferred embodiments are not exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. The invention is limited only by the claims and the full scope and equivalents thereof.

Claims

1. Neural network accelerator suitable for edge device, characterized by comprising:

a memory: the Static Random Access Memory (SRAM) is mainly used for receiving an input characteristic diagram, weight coefficients and bias data;

processing matrix components (PMs): the main functional unit for completing the operation of the input feature map and the weight coefficient comprises one or more Processing Matrixes (PM) for independent operation, wherein the Processing Matrixes (PM) are formed by a plurality of processing units (PE) according to certain topological interconnection and are used for realizing different operation logics, basic operation electronic devices comprising multipliers, adders, registers and comparators are arranged in the processing units (PE) to realize multiplication, addition and comparison operation, and the processing units (PE) receive the input feature values and the weight coefficient as operands to calculate;

2. The edge device-compliant neural network accelerator of claim 1, wherein the registers of the configuration unit comprise two types:

the network layer type attribute configuration comprises a standard convolution layer, wherein the number of channels of an input feature map, the size of the input feature map, the number of channels of an output feature map, the size of convolution kernels, the moving step length of the convolution kernels, the initial address of the input feature map in a memory, the initial address of the convolution kernels in the memory, the initial address of offset coefficients in the memory and the like are required to be configured; the pooling layer is used for configuring the number of channels of the input feature map, the number of channels of the output feature map, the pooling type, the size of the pooling filter, the moving step length of the pooling filter, the initial address of the input feature map in the memory and the initial address of the output feature map in the memory;

the other type is control command configuration of neural network operation, including operation starting and interconnection interface interruption; the accelerator main controller writes network configuration parameters into the configuration unit according to the network layer number, the network type and the network connection sequence contained in the target neural network, so as to realize layer-by-layer calculation.

3. The edge-device-compliant neural network accelerator of claim 1, wherein the multiplier and adder are configured to support standard convolution calculations and average pooling operations, and the comparator is configured to support maximum pooling operations.

4. A neural network acceleration calculation method for a neural network accelerator for an edge device according to any one of claims 1-3, comprising a method of accelerating operation of a multiplexing processing matrix component on standard convolutional layer operations, comprising the steps of:

s01, the accelerator main controller writes network attribute information comprising the number of channels of an input feature map, the size of the input feature map, the number of channels of an output feature map, the size of a convolution kernel, the moving step length of the convolution kernel, the initial address of the input feature map in a memory, the initial address of the convolution kernel in the memory and the initial address of a bias coefficient in the memory into a configuration unit according to the characteristics of a current standard convolution layer, and sets a network type mark register as standard convolution, so that the operation can effectively distinguish the network type of a multiplexing processing matrix component;

s04, processing matrixes in the processing matrix assembly execute standard convolution operation in parallel, wherein the convolution operation mainly occupies a multiplier and an adder in a processing unit, and other electronic devices including a comparator are in a closed state;

5. A neural network acceleration calculation method for a neural network accelerator for an edge device according to any one of claims 1-3, comprising a method of multiplexing processing matrix components to accelerate operations on pooled layer operations, comprising the steps of:

the accelerator main controller writes attribute information comprising the number of channels of the input feature map, the number of channels of the output feature map, the pooling type, the pooling filter size, the moving step length of the pooling filter, the initial address of the input feature map in the memory and the initial address of the output feature map in the memory into the configuration unit according to the current pooling layer characteristics, and sets a network type mark register as pooling, so that the operation can effectively distinguish the network types of the multiplexing matrix component;