WO2019196223A1

WO2019196223A1 - Acceleration method and accelerator used for convolutional neural network

Info

Publication number: WO2019196223A1
Application number: PCT/CN2018/095365
Authority: WO
Inventors: 刘勇攀; 袁哲; 岳金山; 杨华中; 李学清; 王智博
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2018-04-08
Filing date: 2018-07-12
Publication date: 2019-10-17
Anticipated expiration: 2020-10-08
Also published as: CN108510063B; CN108510063A

Abstract

The present invention provides an acceleration method and accelerator used for a convolutional neural network, the method comprising: S1, for any layer of the convolutional neural network, calculating the density of various feature maps output by this layer; S2, comparing the density of the various feature maps output by this layer to a plurality of preset thresholds, and performing, according to the comparison results, sparse coding to the various feature maps, different comparison results corresponding to different sparse coding schemes; and S3, performing, on the basis of the convolution layer of a layer following this layer, convolution to the various sparse coded feature maps and various pre-sparse coded convolution kernels in the convolutional neural network. The present invention reduces the calculated amount of the convolution operation in the convolutional neural network and improves operation speed.

Description

Acceleration method and accelerator applied to convolutional neural network

交叉引用cross reference

本申请引用于2018年04月08日提交的专利名称为“一种应用于卷积神经网络的加速方法和加速器”的第2018103065773号中国专利申请，其通过引用被全部并入本申请。The present application is hereby incorporated by reference in its entirety in its entirety in its entirety in its entirety in the the the the the the the the the the the the the

Technical field

本发明属于运算优化技术领域，更具体地，涉及一种应用于卷积神经网络的加速方法和加速器。The invention belongs to the technical field of operation optimization, and more particularly to an acceleration method and an accelerator applied to a convolutional neural network.

Background technique

卷积神经网络(Convolutional Neural Network,CNN)是一种前馈神经网络，其人工神经元可以响应一部分覆盖范围内的周围单元，适用于对大型图像的处理。卷积神经网络广泛应用于图像识别、语音识别等领域，但计算量非常大。The Convolutional Neural Network (CNN) is a feedforward neural network whose artificial neurons can respond to surrounding units in a part of the coverage and is suitable for processing large images. Convolutional neural networks are widely used in image recognition, speech recognition and other fields, but the amount of calculation is very large.

由于卷积神经网络中的激活函数ReLU(Rectified linear unit，修正线性单元)会造成大量稀疏的特征图(feature map)；同时，采用剪枝等方法训练卷积神经网络会造成大量稀疏的权重数据(weight data)。利用特征图和权重数据的稀疏性可以大幅提高卷积神经网络的计算效率。目前，已有很多方法基于卷积神经网络中特征图和权重数据的稀疏性提高计算速度。这些方法大致可以分为两类，一类着眼于跳过0值。例如有的方法去除输入中的0值，从而减少输入为0的无效计算。另一类采取忽略零值的方法。例如有的方法在输入数据为0时，不执行乘法操作，从而减少运算。但这些方法都着眼于处理稀疏神经网络本身，假定神经网络稀疏是前提。然而实际上卷积神经网络中各层输出特征图可能是稀疏的，可能是非稀疏的。实际运用中卷积神经网络各层的权重数据和特征图的稠密度一般在5％-90％之间分布。Because the activation function ReLU (Rectified Linear Unit) in the convolutional neural network will cause a large number of sparse feature maps; at the same time, training the convolutional neural network by pruning and other methods will result in a large amount of sparse weight data. (weight data). The use of feature maps and the sparseness of weight data can greatly improve the computational efficiency of convolutional neural networks. At present, there are many methods to improve the calculation speed based on the sparsity of feature maps and weight data in convolutional neural networks. These methods can be roughly divided into two categories, one focusing on skipping the 0 value. For example, some methods remove the 0 value in the input, thereby reducing the invalid calculation of input 0. The other type takes a method of ignoring zero values. For example, some methods do not perform a multiplication operation when the input data is 0, thereby reducing the operation. However, these methods are all focused on dealing with the sparse neural network itself, assuming that the neural network is sparse. However, in fact, the output feature maps of each layer in the convolutional neural network may be sparse and may be non-sparse. In practice, the weight data of each layer of the convolutional neural network and the density of the feature map are generally distributed between 5% and 90%.

稀疏矩阵是指数值为0的元素数目远远多于非0元素的数目，并且非0元素分布没有规律的矩阵。现有技术中一方面只能对稀疏的卷积神经网络进行处理，在卷积神经网络不是稀疏的情况下计算量很大，运算速度低；另一方面现有技术只能处理卷积神经中权重数据或特征图是稀疏的情况，不能处理权重数据和特征图都是稀疏的情况。A sparse matrix is a matrix in which the number of elements with an index value of 0 is much larger than the number of non-zero elements, and the non-zero element distribution has no regular matrix. In the prior art, only a sparse convolutional neural network can be processed on the one hand, and the calculation amount is large and the operation speed is low in the case where the convolutional neural network is not sparse; on the other hand, the prior art can only process the convolutional nerve. The weight data or feature map is sparse and cannot handle the case where the weight data and the feature map are sparse.

发明内容Summary of the invention

为克服上述卷积神经网络运算速度低的问题或者至少部分地解决上述问题，本发明提供了一种应用于卷积神经网络的加速方法和加速器。In order to overcome the above problem of low computation speed of the convolutional neural network or at least partially solve the above problems, the present invention provides an acceleration method and an accelerator applied to a convolutional neural network.

根据本发明的第一方面，提供一种应用于卷积神经网络的加速方法，包括：According to a first aspect of the present invention, there is provided an acceleration method applied to a convolutional neural network, comprising:

S1，对于卷积神经网络中的任一层，分别计算该层输出的各特征图的稠密度；S1, for any layer in the convolutional neural network, respectively calculating the density of each feature map outputted by the layer;

S2，将该层输出的各所述特征图的稠密度与多个预设阈值进行比较，根据比较结果将各所述特征图进行稀疏编码；其中，不同的比较结果对应不同的稀疏编码方式；S2, comparing the density of each of the feature maps outputted by the layer with a plurality of preset thresholds, and performing sparse coding on each of the feature maps according to the comparison result; wherein different comparison results correspond to different sparse coding modes;

S3，基于该层下一层的卷积层对稀疏编码后的各所述特征图和预先稀疏编码的所述卷积神经网络中的各卷积核进行卷积。S3, convolving each of the feature maps after sparse coding and each convolution kernel in the pre-sparse coded convolutional neural network based on a convolution layer of a layer below the layer.

具体地，所述步骤S1具体包括：Specifically, the step S1 specifically includes:

对于任一所述特征图，统计该特征图中非0元素的个数和该特征图中所有元素的总个数；For any of the feature maps, the number of non-zero elements in the feature map and the total number of all elements in the feature map are counted;

将该特征图中非0元素的个数与该特征图中所有元素的总个数之间的比值作为该特征图的稠密度。The ratio between the number of non-zero elements in the feature map and the total number of all elements in the feature map is taken as the density of the feature map.

具体地，所述预设阈值包括第一预设阈值和第二预设阈值；其中，所述第一预设阈值小于所述第二预设阈值；Specifically, the preset threshold includes a first preset threshold and a second preset threshold; wherein the first preset threshold is smaller than the second preset threshold;

相应地，所述步骤S2具体包括：Correspondingly, the step S2 specifically includes:

若各所述特征图的稠密度小于所述第一预设阈值，则将各所述特征图编码为稀疏矩阵存储格式；If the density of each of the feature maps is less than the first preset threshold, encoding each of the feature maps into a sparse matrix storage format;

若各所述特征图的稠密度大于或等于所述第一预设阈值，且小于所述第二预设阈值，则将各所述特征图中的0元素进行标记；If the density of each of the feature maps is greater than or equal to the first preset threshold and less than the second preset threshold, marking 0 elements in each of the feature maps;

若各所述特征图的稠密度大于或等于所述第二预设阈值，则不对各所述特征图进行稀疏编码。If the density of each of the feature maps is greater than or equal to the second predetermined threshold, each of the feature maps is not sparsely encoded.

具体地，所述步骤S3之前还包括：Specifically, before the step S3, the method further includes:

计算训练好的卷积网络中各卷积核的稠密度；Calculating the density of each convolution kernel in the trained convolutional network;

若各所述卷积核的稠密度小于所述第一预设阈值，则将各所述卷积核编码为稀疏矩阵存储格式；If each of the convolution kernels has a density less than the first predetermined threshold, encoding each of the convolution kernels into a sparse matrix storage format;

若各所述卷积核的稠密度大于或等于所述第一预设阈值，且小于所述第二预设阈值，则将各所述卷积核中的0元素进行标记；If the density of each of the convolution kernels is greater than or equal to the first predetermined threshold and less than the second predetermined threshold, marking 0 elements in each of the convolution kernels;

若各所述卷积核的稠密度大于或等于所述第二预设阈值，则不对各所述卷积核进行稀疏编码。If the density of each of the convolution kernels is greater than or equal to the second predetermined threshold, each of the convolution kernels is not sparsely encoded.

具体地，所述步骤S3具体包括：Specifically, the step S3 specifically includes:

当各所述特征图或各所述卷积核中存在所述标记时，对各所述特征图或各所述卷积核中标记对应的元素不进行计算。When the mark exists in each of the feature maps or each of the convolution kernels, no calculation is performed on each of the feature maps or the elements corresponding to the marks in each of the convolution kernels.

根据本发明另一方面提供一种应用于卷积神经网络的加速器，包括：神经网络计算阵列模块和动态稀疏调整模块；According to another aspect of the present invention, an accelerator for a convolutional neural network is provided, comprising: a neural network computing array module and a dynamic sparse adjustment module;

其中，所述动态稀疏调整模块用于计算卷积神经网络各层输出的各特征图的稠密度，将各所述特征图的稠密度与多个预设阈值进行比较，根据比较结果对各所述特征图进行稀疏编码；其中，不同的比较结果对应的不同稀疏编码方式；The dynamic sparse adjustment module is configured to calculate the density of each feature image outputted by each layer of the convolutional neural network, compare the density of each of the feature maps with a plurality of preset thresholds, and compare the results according to the comparison results. The feature map is subjected to sparse coding; wherein different comparison results correspond to different sparse coding modes;

所述神经网络计算阵列模块用于对稀疏编码后的各所述特征图和预先稀疏编码的所述卷积神经网络中的各卷积核进行卷积操作。The neural network computing array module is configured to perform a convolution operation on each of the sparsely encoded feature maps and the pre-sparsely encoded convolutional neural networks.

具体地，所述动态稀疏调整模块包括线上稠密度识别模块、输出临时寄存模块、动态编码模块和动态稀疏控制模块；Specifically, the dynamic sparse adjustment module includes an online density identification module, an output temporary registration module, a dynamic coding module, and a dynamic sparse control module;

其中，所述线上稠密度识别模块用于对于任一所述特征图，统计该特征图中0元素的个数和该特征图中所有元素的总个数；将该特征图中0元素的个数与该特征图中所有元素的总个数之间的比值作为该特征图的稠密度；The online density identification module is configured to count, for any of the feature maps, the number of 0 elements in the feature map and the total number of all elements in the feature map; The ratio between the number and the total number of all elements in the feature map as the density of the feature map;

所述输出临时寄存模块用于存储卷积神经网络中各层输出的各所述特征图；The output temporary registration module is configured to store each of the feature maps output by each layer in the convolutional neural network;

所述动态稀疏控制模块用于将所述线上稠密度识别模块输出的各所述特征图的稠密度与多个预设阈值进行比较；The dynamic sparse control module is configured to compare the density of each of the feature maps output by the online density identification module with a plurality of preset thresholds;

所述动态编码模块用于根据比较结果将所述输出临时寄存模块中各所述特征图进行稀疏编码。The dynamic encoding module is configured to perform sparse encoding on each of the feature maps in the output temporary registration module according to a comparison result.

相应地，所述动态编码模块具体用于：Correspondingly, the dynamic coding module is specifically configured to:

具体地，所述动态编码模块还用于：Specifically, the dynamic coding module is further configured to:

若预先计算的各所述卷积核的稠密度小于所述第一预设阈值，则将各所述卷积核编码为稀疏矩阵存储格式；If each of the pre-computed convolution kernels has a density less than the first predetermined threshold, encoding each of the convolution kernels into a sparse matrix storage format;

具体地，所述神经网络计算阵列模块具体用于：Specifically, the neural network computing array module is specifically configured to:

本发明提供一种应用于卷积神经网络的加速方法和加速器，该方法通过将卷积神经网络中各层输出的各特征图的稠密度与多个预设阈值进行比较，获取各所述特征图的稀疏状态，将不同稀疏状态的特征图进行不同方式的稀疏编码，然后基于各层下一层的卷积层对稀疏编码后的各特征图和预先稀疏编码的卷积神经网络中的卷积核进行卷积操作，减少卷积神经网络中卷积运算的计算量，提高运算速度。The present invention provides an acceleration method and an accelerator applied to a convolutional neural network, which obtains each of the features by comparing the density of each feature image outputted by each layer in the convolutional neural network with a plurality of preset thresholds. The sparse state of the graph, the feature maps of different sparse states are sparsely coded in different ways, and then based on the thinned coded feature maps of the next layer of each layer and the volumes in the pre-sparse coded convolutional neural network The product core performs a convolution operation to reduce the amount of calculation of the convolution operation in the convolutional neural network and to increase the operation speed.

DRAWINGS

图1为本发明实施例提供的应用于卷积神经网络的加速方法整体流程示意图；1 is a schematic overall flow chart of an acceleration method applied to a convolutional neural network according to an embodiment of the present invention;

图2为本发明实施例提供的应用于卷积神经网络的加速器整体结构示意图；2 is a schematic diagram of an overall structure of an accelerator applied to a convolutional neural network according to an embodiment of the present invention;

图3为本发明实施例提供的应用于卷积神经网络的加速器中极限能量效率测试结果示意图；3 is a schematic diagram of a limit energy efficiency test result of an accelerator applied to a convolutional neural network according to an embodiment of the present invention;

图4为本发明实施例提供的应用于卷积神经网络的加速器中极限能量效率测试结果对比示意图。FIG. 4 is a schematic diagram of comparison of extreme energy efficiency test results in an accelerator applied to a convolutional neural network according to an embodiment of the present invention.

detailed description

下面结合附图和实施例，对本发明的具体实施方式作进一步详细描述。以下实施例用于说明本发明，但不用来限制本发明的范围。The specific embodiments of the present invention are further described in detail below with reference to the drawings and embodiments. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.

在本发明的一个实施例中提供一种应用于卷积神经网络的加速方法，图1为本发明实施例提供的应用于卷积神经网络的加速方法整体流程示意图，该方法包括：An acceleration method for a convolutional neural network is provided in an embodiment of the present invention. FIG. 1 is a schematic flowchart of an overall acceleration method applied to a convolutional neural network according to an embodiment of the present invention, where the method includes:

具体地，所述卷积神经网络中可以包括池化层或不包括池化层。首先对所述卷积神经网络进行训练，训练完成后卷积神经网络中的卷积核不再变动，因此卷积神经网络中的卷积核不需要线上动态稀疏编码，直接线下稀疏编码一次即可。所述线上是指位于加速器的芯片上，所述线下是指不位于所述加速器的芯片上。在每次卷积运算时直接读取稀疏编码的卷积核进行卷积计算。当输入原始图像数据时，对原始图像数据进行稀疏编码，然后将稀疏编码的原始数据和稀疏编码的卷积核输入所述卷积神经网络的第一层卷积层进行卷积计算。由于原始图像数据一般不是稀疏的，也可以不对所述原始图像数据进行稀疏编码，直接输入所述原始图像数据。所述稀疏编码是将数据以稀疏格式存储。Specifically, the convolutional neural network may include or not include a pooling layer. First, the convolutional neural network is trained. After the training is completed, the convolution kernel in the convolutional neural network no longer changes. Therefore, the convolution kernel in the convolutional neural network does not need online dynamic sparse coding, and direct offline sparse coding. Just one time. The line refers to the chip on the accelerator, and the line below refers to the chip not located on the accelerator. The convolutional calculation is performed by directly reading the sparsely encoded convolution kernel at each convolution operation. When the original image data is input, the original image data is sparsely encoded, and then the sparsely encoded raw data and the sparsely encoded convolution kernel are input to the first layer convolutional layer of the convolutional neural network for convolution calculation. Since the original image data is generally not sparse, the original image data may not be sparsely encoded, and the original image data may be directly input. The sparse coding is to store the data in a sparse format.

S1中，由于卷积神经网络中每层输出的各特征图的稠密度不一样，不同层输出的特征图也是动态变化的，因此稠密度也是动态变化的。所述稠密度表示各所述特征图的稀疏程度。为了更好地提高所述卷积神经网络的运算速度，计算每一层输出的各特征图的稠密度，以根据每一层输出的各特征图的稠密度对每一层输出的各特征图进行稀疏编码。In S1, since the density of each characteristic map outputted by each layer in the convolutional neural network is different, the characteristic maps of the output of different layers are also dynamically changed, so the density is also dynamically changed. The density indicates the degree of sparsity of each of the feature maps. In order to better improve the operation speed of the convolutional neural network, the density of each feature image outputted by each layer is calculated, and each characteristic image outputted by each layer is obtained according to the density of each feature image outputted by each layer. Perform sparse coding.

S2中，现有技术中将各层输出的特征图全部进行稀疏编码，计算量大。本实施例根据所述预设阈值获取该层输出的各所述特征图的稀疏状态。从而将不同稀疏状态的特征图进行不同形式的稀疏编码。In S2, in the prior art, all the feature maps output by each layer are sparsely encoded, and the calculation amount is large. In this embodiment, the sparse state of each of the feature maps output by the layer is obtained according to the preset threshold. Therefore, the feature maps of different sparse states are subjected to different forms of sparse coding.

S3中，将稀疏编码后的各所述特征图和预先稀疏编码的所述卷积神经网络中的各卷积核作为该层下一层的卷积层的输入，进行卷积操作。然后，将所述卷积操作的结果作为所述卷积层下一层的输入，将所述卷积层下一层输出的特征图继续执行上述稀疏编码和卷积操作，直到所述卷积神经网络中最后一层输出各特征图为止。本实施例中不限于卷积核的稀疏编码方式。In S3, each of the feature maps after sparse coding and each convolution kernel in the pre-sparse coded convolutional neural network are used as input of a convolution layer of a layer below the layer, and a convolution operation is performed. Then, the result of the convolution operation is used as an input to the next layer of the convolution layer, and the feature map outputted by the next layer of the convolution layer is continued to perform the above-described sparse coding and convolution operations until the convolution The last layer in the neural network outputs the feature maps. This embodiment is not limited to the sparse coding mode of the convolution kernel.

本实施例将卷积神经网络中各层输出的各特征图的稠密度与多个预设阈值进行比较，获取各所述特征图的稀疏状态，将不同稀疏状态的特征图进行不同方式的稀疏编码，然后基于各层下一层的卷积层对稀疏编码后的各特征图和预先稀疏编码的卷积神经网络中的卷积核进行卷积操作，减少卷积神经网络中卷积运算的计算量，提高运算速度。In this embodiment, the density of each feature image outputted by each layer in the convolutional neural network is compared with a plurality of preset thresholds, and the sparse state of each of the feature maps is obtained, and the feature maps of different sparse states are sparse in different manners. Coding, and then convolving the sparse-coded feature maps and the convolution kernels in the pre-sparse-coded convolutional neural network based on the convolutional layer of the next layer of each layer to reduce the convolution operation in the convolutional neural network. Calculate the amount and increase the speed of the operation.

在上述实施例的基础上，本实施例中，所述步骤S1具体包括：对于任一所述特征图，统计该特征图中非0元素的个数和该特征图中所有元素的总个数；将该特征图中非0元素的个数与该特征图中所有元素的总个数之间的比值作为该特征图的稠密度。On the basis of the foregoing embodiment, in the embodiment, the step S1 specifically includes: counting the number of non-zero elements in the feature map and the total number of all the elements in the feature map for any of the feature maps. The ratio between the number of non-zero elements in the feature map and the total number of all elements in the feature map is taken as the density of the feature map.

具体地，各特征图的稠密度为各特征图中非0元素的个数与各特征图中所有元素的总个数之间的比值。例如，一个特征图中非0元素的个数为10，该特征图中所有元素的总个数为100，则该特征图的稠密度为0.1。Specifically, the density of each feature map is a ratio between the number of non-zero elements in each feature map and the total number of all elements in each feature map. For example, if the number of non-zero elements in a feature map is 10, and the total number of all elements in the feature map is 100, the density of the feature map is 0.1.

在上述实施例的基础上，本实施例中所述预设阈值包括第一预设阈值和第二预设阈值；所述预设阈值包括第一预设阈值和第二预设阈值；相应地，所述步骤S2具体包括：若各所述特征图的稠密度小于所述第一预设阈值，则将各所述特征图编码为稀疏矩阵存储格式；若各所述特征图的稠密度大于或等于所述第一预设阈值，且小于所述第二预设阈值，则将各所述特征图中的0元素进行标记；若各所述特征图的稠密度大于或等于所述第二预设阈值，则不对各所述特征图进行稀疏编码。On the basis of the foregoing embodiment, the preset threshold includes a first preset threshold and a second preset threshold; the preset threshold includes a first preset threshold and a second preset threshold; The step S2 specifically includes: if the density of each of the feature maps is less than the first preset threshold, encoding each of the feature maps into a sparse matrix storage format; if each of the feature maps has a density greater than Or equal to the first preset threshold, and less than the second preset threshold, marking 0 elements in each of the feature maps; if the density of each of the feature maps is greater than or equal to the second The preset threshold is not sparsely encoded for each of the feature maps.

具体地，本实施例中所述预设阈值包括第一预设阈值th1和第二预设阈值th2。根据所述第一预设阈值和所述第二预设阈值将各所述特征图的特征状态AS分为三种状态，即将稠密度小于所述第一预设阈值的特征图分为完全稀疏状态S，将稠密度大于或等于所述第一预设阈值，且小于所述第二预设阈值的特征图分为中等稀疏状态M，将稠密度大于或等于所述第二预设阈值的特征图分为完全非稀疏状态D。若各特征图为稀疏状态S，则将各所述特征图编码为稀疏矩阵存储格式，所述稀疏矩阵存储格式包括各所述特征图中的非0数据activ和稀疏索引index，例如坐标编码和压缩稀疏行编码。通过将特征图编码为稀疏矩阵存储格式，可以节省大量存储空间，同时节省大量计算时间。若各特征图为中等稀疏状态M，则将各特征图中的0元素添加标记guard，所述标记用于标识0元素。对于标记的元素可以不参与计算和存储，从而降低功耗。将各特征图中的0元素进行标记也是一种稀疏编码方式。若各特征图为完全非稀疏状态D，则不需要动态编码，直接输出各特征图的非稀疏数据。Specifically, the preset threshold in this embodiment includes a first preset threshold th1 and a second preset threshold th2. Dividing the feature state AS of each of the feature maps into three states according to the first preset threshold and the second preset threshold, that is, the feature map whose density is smaller than the first preset threshold is completely sparse a state S, wherein the feature map having a density greater than or equal to the first preset threshold and less than the second preset threshold is divided into a medium sparse state M, and the density is greater than or equal to the second preset threshold. The feature map is divided into a completely non-sparse state D. If each feature map is a sparse state S, each of the feature maps is encoded into a sparse matrix storage format, where the sparse matrix storage format includes non-zero data activ and sparse index index in each of the feature maps, such as coordinate encoding and Compress sparse row coding. By encoding the feature map as a sparse matrix storage format, a large amount of storage space can be saved while saving a lot of computation time. If each feature map is a medium sparse state M, the 0 element in each feature map is marked with a guard, which is used to identify the 0 element. Elements that are tagged may not participate in calculations and storage, thereby reducing power consumption. Marking the 0 elements in each feature map is also a sparse coding method. If each feature map is completely non-sparse state D, dynamic coding is not required, and non-sparse data of each feature map is directly output.

在上述实施例的基础上，本实施例中所述步骤S3之前还包括：计算训练好的卷积网络中各卷积核的稠密度；若各所述卷积核的稠密度小于所述第一预设阈值，则将各所述卷积核编码为稀疏矩阵存储格式；若各所述卷积核的稠密度大于或等于所述第一预设阈值，且小于所述第二预设阈值，则将各所述卷积核中的0元素进行标记；若各所述卷积核的稠密度大于或等于所述第二预设阈值，则不对各所述卷积核进行编码。On the basis of the foregoing embodiment, before step S3 in the embodiment, the method further includes: calculating a density of each convolution kernel in the trained convolution network; if each of the convolution kernels has a density less than the first a preset threshold, the each convolution kernel is encoded into a sparse matrix storage format; if the density of each of the convolution kernels is greater than or equal to the first preset threshold, and less than the second preset threshold And marking 0 elements in each of the convolution kernels; if each of the convolution kernels has a density greater than or equal to the second predetermined threshold, each of the convolution kernels is not encoded.

具体地，各卷积核的稠密度为各卷积核中非0元素的个数与各卷积核中所有元素的总个数之间的比值。将各卷积核的状态WS与特征图一样分为三种状态。每种状态对应不同的稀疏编码方式。由于特征图和卷积核各存在三种状态，组合后共有9种状态，从而对卷积神经网络的稠密度进行更细粒度的划分。Specifically, the density of each convolution kernel is the ratio between the number of non-zero elements in each convolution kernel and the total number of all elements in each convolution kernel. The state WS of each convolution kernel is divided into three states as in the feature map. Each state corresponds to a different sparse coding scheme. Since there are three states in the feature map and the convolution kernel, there are 9 states after the combination, so that the density of the convolutional neural network is more finely divided.

在上述各实施例的基础上，本实施例中所述步骤S3具体包括：当各所述特征图或各所述卷积核中存在所述标记时，对各所述特征图或各所述卷积核中标记对应的元素不进行计算。On the basis of the above embodiments, the step S3 in the embodiment specifically includes: when each of the feature maps or each of the convolution kernels has the mark, each of the feature maps or each The elements corresponding to the tags in the convolution kernel are not evaluated.

具体地，当各所述特征图或各所述卷积核为完全稀疏状态S时，在输入前去除0，减少存储空间，同时不用对0元素进行计算；当各所述特征图或各所述卷积核为中等稀疏状态M时，各所述特征图或各所述卷积核中的0元素虽然存储，但对标记对应的元素不进行计算，从而减少计算。Specifically, when each of the feature maps or each of the convolution kernels is in a completely sparse state S, remove 0 before input, reduce storage space, and do not calculate 0 elements; when each of the feature maps or each When the convolution kernel is in the medium sparse state M, although the 0 elements in each of the feature maps or the respective convolution kernels are stored, the elements corresponding to the markers are not calculated, thereby reducing the calculation.

在本发明的另一个实施例中提供一种应用于卷积神经网络的加速器，图2为本发明实施例提供的应用于卷积神经网络的加速器整体结构示意图，包括：神经网络计算阵列模块和动态稀疏调整模块；其中，所述动态稀疏调整模块用于计算卷积神经网络中各层输出的各特征图的稠密度，将各所述特征图的稠密度与多个预设阈值进行比较，根据比较结果对各所述特征图进行稀疏编码；其中，不同的比较结果对应的不同稀疏编码方式；所述神经网络计算阵列模块用于对稀疏编码后的各所述特征图和预先稀疏编码的所述卷积神经网络中的各卷积核进行卷积操作。In another embodiment of the present invention, an accelerator applied to a convolutional neural network is provided. FIG. 2 is a schematic diagram of an overall structure of an accelerator applied to a convolutional neural network according to an embodiment of the present invention, including: a neural network computing array module and a dynamic sparse adjustment module, wherein the dynamic sparse adjustment module is configured to calculate a density of each feature image outputted by each layer in the convolutional neural network, and compare the density of each of the feature maps with a plurality of preset thresholds, Performing sparse coding on each of the feature maps according to the comparison result; wherein different comparison results correspond to different sparse coding modes; the neural network calculation array module is configured to perform sparsely coded each of the feature maps and pre-sparse coding Each convolution kernel in the convolutional neural network performs a convolution operation.

具体地，所述卷积神经网络中可以包括池化层或不包括池化层。首先对卷积神经网络进行训练，训练完成后卷积神经网络中的卷积核不再变动，因此卷积神经网络中的卷积核不需要线上动态稀疏编码，直接线下稀疏编码一次即可。在每次卷积运算时所述神经网络计算阵列模块直接读取线下稀疏编码的卷积核进行卷积计算。当卷积神经网络输入原始图像数据时，动态稀疏调整模块对原始图像数据进行稀疏编码，然后神经网络计算阵列模块根据稀疏编码的原始数据和稀疏编码的卷积核进行卷积计算。由于原始图像数据一般不是稀疏的，也可以不对所述原始图像数据进行稀疏编码，直接输入所述原始图像数据。所述稀疏编码是将数据以稀疏格式存储。Specifically, the convolutional neural network may include or not include a pooling layer. Firstly, the convolutional neural network is trained. After the training is completed, the convolution kernel in the convolutional neural network no longer changes. Therefore, the convolution kernel in the convolutional neural network does not need online dynamic sparse coding, and the direct offline sparse coding once can. The neural network computational array module directly reads the offline sparsely encoded convolution kernel for convolution calculations each time a convolution operation is performed. When the convolutional neural network inputs the original image data, the dynamic sparse adjustment module sparsely encodes the original image data, and then the neural network calculation array module performs convolution calculation according to the sparsely encoded original data and the sparsely encoded convolution kernel. Since the original image data is generally not sparse, the original image data may not be sparsely encoded, and the original image data may be directly input. The sparse coding is to store the data in a sparse format.

由于卷积神经网络中各层输出的各特征图的稠密度不一样，不同层输出的特征图也是动态变化的，因此稠密度也是动态变化的。所述稠密度表示各所述特征图的稀疏程度。为了更好地提高所述卷积神经网络的运算速度，所述动态稀疏调整模块计算每一层输出的各特征图的稠密度，以根据每一层输出的各特征图的稠密度对每一层输出的各特征图进行稀疏编码。Since the density of each characteristic map outputted by each layer in the convolutional neural network is different, the characteristic maps of the output of different layers are also dynamically changed, so the density is also dynamically changed. The density indicates the degree of sparsity of each of the feature maps. In order to better improve the operation speed of the convolutional neural network, the dynamic sparse adjustment module calculates the density of each feature map outputted by each layer, according to the density of each feature map output by each layer. Each feature map output by the layer is sparsely encoded.

所述动态稀疏调整模块根据多个预设阈值获取该层输出的各所述特征图的稀疏状态。从而将不同稀疏状态的特征图进行不同形式的稀疏编码，而不仅限于一种稀疏编码。现有技术中将各层输出的特证图全部进行稀疏编码，计算量大。The dynamic sparse adjustment module acquires a sparse state of each of the feature maps output by the layer according to a plurality of preset thresholds. Therefore, the feature maps of different sparse states are subjected to different forms of sparse coding, and are not limited to one type of sparse coding. In the prior art, all the special maps output by each layer are sparsely encoded, and the calculation amount is large.

所述神经网络计算阵列模块根据稀疏编码后的各所述特征图和预先稀疏编码的所述卷积神经网络中的各卷积核进行卷积操作。如果包括池化模块，所述池化模块将所述卷积操作的结果进行池化操作。此外，所述加速器还包括中间数据存储器模块、主芯片控制器和芯片上下数据交流模块。其中，所述主控制器控制加速器整个芯片的运行动作与时序。芯片上下数据交流模块用于从所述芯片外部存储器读取数据或将芯片计算好的数据写入外部存储。例如，初始化后，芯片在所述主控制器的控制下通过所述芯片上下数据交流模块从外部存储器读取原始图像数据和初始卷积核。所述中间数据存储器模块用于存储所述神经网络计算阵列模块计算过程中的中间结果。The neural network computing array module performs a convolution operation according to each of the feature maps after sparse coding and each convolution kernel in the pre-sparse coded convolutional neural network. If the pooling module is included, the pooling module performs a pooling operation on the result of the convolution operation. In addition, the accelerator further includes an intermediate data memory module, a main chip controller, and an on-chip data exchange module. Wherein, the main controller controls the running action and timing of the entire chip of the accelerator. The on-chip data exchange module is used to read data from the chip external memory or write the chip-calculated data to external storage. For example, after initialization, the chip reads the original image data and the initial convolution kernel from the external memory through the chip upper and lower data exchange modules under the control of the main controller. The intermediate data storage module is configured to store intermediate results in the calculation process of the neural network computing array module.

本实施例动态稀疏调整模块将卷积神经网络各层输出的各特征图的稠密度与多个预设阈值进行比较，获取各所述特征图的稀疏状态，将不同稀疏状态的特征图进行不同方式的稀疏编码，以供神经网络计算阵列模块对稀疏编码后的各特征图和预先稀疏编码的卷积神经网络中的卷积核进行卷积操作，一方面，减少卷积神经网络中卷积运算的计算量，提高运算速度；另一方面，根据稀疏状态的不同，动态切换加速器的处理状态，提高了加速器的灵活性。In this embodiment, the dynamic sparse adjustment module compares the density of each feature image outputted by each layer of the convolutional neural network with a plurality of preset thresholds, obtains a sparse state of each of the feature maps, and differentizes the feature maps of different sparse states. Sparse coding of the method for the neural network computing array module to perform convolution operations on the sparsely encoded feature maps and the convolution kernels in the pre-sparse coded convolutional neural network, on the one hand, reducing convolution in convolutional neural networks The calculation amount of the operation increases the calculation speed; on the other hand, according to the sparse state, the processing state of the accelerator is dynamically switched, and the flexibility of the accelerator is improved.

在上述实施例的基础上，本实施例中所述动态稀疏调整模块包括线上稠密度识别模块、输出临时寄存模块、动态编码模块和动态稀疏控制模块；其中，所述线上稠密度识别模块用于对于任一所述特征图，统计该特征图中0元素的个数和该特征图中所有元素的总个数；将该特征图中0元素的个数与该特征图中所有元素的总个数之间的比值作为该特征图的稠密度；所述输出临时寄存模块用于存储卷积神经网络中各层输出的各所述特征图；所述动态稀疏控制模块用于将所述线上稠密度识别模块输出的各所述特征图的稠密度与多个预设阈值进行比较；所述动态编码模块用于根据比较结果将所述输出临时寄存模块中各所述特征图进行稀疏编码。On the basis of the foregoing embodiment, the dynamic sparse adjustment module in the embodiment includes an online density identification module, an output temporary registration module, a dynamic coding module, and a dynamic sparse control module; wherein the online density identification module For counting any of the feature maps, counting the number of 0 elements in the feature map and the total number of all the elements in the feature map; the number of 0 elements in the feature map and all the elements in the feature map The ratio between the total number is used as the density of the feature map; the output temporary registration module is configured to store each of the feature maps output by each layer in the convolutional neural network; the dynamic sparse control module is configured to The density of each of the feature maps output by the online density identification module is compared with a plurality of preset thresholds; the dynamic encoding module is configured to sparse each of the feature maps in the output temporary registration module according to a comparison result. coding.

具体地，所述动态稀疏调整模块具体包括四个模块。所述线上稠密度识别模块用于统计计算过程中各特征图中非0元素的个数，以计算各特征图的稠密度。所述输出临时寄存模块用于将卷积神经网络中各层输出的特征图以非稀疏格式进行暂存。所述动态稀疏控制模块用于通过预设的多个预设阈值控制所述特征图的稀疏状态。所述动态编码模块根据各所述特征图的稀疏状态将所述输出临时寄存模块中各所述特征图进行稀疏编码，从而提高卷积运算的速度。Specifically, the dynamic sparse adjustment module specifically includes four modules. The online density identification module is used to count the number of non-zero elements in each feature map during the calculation process to calculate the density of each feature map. The output temporary registration module is configured to temporarily store the feature map outputted by each layer in the convolutional neural network in a non-sparse format. The dynamic sparse control module is configured to control a sparse state of the feature map by using a preset plurality of preset thresholds. The dynamic encoding module performs sparse encoding on each of the feature maps in the output temporary registration module according to a sparse state of each of the feature maps, thereby increasing the speed of the convolution operation.

在上述实施例的基础上，本实施例中所述预设阈值包括第一预设阈值和第二预设阈值；所述预设阈值包括第一预设阈值和第二预设阈值；相应地，所述动态编码模块具体用于：若各所述特征图的稠密度小于所述第一预设阈值，则将各所述特征图编码为稀疏矩阵存储格式；若各所述特征图的稠密度大于或等于所述第一预设阈值，且小于所述第二预设阈值，则将各所述特征图中的0元素进行标记；若各所述特征图的稠密度大于或等于所述第二预设阈值，则不对各所述特征图进行编码。On the basis of the foregoing embodiment, the preset threshold includes a first preset threshold and a second preset threshold; the preset threshold includes a first preset threshold and a second preset threshold; The dynamic coding module is specifically configured to: if each of the feature maps has a density less than the first preset threshold, encode each of the feature maps into a sparse matrix storage format; if each of the feature maps is dense If the degree is greater than or equal to the first preset threshold, and is less than the second preset threshold, the 0 elements in each of the feature maps are marked; if the density of each of the feature maps is greater than or equal to the The second preset threshold does not encode each of the feature maps.

具体地，本实施例中所述预设阈值包括第一预设阈值th1和第二预设阈值th2。所述动态稀疏控制模块根据所述第一预设阈值和所述第二预设阈值将各所述特征图的特征状态AS分为三种状态，即将稠密度小于所述第一预设阈值的特征图分为完全稀疏状态S，将稠密度大于或等于所述第一预设阈值，且小于所述第二预设阈值的特征图分为中等稀疏状态M，将稠密度大于或等于所述第二预设阈值的特征图分为完全非稀疏状态D。Specifically, the preset threshold in this embodiment includes a first preset threshold th1 and a second preset threshold th2. The dynamic sparse control module divides the feature state AS of each of the feature maps into three states according to the first preset threshold and the second preset threshold, that is, the density is less than the first preset threshold. The feature map is divided into a completely sparse state S, a feature map having a density greater than or equal to the first predetermined threshold, and a feature map smaller than the second predetermined threshold is classified into a medium sparse state M, and the density is greater than or equal to the The feature map of the second preset threshold is divided into a completely non-sparse state D.

若各特征图为稀疏状态S，则所述动态编码模块将所述输出临时寄存模块中各所述特征图编码为稀疏矩阵存储格式，所述稀疏矩阵存储格式包括各所述特征图中的非0数据activ和稀疏索引index，例如坐标编码和压缩稀疏行编码。通过将特征图编码为稀疏矩阵存储格式，可以节省大量存储空间，同时节省大量计算时间。若各特征图为中等稀疏状态M，则所述动态编码模块将所述输出临时寄存模块中各特征图中的0元素添加标记guard，对于标记的元素可以不参与计算和存储，从而降低功耗。若各特征图为完全非稀疏状态D，则不需要动态编码，所述动态编码模块直接输出各特征图的非稀疏数据。If each feature map is a sparse state S, the dynamic coding module encodes each of the feature maps in the output temporary registration module into a sparse matrix storage format, where the sparse matrix storage format includes non-characteristics in each of the feature maps. 0 data activ and sparse index index, such as coordinate encoding and compressed sparse row encoding. By encoding the feature map as a sparse matrix storage format, a large amount of storage space can be saved while saving a lot of computation time. If each feature map is a medium sparse state M, the dynamic coding module adds a flag guard to the 0 element in each feature map in the output temporary registration module, and may not participate in calculation and storage for the marked element, thereby reducing power consumption. . If each feature map is completely non-sparse state D, dynamic coding is not required, and the dynamic coding module directly outputs non-sparse data of each feature map.

在上述实施例的基础上，本实施例中所述动态编码模块还用于：若预先计算的各所述卷积核的稠密度小于所述第一预设阈值，则将各所述卷积核编码为稀疏矩阵存储格式；若各所述卷积核的稠密度大于或等于所述第一预设阈值，且小于所述第二预设阈值，则将各所述卷积核中的0元素进行标记；若各所述卷积核的稠密度大于或等于所述第二预设阈值，则不对各所述卷积核进行编码。On the basis of the foregoing embodiment, the dynamic coding module in this embodiment is further configured to: if the density of each of the convolution kernels calculated in advance is less than the first preset threshold, each of the convolutions The kernel is encoded as a sparse matrix storage format; if the density of each of the convolution kernels is greater than or equal to the first predetermined threshold and less than the second predetermined threshold, then 0 of each of the convolution kernels The element is marked; if the density of each of the convolution kernels is greater than or equal to the second predetermined threshold, each of the convolution kernels is not encoded.

具体地，各卷积核的稠密度为各卷积核中非0元素的个数与各卷积核中所有元素的总个数之间的比值。各卷积核的状态WS与特征图一样有三种状态。每种状态对应不同的稀疏编码方式。由于特征图和卷积核各存在三种状态，组合后共有9种状态，从而对卷积神经网络的稠密度进行更细粒度的划分。Specifically, the density of each convolution kernel is the ratio between the number of non-zero elements in each convolution kernel and the total number of all elements in each convolution kernel. The state WS of each convolution kernel has three states as in the feature map. Each state corresponds to a different sparse coding scheme. Since there are three states in the feature map and the convolution kernel, there are 9 states after the combination, so that the density of the convolutional neural network is more finely divided.

在上述各实施例的基础上，本实施例中所述神经网络计算阵列模块具体用于：当各所述特征图或各所述卷积核中存在所述标记时，对各所述特征图或各所述卷积核中标记对应的元素不进行计算。On the basis of the foregoing embodiments, the neural network calculation array module in the embodiment is specifically configured to: when each of the feature maps or each of the convolution kernels has the label, Or the corresponding element in each of the convolution kernels is not calculated.

具体地，当各所述特征图或各所述卷积核为完全稀疏状态S时，在将各所述特征图或各所述卷积核输入所述神经网络计算阵列模块前去除0，减少存储空间，同时不用对0元素进行计算；当各所述特征图或各所述卷积核为中等稀疏状态M时，各所述特征图或各所述卷积核中的0元素虽然存储，但对标记对应的元素不进行计算，从而减少计算。Specifically, when each of the feature maps or each of the convolution kernels is in a completely sparse state S, the 0 is reduced before each of the feature maps or each of the convolution kernels is input into the neural network calculation array module. The storage space is not calculated at the same time; when each of the feature maps or each of the convolution kernels is in a medium sparse state M, the 0 elements in each of the feature maps or each of the convolution kernels are stored, However, the elements corresponding to the mark are not calculated, thereby reducing the calculation.

例如，采用台积电65nm工艺制作所述加速器的芯片，所述芯片的面积为3mm*4mm，运行频率为20-200MHz，功耗为20.5-248.4毫瓦。本实施例中极限能量效率会随着特征图和卷积核稠密度的下降而快速上升，如图3所示。当特征图和卷积核的稠密度均为5％时，极限能量效率可达到62.1TOPS/W，是不采用本实施例加速器时极限能量效率的6.2倍。如图4所示，相比于只支持特征数据稀疏的实现，本实施例能量效率可提升4.3倍。相比于无自适应稀疏控制的实现，本发明能量效率可提升2.8倍。相比于无稠密度控制但量化精度可变的实现，本发明能量效率可提升2倍。For example, the chip of the accelerator is fabricated by a TSMC 65 nm process, the chip having an area of 3 mm*4 mm, an operating frequency of 20-200 MHz, and a power consumption of 20.5-248.4 mW. In this embodiment, the ultimate energy efficiency rises rapidly as the feature map and the convolution kernel density decrease, as shown in FIG. When the density of the feature map and the convolution kernel are both 5%, the ultimate energy efficiency can reach 62.1 TOPS/W, which is 6.2 times the ultimate energy efficiency when the accelerator of the embodiment is not used. As shown in FIG. 4, the energy efficiency of the embodiment can be increased by 4.3 times compared to the implementation of only supporting feature data sparse. Compared to the implementation without adaptive sparse control, the energy efficiency of the present invention can be increased by 2.8 times. The energy efficiency of the present invention can be increased by a factor of 2 compared to the implementation without variable density control but variable quantization accuracy.

最后，本申请的方法仅为较佳的实施方案，并非用于限定本发明的保护范围。凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。Finally, the method of the present application is only a preferred embodiment and is not intended to limit the scope of the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and scope of the present invention are intended to be included within the scope of the present invention.

Claims

An acceleration method applied to a convolutional neural network, comprising:

S1, for any layer in the convolutional neural network, respectively calculating the density of each feature map outputted by the layer;

S2, comparing the density of each of the feature maps outputted by the layer with a plurality of preset thresholds, and performing sparse coding on each of the feature maps according to the comparison result; wherein different comparison results correspond to different sparse coding modes;

S3, convolving each of the feature maps after sparse coding and each convolution kernel in the pre-sparse coded convolutional neural network based on a convolution layer of a layer below the layer.

The method according to claim 1, wherein the step S1 specifically comprises:

For any of the feature maps, the number of non-zero elements in the feature map and the total number of all elements in the feature map are counted;

The ratio between the number of non-zero elements in the feature map and the total number of all elements in the feature map is taken as the density of the feature map.

The method according to claim 1, wherein the preset threshold includes a first preset threshold and a second preset threshold; wherein the first preset threshold is smaller than the second preset threshold;

Correspondingly, the step S2 specifically includes:

If the density of each of the feature maps is less than the first preset threshold, encoding each of the feature maps into a sparse matrix storage format;

If the density of each of the feature maps is greater than or equal to the first preset threshold and less than the second preset threshold, marking 0 elements in each of the feature maps;

If the density of each of the feature maps is greater than or equal to the second predetermined threshold, each of the feature maps is not sparsely encoded.

The method according to claim 3, wherein the step S3 further comprises:

Calculating the density of each convolution kernel in the trained convolutional network;

If each of the convolution kernels has a density less than the first predetermined threshold, encoding each of the convolution kernels into a sparse matrix storage format;

If the density of each of the convolution kernels is greater than or equal to the first predetermined threshold and less than the second predetermined threshold, marking 0 elements in each of the convolution kernels;

If the density of each of the convolution kernels is greater than or equal to the second predetermined threshold, each of the convolution kernels is not sparsely encoded.

The method according to claim 3 or 4, wherein the step S3 specifically comprises:

When the mark exists in each of the feature maps or each of the convolution kernels, no calculation is performed on each of the feature maps or the elements corresponding to the marks in each of the convolution kernels.

An accelerator applied to a convolutional neural network, comprising: a neural network computing array module and a dynamic sparse adjustment module;

The dynamic sparse adjustment module is configured to calculate the density of each feature image outputted by each layer in the convolutional neural network, compare the density of each of the feature maps with a plurality of preset thresholds, and compare the results according to the comparison results. The feature map is subjected to sparse coding; wherein different comparison results correspond to different sparse coding modes;

The neural network computing array module is configured to perform a convolution operation on each of the sparsely encoded feature maps and the pre-sparsely encoded convolutional neural networks.

The accelerator according to claim 6, wherein the dynamic sparse adjustment module comprises an online density identification module, an output temporary registration module, a dynamic coding module, and a dynamic sparse control module;

The online density identification module is configured to count, for any of the feature maps, the number of 0 elements in the feature map and the total number of all elements in the feature map; The ratio between the number and the total number of all elements in the feature map as the density of the feature map;

The output temporary registration module is configured to store each of the feature maps output by each layer in the convolutional neural network;

The dynamic sparse control module is configured to compare the density of each of the feature maps output by the online density identification module with a plurality of preset thresholds;

The dynamic encoding module is configured to perform sparse encoding on each of the feature maps in the output temporary registration module according to a comparison result.

The accelerator according to claim 7, wherein the preset threshold comprises a first preset threshold and a second preset threshold; wherein the first preset threshold is smaller than the second preset threshold;

Correspondingly, the dynamic coding module is specifically configured to:

The accelerator according to claim 8, wherein the dynamic encoding module is further configured to:

If each of the pre-computed convolution kernels has a density less than the first predetermined threshold, encoding each of the convolution kernels into a sparse matrix storage format;

The accelerator according to claim 7 or 8, wherein the neural network computing array module is specifically configured to: