WO2018107383A1

WO2018107383A1 - Neural network convolution computation method and device, and computer-readable storage medium

Info

Publication number: WO2018107383A1
Application number: PCT/CN2016/109862
Authority: WO
Inventors: 陈云霁; 庄毅敏; 刘少礼; 郭崎; 陈天石
Original assignee: Shanghai Cambricon Information Technology Co Ltd
Current assignee: Shanghai Cambricon Information Technology Co Ltd
Priority date: 2016-12-14
Filing date: 2016-12-14
Publication date: 2018-06-21
Anticipated expiration: 2019-06-14

Abstract

A neural network convolution computation method and device, used for achieving convolution computation of a weight matrix and neurons in a neural network in a matrix multiplication manner. The method comprises: first performing winograd transform on a neuron matrix and a weight matrix to obtain a transformed neuron matrix and a transformed weight matrix (step 1); then performing a matrix multiplication operation on the transformed neuron matrix and the transformed weight matrix to obtain a multiplication matrix (step 2); and finally performing winograd inverse transform on the multiplication matrix to obtain the computation result (step 3).

Description

Neural network convolution operation method, device and computer readable storage medium

Technical field

本发明涉及人工神经网络技术领域，具体涉及一种神经网络的卷积运算方法、装置及计算机可读存储介质。The present invention relates to the field of artificial neural network technologies, and in particular, to a convolution operation method and apparatus for a neural network, and a computer readable storage medium.

Background technique

多层人工神经网络被广泛应用于模式识别、图像处理、函数逼近和优化计算等领域，多层人工网络在近年来由于其较高的识别准确度和较好的可并行性，受到学术界和工业界越来越广泛的关注。Multi-layer artificial neural networks are widely used in the fields of pattern recognition, image processing, function approximation and optimization calculation. Multi-layer artificial networks have been accepted by academia in recent years due to their high recognition accuracy and good parallelism. The industry is getting more and more attention.

为了适应越来越来高的任务需求，神经网络的规模变得越来越庞大，目前大型的卷积神经网络已经包含了上百层的网络层结构。随之带来的问题神经网络需要做更大量的运算，特别是卷积神经网络，大量的卷积运算降低了神经网络的运算速度，影响神经网络在实际应用场合的使用。In order to adapt to the increasingly high task requirements, the scale of neural networks has become more and more large. At present, large convolutional neural networks already contain hundreds of layers of network layers. The resulting problems require a lot of operations on neural networks, especially convolutional neural networks. A large number of convolution operations reduce the computational speed of neural networks and affect the use of neural networks in practical applications.

发明内容Summary of the invention

(一)要解决的技术问题(1) Technical problems to be solved

本发明的目的在于，提供一种神经网络的卷积运算方法、装置及计算机可读存储介质，可以通过矩阵乘法的方式实现神经网络中的权值矩阵与神经元的卷积运算，从而减少卷积所需的运算量，提高神经网络的运算速度，大幅提高数据处理的效率。The object of the present invention is to provide a convolution operation method and device for a neural network and a computer readable storage medium, which can realize a convolution operation of a weight matrix and a neuron in a neural network by matrix multiplication, thereby reducing a volume The amount of computation required for the product increases the computational speed of the neural network and greatly improves the efficiency of data processing.

(二)技术方案(2) Technical plan

本发明一方面提供一种神经网络的卷积运算方法，用于以矩阵乘法的方式实现神经网络中的权值矩阵与神经元的卷积运算，方法包括：An aspect of the present invention provides a convolution operation method for a neural network, which is used to implement a convolution operation of a weight matrix and a neuron in a neural network by matrix multiplication, and the method includes:

S1，对神经元矩阵和权值矩阵进行winograd变换，得到变换后神经元矩阵和变换后权值矩阵；S1, performing a winograd transformation on the neuron matrix and the weight matrix to obtain the transformed nerve a metamatrix and a transformed weight matrix;

S2，将变换后神经元矩阵和变换后权值矩阵进行矩阵乘法操作，得到乘法矩阵；S2, performing a matrix multiplication operation on the transformed neuron matrix and the transformed weight matrix to obtain a multiplication matrix;

S3，将乘法矩阵进行winograd反变换，得到运算结果。S3, the multiplication matrix is inverse-transformed by winograd, and the operation result is obtained.

本发明另一方面提供一种神经网络的卷积运算装置，用于以矩阵乘法的方式实现神经网络中的权值矩阵与神经元的卷积运算，装置包括：Another aspect of the present invention provides a convolution operation device for a neural network, which is used to implement a convolution operation of a weight matrix and a neuron in a neural network by matrix multiplication, and the device includes:

存储器，用于存储指令；a memory for storing instructions;

控制器，用于对指令进行译码；a controller for decoding an instruction;

处理器，用于执行译码后的指令，以执行：a processor, configured to execute the decoded instruction to perform:

对神经元矩阵和权值矩阵进行winograd变换，得到变换后神经元矩阵和变换后权值矩阵；Performing a winograd transformation on the neuron matrix and the weight matrix to obtain a transformed neuron matrix and a transformed weight matrix;

将变换后神经元矩阵和变换后权值矩阵进行矩阵乘法操作，得到乘法矩阵；Performing a matrix multiplication operation on the transformed neuron matrix and the transformed weight matrix to obtain a multiplication matrix;

将乘法矩阵进行winograd反变换，得到运算结果。The inverse matrix is inverse-transformed by the multiplication matrix to obtain the operation result.

本发明另一方面提供一种计算机可读存储介质，其存储有指令，该指令可被处理器执行，以使得该处理器执行本发明的方法。Another aspect of the invention provides a computer readable storage medium storing instructions executable by a processor to cause the processor to perform the methods of the present invention.

(3) Beneficial effects

本发明能够将复杂的卷积操作变为稀疏矩阵乘法操作，并且变换与反变换过程可用位操作实现，通过这种方法可以大量减少卷积所需的运算量，提高神经网络的运算速度，大幅提高数据处理的效率，同时，采用稀疏序列可以减少存储网络参数所需的存储空间，降低内存访问的带宽。The invention can turn a complex convolution operation into a sparse matrix multiplication operation, and the transform and inverse transform processes can be realized by bit operations. By this method, the amount of calculation required for convolution can be greatly reduced, and the operation speed of the neural network can be improved. Improve the efficiency of data processing. At the same time, using sparse sequences can reduce the storage space required to store network parameters and reduce the bandwidth of memory access.

DRAWINGS

图1示意性示出了本发明实施例的神经网络的卷积运算方法的流程图。FIG. 1 is a flow chart schematically showing a convolution operation method of a neural network according to an embodiment of the present invention.

图2示意性示出了本发明实施例的神经网络的卷积运算装置的结构示意图。 FIG. 2 is a schematic block diagram showing the structure of a convolution operation device of a neural network according to an embodiment of the present invention.

图3示意性示出了本发明实施例的处理器的结构示意图。FIG. 3 is a schematic block diagram showing the structure of a processor according to an embodiment of the present invention.

图4示意性示出了卷积运算的示意图。Fig. 4 schematically shows a schematic diagram of a convolution operation.

图5结合本发明实施例所描述的装置，示意性示出本发明实施例执行图4卷积运算的过程。FIG. 5 is a schematic diagram showing the process of performing the convolution operation of FIG. 4 according to an embodiment of the present invention in conjunction with the apparatus described in the embodiment of the present invention.

detailed description

根据结合附图对本发明示例性实施例的以下详细描述，本发明的其它方面、优势和突出特征对于本领域技术人员将变得显而易见。Other aspects, advantages, and salient features of the present invention will become apparent to those skilled in the <

在本发明中，术语“包括”和“含有”及其派生词意为包括而非限制；术语“或”是包含性的，意为和/或。In the present invention, the terms "include" and "including" and their derivatives are intended to be inclusive and not limiting; the term "or" is inclusive, meaning and/or.

在本说明书中，下述用于描述本发明原理的各种实施例知识说明，不应该以任何方式解释为限制发明的范围。参照附图的下述描述用于帮助全面理解由权利要求及其等同物限定的本发明的示例性实施例。下述描述包括多种具体细节来帮助理解，但这些细节应认为仅仅是示例性的。因此，本领域普通技术人员应认识到，在不背离本发明的范围和精神的情况下，可以对本文中描述的实施例进行多种改变和修改。此外，为了清楚和简洁起见，省略了公知功能和结构的描述。此外，贯穿附图，相同参考数字用于相思功能和操作。In the present specification, the following description of various embodiments for describing the principles of the present invention should not be construed as limiting the scope of the invention in any way. The following description of the invention is intended to be understood as The description below includes numerous specific details to assist the understanding, but these details should be considered as merely exemplary. Accordingly, it will be apparent to those skilled in the art that various changes and modifications may be made to the embodiments described herein without departing from the scope and spirit of the invention. In addition, descriptions of well-known functions and constructions are omitted for clarity and conciseness. Moreover, throughout the figures, the same reference numerals are used for the acacia function and operation.

附图中示出了一些方框图和/或流程图。应理解，方框图和/或流程图中的一些方框或其组合可以由计算机程序指令来实现。这些计算机程序指令可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器，从而这些指令在由该处理器执行时可以创建用于实现这些方框图和/或流程图中所说明的功能/操作的装置。Some block diagrams and/or flowcharts are shown in the drawings. It will be understood that some blocks or combinations of the block diagrams and/or flowcharts can be implemented by computer program instructions. These computer program instructions may be provided to a general purpose computer, a special purpose computer or a processor of other programmable data processing apparatus such that when executed by the processor, the instructions may be used to implement the functions illustrated in the block diagrams and/or flowcharts. / operating device.

因此，本公开的技术可以硬件和/或软件(包括固件、微代码等)的形式来实现。另外，本公开的技术可以采取存储有指令的计算机可读介质上的计算机程序产品的形式，该计算机程序产品可供指令执行系统使用。在本公开的上下文中，计算机可读介质可以是能够包含、存储、传送、传播或传输指令的任意介质。例如，计算机可读介质可以包括但不限于电、磁、光、电磁、红外或半导体系统、装置、器件或传播介质。计算机可读介质的具体示例包括：磁存储装置，如磁带或硬盘(HDD)；光存储装置，如光盘(CD-ROM)；存储器，如随机存取存储器(RAM)或闪存；和/或有线/无线通信链路。Thus, the techniques of this disclosure may be implemented in the form of hardware and/or software (including firmware, microcode, etc.). Additionally, the techniques of the present disclosure may take the form of a computer program product on a computer readable medium storing instructions for use by an instruction execution system. In the context of the present disclosure, a computer readable medium can be any medium that can contain, store, communicate, propagate or transport the instructions. For example, a computer readable medium can include but not Limited to electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, devices, or propagation media. Specific examples of the computer readable medium include: a magnetic storage device such as a magnetic tape or a hard disk (HDD); an optical storage device such as a compact disk (CD-ROM); a memory such as a random access memory (RAM) or a flash memory; and/or a wired /Wireless communication link.

图1示意性示出了本发明实施例的神经网络的卷积运算方法的流程图，如图1所示，方法包括：FIG. 1 is a flow chart schematically showing a convolution operation method of a neural network according to an embodiment of the present invention. As shown in FIG. 1, the method includes:

步骤1，对神经元矩阵和权值矩阵进行winograd变换，得到变换后神经元矩阵和变换后权值矩阵。Step 1. Perform a winograd transformation on the neuron matrix and the weight matrix to obtain a transformed neuron matrix and a transformed weight matrix.

在本步骤中，采用下式对神经元矩阵d₀和权值矩阵w₀进行winograd变换，得到变换后神经元矩阵d和变换后权值矩阵w：In this step, the neuron matrix d ₀ and the weight matrix w _{0 are} subjected to winograd transformation using the following formula to obtain the transformed neuron matrix d and the transformed weight matrix w:

d＝C^Td₀C，w＝Gw₀G^T，d=C ^T d ₀ C,w=Gw ₀ G ^T ,

其中，C为神经元矩阵d₀的变换矩阵，C^T为C的转置矩阵，G为权值矩阵w₀的变换矩阵，G^T为G的转置矩阵。Where C is the transformation matrix of the neuron matrix d ₀ , C ^T is the transposed matrix of C, G is the transformation matrix of the weight matrix w ₀ , and G ^T is the transposed matrix of G.

另外，神经元矩阵和权值矩阵中的数值为二进制，并且，变换矩阵C、G数值为2ⁿ，例如1，-0.5，0，0.5，1等。这样，本发明实施例采用位操作实现winograd变换，通过左移和右移实现乘2与除2的操作。例如，神经元矩阵d₀中的一个数值与0.5相乘时，即将该数值向右移一位，与-0.5相乘时，即将该数值向左移一位并将最高位取反。因此，本发明实施例通过位操作来实现winograd变换，减少了运算量，提高了运算速度。In addition, the values in the neuron matrix and the weight matrix are binary, and the values of the transformation matrix C, G are 2 ⁿ , such as 1, -0.5, 0, 0.5, 1, and the like. Thus, the embodiment of the present invention implements a winograd transform using a bit operation, and implements operations of multiplying 2 and dividing 2 by left shift and right shift. For example, when a value in the neuron matrix d ₀ is multiplied by 0.5, the value is shifted to the right by one bit. When multiplied by -0.5, the value is shifted to the left by one bit and the highest bit is inverted. Therefore, in the embodiment of the present invention, the winograd transformation is realized by the bit operation, the calculation amount is reduced, and the operation speed is improved.

神经元矩阵d₀和权值矩阵w₀的变换矩阵C和G是采用winograd算法得到的。The transformation matrices C and G of the neuron matrix d ₀ and the weight matrix w ₀ are obtained using the winograd algorithm.

winograd算法利用矩阵的分块相乘以减小矩阵乘法的乘法次数，有多种不同的矩阵分块方法，一种winograd算法如下所示The winograd algorithm uses the block multiplication of the matrix to reduce the number of multiplications of the matrix multiplication. There are many different matrix blocking methods. A winograd algorithm is shown below.

计算矩阵乘法C＝AB，对各矩阵进行分块，有Calculate matrix multiplication C=AB, block each matrix, there is

记Remember

S₁＝A₂₁+A₂₂，S₂＝S₁-A₁₁，S₃＝A₁₁-A₂₁，S₄＝A₁₂-S₂ S ₁ = A ₂₁ + A ₂₂ , S ₂ = S _{1 -} A ₁₁ , S ₃ = A _{11 -} A ₂₁ , S ₄ = A _{12 -} S ₂

S₅＝B₁₂-B₁₁，S₆＝B₂₂-S₅，S₇＝B₂₂-B₁₂，S₈＝S₆-B₂₁ S ₅ =B ₁₂ -B ₁₁ ,S ₆ =B ₂₂ -S ₅ ,S ₇ =B ₂₂ -B ₁₂ ,S ₈ =S ₆ -B ₂₁

M₁＝S₂S₆，M₂＝A₁₁B₁₁，M₃＝A₁₂B₂₁，M₄＝S₃S₇ M ₁ =S ₂ S ₆ , M ₂ =A ₁₁ B ₁₁ , M ₃ =A ₁₂ B ₂₁ , M ₄ =S ₃ S ₇

M₅＝S₁S₅，M₆＝S₄B₂₂，M₇＝A₂₂S₈ M ₅ =S ₁ S ₅ , M ₆ =S ₄ B ₂₂ , M ₇ =A ₂₂ S ₈

T₁＝M₁+M₂，T₂＝T₁+M₄ T ₁ = M ₁ + M ₂ , T ₂ = T ₁ + M ₄

则then

C₁₁＝M₂+M₃+M₆，C₁₂＝T₁+M₅ C ₁₁ =M ₂ +M ₃ +M ₆ , C ₁₂ =T ₁ +M ₅

C₂₁＝T₂-M₇，C₂₂＝T₂+M₅ C ₂₁ =T ₂ -M ₇ , C ₂₂ =T ₂ +M ₅

通过上述的winograd算法，获得卷积所需的变换矩阵，例如，对于一维卷积[d₁，d₂，d₃]*[w₁，w₂]，假设每次卷积滑动为1，可将卷积扩展成矩阵相乘的形式The transformation matrix required for convolution is obtained by the above winograd algorithm, for example, for a one-dimensional convolution [d ₁ , d ₂ , d ₃ ]*[w ₁ , w ₂ ], assuming that each convolution slip is 1, The convolution can be extended into a matrix multiplied form

通过winograd算法可获得Available through the winograd algorithm

M₁＝(-a₁+a₂+a₃)b₁，M₂＝a₁b₁，M₃＝a₂b₂，M₄＝0M ₁ = (-a ₁ + a ₂ + a ₃ ) b ₁ , M ₂ = a ₁ b ₁ , M ₃ = a ₂ b ₂ , M ₄ =0

M₅＝(a₂+a₃)(-b₁)，M₆＝0，M₇＝a₃(b₁-b₂)M ₅ = (a ₂ + a ₃ ) (-b ₁ ), M ₆ =0, M ₇ = a ₃ (b ₁ - b ₂ )

output₁＝M₂+M₃+M₆，output₂＝M₁+M₂+M₄-M₇ Output ₁ = M ₂ + M ₃ + M ₆ , output ₂ = M ₁ + M ₂ + M _{4 -} M ₇

去除其中的0值项，和未用到部分可改写为Remove the 0 value item, and the unused part can be rewritten as

m₁＝(-a₁+a₂+a₃)b₁，m₂＝a₁b₁，m₃＝a₂b₂，m₄＝a₃(b₁-b₂)m ₁ = (-a ₁ + a ₂ + a ₃ ) b ₁ , m ₂ = a ₁ b ₁ , m ₃ = a ₂ b ₂ , m ₄ = a ₃ (b ₁ - b ₂ )

output₁＝m₂+m₃，output₂＝m₁+m₂-m₄ Output ₁ = m ₂ + m ₃ , output ₂ = m ₁ + m _{2 -} m ₄

从而可获得卷积的变换矩阵Thus a convolutional transformation matrix can be obtained

对于高维的矩阵，可通过多次矩阵分块获得其卷积变换矩阵。winograd算法有不同的矩阵分块方式，对同一种矩阵分块方式，变换矩阵的具体数值及维度由输入神经元与权值矩阵的维度决定以及卷积滑动步长决定。For high-dimensional matrices, the convolutional transformation matrix can be obtained by multiple matrix partitioning. The winograd algorithm has different matrix blocking methods. For the same matrix blocking method, the specific values and dimensions of the transformation matrix are determined by the dimensions of the input neurons and the weight matrix and the convolution sliding step size.

从上述算法可以看出变换矩阵的具体数值及维度由输入神经元与权值矩阵的维度决定，具体影响因素包括输入神经元的维度、权值矩阵的维度和每次卷积操作的滑动步长，当这三个因素确定后，各变换矩阵的数值及维度也随之确定，由于在神经网络结构中，三个影响因素是事先设定好的，因此本实施例在线下操作以完成对于各变换矩阵的设定。It can be seen from the above algorithm that the specific value and dimension of the transformation matrix are determined by the dimensions of the input neuron and the weight matrix. The specific influencing factors include the dimension of the input neuron, the dimension of the weight matrix, and the sliding step size of each convolution operation. When these three factors are determined, the values and dimensions of the transformation matrices are also determined, because in the neural network structure, the three influencing factors are things. It is set first, so this embodiment operates offline to complete the setting for each transformation matrix.

步骤2，将变换后神经元矩阵和变换后权值矩阵进行矩阵乘法操作，得到乘法矩阵t：Step 2: performing a matrix multiplication operation on the transformed neuron matrix and the transformed weight matrix to obtain a multiplication matrix t:

t＝w⊙d。t=w⊙d.

需要说明的是，在通常的卷积过程中，参与运算的两个矩阵可能具有不同规模，故需要通过滑动操作，进行多次矩阵乘法运算，而在本发明实施例中，转换后的神经元矩阵d和权值矩阵w符合矩阵乘法规则，即只进行一次矩阵乘法运算，这样大大节省了计算量。It should be noted that, in the usual convolution process, the two matrices participating in the operation may have different scales, so multiple matrix multiplication operations need to be performed by a sliding operation, and in the embodiment of the present invention, the converted neurons are The matrix d and the weight matrix w conform to the matrix multiplication rule, that is, only one matrix multiplication operation is performed, which greatly saves the calculation amount.

另外，当两个矩阵相乘时，如果已知一个矩阵的部分元素的数值为0，其与另一矩阵相应元素相乘得到的数值必然是0。那么，在实际数据计算过程中，上述过程其实可以不用参与运算，这样可以省去不必要的计算量。所以，本发明实施例将变换后权值矩阵映射成“0”和“1”组成的稀疏序列，其中，“0”对应变换后权值矩阵中数值为“0”的元素，“1”对应变换后权值矩阵中数值不为0的元素。在执行矩阵乘法操作时，根据稀疏序列记录的“1”，提取变换后神经元矩阵中对应位置的元素，以与变换后权值矩阵中对应元素相乘。In addition, when two matrices are multiplied, if the value of a part of a matrix is known to be 0, the value obtained by multiplying the corresponding element of another matrix must be 0. Then, in the actual data calculation process, the above process can actually not participate in the operation, which can save unnecessary calculations. Therefore, in the embodiment of the present invention, the transformed weight matrix is mapped into a sparse sequence consisting of “0” and “1”, where “0” corresponds to an element whose value is “0” in the transformed weight matrix, and “1” corresponds to An element whose value is not 0 in the transformed weight matrix. When the matrix multiplication operation is performed, the elements of the corresponding positions in the transformed neuron matrix are extracted according to the "1" recorded by the sparse sequence to be multiplied by the corresponding elements in the transformed weight matrix.

例如：E.g:

其中，w对应的稀疏序列为1110111011101100(一行一行读取)，在执行矩阵乘法操作时，根据该序列，可知变换后神经元矩阵中[d₀₃，d₁₃，d₂₃，d₃₂，d₃₃]不参与运算。因此，采用稀疏序列可以进一步减少矩阵乘法运算的运算量。The sparse sequence corresponding to w is 1110111011101100 (read line by line). When performing matrix multiplication operation, according to the sequence, it can be known that the transformed neuron matrix [d ₀₃ , d ₁₃ , d ₂₃ , d ₃₂ , d ₃₃ ] Do not participate in the operation. Therefore, the use of sparse sequences can further reduce the amount of computation of matrix multiplication operations.

步骤3，将乘法矩阵进行winograd反变换，得到运算结果。In step 3, the multiplication matrix is inverse-transformed by winograd to obtain an operation result.

在本步骤中，采用下式将乘法矩阵t进行winograd反变换，得到运算结果output：In this step, the multiplication matrix t is inverse-transformed by winograd using the following formula to obtain an operation result:

output＝A^TtA，Output=A ^T tA,

其中，A为反变换矩阵，A^T为A的转置矩阵。 Where A is the inverse transformation matrix and A ^T is the transposed matrix of A.

需要说明的是，反变换矩阵A与C、G一样，是采用winograd算法得到的，其具体过程在此就不再赘述，另外，反变换矩阵A的数值也为2ⁿ，同样通过位操作实现数值间的运算。It should be noted that the inverse transformation matrix A is the same as C and G, and is obtained by using the winograd algorithm. The specific process is not repeated here. In addition, the value of the inverse transformation matrix A is also 2 ⁿ , which is also realized by bit operations. The operation between values.

图2示意性示出了本发明实施例的神经网络的卷积运算装置的结构示意图，如图2所示，装置包括：FIG. 2 is a schematic structural diagram of a convolution operation device of a neural network according to an embodiment of the present invention. As shown in FIG. 2, the device includes:

数据访问单元1，用于从外部地址空间获取神经元矩阵和权值矩阵，并提供至处理器5，还可从外部获取指令，并提供给存储器2。The data access unit 1 is configured to acquire a neuron matrix and a weight matrix from an external address space, and provide the same to the processor 5, and can also obtain an instruction from the outside and provide it to the memory 2.

存储器2，用于通过数据访问单元1读取指令，并缓存读入指令。The memory 2 is configured to read an instruction through the data access unit 1 and cache the read instruction.

控制器3，用于读取存储器2中的指令，并对读取的指令进行译码，得到控制相应模块的微指令，并将微指令发送给相应的模块。The controller 3 is configured to read an instruction in the memory 2, decode the read instruction, obtain a micro instruction that controls the corresponding module, and send the micro instruction to the corresponding module.

数据缓存单元4，用于存储数据处理所需的数据，以及运算过程中的缓存数据。The data buffer unit 4 is configured to store data required for data processing, and cache data during the operation.

处理器5，用于在所述控制器单元的控制下，执行相应的运算操作，处理器5从数据缓存单元4或者通过数据访问单元1获取数据，其运算的结果输出至数据缓存单元4或通过数据访问单元1输出。The processor 5 is configured to perform a corresponding operation operation under the control of the controller unit, and the processor 5 acquires data from the data buffer unit 4 or through the data access unit 1, and the result of the operation is output to the data buffer unit 4 or Output through the data access unit 1.

图3示意性示出了本发明实施例的处理器的结构示意图，如图3所示，处理器5包括运算控制单元51、矩阵乘法单元52及稀疏化处理单元53，其中，稀疏化处理单元53具体包括一个映射单元531，其中，矩阵乘法单元52也执行图1方法所示的操作，在此就不再赘述。FIG. 3 is a schematic structural diagram of a processor according to an embodiment of the present invention. As shown in FIG. 3, the processor 5 includes an operation control unit 51, a matrix multiplication unit 52, and a thinning processing unit 53, wherein the thinning processing unit 53 specifically includes a mapping unit 531, wherein the matrix multiplication unit 52 also performs the operations shown in the method of FIG. 1, and details are not described herein again.

稀疏化处理单元53包含映射单元531，映射单元531实现矩阵与稀疏序列的映射，以将变换后权值矩阵映射成“0”和“1”组成的稀疏序列，其中，“0”对应变换后权值矩阵中数值为“0”的元素，“1”对应变换后权值矩阵中数值不为0的元素；矩阵运算单元在执行矩阵乘法操作时，稀疏化单元根据稀疏序列记录的“1”，提取变换后神经元矩阵中对应位置的元素，以与变换后权值矩阵中对应元素相乘。The thinning processing unit 53 includes a mapping unit 531, and the mapping unit 531 implements mapping between the matrix and the sparse sequence to map the transformed weight matrix into a sparse sequence consisting of “0” and “1”, where “0” corresponds to the transformed An element whose value is "0" in the weight matrix, "1" corresponds to an element whose value is not 0 in the transformed weight matrix; when the matrix operation unit performs the matrix multiplication operation, the thinning unit records "1" according to the sparse sequence And extracting the elements of the corresponding positions in the transformed neuron matrix to be multiplied by the corresponding elements in the transformed weight matrix.

其中，矩阵乘法具体实现如下，稀疏序列值为0的部分对乘法无贡献，不参与运算，稀疏序列值为1的部分，通过映射单元531读取对应的权值数据，通过处理器531读取对应的神经元数据，完成乘法操作，稀疏序列完权值矩阵一行时，对乘法运算获得的值做加法运算。由于矩阵乘法中的大部分乘加运算互不影响，因此在本实施方式中，多个乘加运算是并行进行的。The matrix multiplication is specifically implemented as follows. The portion with the sparse sequence value of 0 does not contribute to the multiplication, does not participate in the operation, and the portion with the sparse sequence value of 1, reads the corresponding weight data through the mapping unit 531, and is read by the processor 531. Corresponding neuron data, the multiplication operation is completed, and when the sparse sequence completes the weight matrix, one row is added, and the value obtained by the multiplication operation is added. Due to moment Most of the multiply-and-accumulate operations in the matrix multiplication do not affect each other. Therefore, in the present embodiment, a plurality of multiply-and-accumulate operations are performed in parallel.

在神经网络实际应用中，稀疏序列离线完成，并且稀疏序列占用存储空间相对于稀疏带来的存储空间减少是非常小的，因此该过程并不影响神经网络的运算速度与存储空间。In the practical application of the neural network, the sparse sequence is completed offline, and the storage space occupied by the sparse sequence relative to the sparse storage space is very small, so the process does not affect the operation speed and storage space of the neural network.

图4示意性示出了卷积运算的示意图，如图4所示，卷积核是一个3×3的矩阵，卷积核在输入图像上滑动，其中，图中卷积核即为本发明的层权值矩阵，输入图像即为本发明的神经元矩阵。4 is a schematic diagram showing a convolution operation. As shown in FIG. 4, the convolution kernel is a 3×3 matrix, and the convolution kernel slides on the input image, wherein the convolution kernel in the figure is the present invention. The layer weight matrix, the input image is the neuron matrix of the present invention.

对于通常神经网络中采用的卷积操作，假设每次滑动一个像素点，则总共需要做4次卷积操作，每次卷积操作，卷积核与对应的数据图像数据做乘加操作。因此，对于同个输出特征图上的不同的输出神经元，所需要的输入神经元不同，而权值和连接关系是相同的。例如，在图3中，第一个卷积结果的计算过程为：1*1+1*0+1*1+0*0+1*1+1*0+0*1+0*0+1*1＝4，第二个卷积结果的计算过程为：1*1+0*1+1*0+1*0+1*1+1*0+0*1+1*0+1*1＝3，以此类推。For the convolution operation commonly used in neural networks, it is assumed that each time a pixel is slid, a total of four convolution operations are required, and each convolution operation, the convolution kernel and the corresponding data image data are multiplied and added. Therefore, for different output neurons on the same output feature map, the required input neurons are different, and the weights and connection relationships are the same. For example, in Figure 3, the calculation process of the first convolution result is: 1*1+1*0+1*1+0*0+1*1+1*0+0*1+0*0+ 1*1=4, the calculation process of the second convolution result is: 1*1+0*1+1*0+1*0+1*1+1*0+0*1+1*0+1 *1=3, and so on.

图5结合本发明实施例所描述的装置，示意性示出本发明实施例执行图4卷积运算的过程，如图5所示：FIG. 5 is a schematic diagram showing a process for performing the convolution operation of FIG. 4 according to an embodiment of the present invention, as shown in FIG. 5:

步骤S1，控制器3从存储器2读取一条指令。In step S1, the controller 3 reads an instruction from the memory 2.

步骤S2，控制器3译出微指令，数据访问单元1根据该微指令从外部地址空间读入执行卷积运算所需的数据，包括神经元矩阵d₀、权值矩阵w₀。然后，获取变换矩阵C、G，反变换矩阵A，在图4的示例中：In step S2, the controller 3 decodes the microinstruction, and the data access unit 1 reads the data required to perform the convolution operation from the external address space according to the microinstruction, including the neuron matrix d ₀ and the weight matrix w ₀ . Then, the transformation matrices C, G, and the inverse transform matrix A are obtained, in the example of FIG. 4:

步骤S3，处理器5从数据缓存单元4中读取神经元矩阵d₀和权值矩阵w₀，并对神经元矩阵d₀和权值矩阵w₀做winograd变换，即：In step S3, the processor 5 reads the neuron matrix d ₀ and the weight matrix w ₀ from the data buffer unit 4, and performs a winograd transformation on the neuron matrix d ₀ and the weight matrix w ₀ , namely:

步骤S4，稀疏化处理单元53通过映射单元531获得变换后权值矩阵w的稀疏序列，即[1110111011101100]。其中，稀疏序列的创建由映射单元完成，映射单元通过遍历权值矩阵，对于权值矩阵非零值用比特1标志，对于零值用比特0标志，最后得到一个比特序列作为稀疏序列，比特序列的长度与权值矩阵的数值个数一致。In step S4, the thinning processing unit 53 obtains the sparse sequence of the transformed weight matrix w by the mapping unit 531, that is, [1110111011101100]. The creation of the sparse sequence is performed by the mapping unit. The mapping unit traverses the weight matrix, and the non-zero value of the weight matrix is marked with a bit 1 and the zero value is marked with a bit 0, and finally a bit sequence is obtained as a sparse sequence, the bit sequence. The length is the same as the number of values of the weight matrix.

步骤S5，处理器5根据稀疏序列选择相应的神经元与权值做乘法操作，完成输入神经元与权值的矩阵对位相乘，其中根据索引序列，变换后神经元矩阵d中[d₀₃，d₁₃，d₂₃，d₃₂，d₃₃]不参与运算，最后得到运算结果，即：Step S5, the processor 5 selects the corresponding neuron and the weight according to the sparse sequence to perform a multiplication operation, and completes the matrix multiplication of the input neuron and the weight, wherein the transformed neuron matrix d is based on the index sequence [d ₀₃ , d ₁₃ , d ₂₃ , d ₃₂ , d ₃₃ ] do not participate in the operation, and finally get the operation result, namely:

步骤S6，处理器5对矩阵相乘的结果做winograd反变换操作，获得输出如下Step S6, the processor 5 performs a winograd inverse transform operation on the result of multiplying the matrix, and obtains the output as follows

本发明提供的计算机可读存储介质，例如可以是能够包含、存储、传送、传播或传输指令的任意介质。例如，可读存储介质可以包括但不限于电、磁、光、电磁、红外或半导体系统、装置、器件或传播介质。可读存储介质的具体示例包括：磁存储装置，如磁带或硬盘(HDD)；光存储装置，如光盘(CD-ROM)；存储器，如随机存取存储器(RAM)或闪存；和/或有线/无线通信链路。The computer readable storage medium provided by the present invention may be, for example, any medium capable of containing, storing, transmitting, transmitting or transmitting instructions. For example, a readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. Specific examples of the readable storage medium include: a magnetic storage device such as a magnetic tape or a hard disk (HDD); an optical storage device such as a compact disk (CD-ROM); a memory such as a random access memory (RAM) or a flash memory; and/or a wired /Wireless communication link.

可读存储介质包括计算机可执行指令，该指令在由处理器执行时使得处理器可以执行例如上面结合图1和图5方法流程及其任何变形。The readable storage medium includes computer executable instructions that, when executed by a processor, cause the processor to perform, for example, the method flow described above in connection with FIGS. 1 and 5, and any variations thereof.

以上所述的具体实施例，对本发明的目的、技术方案和有益效果进行了进一步详细说明，所应理解的是，以上所述仅为本发明的具体实施例而已，并不用于限制本发明，凡在本发明的精神和原则之内，所做的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。 The specific embodiments of the present invention have been described in detail, and are not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and scope of the present invention are intended to be included within the scope of the present invention.

Claims

A convolution operation method of a neural network, which is used for realizing a convolution operation of a weight matrix and a neuron in a neural network by matrix multiplication, wherein the method comprises:

S1, performing a winograd transformation on the neuron matrix and the weight matrix to obtain a transformed neuron matrix and a transformed weight matrix;

S2, performing matrix multiplication operation on the transformed neuron matrix and the transformed weight matrix to obtain a multiplication matrix;

S3, performing inverse winograd transformation on the multiplication matrix to obtain an operation result.

The convolution operation method of the neural network according to claim 1, wherein the step S1 comprises:

The neuron matrix d ₀ and the weight matrix w _{0 are} subjected to winograd transformation using the following formula to obtain a transformed neuron matrix d and a transformed weight matrix w:

d=C ^T d ₀ C,w=Gw ₀ G ^T ,

Where C is the transformation matrix of the neuron matrix d ₀ , C ^T is the transposed matrix of C, G is the transformation matrix of the weight matrix w ₀ , and G ^T is the transposed matrix of G.

The convolution operation method of the neural network according to claim 2, wherein the step S3 comprises:

The multiplication matrix t is inverse-transformed by winograd using the following formula to obtain an operation result:

Output=A ^T tA,

Where A is the inverse transformation matrix and A ^T is the transposed matrix of A.

The convolution operation method of a neural network according to claim 3, wherein the values in the neuron matrix and the weight matrix are binary, and in the transformation matrix C, G and the inverse transformation matrix A The value is 2 ⁿ and n is an integer;

Wherein, the winograd transform and the winograd inverse transform are implemented by using a bit operation.

The convolution operation method of a neural network according to claim 1, wherein the transformed weight matrix is mapped into a sparse sequence consisting of "0" and "1", wherein "0" corresponds to the transformed weight An element with a value of "0" in the value matrix, and a "1" corresponding to the transformed weight matrix An element whose value is not 0;

When performing the matrix multiplication operation, elements corresponding to positions in the transformed neuron matrix are extracted according to "1" of the sparse sequence record, and multiplied by corresponding elements in the transformed weight matrix.

A convolution operation device for a neural network, which is used for realizing a convolution operation of a weight matrix and a neuron in a neural network by matrix multiplication, wherein the device comprises:

a memory for storing instructions;

a controller for decoding the instruction;

a processor, configured to execute the decoded instruction to perform:

Performing a winograd transformation on the neuron matrix and the weight matrix to obtain a transformed neuron matrix and a transformed weight matrix;

Performing a matrix multiplication operation on the transformed neuron matrix and the transformed weight matrix to obtain a multiplication matrix;

The multiplication matrix is inverse-transformed by winograd to obtain an operation result.

The apparatus for convolution operation of a neural network according to claim 6, wherein said processor comprises a matrix operation unit for performing winograd on said neuron matrix d ₀ and weight matrix w ₀ using the following formula; Transform to obtain the transformed neuron matrix d and the transformed weight matrix w:

d=C ^T d ₀ C,w=Gw ₀ G ^T ,

The convolution operation device of the neural network according to claim 7, wherein the matrix operation unit is further configured to: inversely transform the multiplication matrix t by winograd using the following formula to obtain an operation result:

Output=A ^T tA,

The convolution operation device of a neural network according to claim 8, wherein values in said neuron matrix and weight matrix are binary, and wherein said transformation matrix C, G and inverse transformation matrix A The value is 2 ⁿ and n is an integer;

The matrix operation unit implements the winograd transform and the winograd inverse transform by using a bit operation.

The apparatus for convolution operation of a neural network according to claim 6, wherein the processor further comprises a thinning processing unit for mapping the transformed weight matrix to "0" and "1" a sparse sequence composed of “0” corresponding to an element whose value is “0” in the transformed weight matrix, and “1” corresponding to an element whose value is not 0 in the transformed weight matrix;

When the matrix operation unit performs the matrix multiplication operation, the thinning unit extracts an element of a corresponding position in the transformed neuron matrix according to the “1” of the sparse sequence record, and the transformed weight The corresponding elements in the matrix are multiplied.

The convolution operation device of a neural network according to claim 6, further comprising a data access unit for acquiring a neuron matrix and a weight matrix from an external address space and providing the same to the processor.

The convolution operation device of the neural network according to claim 6, further comprising a data buffer unit for buffering data generated by said processor.

A computer readable storage medium storing instructions, wherein the instructions are executable by a processor to cause the processor to perform the method of any of claims 1-5.