[go: up one dir, main page]

WO2018107383A1 - Neural network convolution computation method and device, and computer-readable storage medium - Google Patents

Neural network convolution computation method and device, and computer-readable storage medium Download PDF

Info

Publication number
WO2018107383A1
WO2018107383A1 PCT/CN2016/109862 CN2016109862W WO2018107383A1 WO 2018107383 A1 WO2018107383 A1 WO 2018107383A1 CN 2016109862 W CN2016109862 W CN 2016109862W WO 2018107383 A1 WO2018107383 A1 WO 2018107383A1
Authority
WO
WIPO (PCT)
Prior art keywords
matrix
transformed
neuron
neural network
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2016/109862
Other languages
French (fr)
Chinese (zh)
Inventor
陈云霁
庄毅敏
刘少礼
郭崎
陈天石
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Priority to PCT/CN2016/109862 priority Critical patent/WO2018107383A1/en
Publication of WO2018107383A1 publication Critical patent/WO2018107383A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs

Definitions

  • the present invention relates to the field of artificial neural network technologies, and in particular, to a convolution operation method and apparatus for a neural network, and a computer readable storage medium.
  • Multi-layer artificial neural networks are widely used in the fields of pattern recognition, image processing, function approximation and optimization calculation.
  • Multi-layer artificial networks have been accepted by Kirin, image processing, function approximation and optimization calculation.
  • Multi-layer artificial networks have been accepted by Kirin, image processing, function approximation and optimization calculation.
  • Multi-layer artificial networks have been accepted by Kirin, image processing, function approximation and optimization calculation.
  • Multi-layer artificial networks have been accepted by Kir in recent years due to their high recognition accuracy and good parallelism. The industry is getting more and more attention.
  • the object of the present invention is to provide a convolution operation method and device for a neural network and a computer readable storage medium, which can realize a convolution operation of a weight matrix and a neuron in a neural network by matrix multiplication, thereby reducing a volume
  • the amount of computation required for the product increases the computational speed of the neural network and greatly improves the efficiency of data processing.
  • An aspect of the present invention provides a convolution operation method for a neural network, which is used to implement a convolution operation of a weight matrix and a neuron in a neural network by matrix multiplication, and the method includes:
  • Another aspect of the present invention provides a convolution operation device for a neural network, which is used to implement a convolution operation of a weight matrix and a neuron in a neural network by matrix multiplication, and the device includes:
  • a memory for storing instructions
  • a processor configured to execute the decoded instruction to perform:
  • the inverse matrix is inverse-transformed by the multiplication matrix to obtain the operation result.
  • Another aspect of the invention provides a computer readable storage medium storing instructions executable by a processor to cause the processor to perform the methods of the present invention.
  • the invention can turn a complex convolution operation into a sparse matrix multiplication operation, and the transform and inverse transform processes can be realized by bit operations.
  • the amount of calculation required for convolution can be greatly reduced, and the operation speed of the neural network can be improved.
  • Improve the efficiency of data processing can reduce the storage space required to store network parameters and reduce the bandwidth of memory access.
  • FIG. 1 is a flow chart schematically showing a convolution operation method of a neural network according to an embodiment of the present invention.
  • FIG. 2 is a schematic block diagram showing the structure of a convolution operation device of a neural network according to an embodiment of the present invention.
  • FIG. 3 is a schematic block diagram showing the structure of a processor according to an embodiment of the present invention.
  • Fig. 4 schematically shows a schematic diagram of a convolution operation.
  • FIG. 5 is a schematic diagram showing the process of performing the convolution operation of FIG. 4 according to an embodiment of the present invention in conjunction with the apparatus described in the embodiment of the present invention.
  • the techniques of this disclosure may be implemented in the form of hardware and/or software (including firmware, microcode, etc.). Additionally, the techniques of the present disclosure may take the form of a computer program product on a computer readable medium storing instructions for use by an instruction execution system.
  • a computer readable medium can be any medium that can contain, store, communicate, propagate or transport the instructions.
  • a computer readable medium can include but not Limited to electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, devices, or propagation media.
  • the computer readable medium include: a magnetic storage device such as a magnetic tape or a hard disk (HDD); an optical storage device such as a compact disk (CD-ROM); a memory such as a random access memory (RAM) or a flash memory; and/or a wired /Wireless communication link.
  • a magnetic storage device such as a magnetic tape or a hard disk (HDD)
  • an optical storage device such as a compact disk (CD-ROM)
  • a memory such as a random access memory (RAM) or a flash memory
  • RAM random access memory
  • FIG. 1 is a flow chart schematically showing a convolution operation method of a neural network according to an embodiment of the present invention. As shown in FIG. 1, the method includes:
  • Step 1 Perform a winograd transformation on the neuron matrix and the weight matrix to obtain a transformed neuron matrix and a transformed weight matrix.
  • the neuron matrix d 0 and the weight matrix w 0 are subjected to winograd transformation using the following formula to obtain the transformed neuron matrix d and the transformed weight matrix w:
  • C is the transformation matrix of the neuron matrix d 0
  • C T is the transposed matrix of C
  • G is the transformation matrix of the weight matrix w 0
  • G T is the transposed matrix of G.
  • the values in the neuron matrix and the weight matrix are binary, and the values of the transformation matrix C, G are 2 n , such as 1, -0.5, 0, 0.5, 1, and the like.
  • the embodiment of the present invention implements a winograd transform using a bit operation, and implements operations of multiplying 2 and dividing 2 by left shift and right shift. For example, when a value in the neuron matrix d 0 is multiplied by 0.5, the value is shifted to the right by one bit. When multiplied by -0.5, the value is shifted to the left by one bit and the highest bit is inverted. Therefore, in the embodiment of the present invention, the winograd transformation is realized by the bit operation, the calculation amount is reduced, and the operation speed is improved.
  • the transformation matrices C and G of the neuron matrix d 0 and the weight matrix w 0 are obtained using the winograd algorithm.
  • the winograd algorithm uses the block multiplication of the matrix to reduce the number of multiplications of the matrix multiplication. There are many different matrix blocking methods. A winograd algorithm is shown below.
  • M 5 S 1 S 5
  • M 6 S 4 B 22
  • M 7 A 22 S 8
  • T 1 M 1 + M 2
  • T 2 T 1 + M 4
  • the transformation matrix required for convolution is obtained by the above winograd algorithm, for example, for a one-dimensional convolution [d 1 , d 2 , d 3 ]*[w 1 , w 2 ], assuming that each convolution slip is 1,
  • the convolution can be extended into a matrix multiplied form
  • M 1 (-a 1 + a 2 + a 3 ) b 1
  • M 2 a 1 b 1
  • M 3 a 2 b 2
  • M 4 0
  • m 1 (-a 1 + a 2 + a 3 ) b 1
  • m 2 a 1 b 1
  • m 3 a 2 b 2
  • m 4 a 3 (b 1 - b 2 )
  • the convolutional transformation matrix can be obtained by multiple matrix partitioning.
  • the winograd algorithm has different matrix blocking methods.
  • the specific values and dimensions of the transformation matrix are determined by the dimensions of the input neurons and the weight matrix and the convolution sliding step size.
  • the specific value and dimension of the transformation matrix are determined by the dimensions of the input neuron and the weight matrix.
  • the specific influencing factors include the dimension of the input neuron, the dimension of the weight matrix, and the sliding step size of each convolution operation.
  • the values and dimensions of the transformation matrices are also determined, because in the neural network structure, the three influencing factors are things. It is set first, so this embodiment operates offline to complete the setting for each transformation matrix.
  • Step 2 performing a matrix multiplication operation on the transformed neuron matrix and the transformed weight matrix to obtain a multiplication matrix t:
  • the two matrices participating in the operation may have different scales, so multiple matrix multiplication operations need to be performed by a sliding operation, and in the embodiment of the present invention, the converted neurons are The matrix d and the weight matrix w conform to the matrix multiplication rule, that is, only one matrix multiplication operation is performed, which greatly saves the calculation amount.
  • the transformed weight matrix is mapped into a sparse sequence consisting of “0” and “1”, where “0” corresponds to an element whose value is “0” in the transformed weight matrix, and “1” corresponds to An element whose value is not 0 in the transformed weight matrix.
  • the matrix multiplication operation is performed, the elements of the corresponding positions in the transformed neuron matrix are extracted according to the "1" recorded by the sparse sequence to be multiplied by the corresponding elements in the transformed weight matrix.
  • the sparse sequence corresponding to w is 1110111011101100 (read line by line).
  • the use of sparse sequences can further reduce the amount of computation of matrix multiplication operations.
  • step 3 the multiplication matrix is inverse-transformed by winograd to obtain an operation result.
  • the multiplication matrix t is inverse-transformed by winograd using the following formula to obtain an operation result:
  • A is the inverse transformation matrix and A T is the transposed matrix of A.
  • the inverse transformation matrix A is the same as C and G, and is obtained by using the winograd algorithm. The specific process is not repeated here.
  • the value of the inverse transformation matrix A is also 2 n , which is also realized by bit operations. The operation between values.
  • FIG. 2 is a schematic structural diagram of a convolution operation device of a neural network according to an embodiment of the present invention. As shown in FIG. 2, the device includes:
  • the data access unit 1 is configured to acquire a neuron matrix and a weight matrix from an external address space, and provide the same to the processor 5, and can also obtain an instruction from the outside and provide it to the memory 2.
  • the memory 2 is configured to read an instruction through the data access unit 1 and cache the read instruction.
  • the controller 3 is configured to read an instruction in the memory 2, decode the read instruction, obtain a micro instruction that controls the corresponding module, and send the micro instruction to the corresponding module.
  • the data buffer unit 4 is configured to store data required for data processing, and cache data during the operation.
  • the processor 5 is configured to perform a corresponding operation operation under the control of the controller unit, and the processor 5 acquires data from the data buffer unit 4 or through the data access unit 1, and the result of the operation is output to the data buffer unit 4 or Output through the data access unit 1.
  • FIG. 3 is a schematic structural diagram of a processor according to an embodiment of the present invention.
  • the processor 5 includes an operation control unit 51, a matrix multiplication unit 52, and a thinning processing unit 53, wherein the thinning processing unit 53 specifically includes a mapping unit 531, wherein the matrix multiplication unit 52 also performs the operations shown in the method of FIG. 1, and details are not described herein again.
  • the thinning processing unit 53 includes a mapping unit 531, and the mapping unit 531 implements mapping between the matrix and the sparse sequence to map the transformed weight matrix into a sparse sequence consisting of “0” and “1”, where “0” corresponds to the transformed An element whose value is “0” in the weight matrix, "1" corresponds to an element whose value is not 0 in the transformed weight matrix; when the matrix operation unit performs the matrix multiplication operation, the thinning unit records "1" according to the sparse sequence And extracting the elements of the corresponding positions in the transformed neuron matrix to be multiplied by the corresponding elements in the transformed weight matrix.
  • the matrix multiplication is specifically implemented as follows.
  • the portion with the sparse sequence value of 0 does not contribute to the multiplication, does not participate in the operation, and the portion with the sparse sequence value of 1, reads the corresponding weight data through the mapping unit 531, and is read by the processor 531.
  • the multiplication operation is completed, and when the sparse sequence completes the weight matrix, one row is added, and the value obtained by the multiplication operation is added. Due to moment Most of the multiply-and-accumulate operations in the matrix multiplication do not affect each other. Therefore, in the present embodiment, a plurality of multiply-and-accumulate operations are performed in parallel.
  • the sparse sequence is completed offline, and the storage space occupied by the sparse sequence relative to the sparse storage space is very small, so the process does not affect the operation speed and storage space of the neural network.
  • the convolution kernel is a 3 ⁇ 3 matrix, and the convolution kernel slides on the input image, wherein the convolution kernel in the figure is the present invention.
  • the layer weight matrix, the input image is the neuron matrix of the present invention.
  • FIG. 5 is a schematic diagram showing a process for performing the convolution operation of FIG. 4 according to an embodiment of the present invention, as shown in FIG. 5:
  • step S1 the controller 3 reads an instruction from the memory 2.
  • step S2 the controller 3 decodes the microinstruction, and the data access unit 1 reads the data required to perform the convolution operation from the external address space according to the microinstruction, including the neuron matrix d 0 and the weight matrix w 0 . Then, the transformation matrices C, G, and the inverse transform matrix A are obtained, in the example of FIG. 4:
  • step S3 the processor 5 reads the neuron matrix d 0 and the weight matrix w 0 from the data buffer unit 4, and performs a winograd transformation on the neuron matrix d 0 and the weight matrix w 0 , namely:
  • step S4 the thinning processing unit 53 obtains the sparse sequence of the transformed weight matrix w by the mapping unit 531, that is, [1110111011101100].
  • the creation of the sparse sequence is performed by the mapping unit.
  • the mapping unit traverses the weight matrix, and the non-zero value of the weight matrix is marked with a bit 1 and the zero value is marked with a bit 0, and finally a bit sequence is obtained as a sparse sequence, the bit sequence.
  • the length is the same as the number of values of the weight matrix.
  • Step S5 the processor 5 selects the corresponding neuron and the weight according to the sparse sequence to perform a multiplication operation, and completes the matrix multiplication of the input neuron and the weight, wherein the transformed neuron matrix d is based on the index sequence [d 03 , d 13 , d 23 , d 32 , d 33 ] do not participate in the operation, and finally get the operation result, namely:
  • Step S6 the processor 5 performs a winograd inverse transform operation on the result of multiplying the matrix, and obtains the output as follows
  • the computer readable storage medium provided by the present invention may be, for example, any medium capable of containing, storing, transmitting, transmitting or transmitting instructions.
  • a readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium.
  • Specific examples of the readable storage medium include: a magnetic storage device such as a magnetic tape or a hard disk (HDD); an optical storage device such as a compact disk (CD-ROM); a memory such as a random access memory (RAM) or a flash memory; and/or a wired /Wireless communication link.
  • the readable storage medium includes computer executable instructions that, when executed by a processor, cause the processor to perform, for example, the method flow described above in connection with FIGS. 1 and 5, and any variations thereof.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Complex Calculations (AREA)

Abstract

A neural network convolution computation method and device, used for achieving convolution computation of a weight matrix and neurons in a neural network in a matrix multiplication manner. The method comprises: first performing winograd transform on a neuron matrix and a weight matrix to obtain a transformed neuron matrix and a transformed weight matrix (step 1); then performing a matrix multiplication operation on the transformed neuron matrix and the transformed weight matrix to obtain a multiplication matrix (step 2); and finally performing winograd inverse transform on the multiplication matrix to obtain the computation result (step 3).

Description

神经网络的卷积运算方法、装置及计算机可读存储介质Neural network convolution operation method, device and computer readable storage medium 技术领域Technical field

本发明涉及人工神经网络技术领域,具体涉及一种神经网络的卷积运算方法、装置及计算机可读存储介质。The present invention relates to the field of artificial neural network technologies, and in particular, to a convolution operation method and apparatus for a neural network, and a computer readable storage medium.

背景技术Background technique

多层人工神经网络被广泛应用于模式识别、图像处理、函数逼近和优化计算等领域,多层人工网络在近年来由于其较高的识别准确度和较好的可并行性,受到学术界和工业界越来越广泛的关注。Multi-layer artificial neural networks are widely used in the fields of pattern recognition, image processing, function approximation and optimization calculation. Multi-layer artificial networks have been accepted by academia in recent years due to their high recognition accuracy and good parallelism. The industry is getting more and more attention.

为了适应越来越来高的任务需求,神经网络的规模变得越来越庞大,目前大型的卷积神经网络已经包含了上百层的网络层结构。随之带来的问题神经网络需要做更大量的运算,特别是卷积神经网络,大量的卷积运算降低了神经网络的运算速度,影响神经网络在实际应用场合的使用。In order to adapt to the increasingly high task requirements, the scale of neural networks has become more and more large. At present, large convolutional neural networks already contain hundreds of layers of network layers. The resulting problems require a lot of operations on neural networks, especially convolutional neural networks. A large number of convolution operations reduce the computational speed of neural networks and affect the use of neural networks in practical applications.

发明内容Summary of the invention

(一)要解决的技术问题(1) Technical problems to be solved

本发明的目的在于,提供一种神经网络的卷积运算方法、装置及计算机可读存储介质,可以通过矩阵乘法的方式实现神经网络中的权值矩阵与神经元的卷积运算,从而减少卷积所需的运算量,提高神经网络的运算速度,大幅提高数据处理的效率。The object of the present invention is to provide a convolution operation method and device for a neural network and a computer readable storage medium, which can realize a convolution operation of a weight matrix and a neuron in a neural network by matrix multiplication, thereby reducing a volume The amount of computation required for the product increases the computational speed of the neural network and greatly improves the efficiency of data processing.

(二)技术方案(2) Technical plan

本发明一方面提供一种神经网络的卷积运算方法,用于以矩阵乘法的方式实现神经网络中的权值矩阵与神经元的卷积运算,方法包括:An aspect of the present invention provides a convolution operation method for a neural network, which is used to implement a convolution operation of a weight matrix and a neuron in a neural network by matrix multiplication, and the method includes:

S1,对神经元矩阵和权值矩阵进行winograd变换,得到变换后神经 元矩阵和变换后权值矩阵;S1, performing a winograd transformation on the neuron matrix and the weight matrix to obtain the transformed nerve a metamatrix and a transformed weight matrix;

S2,将变换后神经元矩阵和变换后权值矩阵进行矩阵乘法操作,得到乘法矩阵;S2, performing a matrix multiplication operation on the transformed neuron matrix and the transformed weight matrix to obtain a multiplication matrix;

S3,将乘法矩阵进行winograd反变换,得到运算结果。S3, the multiplication matrix is inverse-transformed by winograd, and the operation result is obtained.

本发明另一方面提供一种神经网络的卷积运算装置,用于以矩阵乘法的方式实现神经网络中的权值矩阵与神经元的卷积运算,装置包括:Another aspect of the present invention provides a convolution operation device for a neural network, which is used to implement a convolution operation of a weight matrix and a neuron in a neural network by matrix multiplication, and the device includes:

存储器,用于存储指令;a memory for storing instructions;

控制器,用于对指令进行译码;a controller for decoding an instruction;

处理器,用于执行译码后的指令,以执行:a processor, configured to execute the decoded instruction to perform:

对神经元矩阵和权值矩阵进行winograd变换,得到变换后神经元矩阵和变换后权值矩阵;Performing a winograd transformation on the neuron matrix and the weight matrix to obtain a transformed neuron matrix and a transformed weight matrix;

将变换后神经元矩阵和变换后权值矩阵进行矩阵乘法操作,得到乘法矩阵;Performing a matrix multiplication operation on the transformed neuron matrix and the transformed weight matrix to obtain a multiplication matrix;

将乘法矩阵进行winograd反变换,得到运算结果。The inverse matrix is inverse-transformed by the multiplication matrix to obtain the operation result.

本发明另一方面提供一种计算机可读存储介质,其存储有指令,该指令可被处理器执行,以使得该处理器执行本发明的方法。Another aspect of the invention provides a computer readable storage medium storing instructions executable by a processor to cause the processor to perform the methods of the present invention.

(三)有益效果(3) Beneficial effects

本发明能够将复杂的卷积操作变为稀疏矩阵乘法操作,并且变换与反变换过程可用位操作实现,通过这种方法可以大量减少卷积所需的运算量,提高神经网络的运算速度,大幅提高数据处理的效率,同时,采用稀疏序列可以减少存储网络参数所需的存储空间,降低内存访问的带宽。The invention can turn a complex convolution operation into a sparse matrix multiplication operation, and the transform and inverse transform processes can be realized by bit operations. By this method, the amount of calculation required for convolution can be greatly reduced, and the operation speed of the neural network can be improved. Improve the efficiency of data processing. At the same time, using sparse sequences can reduce the storage space required to store network parameters and reduce the bandwidth of memory access.

附图说明DRAWINGS

图1示意性示出了本发明实施例的神经网络的卷积运算方法的流程图。FIG. 1 is a flow chart schematically showing a convolution operation method of a neural network according to an embodiment of the present invention.

图2示意性示出了本发明实施例的神经网络的卷积运算装置的结构示意图。 FIG. 2 is a schematic block diagram showing the structure of a convolution operation device of a neural network according to an embodiment of the present invention.

图3示意性示出了本发明实施例的处理器的结构示意图。FIG. 3 is a schematic block diagram showing the structure of a processor according to an embodiment of the present invention.

图4示意性示出了卷积运算的示意图。Fig. 4 schematically shows a schematic diagram of a convolution operation.

图5结合本发明实施例所描述的装置,示意性示出本发明实施例执行图4卷积运算的过程。FIG. 5 is a schematic diagram showing the process of performing the convolution operation of FIG. 4 according to an embodiment of the present invention in conjunction with the apparatus described in the embodiment of the present invention.

具体实施方式detailed description

根据结合附图对本发明示例性实施例的以下详细描述,本发明的其它方面、优势和突出特征对于本领域技术人员将变得显而易见。Other aspects, advantages, and salient features of the present invention will become apparent to those skilled in the <

在本发明中,术语“包括”和“含有”及其派生词意为包括而非限制;术语“或”是包含性的,意为和/或。In the present invention, the terms "include" and "including" and their derivatives are intended to be inclusive and not limiting; the term "or" is inclusive, meaning and/or.

在本说明书中,下述用于描述本发明原理的各种实施例知识说明,不应该以任何方式解释为限制发明的范围。参照附图的下述描述用于帮助全面理解由权利要求及其等同物限定的本发明的示例性实施例。下述描述包括多种具体细节来帮助理解,但这些细节应认为仅仅是示例性的。因此,本领域普通技术人员应认识到,在不背离本发明的范围和精神的情况下,可以对本文中描述的实施例进行多种改变和修改。此外,为了清楚和简洁起见,省略了公知功能和结构的描述。此外,贯穿附图,相同参考数字用于相思功能和操作。In the present specification, the following description of various embodiments for describing the principles of the present invention should not be construed as limiting the scope of the invention in any way. The following description of the invention is intended to be understood as The description below includes numerous specific details to assist the understanding, but these details should be considered as merely exemplary. Accordingly, it will be apparent to those skilled in the art that various changes and modifications may be made to the embodiments described herein without departing from the scope and spirit of the invention. In addition, descriptions of well-known functions and constructions are omitted for clarity and conciseness. Moreover, throughout the figures, the same reference numerals are used for the acacia function and operation.

附图中示出了一些方框图和/或流程图。应理解,方框图和/或流程图中的一些方框或其组合可以由计算机程序指令来实现。这些计算机程序指令可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器,从而这些指令在由该处理器执行时可以创建用于实现这些方框图和/或流程图中所说明的功能/操作的装置。Some block diagrams and/or flowcharts are shown in the drawings. It will be understood that some blocks or combinations of the block diagrams and/or flowcharts can be implemented by computer program instructions. These computer program instructions may be provided to a general purpose computer, a special purpose computer or a processor of other programmable data processing apparatus such that when executed by the processor, the instructions may be used to implement the functions illustrated in the block diagrams and/or flowcharts. / operating device.

因此,本公开的技术可以硬件和/或软件(包括固件、微代码等)的形式来实现。另外,本公开的技术可以采取存储有指令的计算机可读介质上的计算机程序产品的形式,该计算机程序产品可供指令执行系统使用。在本公开的上下文中,计算机可读介质可以是能够包含、存储、传送、传播或传输指令的任意介质。例如,计算机可读介质可以包括但不 限于电、磁、光、电磁、红外或半导体系统、装置、器件或传播介质。计算机可读介质的具体示例包括:磁存储装置,如磁带或硬盘(HDD);光存储装置,如光盘(CD-ROM);存储器,如随机存取存储器(RAM)或闪存;和/或有线/无线通信链路。Thus, the techniques of this disclosure may be implemented in the form of hardware and/or software (including firmware, microcode, etc.). Additionally, the techniques of the present disclosure may take the form of a computer program product on a computer readable medium storing instructions for use by an instruction execution system. In the context of the present disclosure, a computer readable medium can be any medium that can contain, store, communicate, propagate or transport the instructions. For example, a computer readable medium can include but not Limited to electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, devices, or propagation media. Specific examples of the computer readable medium include: a magnetic storage device such as a magnetic tape or a hard disk (HDD); an optical storage device such as a compact disk (CD-ROM); a memory such as a random access memory (RAM) or a flash memory; and/or a wired /Wireless communication link.

图1示意性示出了本发明实施例的神经网络的卷积运算方法的流程图,如图1所示,方法包括:FIG. 1 is a flow chart schematically showing a convolution operation method of a neural network according to an embodiment of the present invention. As shown in FIG. 1, the method includes:

步骤1,对神经元矩阵和权值矩阵进行winograd变换,得到变换后神经元矩阵和变换后权值矩阵。Step 1. Perform a winograd transformation on the neuron matrix and the weight matrix to obtain a transformed neuron matrix and a transformed weight matrix.

在本步骤中,采用下式对神经元矩阵d0和权值矩阵w0进行winograd变换,得到变换后神经元矩阵d和变换后权值矩阵w:In this step, the neuron matrix d 0 and the weight matrix w 0 are subjected to winograd transformation using the following formula to obtain the transformed neuron matrix d and the transformed weight matrix w:

d=CTd0C,w=Gw0GTd=C T d 0 C,w=Gw 0 G T ,

其中,C为神经元矩阵d0的变换矩阵,CT为C的转置矩阵,G为权值矩阵w0的变换矩阵,GT为G的转置矩阵。Where C is the transformation matrix of the neuron matrix d 0 , C T is the transposed matrix of C, G is the transformation matrix of the weight matrix w 0 , and G T is the transposed matrix of G.

另外,神经元矩阵和权值矩阵中的数值为二进制,并且,变换矩阵C、G数值为2n,例如1,-0.5,0,0.5,1等。这样,本发明实施例采用位操作实现winograd变换,通过左移和右移实现乘2与除2的操作。例如,神经元矩阵d0中的一个数值与0.5相乘时,即将该数值向右移一位,与-0.5相乘时,即将该数值向左移一位并将最高位取反。因此,本发明实施例通过位操作来实现winograd变换,减少了运算量,提高了运算速度。In addition, the values in the neuron matrix and the weight matrix are binary, and the values of the transformation matrix C, G are 2 n , such as 1, -0.5, 0, 0.5, 1, and the like. Thus, the embodiment of the present invention implements a winograd transform using a bit operation, and implements operations of multiplying 2 and dividing 2 by left shift and right shift. For example, when a value in the neuron matrix d 0 is multiplied by 0.5, the value is shifted to the right by one bit. When multiplied by -0.5, the value is shifted to the left by one bit and the highest bit is inverted. Therefore, in the embodiment of the present invention, the winograd transformation is realized by the bit operation, the calculation amount is reduced, and the operation speed is improved.

神经元矩阵d0和权值矩阵w0的变换矩阵C和G是采用winograd算法得到的。The transformation matrices C and G of the neuron matrix d 0 and the weight matrix w 0 are obtained using the winograd algorithm.

winograd算法利用矩阵的分块相乘以减小矩阵乘法的乘法次数,有多种不同的矩阵分块方法,一种winograd算法如下所示The winograd algorithm uses the block multiplication of the matrix to reduce the number of multiplications of the matrix multiplication. There are many different matrix blocking methods. A winograd algorithm is shown below.

计算矩阵乘法C=AB,对各矩阵进行分块,有Calculate matrix multiplication C=AB, block each matrix, there is

Figure PCTCN2016109862-appb-000001
Figure PCTCN2016109862-appb-000001

Remember

S1=A21+A22,S2=S1-A11,S3=A11-A21,S4=A12-S2 S 1 = A 21 + A 22 , S 2 = S 1 - A 11 , S 3 = A 11 - A 21 , S 4 = A 12 - S 2

S5=B12-B11,S6=B22-S5,S7=B22-B12,S8=S6-B21 S 5 =B 12 -B 11 ,S 6 =B 22 -S 5 ,S 7 =B 22 -B 12 ,S 8 =S 6 -B 21

M1=S2S6,M2=A11B11,M3=A12B21,M4=S3S7 M 1 =S 2 S 6 , M 2 =A 11 B 11 , M 3 =A 12 B 21 , M 4 =S 3 S 7

M5=S1S5,M6=S4B22,M7=A22S8 M 5 =S 1 S 5 , M 6 =S 4 B 22 , M 7 =A 22 S 8

T1=M1+M2,T2=T1+M4 T 1 = M 1 + M 2 , T 2 = T 1 + M 4

then

C11=M2+M3+M6,C12=T1+M5 C 11 =M 2 +M 3 +M 6 , C 12 =T 1 +M 5

C21=T2-M7,C22=T2+M5 C 21 =T 2 -M 7 , C 22 =T 2 +M 5

通过上述的winograd算法,获得卷积所需的变换矩阵,例如,对于一维卷积[d1,d2,d3]*[w1,w2],假设每次卷积滑动为1,可将卷积扩展成矩阵相乘的形式The transformation matrix required for convolution is obtained by the above winograd algorithm, for example, for a one-dimensional convolution [d 1 , d 2 , d 3 ]*[w 1 , w 2 ], assuming that each convolution slip is 1, The convolution can be extended into a matrix multiplied form

Figure PCTCN2016109862-appb-000002
Figure PCTCN2016109862-appb-000002

通过winograd算法可获得Available through the winograd algorithm

M1=(-a1+a2+a3)b1,M2=a1b1,M3=a2b2,M4=0M 1 = (-a 1 + a 2 + a 3 ) b 1 , M 2 = a 1 b 1 , M 3 = a 2 b 2 , M 4 =0

M5=(a2+a3)(-b1),M6=0,M7=a3(b1-b2)M 5 = (a 2 + a 3 ) (-b 1 ), M 6 =0, M 7 = a 3 (b 1 - b 2 )

output1=M2+M3+M6,output2=M1+M2+M4-M7 Output 1 = M 2 + M 3 + M 6 , output 2 = M 1 + M 2 + M 4 - M 7

去除其中的0值项,和未用到部分可改写为Remove the 0 value item, and the unused part can be rewritten as

m1=(-a1+a2+a3)b1,m2=a1b1,m3=a2b2,m4=a3(b1-b2)m 1 = (-a 1 + a 2 + a 3 ) b 1 , m 2 = a 1 b 1 , m 3 = a 2 b 2 , m 4 = a 3 (b 1 - b 2 )

output1=m2+m3,output2=m1+m2-m4 Output 1 = m 2 + m 3 , output 2 = m 1 + m 2 - m 4

从而可获得卷积的变换矩阵Thus a convolutional transformation matrix can be obtained

Figure PCTCN2016109862-appb-000003
Figure PCTCN2016109862-appb-000003

对于高维的矩阵,可通过多次矩阵分块获得其卷积变换矩阵。winograd算法有不同的矩阵分块方式,对同一种矩阵分块方式,变换矩阵的具体数值及维度由输入神经元与权值矩阵的维度决定以及卷积滑动步长决定。For high-dimensional matrices, the convolutional transformation matrix can be obtained by multiple matrix partitioning. The winograd algorithm has different matrix blocking methods. For the same matrix blocking method, the specific values and dimensions of the transformation matrix are determined by the dimensions of the input neurons and the weight matrix and the convolution sliding step size.

从上述算法可以看出变换矩阵的具体数值及维度由输入神经元与权值矩阵的维度决定,具体影响因素包括输入神经元的维度、权值矩阵的维度和每次卷积操作的滑动步长,当这三个因素确定后,各变换矩阵的数值及维度也随之确定,由于在神经网络结构中,三个影响因素是事 先设定好的,因此本实施例在线下操作以完成对于各变换矩阵的设定。It can be seen from the above algorithm that the specific value and dimension of the transformation matrix are determined by the dimensions of the input neuron and the weight matrix. The specific influencing factors include the dimension of the input neuron, the dimension of the weight matrix, and the sliding step size of each convolution operation. When these three factors are determined, the values and dimensions of the transformation matrices are also determined, because in the neural network structure, the three influencing factors are things. It is set first, so this embodiment operates offline to complete the setting for each transformation matrix.

步骤2,将变换后神经元矩阵和变换后权值矩阵进行矩阵乘法操作,得到乘法矩阵t:Step 2: performing a matrix multiplication operation on the transformed neuron matrix and the transformed weight matrix to obtain a multiplication matrix t:

t=w⊙d。t=w⊙d.

需要说明的是,在通常的卷积过程中,参与运算的两个矩阵可能具有不同规模,故需要通过滑动操作,进行多次矩阵乘法运算,而在本发明实施例中,转换后的神经元矩阵d和权值矩阵w符合矩阵乘法规则,即只进行一次矩阵乘法运算,这样大大节省了计算量。It should be noted that, in the usual convolution process, the two matrices participating in the operation may have different scales, so multiple matrix multiplication operations need to be performed by a sliding operation, and in the embodiment of the present invention, the converted neurons are The matrix d and the weight matrix w conform to the matrix multiplication rule, that is, only one matrix multiplication operation is performed, which greatly saves the calculation amount.

另外,当两个矩阵相乘时,如果已知一个矩阵的部分元素的数值为0,其与另一矩阵相应元素相乘得到的数值必然是0。那么,在实际数据计算过程中,上述过程其实可以不用参与运算,这样可以省去不必要的计算量。所以,本发明实施例将变换后权值矩阵映射成“0”和“1”组成的稀疏序列,其中,“0”对应变换后权值矩阵中数值为“0”的元素,“1”对应变换后权值矩阵中数值不为0的元素。在执行矩阵乘法操作时,根据稀疏序列记录的“1”,提取变换后神经元矩阵中对应位置的元素,以与变换后权值矩阵中对应元素相乘。In addition, when two matrices are multiplied, if the value of a part of a matrix is known to be 0, the value obtained by multiplying the corresponding element of another matrix must be 0. Then, in the actual data calculation process, the above process can actually not participate in the operation, which can save unnecessary calculations. Therefore, in the embodiment of the present invention, the transformed weight matrix is mapped into a sparse sequence consisting of “0” and “1”, where “0” corresponds to an element whose value is “0” in the transformed weight matrix, and “1” corresponds to An element whose value is not 0 in the transformed weight matrix. When the matrix multiplication operation is performed, the elements of the corresponding positions in the transformed neuron matrix are extracted according to the "1" recorded by the sparse sequence to be multiplied by the corresponding elements in the transformed weight matrix.

例如:E.g:

Figure PCTCN2016109862-appb-000004
Figure PCTCN2016109862-appb-000004

其中,w对应的稀疏序列为1110111011101100(一行一行读取),在执行矩阵乘法操作时,根据该序列,可知变换后神经元矩阵中[d03,d13,d23,d32,d33]不参与运算。因此,采用稀疏序列可以进一步减少矩阵乘法运算的运算量。The sparse sequence corresponding to w is 1110111011101100 (read line by line). When performing matrix multiplication operation, according to the sequence, it can be known that the transformed neuron matrix [d 03 , d 13 , d 23 , d 32 , d 33 ] Do not participate in the operation. Therefore, the use of sparse sequences can further reduce the amount of computation of matrix multiplication operations.

步骤3,将乘法矩阵进行winograd反变换,得到运算结果。In step 3, the multiplication matrix is inverse-transformed by winograd to obtain an operation result.

在本步骤中,采用下式将乘法矩阵t进行winograd反变换,得到运算结果output:In this step, the multiplication matrix t is inverse-transformed by winograd using the following formula to obtain an operation result:

output=ATtA,Output=A T tA,

其中,A为反变换矩阵,AT为A的转置矩阵。 Where A is the inverse transformation matrix and A T is the transposed matrix of A.

需要说明的是,反变换矩阵A与C、G一样,是采用winograd算法得到的,其具体过程在此就不再赘述,另外,反变换矩阵A的数值也为2n,同样通过位操作实现数值间的运算。It should be noted that the inverse transformation matrix A is the same as C and G, and is obtained by using the winograd algorithm. The specific process is not repeated here. In addition, the value of the inverse transformation matrix A is also 2 n , which is also realized by bit operations. The operation between values.

图2示意性示出了本发明实施例的神经网络的卷积运算装置的结构示意图,如图2所示,装置包括:FIG. 2 is a schematic structural diagram of a convolution operation device of a neural network according to an embodiment of the present invention. As shown in FIG. 2, the device includes:

数据访问单元1,用于从外部地址空间获取神经元矩阵和权值矩阵,并提供至处理器5,还可从外部获取指令,并提供给存储器2。The data access unit 1 is configured to acquire a neuron matrix and a weight matrix from an external address space, and provide the same to the processor 5, and can also obtain an instruction from the outside and provide it to the memory 2.

存储器2,用于通过数据访问单元1读取指令,并缓存读入指令。The memory 2 is configured to read an instruction through the data access unit 1 and cache the read instruction.

控制器3,用于读取存储器2中的指令,并对读取的指令进行译码,得到控制相应模块的微指令,并将微指令发送给相应的模块。The controller 3 is configured to read an instruction in the memory 2, decode the read instruction, obtain a micro instruction that controls the corresponding module, and send the micro instruction to the corresponding module.

数据缓存单元4,用于存储数据处理所需的数据,以及运算过程中的缓存数据。The data buffer unit 4 is configured to store data required for data processing, and cache data during the operation.

处理器5,用于在所述控制器单元的控制下,执行相应的运算操作,处理器5从数据缓存单元4或者通过数据访问单元1获取数据,其运算的结果输出至数据缓存单元4或通过数据访问单元1输出。The processor 5 is configured to perform a corresponding operation operation under the control of the controller unit, and the processor 5 acquires data from the data buffer unit 4 or through the data access unit 1, and the result of the operation is output to the data buffer unit 4 or Output through the data access unit 1.

图3示意性示出了本发明实施例的处理器的结构示意图,如图3所示,处理器5包括运算控制单元51、矩阵乘法单元52及稀疏化处理单元53,其中,稀疏化处理单元53具体包括一个映射单元531,其中,矩阵乘法单元52也执行图1方法所示的操作,在此就不再赘述。FIG. 3 is a schematic structural diagram of a processor according to an embodiment of the present invention. As shown in FIG. 3, the processor 5 includes an operation control unit 51, a matrix multiplication unit 52, and a thinning processing unit 53, wherein the thinning processing unit 53 specifically includes a mapping unit 531, wherein the matrix multiplication unit 52 also performs the operations shown in the method of FIG. 1, and details are not described herein again.

稀疏化处理单元53包含映射单元531,映射单元531实现矩阵与稀疏序列的映射,以将变换后权值矩阵映射成“0”和“1”组成的稀疏序列,其中,“0”对应变换后权值矩阵中数值为“0”的元素,“1”对应变换后权值矩阵中数值不为0的元素;矩阵运算单元在执行矩阵乘法操作时,稀疏化单元根据稀疏序列记录的“1”,提取变换后神经元矩阵中对应位置的元素,以与变换后权值矩阵中对应元素相乘。The thinning processing unit 53 includes a mapping unit 531, and the mapping unit 531 implements mapping between the matrix and the sparse sequence to map the transformed weight matrix into a sparse sequence consisting of “0” and “1”, where “0” corresponds to the transformed An element whose value is "0" in the weight matrix, "1" corresponds to an element whose value is not 0 in the transformed weight matrix; when the matrix operation unit performs the matrix multiplication operation, the thinning unit records "1" according to the sparse sequence And extracting the elements of the corresponding positions in the transformed neuron matrix to be multiplied by the corresponding elements in the transformed weight matrix.

其中,矩阵乘法具体实现如下,稀疏序列值为0的部分对乘法无贡献,不参与运算,稀疏序列值为1的部分,通过映射单元531读取对应的权值数据,通过处理器531读取对应的神经元数据,完成乘法操作,稀疏序列完权值矩阵一行时,对乘法运算获得的值做加法运算。由于矩 阵乘法中的大部分乘加运算互不影响,因此在本实施方式中,多个乘加运算是并行进行的。The matrix multiplication is specifically implemented as follows. The portion with the sparse sequence value of 0 does not contribute to the multiplication, does not participate in the operation, and the portion with the sparse sequence value of 1, reads the corresponding weight data through the mapping unit 531, and is read by the processor 531. Corresponding neuron data, the multiplication operation is completed, and when the sparse sequence completes the weight matrix, one row is added, and the value obtained by the multiplication operation is added. Due to moment Most of the multiply-and-accumulate operations in the matrix multiplication do not affect each other. Therefore, in the present embodiment, a plurality of multiply-and-accumulate operations are performed in parallel.

在神经网络实际应用中,稀疏序列离线完成,并且稀疏序列占用存储空间相对于稀疏带来的存储空间减少是非常小的,因此该过程并不影响神经网络的运算速度与存储空间。In the practical application of the neural network, the sparse sequence is completed offline, and the storage space occupied by the sparse sequence relative to the sparse storage space is very small, so the process does not affect the operation speed and storage space of the neural network.

图4示意性示出了卷积运算的示意图,如图4所示,卷积核是一个3×3的矩阵,卷积核在输入图像上滑动,其中,图中卷积核即为本发明的层权值矩阵,输入图像即为本发明的神经元矩阵。4 is a schematic diagram showing a convolution operation. As shown in FIG. 4, the convolution kernel is a 3×3 matrix, and the convolution kernel slides on the input image, wherein the convolution kernel in the figure is the present invention. The layer weight matrix, the input image is the neuron matrix of the present invention.

对于通常神经网络中采用的卷积操作,假设每次滑动一个像素点,则总共需要做4次卷积操作,每次卷积操作,卷积核与对应的数据图像数据做乘加操作。因此,对于同个输出特征图上的不同的输出神经元,所需要的输入神经元不同,而权值和连接关系是相同的。例如,在图3中,第一个卷积结果的计算过程为:1*1+1*0+1*1+0*0+1*1+1*0+0*1+0*0+1*1=4,第二个卷积结果的计算过程为:1*1+0*1+1*0+1*0+1*1+1*0+0*1+1*0+1*1=3,以此类推。For the convolution operation commonly used in neural networks, it is assumed that each time a pixel is slid, a total of four convolution operations are required, and each convolution operation, the convolution kernel and the corresponding data image data are multiplied and added. Therefore, for different output neurons on the same output feature map, the required input neurons are different, and the weights and connection relationships are the same. For example, in Figure 3, the calculation process of the first convolution result is: 1*1+1*0+1*1+0*0+1*1+1*0+0*1+0*0+ 1*1=4, the calculation process of the second convolution result is: 1*1+0*1+1*0+1*0+1*1+1*0+0*1+1*0+1 *1=3, and so on.

图5结合本发明实施例所描述的装置,示意性示出本发明实施例执行图4卷积运算的过程,如图5所示:FIG. 5 is a schematic diagram showing a process for performing the convolution operation of FIG. 4 according to an embodiment of the present invention, as shown in FIG. 5:

步骤S1,控制器3从存储器2读取一条指令。In step S1, the controller 3 reads an instruction from the memory 2.

步骤S2,控制器3译出微指令,数据访问单元1根据该微指令从外部地址空间读入执行卷积运算所需的数据,包括神经元矩阵d0、权值矩阵w0。然后,获取变换矩阵C、G,反变换矩阵A,在图4的示例中:In step S2, the controller 3 decodes the microinstruction, and the data access unit 1 reads the data required to perform the convolution operation from the external address space according to the microinstruction, including the neuron matrix d 0 and the weight matrix w 0 . Then, the transformation matrices C, G, and the inverse transform matrix A are obtained, in the example of FIG. 4:

Figure PCTCN2016109862-appb-000005
Figure PCTCN2016109862-appb-000005

步骤S3,处理器5从数据缓存单元4中读取神经元矩阵d0和权值矩阵w0,并对神经元矩阵d0和权值矩阵w0做winograd变换,即:In step S3, the processor 5 reads the neuron matrix d 0 and the weight matrix w 0 from the data buffer unit 4, and performs a winograd transformation on the neuron matrix d 0 and the weight matrix w 0 , namely:

Figure PCTCN2016109862-appb-000006
Figure PCTCN2016109862-appb-000006

步骤S4,稀疏化处理单元53通过映射单元531获得变换后权值矩阵w的稀疏序列,即[1110111011101100]。其中,稀疏序列的创建由映射单元完成,映射单元通过遍历权值矩阵,对于权值矩阵非零值用比特1标志,对于零值用比特0标志,最后得到一个比特序列作为稀疏序列,比特序列的长度与权值矩阵的数值个数一致。In step S4, the thinning processing unit 53 obtains the sparse sequence of the transformed weight matrix w by the mapping unit 531, that is, [1110111011101100]. The creation of the sparse sequence is performed by the mapping unit. The mapping unit traverses the weight matrix, and the non-zero value of the weight matrix is marked with a bit 1 and the zero value is marked with a bit 0, and finally a bit sequence is obtained as a sparse sequence, the bit sequence. The length is the same as the number of values of the weight matrix.

步骤S5,处理器5根据稀疏序列选择相应的神经元与权值做乘法操作,完成输入神经元与权值的矩阵对位相乘,其中根据索引序列,变换后神经元矩阵d中[d03,d13,d23,d32,d33]不参与运算,最后得到运算结果,即:Step S5, the processor 5 selects the corresponding neuron and the weight according to the sparse sequence to perform a multiplication operation, and completes the matrix multiplication of the input neuron and the weight, wherein the transformed neuron matrix d is based on the index sequence [d 03 , d 13 , d 23 , d 32 , d 33 ] do not participate in the operation, and finally get the operation result, namely:

Figure PCTCN2016109862-appb-000007
Figure PCTCN2016109862-appb-000007

步骤S6,处理器5对矩阵相乘的结果做winograd反变换操作,获得输出如下Step S6, the processor 5 performs a winograd inverse transform operation on the result of multiplying the matrix, and obtains the output as follows

Figure PCTCN2016109862-appb-000008
Figure PCTCN2016109862-appb-000008

本发明提供的计算机可读存储介质,例如可以是能够包含、存储、传送、传播或传输指令的任意介质。例如,可读存储介质可以包括但不限于电、磁、光、电磁、红外或半导体系统、装置、器件或传播介质。可读存储介质的具体示例包括:磁存储装置,如磁带或硬盘(HDD);光存储装置,如光盘(CD-ROM);存储器,如随机存取存储器(RAM)或闪存;和/或有线/无线通信链路。The computer readable storage medium provided by the present invention may be, for example, any medium capable of containing, storing, transmitting, transmitting or transmitting instructions. For example, a readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. Specific examples of the readable storage medium include: a magnetic storage device such as a magnetic tape or a hard disk (HDD); an optical storage device such as a compact disk (CD-ROM); a memory such as a random access memory (RAM) or a flash memory; and/or a wired /Wireless communication link.

可读存储介质包括计算机可执行指令,该指令在由处理器执行时使得处理器可以执行例如上面结合图1和图5方法流程及其任何变形。The readable storage medium includes computer executable instructions that, when executed by a processor, cause the processor to perform, for example, the method flow described above in connection with FIGS. 1 and 5, and any variations thereof.

以上所述的具体实施例,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施例而已,并不用于限制本发明,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。 The specific embodiments of the present invention have been described in detail, and are not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and scope of the present invention are intended to be included within the scope of the present invention.

Claims (13)

一种神经网络的卷积运算方法,用于以矩阵乘法的方式实现神经网络中的权值矩阵与神经元的卷积运算,其特征在于,方法包括:A convolution operation method of a neural network, which is used for realizing a convolution operation of a weight matrix and a neuron in a neural network by matrix multiplication, wherein the method comprises: S1,对所述神经元矩阵和权值矩阵进行winograd变换,得到变换后神经元矩阵和变换后权值矩阵;S1, performing a winograd transformation on the neuron matrix and the weight matrix to obtain a transformed neuron matrix and a transformed weight matrix; S2,将所述变换后神经元矩阵和变换后权值矩阵进行矩阵乘法操作,得到乘法矩阵;S2, performing matrix multiplication operation on the transformed neuron matrix and the transformed weight matrix to obtain a multiplication matrix; S3,将所述乘法矩阵进行winograd反变换,得到运算结果。S3, performing inverse winograd transformation on the multiplication matrix to obtain an operation result. 根据权利要求1所述的神经网络的卷积运算方法,其特征在于,所述步骤S1包括:The convolution operation method of the neural network according to claim 1, wherein the step S1 comprises: 采用下式对所述神经元矩阵d0和权值矩阵w0进行winograd变换,得到变换后神经元矩阵d和变换后权值矩阵w:The neuron matrix d 0 and the weight matrix w 0 are subjected to winograd transformation using the following formula to obtain a transformed neuron matrix d and a transformed weight matrix w: d=CTd0C,w=Gw0GTd=C T d 0 C,w=Gw 0 G T , 其中,C为神经元矩阵d0的变换矩阵,CT为C的转置矩阵,G为权值矩阵w0的变换矩阵,GT为G的转置矩阵。Where C is the transformation matrix of the neuron matrix d 0 , C T is the transposed matrix of C, G is the transformation matrix of the weight matrix w 0 , and G T is the transposed matrix of G. 根据权利要求2所述的神经网络的卷积运算方法,其特征在于,所述步骤S3包括:The convolution operation method of the neural network according to claim 2, wherein the step S3 comprises: 采用下式将所述乘法矩阵t进行winograd反变换,得到运算结果output:The multiplication matrix t is inverse-transformed by winograd using the following formula to obtain an operation result: output=ATtA,Output=A T tA, 其中,A为反变换矩阵,AT为A的转置矩阵。Where A is the inverse transformation matrix and A T is the transposed matrix of A. 根据权利要求3所述的神经网络的卷积运算方法,其特征在于,所述神经元矩阵和权值矩阵中的数值为二进制,并且,所述变换矩阵C、G及反变换矩阵A中的数值为2n,n为整数;The convolution operation method of a neural network according to claim 3, wherein the values in the neuron matrix and the weight matrix are binary, and in the transformation matrix C, G and the inverse transformation matrix A The value is 2 n and n is an integer; 其中,采用位操作实现所述winograd变换及winograd反变换。Wherein, the winograd transform and the winograd inverse transform are implemented by using a bit operation. 根据权利要求1所述的神经网络的卷积运算方法,其特征在于,将所述变换后权值矩阵映射成“0”和“1”组成的稀疏序列,其中,“0”对应变换后权值矩阵中数值为“0”的元素,“1”对应变换后权值矩阵 中数值不为0的元素;The convolution operation method of a neural network according to claim 1, wherein the transformed weight matrix is mapped into a sparse sequence consisting of "0" and "1", wherein "0" corresponds to the transformed weight An element with a value of "0" in the value matrix, and a "1" corresponding to the transformed weight matrix An element whose value is not 0; 在执行所述矩阵乘法操作时,根据所述稀疏序列记录的“1”,提取变换后神经元矩阵中对应位置的元素,以与所述变换后权值矩阵中对应元素相乘。When performing the matrix multiplication operation, elements corresponding to positions in the transformed neuron matrix are extracted according to "1" of the sparse sequence record, and multiplied by corresponding elements in the transformed weight matrix. 一种神经网络的卷积运算装置,用于以矩阵乘法的方式实现神经网络中的权值矩阵与神经元的卷积运算,其特征在于,装置包括:A convolution operation device for a neural network, which is used for realizing a convolution operation of a weight matrix and a neuron in a neural network by matrix multiplication, wherein the device comprises: 存储器,用于存储指令;a memory for storing instructions; 控制器,用于对所述指令进行译码;a controller for decoding the instruction; 处理器,用于执行译码后的所述指令,以执行:a processor, configured to execute the decoded instruction to perform: 对所述神经元矩阵和权值矩阵进行winograd变换,得到变换后神经元矩阵和变换后权值矩阵;Performing a winograd transformation on the neuron matrix and the weight matrix to obtain a transformed neuron matrix and a transformed weight matrix; 将所述变换后神经元矩阵和变换后权值矩阵进行矩阵乘法操作,得到乘法矩阵;Performing a matrix multiplication operation on the transformed neuron matrix and the transformed weight matrix to obtain a multiplication matrix; 将所述乘法矩阵进行winograd反变换,得到运算结果。The multiplication matrix is inverse-transformed by winograd to obtain an operation result. 根据权利要求6所述的神经网络的卷积运算装置,其特征在于,所述处理器包括一矩阵运算单元,用于采用下式对所述神经元矩阵d0和权值矩阵w0进行winograd变换,得到变换后神经元矩阵d和变换后权值矩阵w:The apparatus for convolution operation of a neural network according to claim 6, wherein said processor comprises a matrix operation unit for performing winograd on said neuron matrix d 0 and weight matrix w 0 using the following formula; Transform to obtain the transformed neuron matrix d and the transformed weight matrix w: d=CTd0C,w=Gw0GTd=C T d 0 C,w=Gw 0 G T , 其中,C为神经元矩阵d0的变换矩阵,CT为C的转置矩阵,G为权值矩阵w0的变换矩阵,GT为G的转置矩阵。Where C is the transformation matrix of the neuron matrix d 0 , C T is the transposed matrix of C, G is the transformation matrix of the weight matrix w 0 , and G T is the transposed matrix of G. 根据权利要求7所述的神经网络的卷积运算装置,其特征在于,所述矩阵运算单元还用于:采用下式将所述乘法矩阵t进行winograd反变换,得到运算结果output:The convolution operation device of the neural network according to claim 7, wherein the matrix operation unit is further configured to: inversely transform the multiplication matrix t by winograd using the following formula to obtain an operation result: output=ATtA,Output=A T tA, 其中,A为反变换矩阵,AT为A的转置矩阵。Where A is the inverse transformation matrix and A T is the transposed matrix of A. 根据权利要求8所述的神经网络的卷积运算装置,其特征在于,所述神经元矩阵和权值矩阵中的数值为二进制,并且,所述变换矩阵C、G及反变换矩阵A中的数值为2n,n为整数; The convolution operation device of a neural network according to claim 8, wherein values in said neuron matrix and weight matrix are binary, and wherein said transformation matrix C, G and inverse transformation matrix A The value is 2 n and n is an integer; 其中,所述矩阵运算单元采用位操作实现所述winograd变换及winograd反变换。The matrix operation unit implements the winograd transform and the winograd inverse transform by using a bit operation. 根据权利要求6所述的神经网络的卷积运算装置,其特征在于,所述处理器还包括一稀疏化处理单元,用于将所述变换后权值矩阵映射成“0”和“1”组成的稀疏序列,其中,“0”对应变换后权值矩阵中数值为“0”的元素,“1”对应变换后权值矩阵中数值不为0的元素;The apparatus for convolution operation of a neural network according to claim 6, wherein the processor further comprises a thinning processing unit for mapping the transformed weight matrix to "0" and "1" a sparse sequence composed of “0” corresponding to an element whose value is “0” in the transformed weight matrix, and “1” corresponding to an element whose value is not 0 in the transformed weight matrix; 所述矩阵运算单元在执行所述矩阵乘法操作时,所述稀疏化单元根据所述稀疏序列记录的“1”,提取变换后神经元矩阵中对应位置的元素,以与所述变换后权值矩阵中对应元素相乘。When the matrix operation unit performs the matrix multiplication operation, the thinning unit extracts an element of a corresponding position in the transformed neuron matrix according to the “1” of the sparse sequence record, and the transformed weight The corresponding elements in the matrix are multiplied. 根据权利要求6所述的神经网络的卷积运算装置,其特征在于,还包括一数据访问单元,用于从外部地址空间获取神经元矩阵和权值矩阵,并提供至所述处理器。The convolution operation device of a neural network according to claim 6, further comprising a data access unit for acquiring a neuron matrix and a weight matrix from an external address space and providing the same to the processor. 根据权利要求6所述的神经网络的卷积运算装置,其特征在于,还包括一数据缓存单元,用于对所述处理器所产生的数据进行缓存。The convolution operation device of the neural network according to claim 6, further comprising a data buffer unit for buffering data generated by said processor. 一种计算机可读存储介质,其存储有指令,其特征在于,该指令可被处理器执行,以使得该处理器执行权利要求1-5任意一项所述的方法。 A computer readable storage medium storing instructions, wherein the instructions are executable by a processor to cause the processor to perform the method of any of claims 1-5.
PCT/CN2016/109862 2016-12-14 2016-12-14 Neural network convolution computation method and device, and computer-readable storage medium Ceased WO2018107383A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/109862 WO2018107383A1 (en) 2016-12-14 2016-12-14 Neural network convolution computation method and device, and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/109862 WO2018107383A1 (en) 2016-12-14 2016-12-14 Neural network convolution computation method and device, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
WO2018107383A1 true WO2018107383A1 (en) 2018-06-21

Family

ID=62557732

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/109862 Ceased WO2018107383A1 (en) 2016-12-14 2016-12-14 Neural network convolution computation method and device, and computer-readable storage medium

Country Status (1)

Country Link
WO (1) WO2018107383A1 (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109919307A (en) * 2019-01-28 2019-06-21 广东浪潮大数据研究有限公司 FPGA and depth residual error network implementation approach, system, computer media
CN111047017A (en) * 2019-12-18 2020-04-21 北京安兔兔科技有限公司 Neural network algorithm evaluation method and device and electronic equipment
CN111126081A (en) * 2018-10-31 2020-05-08 永德利硅橡胶科技(深圳)有限公司 Global universal language terminal and method
CN111199275A (en) * 2018-11-20 2020-05-26 上海登临科技有限公司 System-on-Chip for Neural Networks
CN111210010A (en) * 2020-01-15 2020-05-29 上海眼控科技股份有限公司 Data processing method and device, computer equipment and readable storage medium
CN111260020A (en) * 2018-11-30 2020-06-09 深圳市海思半导体有限公司 Method and device for calculating convolutional neural network
CN111291317A (en) * 2020-02-26 2020-06-16 上海海事大学 A Greedy Recursive Method for Binarization of Convolutional Neural Networks with Approximate Matrix
CN111831254A (en) * 2019-04-15 2020-10-27 阿里巴巴集团控股有限公司 Image processing acceleration method, image processing model storage method and corresponding device
CN112686365A (en) * 2019-10-18 2021-04-20 华为技术有限公司 Method and device for operating neural network model and computer equipment
CN112765542A (en) * 2019-11-01 2021-05-07 中科寒武纪科技股份有限公司 Arithmetic device
CN112765539A (en) * 2019-11-01 2021-05-07 中科寒武纪科技股份有限公司 Operation device, method and related product
CN112784207A (en) * 2019-11-01 2021-05-11 中科寒武纪科技股份有限公司 Algorithms and related products
CN114254744A (en) * 2020-09-22 2022-03-29 上海阵量智能科技有限公司 Data processing apparatus and method, electronic apparatus, and storage medium
CN114692811A (en) * 2020-12-28 2022-07-01 安徽寒武纪信息科技有限公司 Device and board card for executing Winograd convolution
WO2022227024A1 (en) * 2021-04-30 2022-11-03 华为技术有限公司 Operational method and apparatus for neural network model and training method and apparatus for neural network model
CN117851744A (en) * 2024-03-07 2024-04-09 北京象帝先计算技术有限公司 Matrix operation circuit, processor, integrated circuit system, electronic component and equipment
US12412079B2 (en) 2020-05-13 2025-09-09 Samsung Electronics Co., Ltd. Z-first reference neural processing unit for mapping winograd convolution and a method thereof

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101893686A (en) * 2010-06-11 2010-11-24 河南电力试验研究院 Device and method for online detection of circuit breaker action characteristics based on photographic digitization
CN104915322A (en) * 2015-06-09 2015-09-16 中国人民解放军国防科学技术大学 Method for accelerating convolution neutral network hardware and AXI bus IP core thereof
CN106127297A (en) * 2016-06-02 2016-11-16 中国科学院自动化研究所 The acceleration of degree of depth convolutional neural networks based on resolution of tensor and compression method
CN106203617A (en) * 2016-06-27 2016-12-07 哈尔滨工业大学深圳研究生院 A kind of acceleration processing unit based on convolutional neural networks and array structure

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101893686A (en) * 2010-06-11 2010-11-24 河南电力试验研究院 Device and method for online detection of circuit breaker action characteristics based on photographic digitization
CN104915322A (en) * 2015-06-09 2015-09-16 中国人民解放军国防科学技术大学 Method for accelerating convolution neutral network hardware and AXI bus IP core thereof
CN106127297A (en) * 2016-06-02 2016-11-16 中国科学院自动化研究所 The acceleration of degree of depth convolutional neural networks based on resolution of tensor and compression method
CN106203617A (en) * 2016-06-27 2016-12-07 哈尔滨工业大学深圳研究生院 A kind of acceleration processing unit based on convolutional neural networks and array structure

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TAN, FUPING ET AL.: "A New Scheme to Divide Odd-sized Matrices for the Winograd's Algorithm", COMMUNICATION ON APPLIED MATHEMATICS AND COMPUTATION, vol. 18, no. 1, 30 June 2004 (2004-06-30) *

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126081A (en) * 2018-10-31 2020-05-08 永德利硅橡胶科技(深圳)有限公司 Global universal language terminal and method
CN111126081B (en) * 2018-10-31 2023-07-21 深圳永德利科技股份有限公司 Global universal language terminal and method
CN111199275A (en) * 2018-11-20 2020-05-26 上海登临科技有限公司 System-on-Chip for Neural Networks
CN111199275B (en) * 2018-11-20 2023-04-28 上海登临科技有限公司 System-on-Chip for Neural Networks
CN111260020B (en) * 2018-11-30 2024-04-16 深圳市海思半导体有限公司 Convolutional neural network calculation method and device
CN111260020A (en) * 2018-11-30 2020-06-09 深圳市海思半导体有限公司 Method and device for calculating convolutional neural network
CN109919307B (en) * 2019-01-28 2023-04-07 广东浪潮大数据研究有限公司 FPGA (field programmable Gate array) and depth residual error network implementation method, system and computer medium
CN109919307A (en) * 2019-01-28 2019-06-21 广东浪潮大数据研究有限公司 FPGA and depth residual error network implementation approach, system, computer media
CN111831254A (en) * 2019-04-15 2020-10-27 阿里巴巴集团控股有限公司 Image processing acceleration method, image processing model storage method and corresponding device
CN112686365B (en) * 2019-10-18 2024-03-29 华为技术有限公司 Methods, devices and computer equipment for running neural network models
CN112686365A (en) * 2019-10-18 2021-04-20 华为技术有限公司 Method and device for operating neural network model and computer equipment
CN112765539B (en) * 2019-11-01 2024-02-02 中科寒武纪科技股份有限公司 Computing device, computing method and related product
CN112784207A (en) * 2019-11-01 2021-05-11 中科寒武纪科技股份有限公司 Algorithms and related products
CN112765539A (en) * 2019-11-01 2021-05-07 中科寒武纪科技股份有限公司 Operation device, method and related product
CN112765542A (en) * 2019-11-01 2021-05-07 中科寒武纪科技股份有限公司 Arithmetic device
CN112784207B (en) * 2019-11-01 2024-02-02 中科寒武纪科技股份有限公司 Calculation methods and related products
CN111047017A (en) * 2019-12-18 2020-04-21 北京安兔兔科技有限公司 Neural network algorithm evaluation method and device and electronic equipment
CN111047017B (en) * 2019-12-18 2023-06-23 北京安兔兔科技有限公司 Neural network algorithm evaluation method and device and electronic equipment
CN111210010A (en) * 2020-01-15 2020-05-29 上海眼控科技股份有限公司 Data processing method and device, computer equipment and readable storage medium
CN111291317B (en) * 2020-02-26 2023-03-24 上海海事大学 Approximate matrix convolution neural network binary greedy recursion method
CN111291317A (en) * 2020-02-26 2020-06-16 上海海事大学 A Greedy Recursive Method for Binarization of Convolutional Neural Networks with Approximate Matrix
US12412079B2 (en) 2020-05-13 2025-09-09 Samsung Electronics Co., Ltd. Z-first reference neural processing unit for mapping winograd convolution and a method thereof
CN114254744A (en) * 2020-09-22 2022-03-29 上海阵量智能科技有限公司 Data processing apparatus and method, electronic apparatus, and storage medium
CN114692811A (en) * 2020-12-28 2022-07-01 安徽寒武纪信息科技有限公司 Device and board card for executing Winograd convolution
CN116888605A (en) * 2021-04-30 2023-10-13 华为技术有限公司 Operation method, training method and device of neural network model
WO2022227024A1 (en) * 2021-04-30 2022-11-03 华为技术有限公司 Operational method and apparatus for neural network model and training method and apparatus for neural network model
CN117851744A (en) * 2024-03-07 2024-04-09 北京象帝先计算技术有限公司 Matrix operation circuit, processor, integrated circuit system, electronic component and equipment
CN117851744B (en) * 2024-03-07 2025-03-18 北京象帝先计算技术有限公司 Matrix operation circuits, processors, integrated circuit systems, electronic components and equipment

Similar Documents

Publication Publication Date Title
WO2018107383A1 (en) Neural network convolution computation method and device, and computer-readable storage medium
US20240152729A1 (en) Convolutional neural network (cnn) processing method and apparatus performing high-speed and precision convolution operations
CN107622302B (en) Superpixel Methods for Convolutional Neural Networks
EP3612947B1 (en) Processing discontiguous memory as contiguous memory to improve performance of a neural network environment
TWI834729B (en) Neural network processor and convolution operation method thereof
CN107610146B (en) Image scene segmentation method and device, electronic equipment and computer storage medium
US10650230B2 (en) Image data extraction using neural networks
CN112703511B (en) Operation accelerator and data processing method
JP7710507B2 (en) Table folding and acceleration
CN107516290B (en) Image conversion network acquisition method, device, computing device and storage medium
WO2021081854A1 (en) Convolution operation circuit and convolution operation method
CN114764615A (en) Convolution operation implementation method, data processing method and device
CN113761934B (en) Word vector representation method based on self-attention mechanism and self-attention model
WO2021037174A1 (en) Neural network model training method and apparatus
CN111353591A (en) Computing device and related product
WO2019215907A1 (en) Arithmetic processing device
TWI758223B (en) Computing method with dynamic minibatch sizes and computing system and computer-readable storage media for performing the same
WO2021036362A1 (en) Method and apparatus for processing data, and related product
WO2019136751A1 (en) Artificial intelligence parallel processing method and apparatus, computer readable storage medium, and terminal
CN111178513A (en) Convolution implementation method and device of neural network and terminal equipment
US20200118002A1 (en) Down-sampling for convolutional neural networks
CN114819076A (en) Network distillation method, device, computer equipment, storage medium
WO2023006170A1 (en) Devices and methods for providing computationally efficient neural networks
WO2024114154A1 (en) Noise data determination model training method and apparatus, and noise data determination method and apparatus
CN113554092B (en) Based on R 2 Net underwater fish target detection method, device and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16924104

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16924104

Country of ref document: EP

Kind code of ref document: A1