WO2021120036A1

WO2021120036A1 - Data processing apparatus and data processing method

Info

Publication number: WO2021120036A1
Application number: PCT/CN2019/126179
Authority: WO
Inventors: 董镇江; 杨帆; 李震桁
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2019-12-18
Filing date: 2019-12-18
Publication date: 2021-06-24
Anticipated expiration: 2022-06-18
Also published as: CN114730331B; CN114730331A

Abstract

Provided by the present application are a data processing apparatus and a data processing method in the field of artificial intelligence. Provided by the present application is a data processing apparatus, used to perform convolution processing, according to a two-dimensional convolution kernel, on a matrix to be convolved. The two-dimensional convolution kernel comprises a first parameter and M second parameters; the matrix to be convolved comprises a first eigenvalue and M second eigenvalues, the M second parameters corresponding one-to-one to the M second eigenvalues, and the first parameter corresponding to the first eigenvalue. The data processing apparatus comprises: M multipliers and M-1 first adders, which are used to perform a multiply-accumulate operation on the M second parameters and the M second eigenvalues to obtain a multiply-accumulate result; and a second adder, which is used to perform an addition operation on the multiply-accumulate result and the first eigenvalue to obtain a convolution result of the two-dimensional convolution kernel and the matrix to be convolved. The data processing apparatus and data processing method provided in the present application help prevent the wastage of resources and improve resource utilization.

Description

Data processing device and data processing method

Technical field

本申请涉及人工智能领域的数据计算，并且更具体地，涉及一种数据处理装置和数据处理方法。This application relates to data calculation in the field of artificial intelligence, and more specifically, to a data processing device and a data processing method.

Background technique

人工智能(artificial intelligence，AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能，感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说，人工智能是计算机科学的一个分支，它企图了解智能的实质，并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法，使机器具有感知、推理与决策的功能。人工智能领域的研究包括机器人，自然语言处理，计算机视觉，决策与推理，人机交互，推荐与搜索，AI基础理论等。Artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic AI theories.

神经网络(neural network，NN)作为人工智能的重要分支，是一种模仿动物神经网络行为特征进行信息处理的网络结构。神经网络的结构由大量的节点(或称神经元)相互联接构成，基于特定运算模型通过对输入信息进行学习和训练达到处理信息的目的。一个神经网络包括输入层、隐藏层及输出层，输入层负责接收输入信号，输出层负责输出神经网络的计算结果，隐藏层负责学习、训练等计算过程，是网络的记忆单元，隐藏层的记忆功能由权重矩阵来表征，通常每个神经元对应一个权重系数。Neural network (NN), as an important branch of artificial intelligence, is a network structure that imitates the behavioral characteristics of animal neural networks for information processing. The structure of the neural network is composed of a large number of nodes (or neurons) connected to each other, and the purpose of processing information is achieved by learning and training the input information based on a specific operation model. A neural network includes an input layer, a hidden layer, and an output layer. The input layer is responsible for receiving input signals, the output layer is responsible for outputting the calculation results of the neural network, and the hidden layer is responsible for calculation processes such as learning and training. It is the memory unit of the network and the memory of the hidden layer. The function is represented by a weight matrix, usually each neuron corresponds to a weight coefficient.

其中，卷积神经网络(convolutional neural network，CNN)是一种多层的神经网络，每层有多个二维平面组成，而每个平面由多个独立神经元组成，每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重，这里共享的权重就是卷积核。Among them, convolutional neural network (convolutional neural network, CNN) is a multi-layer neural network, each layer is composed of multiple two-dimensional planes, and each plane is composed of multiple independent neurons, and each feature plane can be composed of It is composed of some rectangularly arranged neural units. Neural units in the same feature plane share weights, and the shared weights here are the convolution kernels.

在卷积神经网络中，处理器进行卷积操作通常是将特征平面中的特征信息与权重的卷积，转换为信号矩阵与权重矩阵之间的矩阵乘运算。在具体矩阵乘运算时，对信号矩阵和权重矩阵进行分块处理，得到多个分形(fractional)信号矩阵和分形权重矩阵，然后对多个分形信号矩阵和分形权重矩阵进行矩阵乘和累加运算。In the convolutional neural network, the convolution operation performed by the processor usually converts the convolution of the feature information and the weight in the feature plane into a matrix multiplication operation between the signal matrix and the weight matrix. In the specific matrix multiplication operation, the signal matrix and the weight matrix are divided into blocks to obtain multiple fractal (fractional) signal matrices and fractal weight matrices, and then the multiple fractal signal matrices and fractal weight matrices are subjected to matrix multiplication and accumulation operations.

目前，处理器对输入层进行卷积时，处理器中用于进行卷积运算的每个处理单元(processing engine，PE)中通常包括2 ⁿ的乘累加(multiply and accumulate，MAC)单元。具体地说，处理器用于进行卷积计算的每个PE包括个2 ⁿ乘法单元和2 ⁿ-1个加法单元。但是，进行卷积核运算时使用的卷积核的大小一般都小于2 ⁿ，这使得PE进行卷积运算时，会有部分乘法单元和加法单元空转，从而造成处理器的资源浪费。 At present, when the processor convolves the input layer, each processing engine (PE) used for convolution operation in the processor usually includes a ²ⁿ multiply and accumulate (MAC) unit. Specifically, each PE used by the processor to perform convolution calculation includes 2 ⁿ multiplication units and 2 ⁿ -1 addition units. However, the size of the convolution kernel used in the convolution kernel operation is generally less than 2 ⁿ , which makes some multiplication units and addition units idle when the PE performs the convolution operation, which causes a waste of processor resources.

例如，如图15所示，进行卷积运行的PE包括16个MAC，且卷积核大小为3*3时，该PE中，至少乘法单元9至乘法单元15，以及加法单元8、加法单元9、加法单元10和加法单元12会空转，从而造成处理资源的浪费。For example, as shown in Figure 15, when the PE performing the convolution operation includes 16 MACs and the size of the convolution kernel is 3*3, in the PE, at least the multiplication unit 9 to the multiplication unit 15, and the addition unit 8, the addition unit 9. The addition unit 10 and the addition unit 12 will run idly, resulting in a waste of processing resources.

发明内容Summary of the invention

本申请提供一种数据处理装置和数据处理方法，有助于避免资源的浪费，提高资源的利用率。The present application provides a data processing device and a data processing method, which help to avoid the waste of resources and improve the utilization rate of resources.

第一方面，本申请提供一种数据处理装置，所述数据处理装置用于根据二维卷积核对待卷积矩阵进行卷积处理，所述二维卷积核包括一个第一参数和M个第二参数，所述待卷积矩阵包括一个第一特征值和M个第二特征值，所述M个第二参数与所述M个第二特征值一一对应，所述第一参数对应所述第一特征值。所述数据处理装置包括：M个乘法器和M-1个第一加法器，用于对所述M个第二参数和所述M个第二特征值进行乘累加运算，以得到乘累加结果；第二加法器，用于对所述乘累加结果和所述第一特征值进行加运算，以得到所述二维卷积核与所述待卷积矩阵的卷积结果。其中，M为大于1的正整数。In a first aspect, the present application provides a data processing device configured to perform convolution processing on a convolution matrix to be convolved according to a two-dimensional convolution kernel. The two-dimensional convolution kernel includes a first parameter and M The second parameter, the matrix to be convolved includes a first eigenvalue and M second eigenvalues, the M second parameters are in one-to-one correspondence with the M second eigenvalues, and the first parameter corresponds to The first characteristic value. The data processing device includes: M multipliers and M-1 first adders, configured to perform multiplication and accumulation operations on the M second parameters and the M second characteristic values to obtain a multiplication and accumulation result ; A second adder for performing an addition operation on the multiplication and accumulation result and the first eigenvalue to obtain the convolution result of the two-dimensional convolution kernel and the matrix to be convolved. Among them, M is a positive integer greater than 1.

该数据处理装置在根据大小为M+1的卷积核对大小为M+1的待卷积矩阵进行卷积处理时，通过M个乘法器和M-1个加法器来计算卷积核中的M个参数与待卷积矩阵中的M个特征值的卷积结果，并通过另一个加法器计算该卷积结果与待卷积矩阵中第M+1个特征值的和，最后将该和当作该卷积核对该待卷积矩阵的卷积结果。这使得当需要进行卷积计算的卷积核的大小比数据处理装置中的乘累加单元的数目多1时，可以不用使用多个包含M+1个乘法器和M个加法器的传统数据处理装置计算，从而可以避免资源的浪费，提高资源的利用率。When the data processing device performs convolution processing on a matrix to be convolved with a size of M+1 according to a convolution kernel with a size of M+1, it uses M multipliers and M-1 adders to calculate the convolution kernel The convolution result of the M parameters and the M eigenvalues in the matrix to be convolved, and another adder calculates the sum of the convolution result and the M+1th eigenvalue in the matrix to be convolved, and finally the sum Take it as the convolution result of the convolution kernel on the matrix to be convolved. This makes it unnecessary to use multiple traditional data processing including M+1 multipliers and M adders when the size of the convolution kernel required for convolution calculation is one more than the number of multiplication and accumulation units in the data processing device. The device calculates, thereby avoiding the waste of resources and improving the utilization rate of resources.

其中，待卷积矩阵中可以包括输入神经网络的图像的特征，或者可以包括输入神经网络的语音数据所包含的语音特征，或者可以包括输入神经网络的文本数据所包含的文本特征等等。The matrix to be convolved may include the features of the image input to the neural network, or may include the voice features included in the voice data input to the neural network, or may include the text features included in the text data input to the neural network, and so on.

在一些可能的实现方式中，所述第一参数等于1。也就是说，输入第二加法器的第一特征值所对应的第一参数为1。这样，可以保证该数据处理装置计算得到的卷积结果的准确性。In some possible implementation manners, the first parameter is equal to 1. That is, the first parameter corresponding to the first characteristic value input to the second adder is 1. In this way, the accuracy of the convolution result calculated by the data processing device can be guaranteed.

在一些可能的实现方式中，所述装置还包括处理器，所述处理器用于：根据初始卷积核确定所述二维卷积核，所述初始卷积核包括一个第一初始参数和M个第二初始参数，所述M个所述第二参数与所述M个第二初始参数一一对应，所述第一参数与所述第一初始参数对应，所述第二参数等于所述第二参数对应的第二初始参数与所述第一初始参数的商，所述第一初始参数不为零。In some possible implementation manners, the device further includes a processor configured to determine the two-dimensional convolution kernel according to an initial convolution kernel, and the initial convolution kernel includes a first initial parameter and M Second initial parameters, the M second parameters correspond to the M second initial parameters one-to-one, the first parameters correspond to the first initial parameters, and the second parameters are equal to all The quotient of the second initial parameter corresponding to the second parameter and the first initial parameter, and the first initial parameter is not zero.

该实现方式中的数据处理装置用于计算初始卷积核与待卷积矩阵的卷积结果时，可以先通过处理器将初始卷积核中的所有参数除上其中的一个非零参数(即第一初始参数)，从而可以使得得到的二维卷积核中有一个参数(即第一参数)为1。这样，在使用上述M各乘法器和M个加法器来计算该二维卷积核与待卷积矩阵的卷积结果时，能够得到更准确的值。When the data processing device in this implementation is used to calculate the convolution result of the initial convolution kernel and the matrix to be convolved, the processor can first divide all the parameters in the initial convolution kernel by one of the non-zero parameters (ie The first initial parameter), so that one parameter (that is, the first parameter) in the obtained two-dimensional convolution kernel is 1. In this way, when the above-mentioned M multipliers and M adders are used to calculate the convolution result of the two-dimensional convolution kernel and the matrix to be convolved, more accurate values can be obtained.

可选地，该处理器还可以用于：根据所述二位卷积核与所述待卷积矩阵的卷积结果确定所述初始卷积核与所述待卷积矩阵的卷积结果，其中，所述初始卷积核与所述待卷积矩阵的卷积结果等于所述二维卷积核与所述待卷积矩阵的卷积结果与所述第一初始参数的乘积。Optionally, the processor may be further configured to: determine the convolution result of the initial convolution kernel and the matrix to be convolved according to the convolution result of the two-bit convolution kernel and the matrix to be convolved, Wherein, the convolution result of the initial convolution kernel and the matrix to be convolved is equal to the product of the convolution result of the two-dimensional convolution kernel and the matrix to be convolved and the first initial parameter.

在一些可能的实现方式中，当至少一个所述第二初始参数与所述第一初始参数的商大于第一阈值时，在所述根据初始卷积核确定所述二维卷积核之前，所述处理器还用于：对所述M个第二参数或所述M个第二初始参数缩小m倍，其中，任一缩小m倍的第二初始参数与所述第一初始参数的商不大于所述第一阈值。其中，m可以为正整数。In some possible implementation manners, when the quotient of at least one of the second initial parameters and the first initial parameter is greater than a first threshold, before the determining the two-dimensional convolution kernel according to the initial convolution kernel, The processor is further configured to: reduce the M second parameters or the M second initial parameters by m times, wherein the quotient of any second initial parameter reduced by m times and the first initial parameter Not greater than the first threshold. Among them, m can be a positive integer.

该实现方式中的数据处理装置，在初始卷积核中的任意一个第二初始参数与所述第一初始参数的商大于第一阈值时，可以先将所有第二初始参数缩小m倍，以使得缩小后的任意第二初始参数与所述第一初始参数的商不大于所述第一阈值，从而可以使得将缩小后的第二初始参数除上第一初始参数得到的第二参数不大于所述第一阈值。其中，第一阈值可以小于或等于处理器的最大可表达值，或者说，第一阈值可以小于或等于该数据处理装置中的上述乘法器和加法器的最大可表达值。这样可以使得计算得到的卷积结果不会因为溢出数据处理装置的最大可表达值范围而产生计算结果不准确的问题。In the data processing device in this implementation, when the quotient of any one of the second initial parameters in the initial convolution kernel and the first initial parameter is greater than the first threshold, all the second initial parameters may be reduced by m times to So that the quotient of any second initial parameter after reduction and the first initial parameter is not greater than the first threshold, so that the second parameter obtained by dividing the reduced second initial parameter by the first initial parameter is not greater than The first threshold. Wherein, the first threshold may be less than or equal to the maximum expressible value of the processor, or in other words, the first threshold may be less than or equal to the maximum expressible value of the above-mentioned multiplier and adder in the data processing device. In this way, the calculated convolution result will not overflow the maximum expressible value range of the data processing device and cause the problem of inaccurate calculation results.

或者，该实现方式中的数据处理装置，在初始卷积核中的任意一个第二初始参数与所述第一初始参数的商大于第一阈值时，可以将所有第二参数缩小m倍，以使得缩小后的第二参数不大于所述第一阈值。其中，第一阈值可以小于或等于处理器的最大可表达值，或者说，第一阈值可以小于或等于该数据处理装置中的上述乘法器和加法器的最大可表达值。这样可以使得计算得到的卷积结果不会因为溢出数据处理装置的最大可表达值范围而产生计算结果不准确的问题。Or, in the data processing device in this implementation, when the quotient of any one of the second initial parameters in the initial convolution kernel and the first initial parameter is greater than the first threshold, all the second parameters may be reduced by m times to So that the reduced second parameter is not greater than the first threshold. Wherein, the first threshold may be less than or equal to the maximum expressible value of the processor, or in other words, the first threshold may be less than or equal to the maximum expressible value of the above-mentioned multiplier and adder in the data processing device. In this way, the calculated convolution result will not overflow the maximum expressible value range of the data processing device and cause the problem of inaccurate calculation results.

在处理器还用于对所述M个第二参数或所述M个第二初始参数缩小m倍的情况下，在对所述乘累加结果和第一特征值进行加运算之前，所述处理器还用于：对所述第一特征值缩小m倍；对应的，第二加法器具体用于对所述乘累加结果和所述缩小后的第一特征值进行加运算。这是因为在将第二初始参数或第二参数缩小m倍的情况下，第一参数相应地也应该缩小m倍，所以第一参数对第一特征值的处理结果也应该缩小m倍。本申请的数据处理装置直接将第一特征值缩小m倍，以保证卷积结果的准确性。In the case that the processor is further configured to reduce the M second parameters or the M second initial parameters by m times, before performing the addition operation on the multiplication and accumulation result and the first characteristic value, the processing The device is also used to: reduce the first characteristic value by m times; correspondingly, the second adder is specifically used to add the result of the multiplication and accumulation and the reduced first characteristic value. This is because when the second initial parameter or the second parameter is reduced by m times, the first parameter should be reduced by m times accordingly, so the processing result of the first feature value by the first parameter should also be reduced by m times. The data processing device of the present application directly reduces the first feature value by m times to ensure the accuracy of the convolution result.

可选地，该处理器可以通过移位操作来对第一特征值进行缩小。具体地，可以将第一特征值左移。Optionally, the processor may perform a shift operation to reduce the first characteristic value. Specifically, the first feature value can be shifted to the left.

在一些可能的实现方式中，当至少一个所述第二初始参数与所述第一初始参数的商小于第二阈值时，在所述根据初始卷积核确定所述二维卷积核之前，所述处理器还用于：对所述M个第二参数或所述M个第二初始参数扩大n倍，其中，任一扩大n倍的第二初始参数与所述第一初始参数的商不小于所述第二阈值。其中，n可以为正整数。In some possible implementation manners, when the quotient of at least one of the second initial parameters and the first initial parameter is less than a second threshold, before the determining the two-dimensional convolution kernel according to the initial convolution kernel, The processor is further configured to: expand the M second parameters or the M second initial parameters by n times, wherein the quotient of any second initial parameter expanded by n times and the first initial parameter Not less than the second threshold. Among them, n can be a positive integer.

该实现方式中的数据处理装置，在初始卷积核中的任意一个第二初始参数与所述第一初始参数的商小于第二阈值时，可以先将所有第二初始参数扩大m倍，以使得扩大后的任意第二初始参数与所述第一初始参数的商不小于所述第二阈值，从而可以使得将缩小后的第二初始参数除上第一初始参数得到的第二参数不小于所述第二一阈值。其中，第二阈值可以大于或等于处理器的最小可表达值，或者说，第二阈值可以大于或等于该数据处理装置中的上述乘法器和加法器的最小可表达值。这样可以使得计算得到的卷积结果不会因为溢出数据处理装置的最小可表达值范围而产生计算结果不准确的问题。In the data processing device in this implementation, when the quotient of any one of the second initial parameters in the initial convolution kernel and the first initial parameter is less than the second threshold, all the second initial parameters may be expanded by m times to The quotient of any second initial parameter after expansion and the first initial parameter is not less than the second threshold, so that the second parameter obtained by dividing the reduced second initial parameter by the first initial parameter is not less than The second one threshold. Wherein, the second threshold may be greater than or equal to the minimum expressible value of the processor, or in other words, the second threshold may be greater than or equal to the minimum expressible value of the multiplier and adder in the data processing device. In this way, the calculated convolution result will not overflow the minimum expressible value range of the data processing device and cause the problem of inaccurate calculation results.

或者，该实现方式中的数据处理装置，在初始卷积核中的任意一个第二初始参数与所述第一初始参数的商小于第二阈值时，可以将所有第二参数缩小m倍，以使得缩小后的第二参数不小于所述第二阈值。其中，第二阈值可以大于或等于处理器的最小可表达值，或者说，第二阈值可以大于或等于该数据处理装置中的上述乘法器和加法器的最小可表达值。这样可以使得计算得到的卷积结果不会因为溢出数据处理装置的最小可表达值范围而产生计算结果不准确的问题。Or, the data processing device in this implementation manner, when the quotient of any one of the second initial parameters in the initial convolution kernel and the first initial parameter is less than the second threshold, all the second parameters may be reduced by m times to So that the reduced second parameter is not less than the second threshold. Wherein, the second threshold may be greater than or equal to the minimum expressible value of the processor, or in other words, the second threshold may be greater than or equal to the minimum expressible value of the above-mentioned multiplier and adder in the data processing device. In this way, the calculated convolution result will not overflow the minimum expressible value range of the data processing device and cause the problem of inaccurate calculation results.

在一些可能的实现方式中，在所述对所述乘累加结果和所述第一特征值进行加运算之前，所述处理器还用于：对所述第一特征值扩大n倍；对应的，所述第二加法器具体用于对所述乘累加结果和所述扩大后的第一特征值进行加运算。In some possible implementation manners, before the addition operation is performed on the multiplication and accumulation result and the first eigenvalue, the processor is further configured to: expand the first eigenvalue by n times; correspondingly The second adder is specifically configured to perform an addition operation on the multiplication and accumulation result and the expanded first characteristic value.

这是因为在将第二初始参数或第二参数扩大m倍的情况下，第一参数相应地也应该扩大m倍，所以第一参数对第一特征值的处理结果也应该扩大m倍。本申请的数据处理装置直接将第一特征值扩大m倍，以保证卷积结果的准确性。This is because when the second initial parameter or the second parameter is enlarged by m times, the first parameter should be enlarged by m times accordingly, so the processing result of the first feature value by the first parameter should also be enlarged by m times. The data processing device of the present application directly enlarges the first feature value by m times to ensure the accuracy of the convolution result.

可选地，该处理器可以通过移位操作来对第一特征值进行扩大处理。具体地，可以将第一特征值右移。Optionally, the processor may perform an enlargement process on the first feature value through a shift operation. Specifically, the first feature value can be shifted to the right.

在一些可能的实现方式中，所述处理器包括以下一项或者多项的组合：中央处理器、图形处理器或神经网络处理器。In some possible implementation manners, the processor includes one or a combination of more of the following: a central processing unit, a graphics processor, or a neural network processor.

在一些可能的实现方式中，M等于8，所述二维卷积核为3*3矩阵。可选地，所述M个乘法器和所述M-1个第一加法器可以构成一个乘积累加器。In some possible implementation manners, M is equal to 8, and the two-dimensional convolution kernel is a 3*3 matrix. Optionally, the M multipliers and the M-1 first adders may constitute a multiplication accumulation adder.

在一些可能的实现方式中，M等于24，所述二维卷积核为5*5矩阵。可选地，所述M个乘法器和所述M-1个第一加法器构成3个乘积累加器，其中，每个所述乘积累加器包括M/3个乘法器和M/3-1个第一加法器。In some possible implementation manners, M is equal to 24, and the two-dimensional convolution kernel is a 5*5 matrix. Optionally, the M multipliers and the M-1 first adders constitute three multiplication and accumulation adders, wherein each of the multiplication accumulation adders includes M/3 multipliers and M/3-1 A first adder.

在一些可能的实现方式中，所述二维卷积核为一个N维卷积核的一个二维矩阵分量，N为大于2的整数。In some possible implementation manners, the two-dimensional convolution kernel is a two-dimensional matrix component of an N-dimensional convolution kernel, and N is an integer greater than 2.

第二方面，本申请提供一种数据处理方法，包括：根据神经网络第L层的初始卷积核得到第L层的等效卷积核，其中，所述第L层的等效卷积核的参数基于所述第L层的初始卷积核的对应参数与所述第L层的初始卷积核中的第一初始参数的商获得，所述第一初始参数的值为K，K为非零数，所述第L层的等效卷积核用于对所述第L层的特征图进行卷积处理；获取所述神经网络第L+1层中与所述第L层的初始卷积核具有映射关系的所述第L+1层的初始卷积核；将所述第L+1层的初始卷积核中的每个参数扩大K倍；根据所述扩大处理后的第L+1层的初始卷积核确定所述第L+1层的等效卷积核。其中，K可以为正整数。In a second aspect, the present application provides a data processing method, including: obtaining an equivalent convolution kernel of the Lth layer according to the initial convolution kernel of the Lth layer of the neural network, wherein the equivalent convolution kernel of the Lth layer The parameter of is obtained based on the quotient of the corresponding parameter of the initial convolution kernel of the L-th layer and the first initial parameter in the initial convolution kernel of the L-th layer, and the value of the first initial parameter is K, and K is Non-zero number, the equivalent convolution kernel of the Lth layer is used to perform convolution processing on the feature map of the Lth layer; to obtain the initial value of the L+1th layer of the neural network and the Lth layer The initial convolution kernel of the L+1th layer whose convolution kernel has a mapping relationship; expand each parameter in the initial convolution kernel of the L+1th layer by K times; according to the expanded first convolution kernel The initial convolution kernel of the L+1 layer determines the equivalent convolution kernel of the L+1th layer. Among them, K can be a positive integer.

该方法中，对神经网络第L层的初始卷积核中的参数进行处理，以使得根据处理得到的等效卷积核进行卷积处理时，可以减少一次乘运算处理，从而可以使得初始卷积核中的参数数量比进行卷积计算的装置中的乘法器数量多1时，不用多使用至少一个装置而导致装置中其他乘法器和加法器空转，进而可以节省资源，提高资源利用率。In this method, the parameters in the initial convolution kernel of the Lth layer of the neural network are processed, so that when convolution processing is performed according to the equivalent convolution kernel obtained by the processing, one multiplication processing can be reduced, so that the initial convolution When the number of parameters in the product kernel is one more than the number of multipliers in the device for performing convolution calculation, at least one device is not used, which causes other multipliers and adders in the device to run idle, thereby saving resources and improving resource utilization.

此外，虽然对初始卷积核处理过程中使得等效卷积核缩小K倍，所导致的使用等效卷积核进行卷积处理得到的卷积结果比使用初始卷积核进行卷积处理得到的卷积结果小K倍的问题，可以通过扩大神经网络第L+1层中相应参数来解决。In addition, although the equivalent convolution kernel is reduced by K times during the initial convolution kernel processing, the resulting convolution result obtained by using the equivalent convolution kernel for convolution processing is higher than that obtained by using the initial convolution kernel for convolution processing. The problem that the result of the convolution is K times smaller can be solved by expanding the corresponding parameters in the L+1 layer of the neural network.

其中，输入第L层的特征图可以包括输入神经网络的图像的特征，或者可以包括输入神经网络的语音数据所包含的语音特征，或者可以包括输入神经网络的文本数据所包含的文本特征等等。Among them, the feature map input to the Lth layer may include the features of the image input to the neural network, or may include the voice features contained in the voice data input to the neural network, or may include the text features contained in the text data input to the neural network, etc. .

在一些可能的实现方式中，所述第L层的等效卷积核包括M个第二参数和一个第一参数，所述M个第二参数分别对应所述特征图的M个第二特征值，所述第一参数对应所述特征图的第一特征值，所述第一参数为1，所述对所述第L层的特征图进行卷积处理，包括：对所述M个第二参数和所述M个第二特征值进行乘累加运算，以得到乘累加结果；对所述乘累加结果和所述第一特征值进行加运算。In some possible implementation manners, the equivalent convolution kernel of the L-th layer includes M second parameters and one first parameter, and the M second parameters respectively correspond to the M second features of the feature map. Value, the first parameter corresponds to the first feature value of the feature map, the first parameter is 1, and the convolution processing on the feature map of the Lth layer includes: Multiplying and accumulating the two parameters and the M second characteristic values to obtain a multiplying and accumulating result; and performing an adding operation on the multiplying and accumulating result and the first characteristic value.

也就是说，等效卷积核中的参数可以是初始卷积核中的所有参数除上其中一个非零的第一初始参数得到的，此时，等效卷积核中与第一初始参数对应的第一参数即为1。这样，使用等效卷积核来对特征图进行处理时，就可以仅在计算等效卷积核中除第一参数以外的M个第二参数来与相应的M个特征值的乘累加加过时，才使用乘累加单元，然后可以在计算得到这M个第二参数与这M个特征值的乘累加结果之后，将该乘累加结果与第一参数对应的第一特征值相加，从而得到等效卷积核与特征图的卷积结果。In other words, the parameters in the equivalent convolution kernel can be obtained by dividing all the parameters in the initial convolution kernel by one of the non-zero first initial parameters. At this time, the equivalent convolution kernel and the first initial parameter The corresponding first parameter is 1. In this way, when using the equivalent convolution kernel to process the feature map, you can only calculate the M second parameters other than the first parameter in the equivalent convolution kernel to multiply and accumulate the corresponding M feature values. When it is out of date, the multiplication and accumulation unit is used, and then after the multiplication and accumulation result of the M second parameters and the M eigenvalues is calculated, the multiplication and accumulation result is added to the first eigenvalue corresponding to the first parameter, thereby Obtain the convolution result of the equivalent convolution kernel and the feature map.

在一些可能的实现方式中，所述第L层等效卷积核的参数基于所述第L层的初始卷积核的对应参数与所述第L层的初始卷积核中的第一初始参数的商获得，包括：当至少一个所述第L层的初始卷积核的参数与所述第一初始参数的商大于第一阈值时，对所述第L层的初始卷积核的对应参数缩小m倍，其中，任一缩小m倍的所述第L层的初始卷积核的参数与所述第一初始参数的商不大于所述第一阈值。其中，m可以为正整数。In some possible implementations, the parameters of the equivalent convolution kernel of the Lth layer are based on the corresponding parameters of the initial convolution kernel of the Lth layer and the first initial convolution kernel of the initial convolution kernel of the Lth layer. Obtaining the quotient of the parameter includes: when the quotient of at least one parameter of the initial convolution kernel of the L-th layer and the first initial parameter is greater than a first threshold, the corresponding to the initial convolution kernel of the L-th layer The parameter is reduced by m times, wherein the quotient of any parameter of the initial convolution kernel of the L-th layer reduced by m times and the first initial parameter is not greater than the first threshold. Among them, m can be a positive integer.

可选地，所述对所述乘累加结果和所述第一特征值进行加运算，包括：对所述第一特征值缩小m倍；对所述乘累加结果和所述缩小后的第一特征值进行加运算。Optionally, the performing an addition operation on the multiplication and accumulation result and the first feature value includes: reducing the first feature value by m times; and performing the multiplication and accumulation result and the reduced first feature value by a factor of m; The eigenvalues are added.

在一些可能的实现方式中，所述第L层等效卷积核的参数基于所述第L层的初始卷积核的对应参数与所述第L层的初始卷积核中的第一初始参数的商获得，包括：当至少一个所述第L层的初始卷积核的参数与所述第一初始参数的商小于第二阈值时，对所述第L层的初始卷积核的对应参数扩大n倍，其中，任一扩大n倍的所述第L层的初始卷积核的参数与所述第一初始参数的商不小于所述第二阈值。其中，n可以为正整数。In some possible implementations, the parameters of the equivalent convolution kernel of the Lth layer are based on the corresponding parameters of the initial convolution kernel of the Lth layer and the first initial convolution kernel of the initial convolution kernel of the Lth layer. Obtaining the quotient of the parameter includes: when the quotient of at least one parameter of the initial convolution kernel of the L-th layer and the first initial parameter is less than a second threshold, corresponding to the initial convolution kernel of the L-th layer The parameter is expanded by n times, wherein the quotient of the parameter of the initial convolution kernel of the L-th layer that is expanded by n times and the first initial parameter is not less than the second threshold. Among them, n can be a positive integer.

可选地，所述对所述乘累加结果和所述第一特征值进行加运算，包括：对所述第一特征值扩大n倍；对所述乘累加结果和所述扩大后的第一特征值进行加运算。Optionally, the performing an addition operation on the multiplication and accumulation result and the first eigenvalue includes: expanding the first eigenvalue by n times; and performing the multiplication and accumulation result and the expanded first The eigenvalues are added.

第三方面，本申请提供一种数据处理装置，包括：处理模块，用于根据神经网络第L层的初始卷积核得到第L层的等效卷积核，其中，所述第L层的等效卷积核的参数基于所述第L层的初始卷积核的对应参数与所述第L层的初始卷积核中的第一初始参数的商获得，所述第一初始参数的值为K，K为非零数，所述第L层的等效卷积核用于对所述第L层的特征图进行卷积处理；获取模块，用于获取所述神经网络第L+1层中与所述第L层的初始卷积核具有映射关系的所述第L+1层的初始卷积核；扩大模块，用于将所述第L+1层的初始卷积核中的每个参数扩大K倍；确定模块，用于根据所述扩大处理后的第L+1层的初始卷积核确定所述第L+1层的等效卷积核。其中，K可以为正整数。In a third aspect, the present application provides a data processing device, including: a processing module for obtaining the equivalent convolution kernel of the Lth layer according to the initial convolution kernel of the Lth layer of the neural network, wherein the Lth layer The parameter of the equivalent convolution kernel is obtained based on the quotient of the corresponding parameter of the initial convolution kernel of the Lth layer and the first initial parameter in the initial convolution kernel of the Lth layer, and the value of the first initial parameter Is K, and K is a non-zero number. The equivalent convolution kernel of the Lth layer is used to perform convolution processing on the feature map of the Lth layer; the acquisition module is used to obtain the L+1th layer of the neural network. The initial convolution kernel of the L+1th layer that has a mapping relationship with the initial convolution kernel of the Lth layer in the layer; the expansion module is used to change the initial convolution kernel of the L+1th layer Each parameter is expanded by K times; the determining module is configured to determine the equivalent convolution kernel of the L+1th layer according to the initial convolution kernel of the L+1th layer after the expansion processing. Among them, K can be a positive integer.

在一些可能的实现方式中，所述第L层的等效卷积核包括M个第二参数和一个第一参数，所述M个第二参数分别对应所述特征图的M个第二特征值，所述第一参数对应所述特征图的第一特征值，所述第一参数为1；其中，所述对所述第L层的特征图进行卷积处理，包括：对所述M个第二参数和所述M个第二特征值进行乘累加运算，以得到乘累加结果；对所述乘累加结果和所述第一特征值进行加运算。In some possible implementation manners, the equivalent convolution kernel of the L-th layer includes M second parameters and one first parameter, and the M second parameters respectively correspond to the M second features of the feature map. Value, the first parameter corresponds to the first feature value of the feature map, and the first parameter is 1; wherein, performing convolution processing on the feature map of the Lth layer includes: Performing a multiplication and accumulation operation on the second parameters and the M second characteristic values to obtain a multiplication and accumulation result; and performing an addition operation on the multiplication and accumulation result and the first characteristic value.

在一些可能的实现方式中，所述第L层等效卷积核的参数基于所述第L层的初始卷积核的对应参数与所述第L层的初始卷积核中的第一初始参数的商获得，包括：当至少一个所述第L层的初始卷积核的参数与所述第一初始参数的商小于第二阈值时，对所述第L层的初始卷积核的对应参数扩大n倍，其中，任一扩大n倍的所述第L层的初始卷积核的参数与所述第一初始参数的商不小于所述第二阈值。其中，n可以为正整数。In some possible implementations, the parameters of the equivalent convolution kernel of the Lth layer are based on the corresponding parameters of the initial convolution kernel of the Lth layer and the first initial convolution kernel of the initial convolution kernel of the Lth layer. Obtaining the quotient of the parameter includes: when the quotient of at least one parameter of the initial convolution kernel of the L-th layer and the first initial parameter is less than a second threshold, the corresponding to the initial convolution kernel of the L-th layer The parameter is expanded by n times, wherein the quotient of the parameter of the initial convolution kernel of the L-th layer that is expanded by n times and the first initial parameter is not less than the second threshold. Among them, n can be a positive integer.

第四方面，本申请提供了一种数据装置，该装置包括：存储器，用于存储程序；处理器，用于执行所述存储器存储的程序，当所述存储器存储的程序被执行时，所述处理器用于执行第二方面中的方法。In a fourth aspect, the present application provides a data device, which includes: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, the The processor is used to execute the method in the second aspect.

第五方面，本申请提供一种计算机可读介质，该计算机可读介质存储用于设备执行的指令，该指令用于实现第二方面中的方法。In a fifth aspect, the present application provides a computer-readable medium that stores instructions for device execution, and the instructions are used to implement the method in the second aspect.

第六方面，本申请提供一种包含指令的计算机程序产品，当该计算机程序产品在计算机上运行时，使得计算机执行第二方面中的方法。In the sixth aspect, this application provides a computer program product containing instructions, which when the computer program product runs on a computer, causes the computer to execute the method in the second aspect.

第七方面，本申请提供一种芯片，所述芯片包括处理器与数据接口，所述处理器通过所述数据接口读取存储器上存储的指令，执行第二方面中的方法。In a seventh aspect, the present application provides a chip including a processor and a data interface, and the processor reads instructions stored in a memory through the data interface, and executes the method in the second aspect.

可选地，作为一种实现方式，所述芯片还可以包括存储器，所述存储器中存储有指令，所述处理器用于执行所述存储器上存储的指令，当所述指令被执行时，所述处理器用于执行第二方面中的方法。Optionally, as an implementation manner, the chip may further include a memory in which instructions are stored, and the processor is configured to execute instructions stored on the memory. When the instructions are executed, the The processor is used to execute the method in the second aspect.

第八方面，本申请提供了一种计算设备，计算设备包括处理器和存储器，其中：存储器中存储有计算机指令，处理器执行计算机指令，以实现第二方面中的方法。In an eighth aspect, the present application provides a computing device. The computing device includes a processor and a memory. The memory stores computer instructions, and the processor executes the computer instructions to implement the method in the second aspect.

第九方面，本申请提供了一种数据处理设备，包括可编程器件和存储器，其中，存储器用于存储可编程器件运行所需的配置文件，可编程器件用于从存储器读取配置文件，执行，以实现第二方面的方法。In a ninth aspect, this application provides a data processing device, including a programmable device and a memory, where the memory is used to store configuration files required for the operation of the programmable device, and the programmable device is used to read the configuration file from the memory and execute , In order to achieve the second aspect of the method.

可选地，作为一种实现方式，可编程器件包括现场可编程逻辑门阵列(FPGA)或复杂可编程逻辑器件(CPLD)。Optionally, as an implementation manner, the programmable device includes a field programmable logic gate array (FPGA) or a complex programmable logic device (CPLD).

Description of the drawings

图1为本申请提供的一种卷积神经网络的结构示意图。FIG. 1 is a schematic diagram of the structure of a convolutional neural network provided by this application.

图2为本申请提供的另一种卷积神经网络的结构示意图。Fig. 2 is a schematic structural diagram of another convolutional neural network provided by this application.

图3为本申请提供的一种芯片的硬件结构示意图。FIG. 3 is a schematic diagram of the hardware structure of a chip provided by this application.

图4为本申请提供的一种系统架构的结构示意图。FIG. 4 is a schematic structural diagram of a system architecture provided by this application.

图5为本申请提供的一种数据处理方法的流程示意图。FIG. 5 is a schematic flowchart of a data processing method provided by this application.

图6为本申请提供的一种获取等效卷积核的方法示意图。FIG. 6 is a schematic diagram of a method for obtaining an equivalent convolution kernel provided by this application.

图7为本申请提供的一种获取等效参数的方法示意图。FIG. 7 is a schematic diagram of a method for obtaining equivalent parameters provided by this application.

图8为本申请提供的另一种获取等效参数的方法示意图。FIG. 8 is a schematic diagram of another method for obtaining equivalent parameters provided by this application.

图9为本申请提供的另一种获取等效参数的方法示意图。FIG. 9 is a schematic diagram of another method for obtaining equivalent parameters provided by this application.

图10为本申请提供的另一种获取等效卷积核的方法示意图。FIG. 10 is a schematic diagram of another method for obtaining an equivalent convolution kernel provided by this application.

图11为本申请提供的另一种获取等效卷积核的方法示意图。FIG. 11 is a schematic diagram of another method for obtaining an equivalent convolution kernel provided by this application.

图12为本申请提供的一种数据处理装置的结构示意图。FIG. 12 is a schematic structural diagram of a data processing device provided by this application.

图13为本申请提供的另一种数据处理装置的结构示意图。FIG. 13 is a schematic structural diagram of another data processing device provided by this application.

图14为本申请提供的另一种数据处理装置的结构示意图。FIG. 14 is a schematic structural diagram of another data processing device provided by this application.

图15为本申请提供的另一种数据处理装置的结构示意图。FIG. 15 is a schematic structural diagram of another data processing device provided by this application.

图16为本申请提供的一种读取数据的方法示意图。FIG. 16 is a schematic diagram of a method for reading data provided by this application.

图17为本申请提供的一种读取数据的方法示意图。FIG. 17 is a schematic diagram of a method for reading data provided by this application.

图18为本申请提供的另一种数据处理装置的结构示意图。FIG. 18 is a schematic structural diagram of another data processing device provided by this application.

图19为本申请提供的另一种数据处理装置的结构示意图。FIG. 19 is a schematic structural diagram of another data processing device provided by this application.

图20为本申请提供的一种数据处理设备的结构示意图。FIG. 20 is a schematic structural diagram of a data processing device provided by this application.

Detailed ways

下面对本申请实施例中的技术方案进行描述，显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The following describes the technical solutions in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

本申请实施例涉及了神经网络的相关应用，为了更好地理解本申请实施例的方案，下面先对本申请实施例可能涉及的神经网络的相关术语和其他相关概念进行介绍。The embodiments of the present application relate to related applications of neural networks. In order to better understand the solutions of the embodiments of the present application, the following first introduces related terms and other related concepts of neural networks that may be involved in the embodiments of the present application.

卷积神经网络是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器。该特征抽取器可以看作是滤波器，卷积过程可以看作是使用一个可训练的滤波器与一个输入的图像或者卷积特征平面(feature map)做卷积。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中，一个神经元可以只与部分邻层神经元连接。一个卷积层中，通常包含若干个特征平面，每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重，这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。这其中隐含的原理是：图像的某一部分的统计信息与其他部分是一样的。即意味着在某一部分学习的图像信息也能用在另一部分上。所以对于图像上的所有位置，我们都能使用同样的学习得到的图像信息。在同一卷积层中，可以使用多个卷积核来提取不同的图像信息，一般地，卷积核数量越多，卷积操作反映的图像信息越丰富。Convolutional neural network is a deep neural network with convolutional structure. The convolutional neural network contains a feature extractor composed of a convolutional layer and a sub-sampling layer. The feature extractor can be seen as a filter, and the convolution process can be seen as using a trainable filter to convolve with an input image or convolution feature map. The convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network. In the convolutional layer of a convolutional neural network, a neuron can be connected to only part of the neighboring neurons. A convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units in the same feature plane share weights, and the shared weights here are the convolution kernels. Sharing weight can be understood as the way of extracting image information has nothing to do with location. The underlying principle is that the statistical information of a certain part of the image is the same as that of other parts. This means that the image information learned in one part can also be used in another part. So for all positions on the image, we can use the same learning image information. In the same convolution layer, multiple convolution kernels can be used to extract different image information. Generally, the more the number of convolution kernels, the richer the image information reflected by the convolution operation.

卷积核可以以随机大小的矩阵的形式初始化，在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外，共享权重带来的直接好处是减少卷积神经网络各层之间的连接，同时又降低了过拟合的风险。The convolution kernel can be initialized in the form of a matrix of random size. In the training process of the convolutional neural network, the convolution kernel can obtain reasonable weights through learning. In addition, the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, and at the same time reduce the risk of overfitting.

下面结合图1重点对CNN的结构进行详细的介绍。如上文的基础概念介绍所述，卷积神经网络是一种带有卷积结构的深度神经网络，是一种深度学习(deep learning)架构，深度学习架构是指通过机器学习的算法，在不同的抽象层级上进行多个层次的学习。作为一种深度学习架构，CNN是一种前馈(feed-forward)人工神经网络，该前馈人工神经网络中的各个神经元可以对输入其中的图像作出响应。The following describes the structure of CNN in detail in conjunction with Figure 1. As mentioned in the introduction to the basic concepts above, a convolutional neural network is a deep neural network with a convolutional structure. It is a deep learning architecture. The deep learning architecture refers to the algorithm of machine learning. Multi-level learning is carried out on the abstract level of. As a deep learning architecture, CNN is a feed-forward artificial neural network. Each neuron in the feed-forward artificial neural network can respond to the input image.

本申请实施例中的卷积神经网络的结构可以如图1所示。在图1中，卷积神经网络(CNN)300可以包括输入层310，卷积层/池化层320(其中池化层为可选的)，以及神经网络层330。The structure of the convolutional neural network in the embodiment of the present application may be as shown in FIG. 1. In FIG. 1, a convolutional neural network (CNN) 300 may include an input layer 310, a convolutional layer/pooling layer 320 (the pooling layer is optional), and a neural network layer 330.

以图像处理为例(输入数据为文本或语音时的操作类似)，其中，输入层310可以获取待处理图像，并将获取到的待处理图像交由卷积层/池化层320以及后面的神经网络层330进行处理，可以得到图像的处理结果。Take image processing as an example (the operation is similar when the input data is text or voice), where the input layer 310 can obtain the image to be processed, and pass the obtained image to be processed to the convolutional layer/pooling layer 320 and the following The neural network layer 330 performs processing to obtain the processing result of the image.

下面对图1中的CNN 300中内部的层结构进行详细的介绍。The following describes the internal layer structure in CNN 300 in Figure 1 in detail.

卷积层/池化层320：Convolutional layer/pooling layer 320:

卷积层：Convolutional layer:

如图1所示卷积层/池化层320可以包括如示例321-326层，举例来说：在一种实现中，321层为卷积层，322层为池化层，323层为卷积层，324层为池化层，325为卷积层，326为池化层；在另一种实现方式中，321、322为卷积层，323为池化层，324、325为卷积层，326为池化层。即卷积层的输出可以作为随后的池化层的输入，也可以作为另一个卷积层的输入以继续进行卷积操作。The convolutional layer/pooling layer 320 as shown in Figure 1 may include layers 321-326 as shown in the example. For example, in one implementation, layer 321 is a convolutional layer, layer 322 is a pooling layer, and layer 323 is a convolutional layer. Layers, 324 is a pooling layer, 325 is a convolutional layer, and 326 is a pooling layer; in another implementation, 321 and 322 are convolutional layers, 323 is a pooling layer, and 324 and 325 are convolutional layers. Layer, 326 is the pooling layer. That is, the output of the convolutional layer can be used as the input of the subsequent pooling layer, or as the input of another convolutional layer to continue the convolution operation.

下面将以卷积层321为例，且以输入数据为图像为例，介绍一层卷积层的内部工作原理。输入数据为语音或文本或其他类型的数据时，卷积层的内部工作原理类似。In the following, taking the convolutional layer 321 as an example, and taking the input data as an image as an example, the internal working principle of a convolutional layer will be introduced. When the input data is voice or text or other types of data, the internal working principle of the convolutional layer is similar.

卷积层321可以包括很多个卷积算子，卷积算子也称为核，其在图像处理中的作用相当于一个从输入图像矩阵中提取特定信息的过滤器，卷积算子本质上可以是一个权重矩阵，这个权重矩阵通常被预先定义，在对图像进行卷积操作的过程中，权重矩阵通常在输入图像上沿着水平方向一个像素接着一个像素(或两个像素接着两个像素……这取决于步长stride的取值)的进行处理，从而完成从图像中提取特定特征的工作。该权重矩阵的大小应该与图像的大小相关，需要注意的是，权重矩阵的纵深维度(depth dimension)和输入图像的纵深维度是相同的，在进行卷积运算的过程中，权重矩阵会延伸到输入图像的整个深度。因此，和一个单一的权重矩阵进行卷积会产生一个单一纵深维度的卷积化输出，但是大多数情况下不使用单一权重矩阵，而是应用多个尺寸(行×列)相同的权重矩阵，即多个同型矩阵。每个权重矩阵的输出被堆叠起来形成卷积图像的纵深维度，这里的维度可以理解为由上面所述的“多个”来决定。不同的权重矩阵可以用来提取图像中不同的特征，例如一个权重矩阵用来提取图像边缘信息，另一个权重矩阵用来提取图像的特定颜色，又一个权重矩阵用来对图像中不需要的噪点进行模糊化等。该多个权重矩阵尺寸(行×列)相同，经过该多个尺寸相同的权重矩阵提取后的卷积特征图的尺寸也相同，再将提取到的多个尺寸相同的卷积特征图合并形成卷积运算的输出。The convolution layer 321 can include many convolution operators. The convolution operator is also called a kernel. Its role in image processing is equivalent to a filter that extracts specific information from the input image matrix. The convolution operator is essentially It can be a weight matrix. This weight matrix is usually pre-defined. In the process of convolution on the image, the weight matrix is usually one pixel after one pixel (or two pixels after two pixels) along the horizontal direction on the input image. ...It depends on the value of stride) to complete the work of extracting specific features from the image. The size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix and the depth dimension of the input image are the same. During the convolution operation, the weight matrix will extend to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will produce a single depth dimension convolution output, but in most cases, a single weight matrix is not used, but multiple weight matrices of the same size (row×column) are applied. That is, multiple homogeneous matrices. The output of each weight matrix is stacked to form the depth dimension of the convolutional image, where the dimension can be understood as determined by the "multiple" mentioned above. Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract edge information of the image, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to eliminate unwanted noise in the image. Perform obfuscation and so on. The multiple weight matrices have the same size (row×column), the size of the convolution feature maps extracted by the multiple weight matrices of the same size are also the same, and then the multiple extracted convolution feature maps of the same size are merged to form The output of the convolution operation.

这些权重矩阵中的权重值在实际应用中需要经过大量的训练得到，通过训练得到的权重值形成的各个权重矩阵可以用来从输入图像中提取信息，从而使得卷积神经网络300进行正确的预测。The weight values in these weight matrices need to be obtained through a lot of training in practical applications. Each weight matrix formed by the weight values obtained through training can be used to extract information from the input image, so that the convolutional neural network 300 can make correct predictions. .

当卷积神经网络300有多个卷积层的时候，初始的卷积层(例如321)往往提取较多的一般特征，该一般特征也可以称之为低级别的特征；随着卷积神经网络300深度的加深，越往后的卷积层(例如326)提取到的特征越来越复杂，比如高级别的语义之类的特征，语义越高的特征越适用于待解决的问题。When the convolutional neural network 300 has multiple convolutional layers, the initial convolutional layer (such as 321) often extracts more general features, which can also be called low-level features; with the convolutional neural network With the deepening of the network 300, the features extracted by the subsequent convolutional layers (for example, 326) become more and more complex, such as features such as high-level semantics, and features with higher semantics are more suitable for the problem to be solved.

池化层：Pooling layer:

由于常常需要减少训练参数的数量，因此卷积层之后常常需要周期性的引入池化层，在如图1中320所示例的321-326各层，可以是一层卷积层后面跟一层池化层，也可以是多层卷积层后面接一层或多层池化层。例如，在图像处理过程中，池化层的唯一目的就是减少图像的空间大小。池化层可以包括平均池化算子和/或最大池化算子，以用于对输入图像进行采样得到较小尺寸的图像。平均池化算子可以在特定范围内对图像中的像素值进行计算产生平均值作为平均池化的结果。最大池化算子可以在特定范围内取该范围内值最大的像素作为最大池化的结果。另外，就像卷积层中用权重矩阵的大小应该与图像尺寸相关一样，池化层中的运算符也应该与图像的大小相关。通过池化层处理后输出的图像尺寸可以小于输入池化层的图像的尺寸，池化层输出的图像中每个像素点表示输入池化层的图像的对应子区域的平均值或最大值。Since it is often necessary to reduce the number of training parameters, it is often necessary to periodically introduce a pooling layer after the convolutional layer. In the 321-326 layers as illustrated by 320 in Figure 1, it can be a convolutional layer followed by a layer. The pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers. For example, in the image processing process, the sole purpose of the pooling layer is to reduce the size of the image space. The pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to obtain an image with a smaller size. The average pooling operator can calculate the pixel values in the image within a specific range to generate an average value as the result of the average pooling. The maximum pooling operator can take the pixel with the largest value within a specific range as the result of the maximum pooling. In addition, just as the size of the weight matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size. The size of the image output after processing by the pooling layer can be smaller than the size of the image of the input pooling layer, and each pixel in the image output by the pooling layer represents the average value or the maximum value of the corresponding sub-region of the image input to the pooling layer.

神经网络层330：Neural network layer 330:

在经过卷积层/池化层320的处理后，卷积神经网络300还不足以输出所需要的输出信息。因为如前所述，卷积层/池化层320只会提取特征，并减少输入图像带来的参数。然而为了生成最终的输出信息(所需要的类信息或其他相关信息)，卷积神经网络300需要利用神经网络层330来生成一个或者一组所需要的类的数量的输出。因此，在神经网络层330中可以包括多层隐含层(如图3所示的331、332至33n)以及输出层340，该多层隐含层中所包含的参数可以根据具体的任务类型的相关训练数据进行预先训练得到，例如该任务类型可以包括图像识别，图像分类，图像超分辨率重建等等。After processing by the convolutional layer/pooling layer 320, the convolutional neural network 300 is not enough to output the required output information. Because as mentioned above, the convolutional layer/pooling layer 320 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other related information), the convolutional neural network 300 needs to use the neural network layer 330 to generate one or a group of required classes of output. Therefore, the neural network layer 330 can include multiple hidden layers (331, 332 to 33n as shown in FIG. 3) and an output layer 340. The parameters contained in the multiple hidden layers can be based on specific task types. The relevant training data of the, for example, the task type can include image recognition, image classification, image super-resolution reconstruction and so on.

在神经网络层330中的多层隐含层之后，也就是整个卷积神经网络300的最后层为输出层340，该输出层340具有类似分类交叉熵的损失函数，具体用于计算预测误差，一旦整个卷积神经网络300的前向传播(如图1由310至340方向的传播为前向传播)完成，反向传播(如图1由340至310方向的传播为反向传播)就会开始更新前面提到的各层的权重值以及偏差，以减少卷积神经网络300的损失，及卷积神经网络300通过输出层输出的结果和理想结果之间的误差。After the multiple hidden layers in the neural network layer 330, that is, the final layer of the entire convolutional neural network 300 is the output layer 340. The output layer 340 has a loss function similar to the classification cross entropy, which is specifically used to calculate the prediction error. Once the forward propagation of the entire convolutional neural network 300 (as shown in Figure 1, the propagation from 310 to 340 directions is forward propagation) is completed, the back propagation (as shown in Figure 1 is the propagation from 340 to 310 directions is reverse propagation). Start to update the aforementioned weight values and deviations of each layer to reduce the loss of the convolutional neural network 300 and the error between the output result of the convolutional neural network 300 through the output layer and the ideal result.

本申请实施例中的神经网络的结构可以如图2所示。在图2中，卷积神经网络(CNN)400可以包括输入层410，卷积层/池化层420(其中池化层为可选的)，以及神经网络层430。与图1相比，图2中的卷积层/池化层420中的多个卷积层/池化层(421至426)并行，将分别提取的特征均输入给全神经网络层430进行处理。神经网络层430可以包括多个隐含层，即隐含层1至隐含层n，可以记为431至43n。The structure of the neural network in the embodiment of the present application may be as shown in FIG. 2. In FIG. 2, a convolutional neural network (CNN) 400 may include an input layer 410, a convolutional layer/pooling layer 420 (the pooling layer is optional), and a neural network layer 430. Compared with Figure 1, the multiple convolutional/pooling layers (421 to 426) in the convolutional/pooling layer 420 in Figure 2 are in parallel, and the extracted features are all input to the full neural network layer 430 for processing. deal with. The neural network layer 430 may include multiple hidden layers, that is, hidden layer 1 to hidden layer n, which may be denoted as 431 to 43n.

需要说明的是，图1和图2所示的卷积神经网络仅作为一种本申请实施例中的两种可能的卷积神经网络的示例，在具体的应用中，本申请实施例中的卷积神经网络还可以以其他网络模型的形式存在。It should be noted that the convolutional neural network shown in FIG. 1 and FIG. 2 is only used as an example of two possible convolutional neural networks in the embodiment of the present application. In specific applications, the convolutional neural network in the embodiment of the present application Convolutional neural networks can also exist in the form of other network models.

如图1和图2所示的卷积神经网络中各层的算法或者算子均可在如图3所示的芯片中得以实现。The algorithms or operators of each layer in the convolutional neural network as shown in FIG. 1 and FIG. 2 can all be implemented in the chip as shown in FIG. 3.

图3为本申请实施例提供的一种芯片的硬件结构，该芯片包括神经网络处理器50。该芯片可以被设置在如图4所示的客户设备240中，用以实现相应业务。该芯片也可以被设置在如图4所示的训练设备220中，用以完成训练设备220的训练工作并输出目标模型201。FIG. 3 is a hardware structure of a chip provided by an embodiment of the application, and the chip includes a neural network processor 50. The chip can be set in the client device 240 as shown in FIG. 4 to implement corresponding services. The chip can also be set in the training device 220 shown in FIG. 4 to complete the training work of the training device 220 and output the target model 201.

神经网络处理器NPU 50作为协处理器挂载到主中央处理器(central processing unit，CPU)(host CPU)上，由主CPU分配任务。NPU的核心部分为运算电路503，控制器504控制运算电路503提取存储器(权重存储器或输入存储器)中的数据并进行运算。The neural network processor NPU 50 is mounted as a coprocessor to a main central processing unit (central processing unit, CPU) (host CPU), and the main CPU distributes tasks. The core part of the NPU is the arithmetic circuit 503. The controller 504 controls the arithmetic circuit 503 to extract data from the memory (weight memory or input memory) and perform calculations.

在一些实现中，运算电路503内部包括多个处理单元(process engine,PE)。In some implementations, the arithmetic circuit 503 includes multiple processing units (process engines, PE).

举例来说，假设有输入矩阵A，权重矩阵B，输出矩阵C。运算电路从权重存储器502中取矩阵B相应的数据，并缓存在运算电路中每一个PE上。运算电路从输入存储器501中取矩阵A数据与矩阵B进行矩阵运算，得到的矩阵C的部分结果或最终结果，保存在累加器(accumulator)508中。For example, suppose there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the corresponding data of matrix B from the weight memory 502 and caches it on each PE in the arithmetic circuit. The arithmetic circuit fetches the matrix A data and matrix B from the input memory 501 to perform matrix operations, and the partial result or final result of the matrix C obtained is stored in an accumulator 508.

向量计算单元507可以对运算电路的输出做进一步处理，如向量乘，向量加，指数运算，对数运算，大小比较等等。例如，向量计算单元507可以用于神经网络中非卷积/非FC层的网络计算，如池化(pooling)，批归一化(batch normalization)，局部响应归一化(local response normalization)等。The vector calculation unit 507 can perform further processing on the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and so on. For example, the vector calculation unit 507 can be used for network calculations in the non-convolutional/non-FC layer of the neural network, such as pooling, batch normalization, local response normalization, etc. .

在一些实现中，向量计算单元能507将经处理的输出的向量存储到统一缓存器506。例如，向量计算单元507可以将非线性函数应用到运算电路503的输出，例如累加值的向量，用以生成激活值。在一些实现中，向量计算单元507生成归一化的值、合并值，或二者均有。在一些实现中，处理过的输出的向量能够用作到运算电路503的激活输入，例如用于在神经网络中的后续层中的使用。In some implementations, the vector calculation unit 507 can store the processed output vector to the unified buffer 506. For example, the vector calculation unit 507 may apply a nonlinear function to the output of the arithmetic circuit 503, such as a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit 507 generates a normalized value, a combined value, or both. In some implementations, the processed output vector can be used as an activation input to the arithmetic circuit 503, for example for use in a subsequent layer in a neural network.

统一存储器506用于存放输入数据以及输出数据。The unified memory 506 is used to store input data and output data.

权重数据直接通过存储单元访问控制器505(direct memory access controller，DMAC)将外部存储器中的输入数据搬运到输入存储器501和/或统一存储器506、将外部存储器中的权重数据存入权重存储器502，以及将统一存储器506中的数据存入外部存储器。The weight data directly transfers the input data in the external memory to the input memory 501 and/or the unified memory 506 through the storage unit access controller 505 (direct memory access controller, DMAC), and stores the weight data in the external memory into the weight memory 502, And the data in the unified memory 506 is stored in the external memory.

总线接口单元(bus interface unit，BIU)510，用于通过总线实现主CPU、DMAC和取指存储器509之间进行交互。The bus interface unit (BIU) 510 is used to implement interaction between the main CPU, the DMAC, and the instruction fetch memory 509 through the bus.

与控制器504连接的取指存储器(instruction fetch buffer)509，用于存储控制器504使用的指令；An instruction fetch buffer 509 connected to the controller 504 is used to store instructions used by the controller 504;

控制器504，用于调用指存储器509中缓存的指令，实现控制该运算加速器的工作过程。The controller 504 is used to call the instructions cached in the memory 509 to control the working process of the computing accelerator.

一般地，统一存储器506，输入存储器501，权重存储器502以及取指存储器509均为片上(On-Chip)存储器，外部存储器为该NPU外部的存储器，该外部存储器可以为双倍数据率同步动态随机存储器(double data rate synchronous dynamic random access memory，DDR SDRAM)、高带宽存储器(high bandwidth memory，HBM)或其他可读可写的存储器。Generally, the unified memory 506, the input memory 501, the weight memory 502, and the instruction fetch memory 509 are all on-chip (On-Chip) memories. The external memory is a memory external to the NPU. The external memory can be a double data rate synchronous dynamic random access memory. Memory (double data rate synchronous dynamic random access memory, DDR SDRAM), high bandwidth memory (HBM) or other readable and writable memory.

其中，图1和图2所示的卷积神经网络中各层的运算可以由运算电路503或向量计算单元507执行。Among them, the operations of each layer in the convolutional neural network shown in FIG. 1 and FIG. 2 can be executed by the arithmetic circuit 503 or the vector calculation unit 507.

如图4所示，本申请实施例提供了一种系统架构200。在图4中，数据采集设备260用于采集训练数据。以用于图像处理的目标模型201为例来说，训练数据可以包括训练图像以及训练图像对应的分类结果，其中，训练图像的结果可以是人工预先标注的结果。目标模型201也可以称为目标规则201。As shown in FIG. 4, an embodiment of the present application provides a system architecture 200. In FIG. 4, a data collection device 260 is used to collect training data. Taking the target model 201 used for image processing as an example, the training data may include training images and classification results corresponding to the training images, where the results of the training images may be manually pre-labeled results. The target model 201 may also be referred to as a target rule 201.

在采集到训练数据之后，数据采集设备260将这些训练数据存入数据库230，训练设备220基于数据库230中维护的训练数据训练得到目标模型201。After the training data is collected, the data collection device 260 stores the training data in the database 230, and the training device 220 trains to obtain the target model 201 based on the training data maintained in the database 230.

下面对训练设备220基于训练数据得到目标模型201进行描述，训练设备220对输入的原始图像进行处理，将输出的图像与原始图像进行对比，直到训练设备120输出的图像与原始图像的差值小于一定的阈值，从而完成目标模型201的训练。The following describes the target model 201 obtained by the training device 220 based on the training data. The training device 220 processes the input original image and compares the output image with the original image until the difference between the image output by the training device 120 and the original image is If it is less than a certain threshold, the training of the target model 201 is completed.

本申请实施例中的目标模型201具体可以为神经网络。需要说明的是，在实际的应用中，所述数据库230中维护的训练数据不一定都来自于数据采集设备260的采集，也有可能是从其他设备接收得到的。另外需要说明的是，训练设备220也不一定完全基于数据库230维护的训练数据进行目标模型201的训练，也有可能从云端或其他地方获取训练数据进行模型训练，上述描述不应该作为对本申请实施例的限定。The target model 201 in the embodiment of the present application may specifically be a neural network. It should be noted that in actual applications, the training data maintained in the database 230 may not all come from the collection of the data collection device 260, and may also be received from other devices. In addition, it should be noted that the training device 220 does not necessarily perform training of the target model 201 completely based on the training data maintained by the database 230. It may also obtain training data from the cloud or other places for model training. The above description should not be used as a reference to the embodiments of this application. The limit.

根据训练设备220训练得到的目标模型201可以应用于不同的系统或设备中，如应用于图2所示的客户设备240，所述客户设备240可以是终端，如手机终端，平板电脑，笔记本电脑，增强现实(augmented reality，AR)/虚拟现实(virtual reality，VR)，车载终端等，还可以是服务器或者云端等。The target model 201 obtained by training according to the training device 220 can be applied to different systems or devices, such as the client device 240 shown in FIG. 2. The client device 240 may be a terminal, such as a mobile phone terminal, a tablet computer, or a notebook computer. , Augmented reality (AR)/virtual reality (VR), in-vehicle terminal, etc., it can also be a server or cloud.

训练设备220可以针对不同的目标或称不同的任务，基于不同的训练数据生成相应的目标模型201，该相应的目标模型201即可以用于实现上述目标或完成上述任务，从而为用户提供所需的结果。The training device 220 can generate a corresponding target model 201 based on different training data for different goals or different tasks. The corresponding target model 201 can be used to achieve the above goals or complete the above tasks, so as to provide users with what they need. the result of.

根据训练设备220训练得到目标模型201，可以是CNN，深度卷积神经网络(deep convolutional neural networks,DCNN)等等。The target model 201 is obtained by training according to the training device 220, which may be a CNN, a deep convolutional neural network (deep convolutional neural networks, DCNN), and so on.

值得注意的是，图4仅是本申请实施例提供的一种系统架构的示意图，图中所示设备、器件、模块等之间的位置关系、训练数据的类型以及神经网络的类型或功能不构成任何限制。例如，在图4中，客户设备240与训练设备220可以为同一个设备。It is worth noting that FIG. 4 is only a schematic diagram of a system architecture provided by an embodiment of the present application. The positional relationship among the devices, devices, modules, etc. shown in the figure, the type of training data, and the type or function of the neural network are different. Constitute any restriction. For example, in FIG. 4, the client device 240 and the training device 220 may be the same device.

训练设备220训练得到的神经网络之后，若直接根据训练得到的神经网络执行目标任务，例如进行图像分割、图像识别等等，且神经网络处理器中的运算电路中的每个PE只能计算大小为M的卷积核与大小为M的特征图的卷积结果，而神经网络中至少一个卷积核中的参数的数量为M+1，则需要至少两个PE才能计算得到大小为M+1的卷积核与大小为M+1的特征图的卷积结果。但是这样会使得PE中较多乘法器和加法器空转，从而浪费资源。After the neural network trained by the training device 220, if the target task is performed directly based on the trained neural network, such as image segmentation, image recognition, etc., and each PE in the arithmetic circuit in the neural network processor can only calculate the size The convolution result of the convolution kernel of M and the feature map of size M, and the number of parameters in at least one convolution kernel in the neural network is M+1, at least two PEs are required to calculate the size of M+ The convolution result of the convolution kernel of 1 and the feature map of size M+1. However, this will cause more multipliers and adders in the PE to spin idle, thereby wasting resources.

例如，神经网络中包括深度可分离卷积层，该深度可分离卷积层的卷积核的大小为3*3*1，而一个PE只能计算8个的乘累加时，就需要两个PE来计算该卷积核的卷积结果。这会导致有一个PE中的6个乘法器和四个加法器是空转，从而浪费资源。For example, the neural network includes a depth separable convolutional layer, the size of the convolution kernel of the depth separable convolution layer is 3*3*1, and when a PE can only calculate the multiplication and accumulation of 8, it needs two PE to calculate the convolution result of the convolution kernel. This will cause 6 multipliers and four adders in a PE to be idle, thus wasting resources.

针对资源浪费的问题，本申请提出了新的数据处理方法和数据处理装置。Aiming at the problem of resource waste, this application proposes a new data processing method and data processing device.

图5是本申请一个实施例的数据处理方法的示意性流程图。该方法至少包括S510至S540。该方法可以由图3中的主CPU来执行。Fig. 5 is a schematic flowchart of a data processing method according to an embodiment of the present application. The method includes at least S510 to S540. This method can be executed by the main CPU in FIG. 3.

S510，根据神经网络第L层的初始卷积核得到第L层的等效卷积核，其中，所述第L层的等效卷积核的参数基于所述第L层的初始卷积核的对应参数与所述第L层的初始卷积核中的第一初始参数的商获得，所述第一初始参数的值为K，K为非零数，所述第L层的等效卷积核用于对所述第L层的特征图进行卷积处理。S510: Obtain an equivalent convolution kernel of the Lth layer according to the initial convolution kernel of the Lth layer of the neural network, where the parameters of the equivalent convolution kernel of the Lth layer are based on the initial convolution kernel of the Lth layer The quotient of the corresponding parameter of and the first initial parameter in the initial convolution kernel of the L-th layer is obtained, the value of the first initial parameter is K, and K is a non-zero number, and the equivalent volume of the L-th layer The product kernel is used to perform convolution processing on the feature map of the L-th layer.

本实施例中的神经网络可以是任意包含卷积处理的神经网络。例如，本申请实施例中的神经网络可以是图1或图2中所示的卷积神经网络。The neural network in this embodiment may be any neural network including convolution processing. For example, the neural network in the embodiment of the present application may be the convolutional neural network shown in FIG. 1 or FIG. 2.

本实施例中的第L层可是任意包含卷积处理的层，例如可以是卷积层，进一步地，可以是深度可分离卷积层。The Lth layer in this embodiment may be any layer that includes convolution processing, for example, it may be a convolution layer, and further, it may be a deeply separable convolution layer.

本实施例中的初始卷积核可以是对神经网络进行训练得到的卷积核，或者可以是对神经网络进行训练以及优化后得到的卷积核。The initial convolution kernel in this embodiment may be a convolution kernel obtained by training a neural network, or may be a convolution kernel obtained by training and optimizing the neural network.

本实施例中，第L层的等效卷积核中的参数基于第L层的初始卷积核的对应参数与第L层的初始卷积核中的第一初始参数的商获得，可以理解为：将第L层的初始卷积核中的一个非零参数称为第一初始参数，将初始卷积核中所有参数除上该第一初始参数，得到每个参数与该第一初始参数的商，并根据每个参数与该第一初始参数的商来确定用于对第L层的特征图进行卷积处理的等效卷积核。也就是说，得到等效卷积核之后，后续使用该神经网络中的第L层对特征图进行处理时，不再使用初始卷积核，而是使用等效卷积核。In this embodiment, the parameters in the equivalent convolution kernel of the Lth layer are obtained based on the quotient of the corresponding parameters of the initial convolution kernel of the Lth layer and the first initial parameter in the initial convolution kernel of the Lth layer. It is understandable As: call a non-zero parameter in the initial convolution kernel of the L-th layer as the first initial parameter, divide all the parameters in the initial convolution kernel by the first initial parameter, and obtain each parameter and the first initial parameter According to the quotient of each parameter and the first initial parameter, an equivalent convolution kernel used to perform convolution processing on the feature map of the Lth layer is determined. That is to say, after the equivalent convolution kernel is obtained, when the L-th layer in the neural network is subsequently used to process the feature map, the initial convolution kernel is no longer used, but the equivalent convolution kernel is used.

S520，获取所述神经网络第L+1层中与所述第L层的初始卷积核具有映射关系的所述第L+1层的初始卷积核。S520: Obtain the initial convolution kernel of the L+1th layer in the L+1th layer of the neural network that has a mapping relationship with the initial convolution kernel of the Lth layer.

也就是说，获取第L层之后的第一个层中，与第L层中前述初始卷积核中的参数具有映射关系的初始参数。That is, in the first layer after the Lth layer, the initial parameters that have a mapping relationship with the parameters in the aforementioned initial convolution kernel in the Lth layer are obtained.

第L+1层的初始参数可以是对神经网络进行训练得到的第L+1层的参数，或者，可以是对神经网络进行训练和优化后得到的第L+1层的参数。The initial parameters of the L+1th layer may be the parameters of the L+1th layer obtained by training the neural network, or may be the parameters of the L+1th layer obtained by training and optimizing the neural network.

第L+1层中与第L层中前述初始卷积核中的参数具有映射关系的初始参数可以理解为：若在神经网络的第L层使用前述初始卷积核对特征图进行处理并输出卷积得到的特征值之后，该卷积得到的特征值输入第L+1层之后，第L+1层中原本应用于对这些特征值进行处理的参数则为第L+1层中与第L层的初始卷积核中的参数具有映射关系的初始参数。The initial parameters in the L+1 layer that have a mapping relationship with the parameters in the aforementioned initial convolution kernel in the Lth layer can be understood as: if the aforementioned initial convolution kernel is used in the Lth layer of the neural network to process the feature map and output the volume After the eigenvalues obtained by the convolution, after the eigenvalues obtained by the convolution are input to the L+1th layer, the parameters originally used to process these eigenvalues in the L+1th layer are those in the L+1th layer and the Lth layer. The parameters in the initial convolution kernel of the layer have the initial parameters of the mapping relationship.

第L+1层可以是深度可分离卷积层、通常卷积层或全连接层等。其中，通常卷积层也可以称为常规卷积层或标准卷积层。不同类型的第L+1层，其初始参数与第L层的卷积核的初始参数之间的映射关系不同。The L+1th layer can be a deeply separable convolutional layer, a usual convolutional layer, or a fully connected layer. Among them, usually the convolutional layer can also be referred to as a conventional convolutional layer or a standard convolutional layer. Different types of the L+1th layer have different mapping relationships between their initial parameters and the initial parameters of the convolution kernel of the Lth layer.

神经网络第L+1层中的初始参数与所述第L层的初始卷积核中的初始参数之间的映射关系的示例，将在后续内容中结合图7、图8和图9进行介绍。Examples of the mapping relationship between the initial parameters in the L+1th layer of the neural network and the initial parameters in the initial convolution kernel of the Lth layer will be introduced in the following content with reference to Figures 7, 8 and 9 .

S530，将所述第L+1层的初始卷积核中的每个参数扩大K倍。S530: Expand each parameter in the initial convolution kernel of the L+1th layer by K times.

因为S510中，第L层的等效该卷积核中的参数是基于所述第L层的初始卷积核的对应参数与所述第L层的初始卷积核中的第一初始参数的商获得的，所以，等效卷积核中的参数通常来说，会比第L层的初始卷积核缩小K倍。为了保证该神经网络对输入的数据处理的结果的准确性，可以将第L+1层的初始参数扩大K倍，这样，就能对第L层输出的特征值进行补偿。Because in S510, the equivalent of the Lth layer, the parameters in the convolution kernel are based on the corresponding parameters of the initial convolution kernel of the Lth layer and the first initial parameters in the initial convolution kernel of the Lth layer. Therefore, the parameters in the equivalent convolution kernel are usually K times smaller than the initial convolution kernel of the Lth layer. In order to ensure the accuracy of the results of the input data processing by the neural network, the initial parameters of the L+1th layer can be enlarged by K times, so that the eigenvalues output by the Lth layer can be compensated.

S540，根据所述扩大处理后的第L+1层的初始卷积核确定所述第L+1层的等效卷积核。S540: Determine the equivalent convolution kernel of the L+1th layer according to the initial convolution kernel of the L+1th layer after the expansion processing.

本实施例中，第L+1层的等效卷积核用于对第L层输出的特征值进行处理，或者说用于对输入第L+1层的特征值进行处理。也就是说，在应用场景中使用该神经网络进行业务处理，例如进行图像分类、图像分割、图像识别等业务处理时，不再使用第L+1层的初始卷积核，而是使用第L+1层的等效卷积核。In this embodiment, the equivalent convolution kernel of the L+1th layer is used to process the eigenvalues output by the Lth layer, or in other words, is used to process the eigenvalues input to the L+1th layer. That is to say, when using the neural network for business processing in application scenarios, such as image classification, image segmentation, image recognition and other business processing, the initial convolution kernel of the L+1th layer is no longer used, but the Lth convolution kernel is used instead. +1 layer equivalent convolution kernel.

本实施例中的数据处理方法，对第L层的初始卷积核进行相关处理，使得处理得到的等效卷积核用于对输入第L层的特征图进行卷积处理时，该等效卷积核与该特征图之间的卷积运算不再全部需要乘累加运算，而是使得等效卷积核与该特征图之间的卷积运算可以分解为部分卷积参数与对应特征值的乘累加运算，以及该乘累加运算结果与特征图中的没有参数乘累加运算的特征值的非乘累加运算。这样可以使得在神经网络第L层对特征图进行卷积处理时，若第L层的卷积核中参数的数量比进行卷积处理的运算电路一次能够计算的乘累加次数多1，则可以不在出现因为需要多使用一个运算电路而导致多使用的运算电路中大部分乘法器和加法器空转，从而导致的资源浪费的问题。The data processing method in this embodiment performs correlation processing on the initial convolution kernel of the Lth layer, so that when the processed equivalent convolution kernel is used for convolution processing on the feature map of the input Lth layer, the equivalent The convolution operation between the convolution kernel and the feature map no longer requires all multiplication and accumulation operations, but the convolution operation between the equivalent convolution kernel and the feature map can be decomposed into partial convolution parameters and corresponding feature values The multiply-accumulate operation of, and the non-multiply-accumulate operation between the multiply-accumulate operation result and the feature value without parameter multiply-accumulate operation in the feature map. In this way, when performing convolution processing on the feature map in the L-th layer of the neural network, if the number of parameters in the convolution kernel of the L-th layer is 1 more than the number of times of multiplication and accumulation that the arithmetic circuit for convolution processing can calculate at a time, then There is no longer a problem of waste of resources due to the need to use one more arithmetic circuit that causes most of the multipliers and adders in the arithmetic circuits to be used more idling.

此外，根据L层的初始卷积核获取等效卷积核时，由于采用的处理所导致的等效卷积核与初始卷积核之间的差量，可以通过第L+1层的等效卷积核与初始卷积核之间的差量补偿回来，保证了神经网络对输入数据的处理的准确度。In addition, when the equivalent convolution kernel is obtained from the initial convolution kernel of the L layer, the difference between the equivalent convolution kernel and the initial convolution kernel due to the processing used can be passed through the L+1th layer, etc. The difference between the effective convolution kernel and the initial convolution kernel is compensated back to ensure the accuracy of the neural network's processing of the input data.

本实施例中，作为一种可能的实现方式，根据第L层的初始卷积核获取第L层的等效卷积核，可以包括如下步骤：将第L层的初始卷积核中的一个参数确定为第一初始参数，在第一初始参数不为零的情况下，将初始卷积核中的所有参数均除上第一初始参数，得到的商构成的卷积核即为等效卷积核。其中，若第一初始参数为0，则可以将初始卷积核直接作为等效卷积核。In this embodiment, as a possible implementation, obtaining the equivalent convolution kernel of the Lth layer according to the initial convolution kernel of the Lth layer may include the following steps: adding one of the initial convolution kernels of the Lth layer The parameter is determined as the first initial parameter. When the first initial parameter is not zero, all the parameters in the initial convolution kernel are divided by the first initial parameter, and the convolution kernel formed by the quotient is the equivalent convolution. Product core. Among them, if the first initial parameter is 0, the initial convolution kernel can be directly used as the equivalent convolution kernel.

这种实现方式中，第L层的初始卷积核中的第一初始参数除上第一初始参数得到的值为1，即等效卷积核中有一个参数的值为1，这个值为1的参数称为第一参数。In this implementation, the value obtained by dividing the first initial parameter in the initial convolution kernel of the L-th layer by the first initial parameter is 1, that is, the value of one parameter in the equivalent convolution kernel is 1, and this value is The parameter of 1 is called the first parameter.

本实施例中，假设所述第L层的初始卷积核包括一个第一初始参数和M个第二初始参数，则相应地得到的所述第L层的等效卷积核包括M个第二参数和一个第一参数，且所述M个第二参数可以分别对应待处理的特征图的M个第二特征值，所述第一参数可以对应所述待处理的特征图的第一特征值。In this embodiment, assuming that the initial convolution kernel of the Lth layer includes a first initial parameter and M second initial parameters, the equivalent convolution kernel of the Lth layer obtained accordingly includes M Two parameters and one first parameter, and the M second parameters may respectively correspond to the M second feature values of the feature map to be processed, and the first parameter may correspond to the first feature of the feature map to be processed value.

本实施例中，在一些实现方式中，基于第L层的初始卷积核的对应参数与第L层的初始卷积核中的第一初始参数的商获得第L层等效卷积核的参数时，可以包括以下步骤：当第L层的初始卷积核中的至少一个参数与初始卷积核中的第一初始参数的商大于第一阈值时，可以先将第L层的初始卷积核中所有的参数先缩小m倍，然后再缩小m倍后的参数除上第一初始参数，其中，第L层的初始卷积核中的任意参数缩小m倍后得到的参数与第一初始参数的商应不大于所述第一阈值。In this embodiment, in some implementations, based on the quotient of the corresponding parameter of the initial convolution kernel of the Lth layer and the first initial parameter in the initial convolution kernel of the Lth layer, the value of the equivalent convolution kernel of the Lth layer is obtained. The following steps may be included: when the quotient of at least one parameter in the initial convolution kernel of the Lth layer and the first initial parameter in the initial convolution kernel is greater than the first threshold, the initial convolution of the Lth layer All the parameters in the product kernel are reduced by m times, and then the parameters after the reduction by m times are divided by the first initial parameter. Among them, any parameter in the initial convolution kernel of the Lth layer is reduced by m times and the parameter obtained after the reduction of m times is the same as the first parameter. The quotient of the initial parameter should not be greater than the first threshold.

该第一阈值可以是基于等效卷积核对特征图进行卷积处理时所使用的装置的最大可表达值。例如，基于等效卷积核对特征图进行卷积处理所使用的装置如图3中的神经网络处理器50所示时，第一阈值可以小于或等于其中的存储器以及运算单元的最大可表达值。这样可以避免数据溢出，从而提高神经网络进行业务处理的准确性。The first threshold may be the maximum expressible value of the device used when performing convolution processing on the feature map based on the equivalent convolution kernel. For example, when the device used to perform convolution processing on the feature map based on the equivalent convolution kernel is shown in the neural network processor 50 in FIG. 3, the first threshold may be less than or equal to the maximum expressible value of the memory and the arithmetic unit therein. . In this way, data overflow can be avoided and the accuracy of business processing by the neural network can be improved.

在一些可能的实现方式中，基于第L层的初始卷积核的参数与第L层的初始卷积核中的第一初始参数的商获得第L层等效卷积核的参数时，可以包括如下步骤：当第L层的初始卷积核中的至少一个参数与所述第一初始参数的商小于第二阈值时，可以先将第L层的初始卷积核的所有参数扩大n倍，然后再将扩大得到的参数除上第一初始参数，从而得到等效卷积核，其中，第L层的初始卷积核的任意参数扩大n倍后得到的参数与第一初始参数的商应不小于所述第二阈值。In some possible implementations, when the parameters of the L-th equivalent convolution kernel are obtained based on the quotient of the parameters of the initial convolution kernel of the L-th layer and the first initial parameters in the initial convolution kernel of the L-th layer, you can The method includes the following steps: when the quotient of at least one parameter in the initial convolution kernel of the Lth layer and the first initial parameter is less than a second threshold, all parameters of the initial convolution kernel of the Lth layer may be expanded by n times , And then divide the expanded parameter by the first initial parameter to obtain the equivalent convolution kernel, where any parameter of the initial convolution kernel of the Lth layer is expanded by n times to obtain the quotient of the parameter and the first initial parameter Should not be less than the second threshold.

该第二阈值可以是基于等效卷积核对特征图进行卷积处理时所使用的装置的最小可表达值。例如，基于等效卷积核对特征图进行卷积处理所使用的装置如图3中的神经网络处理器50所示时，第二阈值可以大于或等于其中的存储器以及运算单元的最小可表达值。这样可以避免数值溢出，从而提高神经网络进行业务处理的准确性。The second threshold may be the smallest expressible value of the device used when performing convolution processing on the feature map based on the equivalent convolution kernel. For example, when the device used to perform convolution processing on the feature map based on the equivalent convolution kernel is shown in the neural network processor 50 in FIG. 3, the second threshold may be greater than or equal to the minimum expressible value of the memory and the arithmetic unit therein. . In this way, numerical overflow can be avoided and the accuracy of business processing by the neural network can be improved.

下面以第L层为深度可分离卷积层、第L层的初始卷积核为3*3的卷积核为例，介绍根据神经网络的第L层的初始卷积核获取第L层的等效卷积核的一种实现方式。Taking the L-th layer as the depth separable convolutional layer and the initial convolution kernel of the L-th layer as an example of a 3*3 convolution kernel, we will introduce how to obtain the L-th layer based on the initial convolution kernel of the L-th layer of the neural network. An implementation of the equivalent convolution kernel.

在该实现方式中，将第L层中3*3卷积核中的任意一个参数作为第一初始参数(或者可以称为公因子)；当第一初始参数非0时，将卷积核的所有9个参数(即8个第二初始参数和1个第一初始参数)除以第一初始参数的绝对值，当第一初始参数为0时，所有参数不变，以保证得到的等效卷积核内有8个正常的第二参数和一个值为1、0或-1的第二参数。这样可以使得第L层的卷积运算由9个MAC操作变为8个MAC结果(即8个第二参数与8个第二特征值)与一个常数(即第一特征值)相加的计算。基于该实现方式，可以在现有能处理8个MAC的PE中添加一个加法器，就可以在每一个时钟基于一个3*3的卷积核进行处理，从而使得新PE在处理卷积处理时能够以100％效率工作，大幅提升硬件的性能。In this implementation, any parameter in the 3*3 convolution kernel in the Lth layer is taken as the first initial parameter (or can be called the common factor); when the first initial parameter is non-zero, the convolution kernel’s All 9 parameters (ie 8 second initial parameters and 1 first initial parameter) are divided by the absolute value of the first initial parameter. When the first initial parameter is 0, all parameters remain unchanged to ensure the equivalent There are 8 normal second parameters and a second parameter with a value of 1, 0 or -1 in the convolution kernel. In this way, the convolution operation of the L-th layer can be changed from 9 MAC operations to the calculation of the addition of 8 MAC results (that is, 8 second parameters and 8 second eigenvalues) and a constant (that is, the first eigenvalue) . Based on this implementation method, an adder can be added to the existing PE capable of processing 8 MACs, and processing can be performed based on a 3*3 convolution kernel at each clock, so that the new PE can process convolution processing. It can work with 100% efficiency and greatly improve the performance of the hardware.

如图6所示，初始卷积核中包括w0、w1、w2、w3、w4、w5、w6、w7和w8九个初始参数时，选取w4为第一初始参数；在w4大于0的情况下，将这九个初始参数均除上w4，从而得到等效卷积核，这种情况，w4称为公因子；在w4等于0的情况下，不对这九个初始参数进行额外处理，即等效卷积核与初始卷积核相同，这种情况下，公因子为0；在w4小于0的情况下，将这九个初始参数除上-w4，从而得到等效卷积核，这种情况下，-w4称为公因子。As shown in Figure 6, when the initial convolution kernel includes nine initial parameters of w0, w1, w2, w3, w4, w5, w6, w7, and w8, select w4 as the first initial parameter; when w4 is greater than 0 , Divide these nine initial parameters by w4 to obtain the equivalent convolution kernel. In this case, w4 is called the common factor; when w4 is equal to 0, no additional processing is performed on these nine initial parameters, that is, wait The effective convolution kernel is the same as the initial convolution kernel. In this case, the common factor is 0; when w4 is less than 0, divide these nine initial parameters by -w4 to obtain the equivalent convolution kernel. In this case, -w4 is called the common factor.

其中，可选地，在w4小于0时，可以将这九个初始参数除上w4，而不是除上-w4。Wherein, optionally, when w4 is less than 0, the nine initial parameters can be divided by w4 instead of -w4.

根据初始卷积核得到的等效卷积核用于对第L层输入的特征值进行卷积处理，卷积结果作为第L+1层的输入。The equivalent convolution kernel obtained according to the initial convolution kernel is used to perform convolution processing on the eigenvalue input of the Lth layer, and the convolution result is used as the input of the L+1th layer.

下面介绍第L层的初始卷积核为图6中所示的卷积核，且第L+1层也为深度可分离卷积层时，根据神经网络的第L+1层的初始参数获取第L+1层的等效参数的实现方式。The following describes that when the initial convolution kernel of the Lth layer is the convolution kernel shown in Figure 6, and the L+1th layer is also a deeply separable convolutional layer, it is obtained according to the initial parameters of the L+1th layer of the neural network The implementation of the equivalent parameters of the L+1 layer.

第L+1层为深度可分离卷积层时，若第L层的初始卷积核的公因子为0，则可以不对第L+1层的初始卷积核进行额外处理，即第L+1层的初始卷积核为等效卷积核；否则，可以将第L+1层的初始卷积核乘上第L层的初始卷积核的公因子，并进行图6中的类似操作，第L+1层的所述初始卷积核为第L+1层中与第L层的所述初始卷积核对应的卷积核。第L+1层的初始卷积核与等效卷积核的一种示例如图7所示。When the L+1th layer is a depthwise separable convolutional layer, if the common factor of the initial convolution kernel of the Lth layer is 0, the initial convolution kernel of the L+1th layer may not be processed additionally, that is, the L+th layer The initial convolution kernel of layer 1 is the equivalent convolution kernel; otherwise, the initial convolution kernel of layer L+1 can be multiplied by the common factor of the initial convolution kernel of layer L, and the similar operation in Figure 6 can be performed , The initial convolution kernel of the L+1th layer is a convolution kernel corresponding to the initial convolution kernel of the Lth layer in the L+1th layer. An example of the initial convolution kernel and equivalent convolution kernel of the L+1 layer is shown in FIG. 7.

由图7可知，第L+1层的等效卷积核中的第二参数，等于第L+1层的初始卷积核中相对应的第二初始参数乘上第L层中的第一初始参数得到的积与第L+1层的初始卷积核中的第一初始参数的商。It can be seen from Figure 7 that the second parameter in the equivalent convolution kernel of the L+1th layer is equal to the corresponding second initial parameter in the initial convolution kernel of the L+1th layer multiplied by the first parameter in the Lth layer. The quotient of the product of the initial parameters and the first initial parameter in the initial convolution kernel of the L+1th layer.

下面介绍第L层的初始卷积核为深度可分离卷积核，且第L+1层为通常卷积层时，根据神经网络的第L+1层的初始参数获取第L+1层的等效参数的实现方式。The following describes that the initial convolution kernel of the Lth layer is a depth separable convolution kernel, and when the L+1th layer is a normal convolutional layer, obtain the L+1th layer according to the initial parameters of the L+1th layer of the neural network The implementation of equivalent parameters.

如图8所示，第L层有16个输入通道(input channel)，这16个输入通道的编号为0至15，这16个输入通道与16个初始卷积核一一对应，这16个初始卷积核的公因子为w0至w15，则根据神经网络的第L+1层的初始参数获取第L+1层的等效参数时，可以将L+1层的输入通道中每一层卷积核乘上w0至w15中对应的公因子，从而得到第L+1层的等效参数。As shown in Figure 8, the L-th layer has 16 input channels. The 16 input channels are numbered from 0 to 15. These 16 input channels correspond to the 16 initial convolution kernels one-to-one, and these 16 The common factors of the initial convolution kernel are w0 to w15. When obtaining the equivalent parameters of the L+1 layer according to the initial parameters of the L+1 layer of the neural network, each layer of the input channel of the L+1 layer The convolution kernel multiplies the corresponding common factors from w0 to w15 to obtain the equivalent parameters of the L+1th layer.

下面介绍第L层的初始卷积核为深度可分离卷积核，且第L+1层为全连接层时，根据神经网络的第L+1层的初始参数获取第L+1层的等效参数的实现方式。The following describes that when the initial convolution kernel of the Lth layer is a depth separable convolution kernel, and the L+1th layer is a fully connected layer, the L+1th layer is obtained according to the initial parameters of the L+1th layer of the neural network. Implementation of effective parameters.

如图9所示，当L+1为全连接层时，首选明确第L+1层输出的特征图中的各个特征值在特征图换为1维向量之后的位置，基于特征图的1维向量与全连接层的参数等长且位置呈对应关系的特性，识别第L+1层的各个参数与第L层中的哪个初始卷积核对应，并将各个参数乘上对应初始卷积核的公因子，其中，公因子不为零。As shown in Figure 9, when L+1 is a fully connected layer, it is first to clarify the position of each feature value in the feature map output by the L+1 layer after the feature map is changed to a 1-dimensional vector, based on the 1-dimensional feature map The vector and the parameters of the fully connected layer have the same length and the characteristics of the corresponding position. Identify each parameter of the L+1th layer and which initial convolution kernel in the Lth layer corresponds to, and multiply each parameter by the corresponding initial convolution kernel The common factor of, where the common factor is not zero.

本申请的数据处理方法中，可以对神经网络中的所有深度可分离卷积层执行图5中的全部或部分步骤，也可以仅对神经网络中的部分深度可分离卷积层执行图5中的全部或部分步骤。In the data processing method of the present application, all or part of the steps in Figure 5 can be performed on all the deep separable convolutional layers in the neural network, or only part of the deep separable convolutional layers in the neural network. All or part of the steps.

下面以第一阈值为实现卷积处理的装置内置硬件数据格式的最大可表达值，即第L层的初始卷积核中的任意一个第二初始参数与第一初始参数的商超出最大可表达值域，且最大可表达值为2的16次幂为例，介绍如何对第L层的初始卷积核进行处理，以得到不超出可表达值域的等效卷积核。The following uses the first threshold as the maximum expressible value of the built-in hardware data format of the device that realizes convolution processing, that is, the quotient of any second initial parameter in the initial convolution kernel of the L-th layer and the first initial parameter exceeds the maximum expressible value The value range and the maximum expressible value of 2 to the 16th power are taken as an example to introduce how to process the initial convolution kernel of the L-th layer to obtain an equivalent convolution kernel that does not exceed the expressible value range.

如图10中左图所示，第L层的初始卷积核中的第一初始参数为0.001，而其中一个第二初始参数为64000时，若直接将64000除上0.001，则会出现商大于极大值的问题。此时，如图10中位于中间的图所示，可以将所有初始参数除上2的10次幂(或者说乘上2的负的10次幂)，并除上0.001，才得到如图10中右图所示的等效卷积核。其中，2的-10次幂是2的幂中最接近且小于最小可表达值的数。As shown in the left figure in Figure 10, the first initial parameter in the initial convolution kernel of the L-th layer is 0.001, and when one of the second initial parameters is 64000, if 64000 is directly divided by 0.001, the quotient will be greater than The problem of maximum value. At this time, as shown in the middle diagram in Figure 10, all the initial parameters can be divided by 2 to the power of 10 (or multiplied by 2 to the negative power of 10), and divided by 0.001 to get Figure 10 The equivalent convolution kernel shown in the right figure. Among them, the -10th power of 2 is the number that is closest to and smaller than the smallest expressible value among the powers of 2.

下面以第二阈值为实现卷积处理的装置内置硬件数据格式的最小可表达值，即第L层的初始卷积核中的任意一个第二初始参数与第一初始参数的商超出最小可表达值域，且最小可表达值为2的-16次幂为例，介绍如何对第L层的初始卷积核进行处理，以得到不超出可表达值域的等效卷积核。The following uses the second threshold as the minimum expressible value of the built-in hardware data format of the device that implements convolution processing, that is, the quotient of any second initial parameter in the initial convolution kernel of the L-th layer and the first initial parameter exceeds the minimum expressible value The value range, and the minimum expressible value of 2 to the power of 16 is taken as an example to introduce how to process the initial convolution kernel of the L-th layer to obtain an equivalent convolution kernel that does not exceed the expressible value range.

如图11中左图所示，第L层的初始卷积核中的第一初始参数为64000，而其中一个第二初始参数为0.001时，若直接将0.001除上64000，则会出现商小于极大值的问题。此时，如图11中位于中间的图所示，可以将所有初始参数乘上2的16次幂，并除上64000，才得到如图11中右图所示的等效卷积核。其中，2的16次幂是2的幂中最接近且大于最大可表达值的数。As shown in the left figure in Figure 11, the first initial parameter in the initial convolution kernel of the L-th layer is 64000, and when one of the second initial parameters is 0.001, if you directly divide 0.001 by 64000, the quotient will be less than The problem of maximum value. At this time, as shown in the middle diagram in FIG. 11, all initial parameters can be multiplied by 2 to the power of 16 and divided by 64000 to obtain the equivalent convolution kernel as shown in the right diagram in FIG. 11. Among them, the 16th power of 2 is the number closest to and greater than the maximum expressible value among the powers of 2.

下面介绍本申请提供的数据装置。本申请提供的数据处理装置可以用于根据二维卷积核对待卷积矩阵进行卷积处理，所述二维卷积核包括一个第一参数和M个第二参数，所述待卷积矩阵包括一个第一特征值和M个第二特征值，所述M个第二参数与所述M个第二特征值一一对应，所述第一参数对应所述第一特征值。The following describes the data device provided by this application. The data processing device provided in the present application can be used to perform convolution processing on a matrix to be convolved according to a two-dimensional convolution kernel. The two-dimensional convolution kernel includes a first parameter and M second parameters. The matrix to be convolved It includes a first characteristic value and M second characteristic values, the M second parameters are in one-to-one correspondence with the M second characteristic values, and the first parameter corresponds to the first characteristic value.

图12为本申请数据处理装置的一种示意性结构图。该数据处理装置包括：M个乘法器、M-1个第一加法器和一个第二加法器，这M个乘法器为Mul 0至Mul M，这M-1个第一加法器为ADD 0至ADD M-2，这个第二加法器为ADD M-1。M为偶数。FIG. 12 is a schematic structural diagram of the data processing device of this application. The data processing device includes: M multipliers, M-1 first adders and a second adder, the M multipliers are Mul 0 to Mul M, and the M 1 first adders are ADD 0 To ADD M-2, this second adder is ADD M-1. M is an even number.

这M个乘法器和这M-1个第一加法器用于对所述M个第二参数和所述M个第二特征值进行乘累加运算，以得到乘累加结果。该第二加法器用于对所述乘累加结果和所述第一特征值进行加运算，以得到所述二维卷积核与所述待卷积矩阵的卷积结果。The M multipliers and the M-1 first adders are used for multiplying and accumulating the M second parameters and the M second characteristic values to obtain a multiplying and accumulating result. The second adder is used to perform an addition operation on the multiplication and accumulation result and the first eigenvalue to obtain a convolution result of the two-dimensional convolution kernel and the matrix to be convolved.

例如，图12所示的数据处理装置可以用于：基于S510中第L层的等效卷积核对输入第L层的待卷积特征值矩阵进行卷积处理。For example, the data processing device shown in FIG. 12 may be used to perform convolution processing on the eigenvalue matrix to be convolved input to the Lth layer based on the equivalent convolution kernel of the Lth layer in S510.

图12所示的数据处理装置可以是图3中的神经网络处理器50中的运算电路503的组成部分。The data processing device shown in FIG. 12 may be a component of the arithmetic circuit 503 in the neural network processor 50 in FIG. 3.

在一些可能的实现方式中，所述装置还包括处理器，所述处理器用于：根据初始卷积核确定所述二维卷积核，所述初始卷积核包括一个第一初始参数和M个第二初始参数，所述M个所述第二参数与所述M个第二初始参数一一对应，所述第一参数与所述第一初始参数对应，所述第二参数等于所述第二参数对应的第二初始参数与所述第一初始参数的商，所述第一初始参数不为零。或者说，该处理器可以用于执行S510。In some possible implementation manners, the device further includes a processor configured to determine the two-dimensional convolution kernel according to an initial convolution kernel, and the initial convolution kernel includes a first initial parameter and M Second initial parameters, the M second parameters correspond to the M second initial parameters one-to-one, the first parameters correspond to the first initial parameters, and the second parameters are equal to all The quotient of the second initial parameter corresponding to the second parameter and the first initial parameter, and the first initial parameter is not zero. In other words, the processor can be used to execute S510.

该实现方式中的数据处理装置用于计算初始卷积核与待卷积矩阵的卷积结果时，可以先通过处理器将初始卷积核中的所有参数除上其中的一个非零参数(即第一初始参数)，从而可以使得得到的二维卷积核中有一个参数(即第一参数)为1。这样，在使用上述M 各乘法器和M个加法器来计算该二维卷积核与待卷积矩阵的卷积结果时，能够得到更准确的值。When the data processing device in this implementation is used to calculate the convolution result of the initial convolution kernel and the matrix to be convolved, the processor can first divide all the parameters in the initial convolution kernel by one of the non-zero parameters (ie The first initial parameter), so that one parameter (that is, the first parameter) in the obtained two-dimensional convolution kernel is 1. In this way, when the above-mentioned M multipliers and M adders are used to calculate the convolution result of the two-dimensional convolution kernel and the matrix to be convolved, more accurate values can be obtained.

在一些可能的实现方式中，当至少一个所述第二初始参数与所述第一初始参数的商大于第一阈值时，在所述根据初始卷积核确定所述二维卷积核之前，所述处理器还用于：对所述M个第二参数或所述M个第二初始参数缩小m倍，其中，任一缩小m倍的第二初始参数与所述第一初始参数的商不大于所述第一阈值。In some possible implementation manners, when the quotient of at least one of the second initial parameters and the first initial parameter is greater than a first threshold, before the determining the two-dimensional convolution kernel according to the initial convolution kernel, The processor is further configured to: reduce the M second parameters or the M second initial parameters by m times, wherein the quotient of any second initial parameter reduced by m times and the first initial parameter Not greater than the first threshold.

其中，对所述第一特征值进行缩小m倍处理的操作，可以由所述处理器来执行，或者可以通过移位器来执行。例如，m为2的s次幂时，可以通过至少能够左移s位的移位器来将第一特征值缩小m倍，s为非负整数。这种场景下基于等效卷积核对特征图进行卷积处理的装置的一种示例性结构图如图13所示。Wherein, the operation of reducing the first feature value by m times may be executed by the processor, or may be executed by a shifter. For example, when m is the s power of 2, the first eigenvalue can be reduced by m times by a shifter capable of shifting at least s bits to the left, and s is a non-negative integer. In this scenario, an exemplary structure diagram of the device for performing convolution processing on the feature map based on the equivalent convolution kernel is shown in FIG. 13.

通过图13所述的装置进行卷积处理时，等效卷积核中的M个第二参数和特征图中的M个第二特征值可以输入到M个乘法器，特征图中的第一特征值可以输入到移位器进行移位。这M个第二参数与这M个第二特征值经过这M个乘法器和其中的M个加法器的乘累加处理得到的乘累加结果，与第一特征值经过移位器左移缩小m倍得到的特征值，输入到ADD M-1，ADD M-1输出的和即为等效卷积核与特征图的卷积结果。When the convolution processing is performed by the device described in FIG. 13, the M second parameters in the equivalent convolution kernel and the M second eigenvalues in the feature map can be input to M multipliers, the first in the feature map The characteristic value can be input to the shifter for shifting. The M second parameters and the M second eigenvalues are multiplied and accumulated by the M multipliers and the M adders in the multiplication and accumulation process, and the first eigenvalues are shifted to the left by the shifter to reduce m The eigenvalue obtained by the multiplication is input to ADD M-1, and the sum of the output of ADD M-1 is the convolution result of the equivalent convolution kernel and the feature map.

这是因为在将第二初始参数或第二参数缩小m倍的情况下，第一参数相应地也应该缩小m倍，所以第一参数对第一特征值的处理结果也应该缩小m倍。本申请的数据处理装置直接将第一特征值缩小m倍，以保证卷积结果的准确性。This is because when the second initial parameter or the second parameter is reduced by m times, the first parameter should be reduced by m times accordingly, so the processing result of the first feature value by the first parameter should also be reduced by m times. The data processing device of the present application directly reduces the first feature value by m times to ensure the accuracy of the convolution result.

例如，基于图10的右图所示的等效卷积核对特征图进行卷积处理时，可以将特征图中的第一特征值输入移位器左移10位，得到的特征值再与乘累加结果相加，从而得到卷积结果。For example, when the feature map is convolved based on the equivalent convolution kernel shown in the right image of Figure 10, the first feature value in the feature map can be input to the shifter and shifted to the left by 10 bits, and the resulting feature value can be multiplied by The accumulation result is added to obtain the convolution result.

在一些可能的实现方式中，当至少一个所述第二初始参数与所述第一初始参数的商小于第二阈值时，在所述根据初始卷积核确定所述二维卷积核之前，所述处理器还用于：对所述M个第二参数或所述M个第二初始参数扩大n倍，其中，任一扩大n倍的第二初始参数与所述第一初始参数的商不小于所述第二阈值。In some possible implementation manners, when the quotient of at least one of the second initial parameters and the first initial parameter is less than a second threshold, before the determining the two-dimensional convolution kernel according to the initial convolution kernel, The processor is further configured to: expand the M second parameters or the M second initial parameters by n times, wherein the quotient of any second initial parameter expanded by n times and the first initial parameter Not less than the second threshold.

其中，对所述第一特征值进行扩大n倍处理的操作，可以由执行卷积处理的神经网络处理器所挂靠的主CPU来执行，或者可以通过移位器来执行。例如，n为2的t次幂时，可以通过至少能够右移t位的移位器来将第一特征值扩大n倍，n为非负整数。这种场景下基于等效卷积核对特征图进行卷积处理的装置的一种示例性结构图如图13所示。Wherein, the operation of expanding the first feature value by n times may be executed by the main CPU attached to the neural network processor that executes the convolution processing, or may be executed by a shifter. For example, when n is 2 to the power of t, the first eigenvalue can be expanded by n times by a shifter capable of shifting at least t bits to the right, and n is a non-negative integer. In this scenario, an exemplary structure diagram of the device for performing convolution processing on the feature map based on the equivalent convolution kernel is shown in FIG. 13.

通过图13所述的装置进行卷积处理时，等效卷积核中的M个第二参数和特征图中的M个第二特征值可以输入到M个乘法器，特征图中的第一特征值可以输入到移位器进行移位。这M个第二参数与这M个第二特征值经过这M个乘法器和其中的M个加法器的乘累加处理得到的乘累加结果，与第一特征值经过移位器右移扩大n倍得到的特征值，输入到ADD M-1，ADD M-1输出的和即为等效卷积核与特征图的卷积结果。When the convolution processing is performed by the device described in FIG. 13, the M second parameters in the equivalent convolution kernel and the M second eigenvalues in the feature map can be input to M multipliers, the first in the feature map The characteristic value can be input to the shifter for shifting. The M second parameters and the M second eigenvalues are multiplied and accumulated by the M multipliers and the M adders in the multiplication and accumulation process, and the first eigenvalues are shifted to the right and expanded by n by the shifter. The eigenvalue obtained by the multiplication is input to ADD M-1, and the sum of the output of ADD M-1 is the convolution result of the equivalent convolution kernel and the feature map.

例如，基于图11的右图所示的等效卷积核对特征图进行卷积处理时，可以将特征图中的第一特征值输入移位器右移16位，得到的特征值再与乘累加结果相加，从而得到卷积结果。For example, when the feature map is convolved based on the equivalent convolution kernel shown in the right image of Figure 11, the first feature value in the feature map can be input to the shifter and shifted to the right by 16 bits, and the resulting feature value can be multiplied by The accumulation result is added to obtain the convolution result.

在一些可能的实现方式中，所述处理器包括以下一项或者多项的组合：中央处理器(CPU)、图形处理器(GPU)或神经网络处理器(network process units，NPU)。In some possible implementation manners, the processor includes one or a combination of the following: a central processing unit (CPU), a graphics processing unit (GPU), or a neural network processor (network process units, NPU).

例如，如图14所示，该数据处理装置可以包括8个乘法器和8个加法器，其中的8个乘法器与7个加法器用于对8个卷积核参数与8个特征值的乘累加，另一个加法器用于对乘累加结果与另一个特征值进行相加，从而得到二位卷积核与待卷积特征值矩阵的卷积结果。For example, as shown in Figure 14, the data processing device may include 8 multipliers and 8 adders, of which 8 multipliers and 7 adders are used to multiply 8 convolution kernel parameters and 8 eigenvalues. Accumulation, another adder is used to add the result of the multiplication and accumulation to another eigenvalue, thereby obtaining the convolution result of the two-bit convolution kernel and the eigenvalue matrix to be convolved.

可以理解的是，该数据处理装置可以以处理8个乘累加的PE为基础，添加一个加法器，从而得到本申请中的数据处理装置。也就是说本申请的数据处理装置可以复用现有硬件逻辑，再添加加法器即能得到本申请的数据处理装置。It is understandable that the data processing device may be based on processing 8 multiply and accumulate PEs and add an adder to obtain the data processing device in this application. That is to say, the data processing device of the present application can reuse the existing hardware logic, and then add an adder to obtain the data processing device of the present application.

或者，图14中的8个乘法器和7个加法器可以由两个可以处理4个乘累加的PE组成而成。Alternatively, the 8 multipliers and 7 adders in FIG. 14 can be composed of two PEs that can handle 4 multiplication and accumulation.

或者，如图15所示，可以以处理16个乘累加的PE为基础，添加一个加法器，以得到两个图14所示的数据处理装置。其中，ADD 6的输出不再输出到ADD 14，而是输出到新加的ADD 15。Or, as shown in FIG. 15, based on processing 16 multiply-accumulated PEs, an adder may be added to obtain two data processing devices as shown in FIG. 14. Among them, the output of ADD 6 is no longer output to ADD 14, but to the newly added ADD 15.

例如，可以以3个可以处理8个乘累加的PE为基础，添加三个加法器，以得到一个本申请中可以基于5*5的二位卷积核对5*5的待卷积特征矩阵的数据处理装置。其中，这3个新加的加法器用于将所述3个PE的乘累加结果与待卷积特征矩阵中的第一特征值进行相加。For example, based on 3 PEs that can handle 8 multiplication and accumulation, three adders can be added to obtain a 5*5 two-bit convolution kernel based on the 5*5 feature matrix to be convolved in this application. Data processing device. Among them, the three newly added adders are used to add the multiplication and accumulation results of the three PEs to the first eigenvalue in the feature matrix to be convolved.

在一些实现方式中，图14所示的装置中，还可以包括移位器，该移位器的输出端口与ADD 7的一个输入端口相连；或者，图15所示的装置中，还可以包括两个移位器，其中一个移位器的输出端口与ADD 15的输入端口相连，另一个移位器的输出端口与ADD 14的输入端口相连。这种情况下，待卷积特征矩阵中的第一特征值先输入到移位器进行移位，移位得到的结果与乘累加结果输入到加法器相加，从而得到卷积结果。这种包含移位器的装置在根据初始卷积核经过扩大或缩小得到的等效卷积核进行卷积处理时使用。当然，根据初始卷积核没有经过扩大或缩小处理而得到的等效卷积核进行卷积处理时，也可以使用包含移位器的装置，只是这种情况下，移位器不需要对输入的第一特征值移位而已。In some implementation manners, the device shown in FIG. 14 may further include a shifter, and the output port of the shifter is connected to an input port of ADD 7; or, the device shown in FIG. 15 may also include Two shifters, the output port of one shifter is connected to the input port of ADD 15, and the output port of the other shifter is connected to the input port of ADD 14. In this case, the first eigenvalue in the feature matrix to be convolved is first input to the shifter for shifting, and the result of the shift and the multiplication and accumulation result are input to the adder and added to obtain the convolution result. This device including a shifter is used when performing convolution processing based on the equivalent convolution kernel obtained by the expansion or reduction of the initial convolution kernel. Of course, when convolution processing is performed based on the equivalent convolution kernel obtained by the initial convolution kernel without expansion or reduction processing, a device containing a shifter can also be used, but in this case, the shifter does not need to input The first eigenvalue of is just shifted.

本实施例的一些可能的实现方式中，所述二维卷积核为一个N维卷积核的一个二维矩阵分量，N为大于2的整数。In some possible implementation manners of this embodiment, the two-dimensional convolution kernel is a two-dimensional matrix component of an N-dimensional convolution kernel, and N is an integer greater than 2.

下面以一个PE能够基于包含9个参数的卷积核对包含9个特征值的特征图进行卷积为例，结合图16和图17示例性介绍本申请提出的数据读取方法。例如，该PE为图14或图15所示的结构。进一步地，例如，该PE为基于能够处理8个乘累加的PE添加加法器、甚至还添加移位器得到。Taking a PE that can convolve a feature map containing 9 feature values based on a convolution kernel containing 9 parameters as an example, the data reading method proposed by this application will be exemplarily introduced in conjunction with FIG. 16 and FIG. 17. For example, the PE has the structure shown in FIG. 14 or FIG. 15. Further, for example, the PE is obtained by adding an adder or even a shifter based on a PE capable of processing 8 multiplication and accumulation.

如图16所示，左边实线方框表示特征图，左边实线方框中一个正方形虚线边框表示一个卷积窗口，卷积窗口向右移动，且移动步长为1，卷积窗口的大小由右边实线框表示的卷积核的大小决定，特征图中第一行特征值用A0至An表示，第二行特征值用于B0至Bn表示，依次类推；中间的一个实线方框表示一个PE；右边实线方框表示卷积核K，该卷积核的一种示例为前述等效卷积核，该卷积核的大小为3*3。图16中示例性给出16个PE，分别为PE0至PE15，例如，神经网络处理器中包括这样的16个PE。图16中，箭头表示数据流向。As shown in Figure 16, the solid line box on the left represents the feature map, and a square dashed border in the solid line box on the left represents a convolution window. The convolution window moves to the right, and the moving step is 1, the size of the convolution window Determined by the size of the convolution kernel represented by the solid line box on the right, the feature values of the first row of the feature map are represented by A0 to An, and the feature values of the second row are represented by B0 to Bn, and so on; a solid box in the middle Represents a PE; the solid box on the right represents the convolution kernel K. An example of the convolution kernel is the aforementioned equivalent convolution kernel, and the size of the convolution kernel is 3*3. 16 PEs are exemplarily given in FIG. 16, which are respectively PE0 to PE15. For example, the neural network processor includes such 16 PEs. In Figure 16, the arrows indicate the flow of data.

其中，每个PE可以基于一个卷积核对一个卷积核窗口中的特征值进行卷积，则一个时钟周期内16个PE可以基于同一个卷积核对16个卷积核窗口中的特征值进行卷积。Among them, each PE can convolve the eigenvalues in a convolution kernel window based on a convolution kernel, and then 16 PEs in a clock cycle can perform convolution on the eigenvalues in the 16 convolution kernel windows based on the same convolution kernel. convolution.

如图17所示，在初始时钟时，与进行卷积处理的单元直接相连的直连缓存可以从下级缓存中读取3*(16+2)个数据，这3*(16+2)个数据分别为A0至A17、B0至B17和C0至C17，然后直连缓存可以将A0至A2、B0至B2、C0至C2传输给PE0，将A1至A3、B1至B3和C1至C3传输至PE1，依次类推，直到A15至A17、B15至B17和C15至C17输出到PE15。As shown in Figure 17, at the time of the initial clock, the directly connected buffer directly connected to the unit for convolution processing can read 3*(16+2) data from the lower-level buffer, these 3*(16+2) The data are respectively A0 to A17, B0 to B17 and C0 to C17, and then the direct connection cache can transmit A0 to A2, B0 to B2, C0 to C2 to PE0, and A1 to A3, B1 to B3 and C1 to C3 to PE1, and so on, until A15 to A17, B15 to B17, and C15 to C17 are output to PE15.

在下一个时钟时，直连缓存从下级缓存中可以仅读取(16+2)个数据，这18个数据分别为D0至D17，然后将B0至B2、C0至C2、D0至D2传输给PE0，将B1至B3、C1至C3、D1至D3传输给PE1，依次类推，直到B15至B17、C15至C17、D15至D17传输给PE15。At the next clock, the direct-connected cache can only read (16+2) data from the lower-level cache. The 18 data are D0 to D17 respectively, and then B0 to B2, C0 to C2, D0 to D2 are transmitted to PE0 , Transmit B1 to B3, C1 to C3, D1 to D3 to PE1, and so on, until B15 to B17, C15 to C17, D15 to D17 are transmitted to PE15.

这是因为两个时钟周期内，直连缓存需要向PE0至PE15传输的数据存在部分重合，因此直连缓存在后一个时钟周期可以复用其在前一个时钟周期从下级缓存读取的重复部分数据，而只需读取需更新的部分数据。这样可以节省传输资源，提高传输效率。This is because the data that the direct-connected buffer needs to transmit to PE0 to PE15 partially overlap in two clock cycles, so the direct-connected buffer can reuse the repeated part read from the lower-level cache in the previous clock cycle in the next clock cycle Data, and only need to read part of the data that needs to be updated. This can save transmission resources and improve transmission efficiency.

可以理解的是，图17中没有示出卷积核中的参数的传输。It can be understood that the transmission of the parameters in the convolution kernel is not shown in FIG. 17.

若一个PE包括两个如图14或图15所示结构的装置，例如，该PE为基于能够处理16个乘累加的PE添加加法器、甚至还添加移位器得到，则直连缓存读取数据的方法与图17中所示的方法类似。不同之处在于，在初始时钟周期，直连缓存在特征图的水平纬度需要多读取16个卷积窗口中的数据，或者，在特征图的垂直纬度从图16的卷积窗口的下一行中多读取16个数据。也就是说，初始时钟周期中，直连缓存需要读取4*(16+2)个数据，在后续每个时钟周期可以只更新2*(16+2)个数据。If a PE includes two devices with the structure shown in Fig. 14 or Fig. 15, for example, the PE is obtained by adding an adder or even a shifter based on a PE capable of processing 16 multiplication and accumulation, then the direct connection buffer reads The method of data is similar to that shown in Figure 17. The difference is that, in the initial clock cycle, the direct-connected buffer in the horizontal latitude of the feature map needs to read 16 more data in the convolution window, or the vertical latitude of the feature map is from the next line of the convolution window in Figure 16. 16 more data are read in. That is to say, in the initial clock cycle, the direct-connected buffer needs to read 4*(16+2) data, and only 2*(16+2) data can be updated in each subsequent clock cycle.

图18是本申请数据处理装置的一种示例性结构图。该装置1800包括处理模块1810、获取模块1820、扩大模块1830和确定模块1840。该装置1800可以实现前述图5所示的方法。Fig. 18 is an exemplary structural diagram of the data processing device of the present application. The device 1800 includes a processing module 1810, an acquisition module 1820, an expansion module 1830, and a determination module 1840. The device 1800 can implement the method shown in FIG. 5 described above.

例如，处理模块1810可以用于执行S510，获取模块1820可以用于执行S520，扩大模块1830可以用于执行S530，确定模块1840可以用于执行S540。For example, the processing module 1810 can be used to perform S510, the acquisition module 1820 can be used to perform S520, the expansion module 1830 can be used to perform S530, and the determination module 1840 can be used to perform S540.

在一些可能的实现方式中，装置1800可以是图3中的主CPU；在另一些可能的实现方式中，装置1800可以是图4中所示的训练设备220；在另一些可能的实现方式中，装置1800可以是图4中所述的客户设备240。In some possible implementation manners, the device 1800 may be the main CPU in FIG. 3; in other possible implementation manners, the device 1800 may be the training device 220 shown in FIG. 4; in other possible implementation manners , The apparatus 1800 may be the client device 240 described in FIG. 4.

本申请还提供一种如图19所示的装置1900，装置1900包括处理器1902、通信接口1903和存储器1904。装置1900的一种示例为芯片。装置1900的另一种示例为计算设备。The present application also provides an apparatus 1900 as shown in FIG. 19. The apparatus 1900 includes a processor 1902, a communication interface 1903, and a memory 1904. An example of the device 1900 is a chip. Another example of the apparatus 1900 is a computing device.

处理器1902、存储器1904和通信接口1903之间可以通过总线通信。存储器1904中存储有可执行代码，处理器1902读取存储器1904中的可执行代码以执行对应的方法。存储器1904中还可以包括操作系统等其他运行进程所需的软件模块。操作系统可以为LINUX ^TM，UNIX ^TM，WINDOWS ^TM等。 The processor 1902, the memory 1904, and the communication interface 1903 may communicate through a bus. Executable code is stored in the memory 1904, and the processor 1902 reads the executable code in the memory 1904 to execute the corresponding method. The memory 1904 may also include other software modules required for running processes, such as an operating system. The operating system can be LINUX ^TM , UNIX ^TM , WINDOWS ^TM etc.

例如，存储器1904中的可执行代码用于实现图5所示的方法，处理器1902读取存储器1804中的该可执行代码以执行图5所示的方法。For example, the executable code in the memory 1904 is used to implement the method shown in FIG. 5, and the processor 1902 reads the executable code in the memory 1804 to execute the method shown in FIG. 5.

其中，处理器1902可以为中央处理器(central processing unit，CPU)或图形处理器(graphic processing unit，GPU)等。存储器1904可以包括易失性存储器(volatile memory)，例如随机存取存储器(random access memory，RAM)。存储器1904还可以包括非易失性存储器(non-volatile memory，NVM)，例如只读存储器(read-only memory，ROM)，快闪存储器，硬盘驱动器(hard disk drive，HDD)或固态启动器(solid state disk，SSD)。Among them, the processor 1902 may be a central processing unit (CPU) or a graphics processing unit (GPU) or the like. The memory 1904 may include a volatile memory (volatile memory), such as random access memory (RAM). The memory 1904 may also include non-volatile memory (NVM), such as read-only memory (ROM), flash memory, hard disk drive (HDD), or solid-state starter (read-only memory, ROM). solid state disk, SSD).

本申请还提供了一种如图20所示的数据处理设备2000，包括可编程器件2001和存储器2002，其中，存储器2002用于存储可编程器件2001运行所需的配置文件，可编程器件2001用于从存储器2002读取配置文件，执行，以实现对应的方法。This application also provides a data processing device 2000 as shown in FIG. 20, including a programmable device 2001 and a memory 2002, where the memory 2002 is used to store configuration files required for the operation of the programmable device 2001, and the programmable device 2001 uses The configuration file is read from the memory 2002 and executed to implement the corresponding method.

其中，可编程器件可以包括现场可编程逻辑门阵列(Field Programmable Gate Array，FPGA)或复杂可编程逻辑器件(Complex Programmable Logic Device，CPLD)。Among them, the programmable device may include a field programmable logic gate array (Field Programmable Gate Array, FPGA) or a complex programmable logic device (Complex Programmable Logic Device, CPLD).

以FPGA为例，领域技术人员可以理解，其基本工作原理是通过加载一个配置数据(例如，以配置文件形式存在)来改变FPGA内部的配置RAM的内容，从而改变FPGA内部各种逻辑资源的配置，以实现不同的电路功能，并且，配置数据可以多次加载，从而使得FPGA能够通过加载不同的配置数据来完成不同的功能，具有很好的灵活性。本实际应用中，常常需要更新FPGA的功能，此时，可以将新配置数据事先加载到FPGA配置存储器，然后让FPGA加载新配置数据来实现新配置数据所定义的功能，这个过程被称为FPGA的升级过程。同时，FPGA在出厂时，会自带有一个用于加载配置数据的配置加载电路，该配置加载电路可用于在用户自定义的电路功能(即由配置数据定义的功能)失效后，仍然能够保证最基本的加载操作。Taking FPGA as an example, those skilled in the art can understand that its basic working principle is to change the content of the configuration RAM inside the FPGA by loading a configuration data (for example, in the form of a configuration file), thereby changing the configuration of various logic resources inside the FPGA In order to realize different circuit functions, and the configuration data can be loaded multiple times, so that the FPGA can complete different functions by loading different configuration data, which has good flexibility. In this practical application, it is often necessary to update the function of the FPGA. At this time, you can load the new configuration data into the FPGA configuration memory in advance, and then let the FPGA load the new configuration data to realize the functions defined by the new configuration data. This process is called FPGA The upgrade process. At the same time, when the FPGA leaves the factory, it will have a configuration loading circuit for loading configuration data. This configuration loading circuit can be used to ensure that the user-defined circuit function (that is, the function defined by the configuration data) fails. The most basic loading operation.

本领域普通技术人员可以意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本申请的范围。A person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的系统、装置和单元的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and conciseness of the description, the specific working process of the system, device and unit described above can refer to the corresponding process in the foregoing method embodiment, which is not repeated here.

在本申请所提供的几个实施例中，应该理解到，所揭露的系统、装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method can be implemented in other ways. For example, the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.

所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器、随机存取存储器、磁碟或者光盘等各种可以存储程序代码的介质。If the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk and other media that can store program codes.

以上所述，仅为本申请的具体实施方式，但本申请的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in this application. Should be covered within the scope of protection of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims

A data processing device, characterized in that the data processing device is used to perform convolution processing on a convolution matrix according to a two-dimensional convolution kernel, and the two-dimensional convolution kernel includes a first parameter and M second parameters , The matrix to be convolved includes a first eigenvalue and M second eigenvalues, the M second parameters are in one-to-one correspondence with the M second eigenvalues, and the first parameter corresponds to the first A characteristic value, the data processing device includes:

M multipliers and M-1 first adders are used for multiplying and accumulating the M second parameters and the M second characteristic values to obtain a multiplying and accumulating result;

The second adder is configured to perform an addition operation on the multiplication and accumulation result and the first eigenvalue to obtain a convolution result of the two-dimensional convolution kernel and the matrix to be convolved.

The device of claim 1, wherein the first parameter is equal to one.

The device according to claim 1 or 2, wherein the device further comprises a processor, the processor is configured to: determine the two-dimensional convolution kernel according to an initial convolution kernel, the initial convolution kernel comprising One first initial parameter and M second initial parameters, the M second parameters correspond to the M second initial parameters one-to-one, and the first parameter corresponds to the first initial parameter, The second parameter is equal to the quotient of the second initial parameter corresponding to the second parameter and the first initial parameter, and the first initial parameter is not zero.

The device according to claim 3, wherein when the quotient of at least one of the second initial parameters and the first initial parameter is greater than a first threshold, the two-dimensional Before the convolution kernel, the processor is also used to:

The M second parameters or the M second initial parameters are reduced by m times, wherein the quotient of any second initial parameter reduced by m times and the first initial parameter is not greater than the first threshold.

The device according to claim 4, wherein, before the addition operation is performed on the multiplication and accumulation result and the first eigenvalue, the processor is further configured to:

Reduce the first characteristic value by m times;

Correspondingly, the second adder is specifically configured to perform an addition operation on the multiplication and accumulation result and the reduced first characteristic value.

The device according to claim 3, wherein when the quotient of at least one of the second initial parameters and the first initial parameter is less than a second threshold, the two-dimensional Before the convolution kernel, the processor is also used to:

The M second parameters or the M second initial parameters are expanded by n times, wherein the quotient of any second initial parameter expanded by n times and the first initial parameter is not less than the second threshold.

7. The device according to claim 6, wherein before the addition operation is performed on the multiplication and accumulation result and the first eigenvalue, the processor is further configured to:

Expanding the first characteristic value by n times;

Correspondingly, the second adder is specifically configured to perform an addition operation on the multiplication and accumulation result and the expanded first characteristic value.

The device according to any one of claims 3 to 7, wherein the processor comprises one or a combination of the following: central processing unit (CPU), graphics processing unit (GPU), or neural network processing Device (NPU).

The device according to any one of claims 1 to 8, wherein M is equal to 8, and the two-dimensional convolution kernel is a 3*3 matrix.

8. The device of claim 8, wherein the M multipliers and the M-1 first adders form a multiplication accumulation adder.

The device according to any one of claims 1 to 8, wherein M is equal to 24, and the two-dimensional convolution kernel is a 5*5 matrix.

The apparatus according to claim 11, wherein the M multipliers and the M-1 first adders form three multiplication accumulation adders, wherein each of the multiplication accumulation adders includes M/3 Multipliers and M/3-1 first adders.

The device according to any one of claims 1 to 12, wherein the two-dimensional convolution kernel is a two-dimensional matrix component of an N-dimensional convolution kernel, and N is an integer greater than 2.

A data processing method, characterized in that it comprises:

According to the initial convolution kernel of the Lth layer of the neural network, the equivalent convolution kernel of the Lth layer is obtained, wherein the parameters of the equivalent convolution kernel of the Lth layer are based on the correspondence of the initial convolution kernel of the Lth layer The quotient of the parameter and the first initial parameter in the initial convolution kernel of the Lth layer is obtained, the value of the first initial parameter is K, and K is a non-zero number, and the equivalent convolution kernel of the Lth layer For performing convolution processing on the feature map of the L-th layer;

Acquiring the initial convolution kernel of the L+1th layer in the L+1th layer of the neural network that has a mapping relationship with the initial convolution kernel of the Lth layer;

Expand each parameter in the initial convolution kernel of the L+1th layer by K times;

The equivalent convolution kernel of the L+1th layer is determined according to the initial convolution kernel of the L+1th layer after the expansion processing.

The method according to claim 14, wherein the equivalent convolution kernel of the L-th layer includes M second parameters and a first parameter, and the M second parameters respectively correspond to those of the feature map. M second feature values, the first parameter corresponds to the first feature value of the feature map, and the first parameter is 1;

Wherein, the performing convolution processing on the feature map of the L-th layer includes:

Multiply and accumulate the M second parameters and the M second eigenvalues to obtain a multiply and accumulate result;

An addition operation is performed on the multiplication and accumulation result and the first characteristic value.

The method according to claim 14, wherein the parameters of the equivalent convolution kernel of the Lth layer are based on the corresponding parameters of the initial convolution kernel of the Lth layer and the initial convolution kernel of the Lth layer. The quotient of the first initial parameter in is obtained, including:

When the quotient of at least one parameter of the initial convolution kernel of the Lth layer and the first initial parameter is greater than a first threshold, the corresponding parameter of the initial convolution kernel of the Lth layer is reduced by m times, wherein, The quotient of any parameter of the initial convolution kernel of the L-th layer reduced by m times and the first initial parameter is not greater than the first threshold.

The method according to claim 16, wherein said performing an addition operation on said multiplication and accumulation result and said first eigenvalue comprises:

Reduce the first characteristic value by m times;

An addition operation is performed on the multiplication and accumulation result and the reduced first characteristic value.

When the quotient of at least one parameter of the initial convolution kernel of the L-th layer and the first initial parameter is less than a second threshold, the corresponding parameter of the initial convolution kernel of the L-th layer is expanded by n times, wherein, The quotient of any parameter of the initial convolution kernel of the L-th layer expanded by n times and the first initial parameter is not less than the second threshold.

The method according to claim 18, wherein said performing an addition operation on said multiplication and accumulation result and said first eigenvalue comprises:

Expanding the first characteristic value by n times;

An addition operation is performed on the multiplication and accumulation result and the expanded first eigenvalue.

A data processing device, characterized in that it comprises:

The processing module is used to obtain the equivalent convolution kernel of the Lth layer according to the initial convolution kernel of the Lth layer of the neural network, wherein the parameters of the equivalent convolution kernel of the Lth layer are based on the initial convolution kernel of the Lth layer The quotient of the corresponding parameter of the convolution kernel and the first initial parameter in the initial convolution kernel of the Lth layer is obtained, the value of the first initial parameter is K, K is a non-zero number, and the value of the Lth layer The equivalent convolution kernel is used to perform convolution processing on the feature map of the Lth layer;

An obtaining module, configured to obtain the initial convolution kernel of the L+1th layer that has a mapping relationship with the initial convolution kernel of the Lth layer in the L+1th layer of the neural network;

An expansion module, configured to expand each parameter in the initial convolution kernel of the L+1th layer by K times;

The determining module is configured to determine the equivalent convolution kernel of the L+1th layer according to the initial convolution kernel of the L+1th layer after the expansion processing.

The device according to claim 20, wherein the equivalent convolution kernel of the Lth layer comprises M second parameters and a first parameter, and the M second parameters respectively correspond to those of the feature map. M second feature values, the first parameter corresponds to the first feature value of the feature map, and the first parameter is 1;

The device according to claim 20, wherein the parameters of the equivalent convolution kernel of the Lth layer are based on the corresponding parameters of the initial convolution kernel of the Lth layer and the initial convolution kernel of the Lth layer. The quotient of the first initial parameter in is obtained, including:

The device according to claim 22, wherein said performing an addition operation on said multiplication and accumulation result and said first eigenvalue comprises:

Reduce the first characteristic value by m times;

The device according to claim 24, wherein said performing an addition operation on said multiplication and accumulation result and said first eigenvalue comprises:

Expanding the first characteristic value by n times;

A computer-readable storage medium, comprising instructions, when the instructions run on a processor, the processor executes the method according to any one of claims 14 to 19.

A data processing device, characterized by comprising a programmable device and a memory, the memory is used to store configuration files required for the operation of the programmable device, and the programmable device is used to read the The configuration file executes the method according to any one of claims 14 to 19.

The device of claim 27, wherein the programmable device comprises a field programmable logic gate array (FPGA) or a complex programmable logic device (CPLD).