WO2022061867A1

WO2022061867A1 - Data processing method and apparatus, and computer-readable storage medium

Info

Publication number: WO2022061867A1
Application number: PCT/CN2020/118324
Authority: WO
Inventors: 聂谷洪; 杨龙超; 施泽浩
Original assignee: SZ DJI Technology Co Ltd
Current assignee: SZ DJI Technology Co Ltd
Priority date: 2020-09-28
Filing date: 2020-09-28
Publication date: 2022-03-31
Anticipated expiration: 2023-03-28

Abstract

A data processing method and apparatus, and a computer-readable storage medium. The method is applied to a processing apparatus for performing a convolution operation. The processing apparatus comprises a data selector. The method comprises: loading an input value and a weight value (S101); inputting the input value and the weight value into a data selector, and obtaining a multiplication operation result in an convolution operation by using the data selector (S102); and obtaining an operation result of the convolution operation on the basis of the multiplication operation result (S103). By means of the method, a multiplier is replaced with the data selector, and the multiplication operation result in the convolution operation is obtained by using the data selector, such that the operation complexity is reduced, thereby improving the operation speed of a neural network.

Description

Data processing method, apparatus and computer readable storage medium

technical field

本申请涉及数据处理技术领域，具体而言，涉及一种数据处理方法、装置及计算机可读存储介质。The present application relates to the technical field of data processing, and in particular, to a data processing method, apparatus, and computer-readable storage medium.

Background technique

随着技术的发展，卷积神经网络技术应用于生活中的方方面面，比如利用卷积神经网络技术进行图像识别(诸如人脸识别、基于内容的图像检索或者表情识别等)、自然语言处理(诸如语音识别、文本分类或者信息检索等)等等。With the development of technology, convolutional neural network technology is applied to all aspects of life, such as image recognition (such as face recognition, content-based image retrieval or expression recognition, etc.) using convolutional neural network technology, natural language processing (such as Speech recognition, text classification or information retrieval, etc.) and so on.

然而，卷积神经网络的运行是一个计算密集和存储密集的过程。而在实际应用过程中，又要求产品功能能够满足实时性需求，比如使用卷积神经网络进行图像处理的场景下，要求帧率达到10帧到20帧每秒，即要求卷积神经网络能够每秒处理10帧或20帧图像。但由于卷积神经网络的运行特点(计算密集、存储密集)，使得其运行速度慢，往往无法达到要求的处理帧率，无法满足产品的实时性需求，影响用户体验。However, the operation of a convolutional neural network is a computationally and memory-intensive process. In the actual application process, the product functions are required to meet real-time requirements. For example, in the case of using convolutional neural networks for image processing, the frame rate is required to reach 10 to 20 frames per second, that is, the convolutional neural network is required to be able to Processes 10 or 20 frames per second. However, due to the operating characteristics of the convolutional neural network (intensive computation and intensive storage), its running speed is slow, often unable to achieve the required processing frame rate, unable to meet the real-time requirements of the product, and affecting the user experience.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本申请的目的之一是提供一种数据处理方法、装置及计算机可读存储介质。In view of this, one of the objectives of the present application is to provide a data processing method, apparatus and computer-readable storage medium.

第一方面，本申请实施例提供了一种应用于进行卷积运算的处理装置中，所述处理装置包括数据选择器；所述方法包括：In a first aspect, an embodiment of the present application provides a processing device applied to a convolution operation, where the processing device includes a data selector; the method includes:

加载输入值和权重值；Load input values and weight values;

将所述输入值和权重值输入所述数据选择器中，利用所述数据选择器得到所述卷积运算中的乘法运算结果；Inputting the input value and the weight value into the data selector, and using the data selector to obtain the multiplication result in the convolution operation;

基于所述乘法运算结果得到所述卷积运算的运算结果。The operation result of the convolution operation is obtained based on the multiplication operation result.

第二方面，本申请实施例提供了一种数据处理装置，所述装置用于进行卷积运算，包括数据加载模块、数据选择器和加法器；In a second aspect, an embodiment of the present application provides a data processing apparatus, the apparatus is used to perform a convolution operation, and includes a data loading module, a data selector, and an adder;

所述数据加载模块，用于加载输入值和权重值并输入所述数据选择器中；the data loading module, for loading the input value and the weight value and inputting it into the data selector;

所述数据选择器，用于根据所述输入值和权重值输出所述卷积运算中的乘法运算结果，并将所述乘法运算结果发送至所述加法器；the data selector, configured to output the multiplication result in the convolution operation according to the input value and the weight value, and send the multiplication result to the adder;

所述加法器，用于根据所述乘法运算结果输出所述卷积运算的运算结果。The adder is configured to output the operation result of the convolution operation according to the multiplication operation result.

第三方面，本申请实施例提供了一种计算机可读存储介质，其特征在于，其上存储有计算机指令，该指令被处理器执行时执行第一方面中任意一项所述的方法。In a third aspect, an embodiment of the present application provides a computer-readable storage medium, characterized in that a computer instruction is stored thereon, and when the instruction is executed by a processor, any one of the methods described in the first aspect is executed.

第四方面，本申请实施例提供一种可移动平台，包括：In a fourth aspect, an embodiment of the present application provides a movable platform, including:

机体；body;

动力系统，设于所述机体，所述动力系统用于为所述可移动平台提供动力；a power system, arranged on the body, the power system is used to provide power for the movable platform;

如以上所述所述的数据处理装置。A data processing apparatus as described above.

本申请实施例所提供的一种数据处理方法、装置及计算机可读存储介质，本实施例中，用数据选择器替代乘法器，利用数据选择器得到卷积运算中的乘法运算结果，在得到相同乘法运算结果的基础上降低了运算的复杂度，从而提高了卷积神经网络的运算速度，满足实际产品的实时性需求，则在实际应用过程，在利用卷积神经网络处理特定的对象(如图像、语音信号等)时，能够更快地获取处理结果，相应地，从用户的角度来说，应用所述卷积神经网络的产品的响应速度更快了，从而有利于提高用户的使用体验。In the data processing method, device, and computer-readable storage medium provided by the embodiments of the present application, in this embodiment, a data selector is used to replace the multiplier, and the data selector is used to obtain the multiplication result in the convolution operation. On the basis of the same multiplication result, the complexity of the operation is reduced, thereby improving the operation speed of the convolutional neural network and meeting the real-time requirements of actual products. In the actual application process, the convolutional neural network is used to process specific objects ( (such as images, voice signals, etc.), the processing results can be obtained faster. Correspondingly, from the user's point of view, the response speed of the product applying the convolutional neural network is faster, which is beneficial to improve the user's use. experience.

Description of drawings

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative labor.

图1是本申请一个实施例提供的一种数据处理装置的结构示意图；1 is a schematic structural diagram of a data processing apparatus provided by an embodiment of the present application;

图2是本申请一个实施例提供的一种利用数据选择器获取卷积运算的乘法运算结果的示意图；2 is a schematic diagram of obtaining a multiplication result of a convolution operation using a data selector provided by an embodiment of the present application;

图3是本申请一个实施例提供的第二种数据处理装置的结构示意图；3 is a schematic structural diagram of a second data processing apparatus provided by an embodiment of the present application;

图4是本申请一个实施例提供的第三种数据处理装置的结构示意图；4 is a schematic structural diagram of a third data processing apparatus provided by an embodiment of the present application;

图5是本申请一个实施例提供的第四种数据处理装置的结构示意图；5 is a schematic structural diagram of a fourth data processing apparatus provided by an embodiment of the present application;

图6是本申请一个实施例提供的一种数据处理方法的流程示意图。FIG. 6 is a schematic flowchart of a data processing method provided by an embodiment of the present application.

detailed description

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

考虑到相关技术中，卷积神经网络的运行是一个计算密集和存储密集的过程，使得其运行速度慢，往往无法满足产品的实时性需求，影响用户体验。其中，卷积神经网络中包含了大量的卷积运算，卷积运算的目的是提取输入的数据的不同特征，比如在图像处理领域，能够基于卷积运算提取图像中的不同特征，从而能够使卷积神经网络基于提取的特征给出更好的处理结果。Considering related technologies, the operation of convolutional neural network is a computationally intensive and storage-intensive process, which makes its running speed slow, often cannot meet the real-time requirements of products, and affects user experience. Among them, the convolutional neural network contains a large number of convolution operations. The purpose of the convolution operation is to extract different features of the input data. For example, in the field of image processing, different features in the image can be extracted based on the convolution operation, so that the Convolutional neural networks give better processing results based on extracted features.

基于此，本申请实施例提供了一种处理装置，所述处理装置用于进行卷积运算，通过数据选择器代替乘法器来获取卷积运算中的乘法运算结果，进一步降低了运算的复杂度，从而有利于提高神经网络的运算速度，满足实际产品的实时性需求。Based on this, an embodiment of the present application provides a processing device, which is used to perform a convolution operation, and uses a data selector instead of a multiplier to obtain the multiplication operation result in the convolution operation, which further reduces the complexity of the operation , which is beneficial to improve the operation speed of the neural network and meet the real-time requirements of actual products.

请参阅图1，为本申请实施例提供的第一种数据处理装置的结构图，所述处理装置用于进行神经网络中的卷积运算，所述装置包括数据加载模块11、数据选择器12和加法器13。Please refer to FIG. 1 , which is a structural diagram of a first data processing apparatus provided by an embodiment of the application. The processing apparatus is used to perform convolution operations in a neural network, and the apparatus includes a data loading module 11 and a data selector 12 and adder 13.

所述数据加载模块11，用于加载输入值和权重值并输入所述数据选择器12中。The data loading module 11 is used to load the input value and the weight value and input them into the data selector 12 .

其中，所述权重值指的是要对所述输入值进行卷积运算的卷积核。所述卷积核的数量、大小、取值等可依据实际应用场景进行具体设置，本申请实施例对此不作任何限制。Wherein, the weight value refers to a convolution kernel to perform a convolution operation on the input value. The number, size, and value of the convolution kernels may be specifically set according to the actual application scenario, which is not limited in this embodiment of the present application.

所述输入值指要进行卷积运算的待处理数据，比如在语音识别场景中，所述输入值指待处理的语音信号；又如在图像处理场景中，所述输入值指待处理的图像数据。在一种实现方式中，所述输入值是卷积神经网络中的上一层运算后输出的激活值。The input value refers to the data to be processed to be subjected to the convolution operation. For example, in the speech recognition scenario, the input value refers to the voice signal to be processed; for example, in the image processing scenario, the input value refers to the image to be processed. data. In an implementation manner, the input value is an activation value output by a previous layer in the convolutional neural network after operation.

所述数据选择器12，用于根据所述输入值和权重值输出所述卷积运算中的乘法运算结果，并将所述乘法运算结果发送至所述加法器13。The data selector 12 is configured to output the multiplication result in the convolution operation according to the input value and the weight value, and send the multiplication result to the adder 13 .

其中，所述数据选择器12具体用于根据所述输入值、所述输入值的相反数以及所述权重值输出所述卷积运算中的乘法运算结果。在一种实现方式中，所述数据选择器12可以根据所述权重值，选择根据所述输入值输出所述卷积运算中的乘法运算结果或者选择根据所述输入值的相反数输出所述卷积运算中的乘法运算结果。本实施例中，利用数据选择器12代替乘法器来获取卷积运算中的乘法运算结果，进一步降低了运算的复杂度，从而有利于提高神经网络的运算速度，提高用户的使用体验。The data selector 12 is specifically configured to output the multiplication result in the convolution operation according to the input value, the inverse of the input value, and the weight value. In an implementation manner, the data selector 12 may, according to the weight value, choose to output the multiplication result in the convolution operation according to the input value, or choose to output the result according to the inverse of the input value. The result of the multiplication operation in the convolution operation. In this embodiment, the data selector 12 is used instead of the multiplier to obtain the multiplication result in the convolution operation, which further reduces the complexity of the operation, thereby improving the operation speed of the neural network and improving the user experience.

所述加法器13，用于根据所述乘法运算结果输出所述卷积运算的运算结果。The adder 13 is configured to output the operation result of the convolution operation according to the multiplication operation result.

其中，所述加法器13具体用于将获取的多个乘法运算结果进行加法运算，获取所述卷积运算的运算结果并输出。The adder 13 is specifically configured to perform an addition operation on the acquired multiple multiplication operation results, acquire and output the operation result of the convolution operation.

本实施例中，用数据选择器12替代乘法器，利用数据选择器12得到卷积运算中的乘法运算结果，在得到相同乘法运算结果的基础上降低了运算的复杂度，从而提高了神经网络的运算速度，满足实际产品的实时性需求，则在实际应用过程，在利用卷积神经网络处理特定的对象(如图像、语音信号等)时，能够更快地获取处理结果，相应地，从用户的角度来说，应用所述卷积神经网络的产品的响应速度更快了，从而有利于提高用户的使用体验。In this embodiment, the data selector 12 is used to replace the multiplier, and the data selector 12 is used to obtain the multiplication result in the convolution operation, which reduces the complexity of the operation on the basis of obtaining the same multiplication result, thereby improving the neural network. In the actual application process, when using the convolutional neural network to process specific objects (such as images, voice signals, etc.), the processing results can be obtained faster. Correspondingly, from From the user's point of view, the response speed of the product applying the convolutional neural network is faster, which is beneficial to improve the user's experience.

在一实施例中，由于卷积神经网络的运行是一个计算密集和存储密集的过程，使得卷积神经网络在运行过程中，占用的带宽比较多。基于此，为了减少卷积神经网络在运行过程中所占用的带宽，本实施例中将所述输入值和所述权重值进行量化处理，即所述输入值和所述权重值为经过量化后的结果，由于经过量化的输入值和经过量化的权重值的数据量相较于量化前减少了，使得在进行卷积运算时，使用经过量化的输入值和经过量化的权重值进行运算，相较于量化之前，消耗的运算资源更少，所需占用的带宽也减少了，降低了功耗。In one embodiment, since the operation of the convolutional neural network is a computationally intensive and memory-intensive process, the convolutional neural network occupies more bandwidth during the operation. Based on this, in order to reduce the bandwidth occupied by the convolutional neural network during operation, in this embodiment, the input value and the weight value are quantized, that is, the input value and the weight value are quantized after As a result, since the data volume of the quantized input value and the quantized weight value is reduced compared with that before quantization, when the convolution operation is performed, the quantized input value and the quantized weight value are used for the operation, and the corresponding Compared with before quantization, less computing resources are consumed, the required bandwidth is also reduced, and power consumption is reduced.

进一步地，考虑到相关技术中通常利用8bit定点方式部署卷积神经网络，但是随着卷积神经网络的应用场景越来越多，对于算法精度提出更高的要求，所以需要更大的网络，这样的网络每一层计算所产生的数据即使用8bit存储依然很大，使得存储设备的成本增加。基于此，本申请在对所述输入值和/或(和/或表示两者或两者之一)所述权重值进行量化值，所述输入值和所述权重值的量化比特数均小于8，相较于8bit定点方式，能够进一步减少了运算过程和存储过程的数据量，从而减少运算过程中的运算资源和存储过程的存储资源，减少功能，提高性能，Further, considering that the 8-bit fixed-point method is usually used to deploy convolutional neural networks in related technologies, but with more and more application scenarios of convolutional neural networks, higher requirements for algorithm accuracy are put forward, so larger networks are required. The data generated by the calculation of each layer of such a network is still very large even if it is stored in 8 bits, which increases the cost of the storage device. Based on this, the present application quantizes the input value and/or (and/or represents either or both) the weight value, and the quantization bits of the input value and the weight value are both less than 8. Compared with the 8-bit fixed-point method, it can further reduce the data volume of the operation process and the storage process, thereby reducing the operation resources in the operation process and the storage resources of the stored process, reducing functions and improving performance.

可以理解的是，所述输入值和/或所述权重值的量化比特数在满足小于8的情况下，本实施例对于所述输入值和所述权重值的具体量化比特数不做任何限制，可以依据实际应用场景进行具体设置，例如所述输入值的量化比特数为2，所述权重值的量化比特数为1。It can be understood that, if the number of quantization bits of the input value and/or the weight value is less than 8, this embodiment does not impose any restrictions on the specific number of quantization bits of the input value and the weight value. , which can be specifically set according to the actual application scenario, for example, the number of quantization bits of the input value is 2, and the number of quantization bits of the weight value is 1.

在一种实现方式，所述权重值的量化比特数为1时，则所述权重值的取值为{-1,1}，则请参阅图2，所述数据选择器12在获取所述卷积运算中的乘法运算结果时，所述数据选择器12可以根据输入的所述权重值选择输出所述输入值或者所述输入值的相反数。在一个例子中，当输入所述数据选择器12的权重值为1时，所述数据选择器12可以选择输出所述输入值，即为卷积运算中的乘法运算结果；当输入所述数据选择器12的权重值为-1时，所述数据选择器12可以选择输出所述输入值的相反数，即为卷积运算中的乘法运算结果。本实施例中，基于所述数据选择器12来代替乘法器，由于数据选择器12的选择逻辑相较于乘法器的运算逻辑更为简单，使得使用数据选择器12的运算复杂度低于使用乘法器的运算复杂度，从而有利于提高卷积神经网络的运算速度，满足实际产品的实时性需求。在一个例子中，可以使用门电路来实现根据输入的所述权重值选择输出所述输入值或者所述输入值的相反数。关于本实施例中的数据选择器，不仅限于处理1比特的权重和2比特的激活值(输入值)或更高比特的输入值，还可以处理任意比特的权重值和1比特的输入值，即权重值为任意比特，输入值为1bit，此时计算可将图2中的权重值和输入值交换位置，同样可以实施。In an implementation manner, when the number of quantization bits of the weight value is 1, the value of the weight value is {-1, 1}, please refer to FIG. 2 , the data selector 12 obtains the For the multiplication result in the convolution operation, the data selector 12 may select and output the input value or the opposite number of the input value according to the input weight value. In one example, when the weight value input to the data selector 12 is 1, the data selector 12 can choose to output the input value, that is, the result of the multiplication operation in the convolution operation; When the weight value of the selector 12 is -1, the data selector 12 can choose to output the opposite number of the input value, that is, the multiplication result in the convolution operation. In this embodiment, based on the data selector 12 instead of the multiplier, since the selection logic of the data selector 12 is simpler than that of the multiplier, the operation complexity of using the data selector 12 is lower than that of using the data selector 12. The operation complexity of the multiplier is beneficial to improve the operation speed of the convolutional neural network and meet the real-time requirements of actual products. In one example, a gate circuit may be used to select and output the input value or the inverse of the input value according to the weight value of the input. Regarding the data selector in this embodiment, it is not limited to processing a 1-bit weight and a 2-bit activation value (input value) or an input value of a higher bit, but can also handle an arbitrary-bit weight value and a 1-bit input value, That is, the weight value is any bit, and the input value is 1 bit. At this time, the calculation can be performed by exchanging the positions of the weight value and the input value in FIG. 2 .

进一步地，为了进一步提高卷积神经网络的运算速度，所述输入值和/或所述输入值的相反数可以以补码形式表示，则所述输入值的相反数是所述输入值取反后加1的结果，进一步简化了获取所述输入值的相反数的运算复杂度，降低了所述处理装置的运算代价，从而有利于提高卷积神经网络的运算速度。Further, in order to further improve the operation speed of the convolutional neural network, the input value and/or the opposite number of the input value can be represented in complement form, then the opposite number of the input value is the inversion of the input value. The result of adding 1 afterward further simplifies the computational complexity of obtaining the opposite number of the input value, reduces the computational cost of the processing device, and thus helps to improve the computational speed of the convolutional neural network.

在一实施例中，所述输入值是卷积神经网络中的上一层运算后输出的激活值，相关技术中的激活值量化函数都默认激活函数使得激活值非负的形式，只考虑无符号值量化；实际上，用于输出有符号值的激活函数如Leaky ReLU，或者无激活函数等形式在部分网络结构中依然存在，如MobileNetV2，MNAS等结构。因此，本申请实施例提出了兼容有符号值和无符号值的技术方案，所述输入值包括有符号值和无符号值，即是说，既可以对有符号值进行卷积运算，也可以对无符号值进行卷积运算，从而使得所述处理装置具有广泛的适用性。In one embodiment, the input value is the activation value output after the operation of the previous layer in the convolutional neural network, and the activation value quantization function in the related art defaults to the activation function that makes the activation value non-negative. Sign value quantization; in fact, activation functions used to output signed values, such as Leaky ReLU, or no activation function, still exist in some network structures, such as MobileNetV2, MNAS and other structures. Therefore, the embodiments of the present application propose a technical solution compatible with signed values and unsigned values, and the input values include signed values and unsigned values. The convolution operation is performed on unsigned values, thereby making the processing device widely applicable.

在一实现方式中，可以将所述有符号值和权重值输入所述数据选择器12中，由所述数据选择器12根据所述权重值和所述有符号值输出卷积运算中的乘法运算结果；以及，可以将所述无符号值和权重值输入所述数据选择器12中，由所述数据选择器12根据所述权重值和所述无符号值输出卷积运算中的乘法运算结果。In one implementation, the signed value and the weight value can be input into the data selector 12, and the data selector 12 outputs the multiplication in the convolution operation according to the weight value and the signed value. operation result; and, the unsigned value and the weight value can be input into the data selector 12, and the data selector 12 outputs the multiplication operation in the convolution operation according to the weight value and the unsigned value result.

但是，考虑到所述数据选择器12根据所述权重值和所述无符号值输出卷积运算中的乘法运算结果时，容易产生溢出，从而导致乘法运算结果出错。在一个例子中，比如所述输入值的量化比特数为2，则有符号值的量化取值范围为{-2,-1,0,1}，无符号值的量化取值范围为{0,1,2,3}；所述权重值的量化比特数为1，量化范围为{-1,1}，当所述权重值的量化值为1时，则有符号值的乘法运算结果为1*{-2，-1,0,1}＝{-2，-1，0，1}，无符号值的乘法运算结果为1*{0，1，2，3}＝{0，1，2，3}；当所述权重值的量化值为-1时，则有符号值的乘法运算结果为-1*{-2，-1,0,1}＝{2，1，0，-1}；无符号值的乘法运算结果为-1*{0，1，2，3}＝{0，-1，-2，-3}。卷积操作中，需要对卷积区域的值求和，并对各个输入层卷积区域的和进行累加。在累加过程中，通常采用更高比特位的中间变量来存储，如用8比特表示，则该数值的表示范围为[-128,127]，用16比特表示，则该数值的表示范围为[-32768,32767]。如果卷积计算过程中输入为有符号值，则8比特累计变量可以容忍连续64次极大值相加不溢出，连续63个最大值相加不溢出；16比特累计变量可以容忍连续16384次极大值相加不溢出，连续16383个最大值相加不溢出。如果卷积计算过程中输入为无符号值，8比特累计变量可以容忍连续42次极大值相加不溢出，连续42个最大值相加不溢出；16比特累计变量可以容忍连续10922次极大值相加不溢出，连续10922个最大值相加不溢出。可见，采用无符号值进行运算时，由于其运算结果需要更多的比特位来表示，因此相较于有符号值更容易产生溢出。However, it is considered that when the data selector 12 outputs the multiplication result in the convolution operation according to the weight value and the unsigned value, overflow is likely to occur, thereby causing an error in the multiplication result. In an example, for example, the number of quantization bits of the input value is 2, the quantization value range of the signed value is {-2,-1,0,1}, and the quantization value range of the unsigned value is {0 , 1, 2, 3}; the number of quantization bits of the weight value is 1, and the quantization range is {-1, 1}. When the quantization value of the weight value is 1, the multiplication result of the signed value is 1*{-2, -1,0,1}={-2,-1,0,1}, the result of multiplication of unsigned values is 1*{0,1,2,3}={0,1 , 2, 3}; when the quantization value of the weight value is -1, the multiplication result of the signed value is -1*{-2,-1,0,1}={2,1,0, -1}; the result of multiplication of unsigned values is -1*{0, 1, 2, 3} = {0, -1, -2, -3}. In the convolution operation, the values of the convolution areas need to be summed, and the sum of the convolution areas of each input layer is accumulated. In the accumulation process, an intermediate variable with a higher bit is usually used for storage. If it is represented by 8 bits, the representation range of the value is [-128, 127], and if it is represented by 16 bits, the representation range of the value is [-32768 , 32767]. If the input is a signed value during the convolution calculation process, the 8-bit cumulative variable can tolerate 64 consecutive maximum value additions without overflow, and the 63 consecutive maximum value additions without overflow; the 16-bit cumulative variable can tolerate 16384 consecutive extremes The addition of large values does not overflow, and the addition of 16383 consecutive maximum values does not overflow. If the input is an unsigned value during the convolution calculation, the 8-bit cumulative variable can tolerate 42 consecutive maximum value additions without overflow, and the 42 consecutive maximum value additions without overflow; the 16-bit cumulative variable can tolerate 10922 consecutive maximum values The value addition does not overflow, and the addition of 10922 consecutive maximum values does not overflow. It can be seen that when an unsigned value is used for operation, since the operation result requires more bits to represent, it is more prone to overflow than a signed value.

进一步地，考虑到在计算机进行卷积运算过程都是采用二进制方式进行运算的，即无论是有符号值还是无符号值的二进制表示方式都是一样的，在一个例子中，假设输入值的量化比特数为2，则有符号值{-2，-1,0,1}的二进制形式表示为{00,01,10,11}，无符号值{0,1,2,3}的二进制形式也表示为{00,01,10,11}，其中，利用一标志信息用于标志所述输入值是有符号值还是无符号值。Further, considering that the process of convolution operation in the computer is carried out in binary mode, that is, the binary representation of both signed and unsigned values is the same. In an example, it is assumed that the quantization of the input value is If the number of bits is 2, the binary form of the signed value {-2, -1, 0, 1} is represented as {00, 01, 10, 11}, and the binary form of the unsigned value {0, 1, 2, 3} Also denoted as {00, 01, 10, 11}, where a flag information is used to flag whether the input value is a signed value or an unsigned value.

因此，为了解决无符号值更容易溢出的问题，加之在计算机进行卷积运算过程都是采用二进制方式进行运算的，因此，本申请实施例中将输入所述数据选择器12的输入值均认为是有符号值进行运算，可以这么说，输入所述数据选择器12中的输入值为有符号值，从而避免溢出问题。Therefore, in order to solve the problem that unsigned values are more prone to overflow, and in addition, the computer performs the convolution operation in binary mode. Therefore, in the embodiment of the present application, the input values input to the data selector 12 are considered to be all The operation is performed on signed values, so to speak, the input values into the data selector 12 are signed values, thereby avoiding overflow problems.

进一步地，为了保证输出结果的准确性，如果所述输入值是无符号值(比如可以通过所述标志信息确定)，则所述加法器13在获取所述乘法运算结果之后，根据所述乘法运算结果和偏置值输出所述卷积运算的运算结果；所述偏置值指示有符号值与无符号值之间的转换关系。在一个例子中，假设FS为有符号值，FU为无符号值，a为所述输入值的量化比特数，则有FU＝FS+2 ^a-1。 Further, in order to ensure the accuracy of the output result, if the input value is an unsigned value (for example, it can be determined by the flag information), the adder 13, after acquiring the multiplication result, according to the multiplication The operation result and the offset value output the operation result of the convolution operation; the offset value indicates the conversion relationship between the signed value and the unsigned value. In one example, assuming that FS is a signed value, FU is an unsigned value, and a is the number of quantization bits of the input value, there is FU=FS+2 ^a-1 .

其中，所述偏置值可以根据所述输入值的量化比特数以及所述权重值确定。由于所述输入值的量化比特数和所述权重值是事先已知的，则可以在离线情况下线获取所述偏置值，后续在卷积运算过程中直接使用所述偏置值，无需重复计算所述偏置值，从而有利于加快卷积神经网络以及所述处理装置的运算速度。The offset value may be determined according to the number of quantization bits of the input value and the weight value. Since the number of quantized bits of the input value and the weight value are known in advance, the offset value can be obtained offline, and the offset value can be directly used in the subsequent convolution operation without the need for Repeatedly calculating the offset value is beneficial to speed up the operation speed of the convolutional neural network and the processing device.

在一实施例中，可以根据实际应用需要选择要输出有符号值还是无符号值，如果选择输出有符号值，则可以使用用于输出有符号值的激活函数如preLU函数；如果选择输出无符号值，则可以使用用于输出无符号值的激活函数如reLU函数。在一个例子中，比如所述权重值的量化比特数为1，则其量化值的取值范围为{-1,1}，由于权重值的量化值为有符号值，则根据所述权重值确定的所述偏置值也是有符号值，如果所述输入值为无符号值，由于所述加法器13根据所述乘法运算结果和偏置值输出的所述卷积运算的运算结果也是有符号值，则可以采用用于输出无符号值的激活函数如reLU函数，从而使得最终输出的结果为无符号值。In one embodiment, it is possible to choose whether to output a signed value or an unsigned value according to actual application requirements. If a signed value is selected to be output, an activation function such as a preLU function for outputting a signed value can be used; if an unsigned output is selected to be output value, you can use an activation function such as the reLU function for outputting unsigned values. In an example, if the number of quantization bits of the weight value is 1, the value range of the quantization value is {-1, 1}. Since the quantization value of the weight value is a signed value, according to the weight value The determined offset value is also a signed value. If the input value is an unsigned value, since the operation result of the convolution operation output by the adder 13 according to the multiplication operation result and the offset value is also a If it is a signed value, an activation function such as a reLU function for outputting an unsigned value can be used, so that the final output result is an unsigned value.

在一个例子中，比如所述输入值的量化比特数为2，则有符号值的量化取值范围为{-2,-1,0,1}，无符号值的量化取值范围为{0,1,2,3}；所述权重值的量化比特数为1，量化范围为{-1,1}，在卷积神经网络中，上一层卷积层激活值是当前卷积层的输入，记卷积层输入F，大小为C×H×W，权值K，大小为N×C×k×k，输出O，大小为N×H×W，其中C为输入通道数，N为输出通道数，k×k为单个卷积核大小，H×W为特征图大小，那么卷积层计算公式如下：O _m,l,n＝∑ _i,j,cK _i,j,c,n·F _{m+i-1,l+j-1,c}(1)；对于有符号值和无符号值，假设FS为有符号值，FU为无符号值，则有FU＝FS+2(2)，将(2)式带入(1)式中，则有：O _m,l,n＝∑ _i,j,cK _i,j,c,n·FU _{m+i-1,l+j-1,c}＝∑ _i,j,cK _i,j,c,n·(FS _{m+i-1,l+j-1,c}+2)；则O _m,l,n＝∑ _i,j,cK _i,j,c,n·FS _{m+i-1,l+j-1,c}+2∑ _i,j,cK _i,j,c,n；则有所述偏置值2∑ _i,j,cK _i,j,c,n，所述偏置值可以离线计算，从而可以进行提高所述处理装置的运算速度。 In an example, for example, the number of quantization bits of the input value is 2, the quantization value range of the signed value is {-2,-1,0,1}, and the quantization value range of the unsigned value is {0 , 1, 2, 3}; the number of quantization bits of the weight value is 1, and the quantization range is {-1, 1}. In the convolutional neural network, the activation value of the previous convolutional layer is the current convolutional layer. Input, denote the input F of the convolution layer, the size is C×H×W, the weight K, the size is N×C×k×k, the output O, the size is N×H×W, where C is the number of input channels, N is the number of output channels, k×k is the size of a single convolution kernel, and H×W is the size of the feature map, then the calculation formula of the convolution layer is as follows: O _m,l,n =∑ _i,j,c K _{i,j,c ,n} ·F _{m+i-1,l+j-1,c} (1); for signed and unsigned values, assuming FS is a signed value and FU is an unsigned value, then FU=FS+2 (2), put (2) into (1), then we have: O _m,l,n =∑ _i,j,c K _i,j,c,n ·FU _{m+i-1,l +j-1,c} =∑ _i,j,c K _i,j,c,n ·(FS _{m+i-1,l+j-1,c} +2); then O _m,l,n =∑ _i,j,c K _i,j,c,n ·FS _{m+i-1,l+j-1,c} +2∑ _i,j,c K _i,j,c,n ; If the value 2Σ _i,j,c K _{i,j,c,n is} set, the offset value can be calculated off-line, so that the operation speed of the processing device can be improved.

在一实施例中，请参阅图3，本申请实施例提供的另一种数据处理装置，所述处理装置还包括解码器14。In an embodiment, please refer to FIG. 3 , another data processing apparatus provided by an embodiment of the present application, the processing apparatus further includes a decoder 14 .

所述数据加载模块11具体用于从存储器中读取经压缩的输入值和经压缩的权重值。The data loading module 11 is specifically configured to read the compressed input value and the compressed weight value from the memory.

在一种实现方式中，所述数据加载模块11可以是DMA控制器(Direct Memory Access，直接存储器访问)，使用所述DMA控制器从外部存储器中读取经压缩的输入值和经压缩的权重值。In one implementation, the data loading module 11 may be a DMA controller (Direct Memory Access), and the DMA controller is used to read compressed input values and compressed weights from an external memory value.

所述解码器14用于分别对所述经压缩的输入值和经压缩的权重值进行解压，并输入所述数据选择器12中。本实施例中，在所述存储器中存储的是经压缩的输入值和经压缩的权重值，从而有利于减少需要存储的数据量，实现存储资源的综合使用。The decoder 14 is used to decompress the compressed input value and the compressed weight value, respectively, and input into the data selector 12 . In this embodiment, the compressed input value and the compressed weight value are stored in the memory, which is beneficial to reduce the amount of data that needs to be stored and realize the comprehensive use of storage resources.

所述数据加载模块11从存储器中读取经压缩的输入值和经压缩的权重值并输入到所述解码器14中，所述解码器14分别对所述经压缩的输入值和经压缩的权重值进行解压，并将解压得到的输入值和权重值输入所述数据选择器12中。The data loading module 11 reads compressed input values and compressed weight values from memory and inputs them into the decoder 14, which respectively The weight value is decompressed, and the decompressed input value and the weight value are input into the data selector 12 .

在一种实现方式中，分别指示所述经压缩的输入值和所述经压缩的权重值的两个码流均可以通过游程编码方式编码得到。所述游程编码方式通过检测重复的比特或字符序列，并用它们的出现次数取而代之来进行编码。In an implementation manner, both the two code streams indicating the compressed input value and the compressed weight value respectively can be obtained by encoding in a run-length encoding manner. The run-length encoding method encodes by detecting repeated sequences of bits or characters and replacing them with their occurrences.

若所述输入值或者所述权重值的量化比特数为1，所述经压缩的输入值和所述经压缩的权重值分别对应的码流指示量化值连续的个数；所述量化值为所述输入值和所述权重值经过量化后的取值。If the number of quantization bits of the input value or the weight value is 1, the code streams corresponding to the compressed input value and the compressed weight value respectively indicate the number of consecutive quantization values; the quantization value is The quantized value of the input value and the weight value.

在一实施例中，若所述输入值或者所述权重值的量化比特数为1，在所述输入值或者所述权重值分别对应的码流中，每个量化值连续的个数使用预设的比特位数表示；所述解码器14具体用于：根据所述预设的比特位数对所述经压缩的输入值或者所述经压缩的权重值分别对应的码流进行划分，获得一个或多个子码流；根据所述一个或多个子码流以及所述子码流对应的量化值，获取解压后的输入值或者解压后的权重值。In one embodiment, if the number of quantization bits of the input value or the weight value is 1, in the code stream corresponding to the input value or the weight value respectively, the number of consecutive quantization values is pre-used. The set number of bits represents; the decoder 14 is specifically configured to: divide the code streams corresponding to the compressed input value or the compressed weight value respectively according to the preset number of bits, and obtain One or more sub-streams; according to the one or more sub-streams and the quantization values corresponding to the sub-streams, the decompressed input value or the decompressed weight value is obtained.

其中，所述子码流对应的量化值根据预设的对应关系所确定，所述对应关系指示所述子码流在所述码流中的不同顺序所对应的量化值。在一个例子中，若所述输入值或者所述权重值的量化比特数为1，则其量化范围为{-1,1}，所述对应关系比如可以设置顺序为奇数的表示-1连续的个数，顺序为偶数的表示1连续的个数，比如第一个、第三个子码……第2n+1个子码流表示-1连续的个数，第二个、第四个……第2n个子码流表示1连续的个数，n为整数；或者顺序为奇数的表示1连续的个数，顺序为偶数的表示-1连续的个数。The quantization values corresponding to the sub-code streams are determined according to a preset correspondence relationship, and the correspondence relationship indicates the quantized values corresponding to different orders of the sub-code streams in the code stream. In an example, if the number of quantization bits of the input value or the weight value is 1, then the quantization range is {-1, 1}. For example, the corresponding relationship can be set to an odd number, indicating that -1 is continuous. Numbers, the order is an even number to represent the number of consecutive 1s, such as the first, the third subcode... The 2n+1 subcode stream represents the number of consecutive -1, the second, the fourth... The first The 2n sub-streams represent the number of consecutive 1s, and n is an integer; or the sequence of odd numbers represents the number of consecutive 1s, and the sequence of even numbers represents the number of consecutive -1s.

在一个示例性的实施例中，若所述输入值或者所述权重值的量化比特数为1，则其量化范围为{-1,1}，考虑到在计算机进行卷积运算过程都是采用二进制方式进行运算的，则可以用二进制方式中的0来表示量化值-1，用二进制方式中的1来表示1，则量化范围为{-1,1}，以二进制方式表示为{0，1}。假设每个量化值连续的个数使用4比特位表示，奇数顺序的表示0连续的个数，偶数顺序的表示1连续的个数；比如请参阅表1，原始值为32比特位，依次是8个连续0，12个连续1和12个0，编码后第一个子码流为“1000”，表示0连续的个数；第二个子码流为“1100”，表示1连续的个数；第三个子码流为“1100”，表示0连续的个数；从而得到编码后的码流为“100011001100”，用16进制表示方式表示为8CC，编码后的码流表示为12比特位，相较于32比特位表示的原始值，压缩比为62.5％，从而有利于节省存储资源；在解码时，请参阅表2，编码为8CC，即有“100011001100”，根据预设的4比特位可以获取到“1000”、“1100”以及“1100”，再根据预设的对应关系：“奇数顺序的表示0连续的个数，偶数顺序的表示1连续的个数”；解码后分别为8个连续0，12个连续1和12个0。In an exemplary embodiment, if the number of quantization bits of the input value or the weight value is 1, the quantization range thereof is {-1, 1}. If the operation is performed in binary mode, 0 in binary mode can be used to represent the quantization value -1, and 1 in binary mode can be used to represent 1, then the quantization range is {-1, 1}, which is represented as {0 in binary mode, 1}. Assume that the consecutive number of each quantization value is represented by 4 bits, the odd order represents the consecutive number of 0, and the even order represents the consecutive number of 1; for example, please refer to Table 1, the original value is 32 bits, followed by 8 consecutive 0s, 12 consecutive 1s and 12 0s, the first substream after encoding is "1000", indicating the number of consecutive 0s; the second substream is "1100", indicating the number of consecutive 1s ; The third sub-stream is "1100", indicating the number of consecutive 0s; thus the encoded code stream is "100011001100", which is expressed as 8CC in hexadecimal notation, and the encoded code stream is expressed as 12 bits. , compared to the original value represented by 32 bits, the compression ratio is 62.5%, which is conducive to saving storage resources; when decoding, please refer to Table 2, the encoding is 8CC, that is, "100011001100", according to the preset 4 bits Bits can be obtained as "1000", "1100" and "1100", and then according to the preset correspondence: "The odd-numbered order represents the consecutive number of 0s, and the even-numbered order represents the consecutive number of 1s"; after decoding, they are 8 consecutive 0s, 12 consecutive 1s and 12 consecutive 0s.

表1Table 1

表2Table 2

若所述输入值或者所述权重值的量化比特数大于1，所述经压缩的输入值或者所述经压缩的权重值分别对应的码流指示量化值及所述量化值连续的个数；所述量化值为所述输入值或者所述权重值经过量化后的取值。If the number of quantization bits of the input value or the weight value is greater than 1, the code stream corresponding to the compressed input value or the compressed weight value respectively indicates the quantization value and the number of consecutive quantization values; The quantized value is a quantized value of the input value or the weight value.

在一实施例中，若所述输入值或者所述权重值的量化比特数大于1，在所述输入值或者所述权重值分别对应的码流中，每个量化值及其连续的个数使用预设的比特位数表示；In an embodiment, if the number of quantized bits of the input value or the weight value is greater than 1, in the code stream corresponding to the input value or the weight value, each quantization value and its consecutive number Use the preset number of bits to represent;

所述解码器14具体用于：根据预设的比特位数对所述经压缩的输入值或者所述经压缩的权重值分别对应的码流进行划分，获得一个或多个子码流；对于每一个子码流，从所述子码流的指定比特位上获取所述量化值以及从其他比特位上获取所述量化值连续的个数。其中，所述指定比特位可依据实际应用场景进行具体设置，本申请实施例对此不做任何限制。The decoder 14 is specifically configured to: divide the code stream corresponding to the compressed input value or the compressed weight value according to a preset number of bits to obtain one or more sub-code streams; For a sub-code stream, the quantized value is obtained from a specified bit of the sub-code stream and the continuous number of the quantized value is obtained from other bits. The specified bits may be specifically set according to actual application scenarios, which are not limited in this embodiment of the present application.

在一个例子中，以所述输入值的量化比特数为2为例进行说明，无符号值和无符号值的量化范围分别为{-2,-1,0,1}或{0,1,2,3}。所述输入值是卷积神经网络中的上一层运算后输出的激活值，在各层激活值中，存在大量连续存储的数值。假设在所述输入值对应的码流中，每个量化值及其连续的个数使用8比特位表示，在一个例子中，采用8比特位对激活值进行压缩，前6个比特位用于统计连续数值的个数，后两个比特位用于存储实际数值。对于量化比特数为2的无符号值，用二进制方式中的00表示0，01表示1，10表示2，11表示3。对于量化比特数为2的有符号值，用二进制方式中的00表示0，01表示1，10表示-2，11表示-1。请参阅表3，以无符号值为例，原始值有32个数，依次是6个连续2，7个连续1，7个连续3和12个0，每个数的二进制表示是2比特位，则一共需要64比特位；编码后第一个子码流为“00011010”，其中前6个比特位“000110”表示后2个比特位“10”的个数；第二个子码流为“00011101”，第三个子码流为“00011111”，第四个子码流为“01010000”，则编码后的码流为“00011010000111010001111101010000”，以16进制方式表示为1A1D1F50，编码占32比特位，相较于64比特位的原始值，压缩比为50％，从而有利于节省存储资源；在解码时，请参阅表4，编码为1A1D1F50，即有“00011010000111010001111101010000”，根据预设的8比特位可以获取到“00011010”、“00011101”、“00011111”以及“01010000”，前6个比特位用于统计连续数值的个数，后两个比特位用于存储实际数值，解码后分别为6个连续2，7个连续1，7个连续3和12个0。In an example, the number of quantization bits of the input value is 2 for illustration, and the quantization ranges of the unsigned value and the unsigned value are {-2, -1, 0, 1} or {0, 1, 2,3}. The input value is the activation value output after the operation of the previous layer in the convolutional neural network, and there are a large number of continuously stored values in the activation value of each layer. Assume that in the code stream corresponding to the input value, each quantized value and its consecutive number are represented by 8 bits. In an example, 8 bits are used to compress the activation value, and the first 6 bits are used for Count the number of consecutive values, and the last two bits are used to store the actual value. For an unsigned value with 2 quantization bits, 00 is used for 0, 01 for 1, 10 for 2, and 11 for 3 in binary. For a signed value with 2 quantization bits, 00 is used for 0, 01 for 1, 10 for -2, and 11 for -1 in binary. Please refer to Table 3. Taking the unsigned value as an example, the original value has 32 numbers, which are 6 consecutive 2s, 7 consecutive 1s, 7 consecutive 3s and 12 consecutive 0s. The binary representation of each number is 2 bits. , a total of 64 bits are required; the first sub-stream after encoding is "00011010", of which the first 6 bits "000110" represent the number of the last 2 bits "10"; the second sub-stream is " 00011101", the third sub-stream is "00011111", and the fourth sub-stream is "01010000", then the encoded stream is "000110100001110100011111101010000", which is expressed as 1A1D1F50 in hexadecimal format, and the encoding occupies 32 bits. Compared with the original value of 64 bits, the compression ratio is 50%, which is conducive to saving storage resources; when decoding, please refer to Table 4, the encoding is 1A1D1F50, that is, "00011010000111010001111101010000", which can be obtained according to the preset 8 bits To "00011010", "00011101", "00011111" and "01010000", the first 6 bits are used to count the number of consecutive values, and the last two bits are used to store the actual value. After decoding, they are 6 consecutive 2 , 7 consecutive 1s, 7 consecutive 3s and 12 consecutive 0s.

表3table 3

表4Table 4

在一实施例中，请参阅图4，所述处理装置还包括有编码器15，所述加法器13用于将所述卷积运算的运算结果传输给所述编码器15；所述编码器15对所述卷积运算的运算结果进行编码之后，将编码后的结果存入所述存储器中，从而有利于减少需要存储的数据量，实现存储资源的综合利用。在一例子中，可以采用游程编码方式对所述卷积运算的运算结果进行编码。In an embodiment, please refer to FIG. 4 , the processing device further includes an encoder 15, and the adder 13 is configured to transmit the operation result of the convolution operation to the encoder 15; the encoder 15. After the operation result of the convolution operation is encoded, the encoded result is stored in the memory, thereby helping to reduce the amount of data that needs to be stored and realize comprehensive utilization of storage resources. In an example, the operation result of the convolution operation may be encoded in a run-length encoding manner.

在一实施例中，请参阅图5，所述处理装置还包括有控制模块16，所述控制模块16用于接收卷积运算指令，并将所述卷积运算指令反馈给所述数据加载模块11、所述解码器14、所述数据选择器12和所述加法器13；所述数据加载模块11根据所述卷积运算指令加载经压缩的输入值和经压缩的权重值并输入所述解码器14中；所述解码器14根据所述卷积运算指令分别对所述经压缩的输入值和经压缩的权重值进行解压，并输入所述数据选择器12中；所述数据选择器12基于所述卷积运算指令，并根据所述输入值和权重值输出所述卷积运算中的乘法运算结果，并将所述乘法运算结果发送至所述加法器13；所述加法器13基于所述卷积运算指令，并根据所述乘法运算结果输出所述卷积运算的运算结果。In one embodiment, referring to FIG. 5 , the processing device further includes a control module 16, which is configured to receive a convolution operation instruction and feed back the convolution operation instruction to the data loading module. 11. The decoder 14, the data selector 12 and the adder 13; the data loading module 11 loads the compressed input value and the compressed weight value according to the convolution operation instruction and inputs the In the decoder 14; the decoder 14 decompresses the compressed input value and the compressed weight value respectively according to the convolution operation instruction, and inputs it into the data selector 12; the data selector 12 based on the convolution operation instruction, and output the multiplication result in the convolution operation according to the input value and the weight value, and send the multiplication result to the adder 13; the adder 13 Based on the convolution operation instruction, and output the operation result of the convolution operation according to the multiplication operation result.

在一个例子中，所述控制模块可以是中央处理单元(Central Processing Unit，CPU)，还可以是数字信号处理器(Digital Signal Processor，DSP)、专用集成电路(Application Specific Integrated Circuit，ASIC)、现成可编程门阵列(Field-Programmable Gate Array，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。In one example, the control module may be a central processing unit (Central Processing Unit, CPU), or may be a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), off-the-shelf Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.

相应地，请参阅图6，本申请实施例还提供了一种数据处理方法，所述方法应用于进行卷积运算的处理装置中，所述处理装置包括数据选择器；所述方法包括：Correspondingly, referring to FIG. 6 , an embodiment of the present application further provides a data processing method. The method is applied to a processing apparatus for performing convolution operations, and the processing apparatus includes a data selector; the method includes:

在步骤S101中，加载输入值和权重值。In step S101, the input value and the weight value are loaded.

在步骤S102中，将所述输入值和权重值输入所述数据选择器中，利用所述数据选择器得到所述卷积运算中的乘法运算结果。In step S102, the input value and the weight value are input into the data selector, and the data selector is used to obtain the multiplication result in the convolution operation.

在步骤S103中，基于所述乘法运算结果得到所述卷积运算的运算结果。In step S103, an operation result of the convolution operation is obtained based on the multiplication operation result.

本实施例中，通过数据选择器代替乘法器来获取卷积运算中的乘法运算结果，进一步降低了运算的复杂度，从而有利于提高神经网络的运算速度，满足实际产品的实时性需求。In this embodiment, the multiplication result in the convolution operation is obtained by using the data selector instead of the multiplier, which further reduces the complexity of the operation, thereby improving the operation speed of the neural network and meeting the real-time requirements of actual products.

在一实施例中，所述输入值和所述权重值为经过量化后的结果。In one embodiment, the input value and the weight value are quantized results.

在一实施例中，所述输入值和/或所述权重值的量化比特数均小于8。In an embodiment, the number of quantization bits of the input value and/or the weight value is less than 8.

本实施例中，在进行卷积运算时，使用经过量化的输入值和经过量化的权重值进行运算，相较于量化之前，消耗的运算资源更少，所需占用的带宽也减少了，降低了功耗。In this embodiment, when the convolution operation is performed, the quantized input value and the quantized weight value are used to perform the operation. Compared with before the quantization, less computing resources are consumed, and the required bandwidth is also reduced. power consumption.

在一实施例中，所述将所述输入值和权重值输入数据选择器中，利用所述数据选择器得到所述卷积运算中的乘法运算结果，包括：将所述输入值进行取反操作得到所述输入值的相反数；将所述输入值、所述输入值的相反数以及所述权重值输入数据选择器中，利用所述数据选择器得到所述卷积运算中的乘法运算结果。In an embodiment, the inputting the input value and the weight value into a data selector, and using the data selector to obtain a multiplication result in the convolution operation includes: inverting the input value Operation to obtain the inverse of the input value; input the input value, the inverse of the input value and the weight value into a data selector, and use the data selector to obtain the multiplication operation in the convolution operation result.

在一实施例中，所述权重值的量化比特数为1，则所述利用所述数据选择器得到所述卷积运算中的乘法运算结果，包括：根据输入的所述权重值，使得所述数据选择器选择输出所述输入值或者所述输入值的相反数。In an embodiment, if the number of quantization bits of the weight value is 1, the obtaining the multiplication operation result in the convolution operation by using the data selector includes: according to the input weight value, making all The data selector selects to output the input value or the inverse of the input value.

在一实施例中，所述输入值和/或所述输入值的相反数以补码形式表示。In one embodiment, the input value and/or the inverse of the input value is represented in two's complement form.

在一实施例中，所述数据选择器包括门电路。In one embodiment, the data selector includes a gate circuit.

在一实施例中，所述输入值包括有符号值和无符号值；输入所述数据选择器中的输入值为有符号值。In one embodiment, the input value includes a signed value and an unsigned value; the input value input into the data selector is a signed value.

若所述输入值为无符号值，所述基于所述乘法运算结果得到所述卷积运算的运算结果，包括：基于所述乘法运算结果和偏置值得到所述卷积运算的运算结果，所述偏置值指示有符号值与无符号值之间的转换关系。If the input value is an unsigned value, the obtaining the operation result of the convolution operation based on the multiplication operation result includes: obtaining the operation result of the convolution operation based on the multiplication operation result and the offset value, The offset value indicates a conversion relationship between signed and unsigned values.

在一实施例中，所述偏置值根据所述输入值的量化比特数以及所述权重值确定。In one embodiment, the offset value is determined according to the number of quantization bits of the input value and the weight value.

在一实施例中，所述偏置值在离线情况下获取。In one embodiment, the offset value is obtained offline.

在一实施例中，所述获取输入值和权重值，包括：从存储器中读取经压缩的输入值和经压缩的权重值；分别对所述经压缩的输入值和经压缩的权重值进行解压，获取所述输入值和权重值。In one embodiment, the obtaining the input value and the weight value includes: reading the compressed input value and the compressed weight value from a memory; Decompress to obtain the input value and weight value.

本实施例中，在所述存储器中存储的是经压缩的输入值和经压缩的权重值，从而有利于减少需要存储的数据量，实现存储资源的综合使用。In this embodiment, the compressed input value and the compressed weight value are stored in the memory, which is beneficial to reduce the amount of data that needs to be stored and realize the comprehensive use of storage resources.

在一实施例中，分别指示所述经压缩的输入值和所述经压缩的权重值的两个码流均通过游程编码方式编码得到。In an embodiment, the two code streams respectively indicating the compressed input value and the compressed weight value are obtained by encoding in a run-length encoding manner.

在一实施例中，若所述输入值或者所述权重值的量化比特数为1，所述经压缩的输入值和所述经压缩的权重值分别对应的码流指示量化值连续的个数；所述量化值为所述输入值和所述权重值经过量化后的取值。In one embodiment, if the number of quantization bits of the input value or the weight value is 1, the code streams corresponding to the compressed input value and the compressed weight value respectively indicate the number of consecutive quantization values. ; the quantized value is the quantized value of the input value and the weight value.

在一实施例中，若所述输入值或者所述权重值的量化比特数大于1，所述经压缩的输入值或者所述经压缩的权重值分别对应的码流指示量化值及所述量化值连续的个数；所述量化值为所述输入值或者所述权重值经过量化后的取值。In one embodiment, if the number of quantization bits of the input value or the weight value is greater than 1, the code streams corresponding to the compressed input value or the compressed weight value respectively indicate the quantization value and the quantization value. The number of consecutive values; the quantized value is the quantized value of the input value or the weight value.

在一实施例中，若所述输入值或者所述权重值的量化比特数为1，在所述输入值或者所述权重值分别对应的码流中，每个量化值连续的个数使用预设的比特位数表示。In one embodiment, if the number of quantization bits of the input value or the weight value is 1, in the code stream corresponding to the input value or the weight value respectively, the number of consecutive quantization values is pre-used. Set the number of bits to represent.

所述分别对所述经压缩的输入值和所述经压缩的权重值进行解压，包括：根据所述预设的比特位数对所述经压缩的输入值或者所述经压缩的权重值分别对应的码流进行划分，获得一个或多个子码流；根据所述一个或多个子码流以及所述子码流对应的量化值，获取解压后的输入值或者解压后的权重值。The separately decompressing the compressed input value and the compressed weight value includes: decompressing the compressed input value or the compressed weight value according to the preset number of bits, respectively. The corresponding code stream is divided to obtain one or more sub-code streams; according to the one or more sub-code streams and the quantization value corresponding to the sub-code stream, the decompressed input value or the decompressed weight value is obtained.

在一实施例中，所述子码流对应的量化值根据预设的对应关系所确定，所述对应关系指示所述子码流在所述码流中的不同顺序所对应的量化值。In an embodiment, the quantization values corresponding to the sub-streams are determined according to a preset correspondence relationship, and the correspondence indicates the quantization values corresponding to different orders of the sub-streams in the code stream.

在一实施例中，若所述输入值或者所述权重值的量化比特数大于1，在所述输入值或者所述权重值分别对应的码流中，每个量化值及其连续的个数使用预设的比特位数表示。In an embodiment, if the number of quantized bits of the input value or the weight value is greater than 1, in the code stream corresponding to the input value or the weight value, each quantization value and its consecutive number Use the preset number of bits to represent.

所述分别对所述经压缩的输入值和所述经压缩的权重值进行解压，包括：根据预设的比特位数对所述经压缩的输入值或者所述经压缩的权重值分别对应的码流进行划分，获得一个或多个子码流；对于每一个子码流，从所述子码流的指定比特位上获取所述量化值以及从其他比特位上获取所述量化值连续的个数。The decompressing the compressed input value and the compressed weight value respectively includes: corresponding to the compressed input value or the compressed weight value according to a preset number of bits, respectively; The code stream is divided to obtain one or more sub-code streams; for each sub-code stream, the quantized value is obtained from the specified bits of the sub-code stream and the consecutive number of quantized values are obtained from other bits. number.

所述数据处理装置可应用于如ARM，DSP，FPGA等常见平台上，可以广泛应用于无人机，手持设备，机器人及IoT等设备上。The data processing device can be applied to common platforms such as ARM, DSP, and FPGA, and can be widely applied to UAVs, handheld devices, robots, and IoT devices.

在示例性实施例中，还提供了一种包括指令的非临时性计算机可读存储介质，例如包括指令的存储器，上述指令可由装置的处理器执行以完成上述方法。例如，非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。In an exemplary embodiment, there is also provided a non-transitory computer-readable storage medium, such as a memory including instructions, executable by a processor of an apparatus to perform the above-described method. For example, the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

一种非临时性计算机可读存储介质，当存储介质中的指令由终端的处理器执行时，使得终端能够执行上述方法。A non-transitory computer-readable storage medium, when the instructions in the storage medium are executed by the processor of the terminal, enable the terminal to execute the above method.

本申请实施例提供一种可移动平台，包括：The embodiment of the present application provides a movable platform, including:

机体；body;

如以上所述的数据处理装置。所述数据处理装置可以用于检测目标、分类等任务。A data processing device as described above. The data processing device can be used for tasks such as detection of objects and classification.

需要说明的是，在本文中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this document, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any relationship between these entities or operations. any such actual relationship or sequence exists. The terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion such that a process, method, article or device comprising a list of elements includes not only those elements, but also other not expressly listed elements, or also include elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

以上对本申请实施例所提供的方法和装置进行了详细介绍，本文中应用了具体个例对本申请的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本申请的方法及其核心思想；同时，对于本领域的一般技术人员，依据本申请的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本申请的限制。The methods and devices provided by the embodiments of the present application have been introduced in detail above, and specific examples are used to illustrate the principles and implementations of the present application. At the same time, for those of ordinary skill in the art, according to the idea of the application, there will be changes in the specific implementation and application scope. In summary, the content of this specification should not be construed as a limitation to the application. .

Claims

A data processing method, which is applied to a processing device for performing convolution operations, wherein the processing device includes a data selector; the method includes:

Load input values and weight values;

Inputting the input value and the weight value into the data selector, and using the data selector to obtain the multiplication result in the convolution operation;

The operation result of the convolution operation is obtained based on the multiplication operation result.

The method according to claim 1, wherein the input value and the weight value are quantized results.

The method according to claim 2, wherein the number of quantization bits of the input value and/or the weight value is less than 8.

The method according to claim 1, wherein the inputting the input value and the weight value into a data selector, and using the data selector to obtain a multiplication result in the convolution operation, comprising:

Inverting the input value to obtain the inverse of the input value;

The input value, the inverse of the input value, and the weight value are input into a data selector, and the data selector is used to obtain a multiplication result in the convolution operation.

The method according to claim 4, wherein, if the number of quantization bits of the weight value is 1, the obtaining the multiplication operation result in the convolution operation by using the data selector comprises:

According to the input weight value, the data selector is caused to select and output the input value or the opposite number of the input value.

The method according to claim 1, wherein the input value and/or the inverse of the input value is represented in complement form.

6. The method of claim 5, wherein the data selector comprises a gate circuit.

The method according to claim 1, wherein the input value includes a signed value and an unsigned value; the input value input into the data selector is a signed value;

If the input value is an unsigned value, obtaining the operation result of the convolution operation based on the multiplication operation result, including:

An operation result of the convolution operation is obtained based on the multiplication operation result and an offset value, the offset value indicating a conversion relationship between a signed value and an unsigned value.

The method according to claim 8, wherein the offset value is determined according to the number of quantization bits of the input value and the weight value.

The method according to claim 8 or 9, wherein the offset value is obtained offline.

The method according to claim 1, wherein the acquiring the input value and the weight value comprises:

reading compressed input values and compressed weight values from memory;

The compressed input value and the compressed weight value are decompressed, respectively, to obtain the input value and the weight value.

The method according to claim 11, wherein the two code streams indicating the compressed input value and the compressed weight value respectively are obtained through run-length coding.

The method according to claim 12, wherein if the number of quantization bits of the input value or the weight value is 1, the code streams corresponding to the compressed input value and the compressed weight value respectively correspond to Indicates the number of consecutive quantized values; the quantized value is the quantized value of the input value and the weight value.

The method according to claim 12, wherein if the number of quantization bits of the input value or the weight value is greater than 1, the code stream corresponding to the compressed input value or the compressed weight value respectively Indicates the quantized value and the number of consecutive quantized values; the quantized value is the quantized value of the input value or the weight value.

The method according to claim 13, wherein if the number of quantization bits of the input value or the weight value is 1, in the code stream corresponding to the input value or the weight value, each quantization bit The number of consecutive values is represented by the preset number of bits;

The decompressing the compressed input value and the compressed weight value, respectively, includes:

Divide the code streams corresponding to the compressed input value or the compressed weight value according to the preset number of bits to obtain one or more sub-code streams;

The decompressed input value or the decompressed weight value is obtained according to the one or more sub-streams and the quantization value corresponding to the sub-stream.

The method according to claim 15, wherein the quantization value corresponding to the sub-code stream is determined according to a preset correspondence relationship, and the correspondence relationship indicates different orders of the sub-code stream in the code stream the corresponding quantization value.

The method according to claim 14, wherein if the number of quantization bits of the input value or the weight value is greater than 1, in the code stream corresponding to the input value or the weight value, each quantization The value and its consecutive number are represented by the preset number of bits;

According to the preset number of bits, the code streams corresponding to the compressed input values or the compressed weight values are divided to obtain one or more sub-code streams;

For each sub-stream, the quantized value is obtained from a specified bit of the sub-stream and the consecutive number of the quantized value is obtained from other bits.

A data processing device, characterized in that the device is used to perform convolution operations, comprising a data loading module, a data selector and an adder;

the data loading module, for loading the input value and the weight value and inputting it into the data selector;

the data selector, configured to output the multiplication result in the convolution operation according to the input value and the weight value, and send the multiplication result to the adder;

The adder is configured to output the operation result of the convolution operation according to the multiplication operation result.

The apparatus according to claim 18, wherein the input value and the weight value are quantized results.

The apparatus according to claim 19, wherein the number of quantization bits of the input value and/or the weight value is less than 8.

The device according to claim 18, wherein the data selector is specifically configured to: output the multiplication operation in the convolution operation according to the input value, the inverse of the input value and the weight value result.

The apparatus according to claim 21, wherein the number of quantization bits of the weight value is 1, and the data selector is specifically configured to: select and output the input value or the input value according to the input weight value The opposite of the value.

The apparatus of claim 18, wherein the input value and/or the inverse of the input value is represented in two's complement form.

23. The apparatus of claim 22, wherein the data selector comprises a gate circuit.

The apparatus according to claim 18, wherein the input value includes a signed value and an unsigned value; the input value input into the data selector is a signed value;

If the input value is an unsigned value, the adder is specifically configured to: output the operation result of the convolution operation according to the multiplication operation result and the offset value; the offset value indicates the signed value and the unsigned value Conversion relationship between values.

The apparatus according to claim 25, wherein the offset value is determined according to the number of quantization bits of the input value and the weight value.

The apparatus of claim 25 or 26, wherein the offset value is obtained offline.

The apparatus of claim 18, further comprising a decoder;

The data loading module is specifically configured to read the compressed input value and the compressed weight value from the memory;

The decoder is used to decompress the compressed input value and the compressed weight value, respectively, and input into the data selector.

The apparatus according to claim 28, wherein the two code streams indicating the compressed input value and the compressed weight value respectively are obtained by encoding in a run-length encoding manner.

The device according to claim 29, wherein if the number of quantization bits of the input value or the weight value is 1, the code streams corresponding to the compressed input value and the compressed weight value respectively correspond to Indicates the number of consecutive quantized values; the quantized value is the quantized value of the input value and the weight value.

The device according to claim 29, wherein if the number of quantization bits of the input value or the weight value is greater than 1, the code stream corresponding to the compressed input value or the compressed weight value respectively Indicates the quantized value and the number of consecutive quantized values; the quantized value is the quantized value of the input value or the weight value.

The apparatus according to claim 30, wherein if the number of quantization bits of the input value or the weight value is 1, in the code stream corresponding to the input value or the weight value, each quantization bit The number of consecutive values is represented by the preset number of bits;

The decoder is specifically configured to: divide the code streams corresponding to the compressed input value or the compressed weight value respectively according to the preset number of bits to obtain one or more sub-code streams; The one or more sub-streams and the quantized values corresponding to the sub-streams are obtained as decompressed input values or decompressed weight values.

The apparatus according to claim 32, wherein the quantization values corresponding to the sub-streams are determined according to a preset correspondence relationship, and the correspondence indicates different orders of the sub-streams in the code stream the corresponding quantization value.

The apparatus according to claim 31, wherein if the number of quantization bits of the input value or the weight value is greater than 1, in the code stream corresponding to the input value or the weight value, each quantization The value and its consecutive number are represented by the preset number of bits;

The decoder is specifically configured to: divide the compressed input value or the code stream corresponding to the compressed weight value according to a preset number of bits to obtain one or more sub-code streams; for each A sub-stream, the quantized value is obtained from a specified bit of the sub-stream and the consecutive number of the quantized value is obtained from other bits.

A computer-readable storage medium, characterized in that computer instructions are stored thereon, and when the instructions are executed by a processor, perform the method of any one of claims 1 to 17.

A movable platform, characterized in that, comprising:

body;

a power system, arranged on the body, the power system is used to provide power for the movable platform;

A data processing device as claimed in claims 18-33.