CN111612137A

CN111612137A - Optimization method and system of convolutional neural network based on soft threshold ternary parameters

Info

Publication number: CN111612137A
Application number: CN202010456560.3A
Authority: CN
Inventors: 程健; 许伟翔
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2020-05-26
Filing date: 2020-05-26
Publication date: 2020-09-01

Abstract

The invention belongs to the field of data processing, and particularly relates to a soft threshold ternary parameter-based convolutional neural network optimization method and system, aiming at solving the problem of optimization acceleration of a convolutional neural network. The method comprises the following steps: splitting the convolutional layer of the convolutional neural network into two parallel convolutional layers with the same size; and after the two convolution layers are respectively binarized under the constraint condition with equal scale coefficients, the two convolution layers are correspondingly added to obtain a ternary parameter. The invention can realize the optimized acceleration and compression of the deep convolutional neural network.

Description

Optimization method and system of convolutional neural network based on soft threshold ternary parameters

技术领域technical field

本发明属于数据处理领域，具体涉及一种基于软阈值三值化参数的卷积神经网络优化方法、系统。The invention belongs to the field of data processing, and in particular relates to a convolutional neural network optimization method and system based on soft threshold ternary parameters.

背景技术Background technique

近几年来，深度卷积神经网络在计算机视觉、语音处理、机器学习等众多领域取得了巨大的突破，显著地提高了机器算法在图像分类、目标检测和语音识别等多个任务中的性能，并且在互联网、视频监控等行业中得到了广泛应用。In recent years, deep convolutional neural networks have made great breakthroughs in many fields such as computer vision, speech processing, and machine learning, significantly improving the performance of machine algorithms in multiple tasks such as image classification, object detection, and speech recognition. And in the Internet, video surveillance and other industries have been widely used.

深度卷积神经网络的训练过程，是基于大规模的含有人工标注信息的数据集，对网络参数进行学习与调整。一般而言，大容量、高复杂度的深度卷积网络可以更全面地对数据进行学习，从而取得更好的性能指标。但是，随着网络层数与参数数量的增加，运算和存储代价都会大幅增长，因此目前来说，卷积神经网络的训练与测试大多只能在高性能的计算集群上进行。The training process of deep convolutional neural network is to learn and adjust network parameters based on large-scale datasets containing manual annotation information. In general, large-capacity, high-complexity deep convolutional networks can learn more comprehensively from the data, resulting in better performance metrics. However, with the increase of the number of network layers and parameters, the computational and storage costs will increase significantly. Therefore, at present, most of the training and testing of convolutional neural networks can only be performed on high-performance computing clusters.

另一方面，移动互联网技术在近年取得了长足进步，在实际生活中的应用也越来越广泛。在移动互联网的应用场景下，用户所使用的设备，例如手机或者平板电脑，其运算与存储能力都十分有限。虽然深度卷积神经网络可以在计算集群上进行训练，但是在移动平台应用场景下，网络模型的测试过程仍需要在移动设备上进行，这就提出了两个挑战：如何降低卷积神经网络的推理用时，以及如何压缩网络模型的存储开销。On the other hand, mobile Internet technology has made great progress in recent years, and its application in real life has become more and more extensive. In the application scenario of the mobile Internet, the devices used by users, such as mobile phones or tablet computers, have very limited computing and storage capabilities. Although deep convolutional neural networks can be trained on computing clusters, in mobile platform application scenarios, the testing process of network models still needs to be performed on mobile devices, which poses two challenges: how to reduce the performance of convolutional neural networks. Inference time, and how to compress the storage overhead of the network model.

针对卷积神经网络的加速与压缩问题，已有一些有效的三值化算法被提出。这些算法是通过求解最优化问题或梯度学习来设定两个固定的阈值，通过固定的阈值来达到三值化的目的，这类方法可以统一归类为“硬阈值”三值化方法。然而，这些硬阈值的三值化算法由于阈值计算的困难性，计算最优的阈值往往需要花费大量时间。因此如何高效得进行深度卷积神经网络的三值化仍有待研究。For the acceleration and compression of convolutional neural networks, some effective ternary algorithms have been proposed. These algorithms set two fixed thresholds by solving optimization problems or gradient learning, and achieve the purpose of ternarization through fixed thresholds. Such methods can be uniformly classified as "hard threshold" ternarization methods. However, these hard-threshold ternarization algorithms often take a lot of time to calculate the optimal threshold due to the difficulty of threshold calculation. Therefore, how to efficiently perform ternaryization of deep convolutional neural networks remains to be studied.

发明内容SUMMARY OF THE INVENTION

为了解决现有技术中的上述问题，即为了解决卷积神经网络的优化加速问题，本发明的第一方面，提供了一种基于软阈值三值化参数的卷积神经网络优化方法，包括构建中间网络和网络优化两部分：In order to solve the above problems in the prior art, that is, in order to solve the optimization acceleration problem of convolutional neural networks, the first aspect of the present invention provides a convolutional neural network optimization method based on soft threshold ternary parameters, including constructing The intermediate network and network optimization are two parts:

构建中间网络，包括以下步骤：Building an intermediate network includes the following steps:

步骤S100，将原卷积神经网络各卷积操作层，分别拆分为两个并列的、卷积核大小相同的两个子层,作为异化卷积操作层；所述卷积操作层包括卷积层和全连接层；Step S100, each convolution operation layer of the original convolutional neural network is divided into two parallel sub-layers with the same convolution kernel size, as the alienated convolution operation layer; the convolution operation layer includes convolution layers and fully connected layers;

步骤S200，在相同等尺度系数的约束条件下，分别对每个异化卷积操作层中的子层进行二值化，得到中间网络；Step S200, under the constraint of the same equal-scale coefficients, binarize the sub-layers in each alienated convolution operation layer to obtain an intermediate network;

网络优化，包括以下步骤：Network optimization, including the following steps:

基于训练数据对所述中间网络进行训练，得到优化后的中间网络；The intermediate network is trained based on the training data to obtain an optimized intermediate network;

将所述优化后的中间网络中每个异化卷积操作层中两个子层的二值化权重相加得到一个三值权重，得到三值化的卷积神经网络。The binarization weights of the two sublayers in each alienated convolution operation layer in the optimized intermediate network are added to obtain a ternary weight, and a ternary convolutional neural network is obtained.

在一些优选实施方式中，所述两个子层位置上并列设置，形状上和原卷积核大小相等。In some preferred embodiments, the two sub-layers are arranged side by side in position, and the shape is equal to the size of the original convolution kernel.

在一些优选实施方式中，所述卷积操作层包括L层卷积神经网络的第2至L-1层；其中L为卷积神经网络的总层数。In some preferred embodiments, the convolution operation layer includes layers 2 to L-1 of the L-layer convolutional neural network; wherein L is the total number of layers of the convolutional neural network.

在一些优选实施方式中，所述异化卷积操作层，其输出等于该层的输入分别与两个并列子层的卷积核进行卷积操作后再相加。In some preferred embodiments, the output of the alienated convolution operation layer is equal to the input of the layer and the convolution kernels of the two parallel sub-layers are respectively added after convolution operation.

在一些优选实施方式中，步骤S200中“对每个异化卷积操作层中的子层进行二值化”，其方法为：In some preferred embodiments, in step S200, "binarize the sublayers in each alienated convolution operation layer", the method is as follows:

通过最小化全精度权重W₁、W₂和二值化权重B₁、B₂之间的量化误差的最优化问题来求解；其中，W₁、W₂分别为异化卷积操作层中两个子层的全精度权重，B₁、B₂分别为异化卷积操作层中两个子层的二值化权重。It is solved by the optimization problem of minimizing the quantization error between the full-precision weights W ₁ , W ₂ and the binarization weights B ₁ , B ₂ ; where W ₁ and W ₂ are the two sub-layers in the alienated convolution operation layer, respectively. The full-precision weight of the layer, B ₁ and B ₂ are the binarization weights of the two sub-layers in the alienated convolution operation layer, respectively.

在一些优选实施方式中，求解所述最优化问题的约束条件为：异化卷积操作层中两个子层的二值化权重的等尺度系数α₁、α₂相等。In some preferred embodiments, the constraint condition for solving the optimization problem is that the equal-scale coefficients α ₁ and α ₂ of the binarization weights of the two sub-layers in the alienated convolution operation layer are equal.

在一些优选实施方式中，所述三值化的卷积神经网络，每个所述卷积操作层之前设置有激活量化层Ternarize()，对该层的输入X_i进行三值化处理：In some preferred embodiments, in the ternarized convolutional neural network, an activation quantization layer Ternarize() is set before each convolution operation layer, and the input X _i of this layer is subjected to ternarization processing:

其中，

为三值化处理的输出。in,

is the output of the ternarization process.

本发明的另一方面，提出了一种基于软阈值三值化参数的卷积神经网络优化系统，包括构建中间网络模块和网络优化模块；In another aspect of the present invention, a convolutional neural network optimization system based on soft threshold ternary parameters is proposed, which includes constructing an intermediate network module and a network optimization module;

所述构建中间网络模块，配置为通过以下方法构建中间网络：The building intermediate network module is configured to build an intermediate network through the following methods:

所述网络优化模块，配置为基于训练数据对所述中间网络进行训练，得到优化后的中间网络，将所述优化后的中间网络中每个异化卷积操作层中两个子层的二值化权重相加得到一个三值权重，得到三值化的卷积神经网络。The network optimization module is configured to train the intermediate network based on the training data, obtain an optimized intermediate network, and binarize the two sublayers in each alienated convolution operation layer in the optimized intermediate network. The weights are added to obtain a three-valued weight to obtain a three-valued convolutional neural network.

本发明的第三方面，提出了一种存储装置，其中存储有多条程序，所述程序适于由处理器加载并执行以实现上述的基于软阈值三值化参数的卷积神经网络优化方法。In a third aspect of the present invention, a storage device is provided, in which a plurality of programs are stored, and the programs are adapted to be loaded and executed by a processor to realize the above-mentioned optimization method of a convolutional neural network based on a soft-threshold ternary parameter .

本发明的第四方面，提出了一种处理装置，包括处理器、存储装置；处理器，适于执行各条程序；存储装置，适于存储多条程序；所述程序适于由处理器加载并执行以实现上述的基于软阈值三值化参数的卷积神经网络优化方法。In a fourth aspect of the present invention, a processing device is provided, including a processor and a storage device; the processor is adapted to execute various programs; the storage device is adapted to store multiple programs; the programs are adapted to be loaded by the processor And execute to realize the above-mentioned optimization method of convolutional neural network based on soft threshold ternary parameter.

本发明的有益效果：Beneficial effects of the present invention:

本发明实的基于软阈值三值化参数的卷积神经网络优化方法，通过将卷积神经网络的卷积核拆分为两个并列的相同大小的卷积层，然后在具有相等尺度系数的约束条件下对两个卷积层分别进行二值化后将其对应相加得到三值参数，使用位运算代替原有的浮点数卷积运算，从而可以实现深度卷积神经网络的优化加速与压缩。The convolutional neural network optimization method based on the soft-threshold ternary parameter of the present invention divides the convolutional kernel of the convolutional neural network into two parallel convolutional layers of the same size, and then divides the convolutional neural network into two parallel convolutional layers with equal scale coefficients. Under the constraints, binarize the two convolutional layers respectively and add them correspondingly to obtain three-valued parameters, and use the bit operation to replace the original floating-point convolution operation, so that the optimization and acceleration of the deep convolutional neural network can be achieved. compression.

附图说明Description of drawings

通过阅读参照以下附图所作的对非限制性实施例所作的详细描述，本申请的其它特征、目的和优点将会变得更明显：Other features, objects and advantages of the present application will become more apparent by reading the detailed description of non-limiting embodiments made with reference to the following drawings:

图1是本发明一种实施例的基于软阈值三值化参数的卷积神经网络优化方法流程示意图；1 is a schematic flowchart of a convolutional neural network optimization method based on a soft threshold ternary parameter according to an embodiment of the present invention;

图2是深度卷积神经网络的图像分类过程示意图；Figure 2 is a schematic diagram of the image classification process of a deep convolutional neural network;

图3是图像分类过程中深度卷积神经网络的卷积操作示意图；3 is a schematic diagram of the convolution operation of a deep convolutional neural network in an image classification process;

图4是本发明一种实施例中的并列二值化卷积核相加得到三值化卷积核过程示意图。4 is a schematic diagram of a process of adding parallel binarized convolution kernels to obtain a ternary convolution kernel in an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚，下面将结合附图对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the objectives, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are part of the embodiments of the present invention, not All examples. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是，此处所描述的具体实施例仅仅用于解释相关发明，而非对该发明的限定。另外还需要说明的是，为了便于描述，附图中仅示出了与有关发明相关的部分。The present application will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the related invention, but not to limit the invention. In addition, it should be noted that, for the convenience of description, only the parts related to the related invention are shown in the drawings.

需要说明的是，在不冲突的情况下，本申请中的实施例及实施例中的特征可以相互组合。It should be noted that the embodiments in the present application and the features of the embodiments may be combined with each other in the case of no conflict.

本发明的一种基于软阈值三值化参数的卷积神经网络优化方法，如图1所示，包括构建中间网络和网络优化两部分：A convolutional neural network optimization method based on soft threshold ternary parameters of the present invention, as shown in Figure 1, includes two parts: constructing an intermediate network and optimizing the network:

为了更清晰地对本发明基于软阈值三值化参数的卷积神经网络优化方法进行说明，下面结合附图，以深度卷积神经网络为例对本方发明方法一种实施例中各步骤进行展开详述。In order to more clearly describe the convolutional neural network optimization method based on the soft threshold ternary parameter of the present invention, the following describes the steps in an embodiment of the present invention method in detail by taking a deep convolutional neural network as an example with reference to the accompanying drawings. described.

图2示例性地示出了将深度卷积神经网络用于图像分类的过程。其中，卷积神经网络包含多个卷积层和多个全连接层。输入图像经过卷积层和全连接层的处理后得到分类结果。Figure 2 exemplarily shows the process of using a deep convolutional neural network for image classification. Among them, the convolutional neural network contains multiple convolutional layers and multiple fully connected layers. The input image is processed by the convolutional layer and the fully connected layer to obtain the classification result.

图3示例性地示出了在图像分类过程中深度卷积神经网络中卷积层的卷积操作。其中，每个卷积层都有一组卷积核，该组卷积核共同组成该层的权值张量，例如，卷积核可以设置为3×3；卷积层的处理方式就是使用所述卷积核对该层的输入特征图进行卷积操作(即计算每个卷积核与输入特征图的每个位置的卷积区域对应元素相乘,并求和),获得对应层的输出特征图。FIG. 3 exemplarily shows convolutional operations of convolutional layers in a deep convolutional neural network during image classification. Among them, each convolution layer has a set of convolution kernels, and the group of convolution kernels together constitute the weight tensor of the layer. For example, the convolution kernel can be set to 3 × 3; the processing method of the convolution layer is to use the The convolution kernel performs a convolution operation on the input feature map of the layer (that is, multiplies each convolution kernel with the corresponding elements of the convolution area at each position of the input feature map, and sums them up) to obtain the output features of the corresponding layer. picture.

本发明一种实施例的基于软阈值三值化参数的卷积神经网络优化方法，包括构建中间网络和网络优化。A method for optimizing a convolutional neural network based on a soft-threshold ternary parameter according to an embodiment of the present invention includes constructing an intermediate network and optimizing the network.

1、构建中间网络1. Build an intermediate network

步骤S100，将原卷积神经网络各卷积操作层，分别拆分为两个并列的、卷积核大小相同的两个子层,作为异化卷积操作层；所述卷积操作层包括卷积层和全连接层。Step S100, each convolution operation layer of the original convolutional neural network is divided into two parallel sub-layers with the same convolution kernel size, as the alienated convolution operation layer; the convolution operation layer includes convolution layers and fully connected layers.

对于给定的卷积神经网络，把其中除了第一层和最后一层外所有的具有卷积操作的层(包括卷积层和全连接层，简称卷积操作层)w拆分为两个形状和原来完全相同的并列的卷积层(子层)w1和w2。即，假如卷积神经网络的总层数为L，则卷积操作层包括L层卷积神经网络的第2至L-1层。For a given convolutional neural network, all the layers with convolution operations (including the convolution layer and the fully connected layer, referred to as the convolution operation layer) w except the first layer and the last layer are split into two Parallel convolutional layers (sublayers) w1 and w2 with exactly the same shape as the original. That is, if the total number of layers of the convolutional neural network is L, the convolutional operation layer includes layers 2 to L-1 of the L-layered convolutional neural network.

本实施例中，两个子层位置上并列设置，形状上和原卷积核大小相等；构建得到的异化卷积操作层，其输出等于该层的输入分别与两个并列子层的卷积核进行卷积操作后再相加。In this embodiment, two sub-layers are arranged side by side in position, and the shape is equal to the size of the original convolution kernel; the output of the obtained alienated convolution operation layer is equal to the input of this layer and the convolution kernel of the two parallel sub-layers respectively. After the convolution operation is performed, they are added.

步骤S200，在相同等尺度系数的约束条件下，分别对每个异化卷积操作层中的子层进行二值化，得到中间网络。Step S200 , under the constraints of the same equal-scale coefficients, binarize the sub-layers in each alienated convolution operation layer to obtain an intermediate network.

在本步骤中，使用三值权重T对全精度权重进行逼近，即原网络中的每个卷积层都由通过求解最优化问题得到的三值权重T∈{-1，0，+1}进行拟合。In this step, the ternary weight T is used to approximate the full-precision weight, that is, each convolutional layer in the original network is composed of the ternary weight T ∈ {-1, 0, +1} obtained by solving the optimization problem to fit.

(1)、分别对步骤S100中得到的两个子层的全精度权重W₁和W₂进行二值化，即用二值权重B₁，B₂∈{-1，+1}和尺度系数α₁，α₂∈R的乘积对全精度权重进行拟合，W₁≈α₁B₁，W₂≈α₂B₂。(1) Binarize the full-precision weights W ₁ and W ₂ of the two sub-layers obtained in step S100 respectively, that is, use the binary weights B ₁ , B ₂ ∈ {-1, +1} and the scale coefficient α ₁ , the product of α ₂ ∈ R to fit the full-precision weights, W ₁ ≈α ₁ B ₁ , W ₂ ≈α ₂ B ₂ .

(2)、对(1)中的拟合权重问题添加一个必要的约束条件，即使得α₁＝α₂，统一使用α表示。目的是使两个并列权重相加的结果是三值表示，即α₁B₁+α₂B₂＝α(B₁+B₂)是三值表示。(2) A necessary constraint is added to the fitting weight problem in (1), that is, α ₁ =α ₂ , which is represented by α uniformly. The purpose is that the result of adding the two parallel weights is a ternary representation, ie α ₁ B ₁ +α ₂ B ₂ =α(B ₁ +B ₂ ) is a ternary representation.

(3)、对添加了(2)的约束条件的权值拟合问题进行求解，以对对每个异化卷积操作层中的子层进行二值化。具体为：通过最小化全精度权重W₁、W₂和二值化权重B₁、B₂之间的量化误差的最优化问题来求解。(3), solving the weight fitting problem to which the constraints of (2) are added to binarize the sub-layers in each alienated convolution operation layer. Specifically, it is solved by an optimization problem of minimizing the quantization error between the full-precision weights W ₁ , W ₂ and the binarized weights B ₁ , B ₂ .

该最优化问题，可以建模为最小化下述最优化问题：

使得α₁＝α₂。将所述最优化问题展开得到

其中

是一个不影响优化求解的常数项。由B₁，B₂∈{-1，+1}^nchw可得

是常数项。当二值权重和相应的全精度权重拥有相同的符号时上述最优化问题取得最小值，即

其中，

和

为前文所述两个并列的全精度卷积层，

和

为对应卷积层中第i个卷积核，

和

为前文所述两个并列的二值化卷积层，N为该卷积层中卷积核的个数，α^*为前文所述尺度系数。This optimization problem can be modeled as minimizing the following optimization problem:

Let α ₁ =α ₂ . Expanding the optimization problem, we get

in

is a constant term that does not affect the optimization solution. From B ₁ , B ₂ ∈ {-1, +1} ^nchw can be obtained

is a constant term. The above optimization problem achieves a minimum when the binary weights and the corresponding full-precision weights have the same sign, that is,

in,

and

are the two parallel full-precision convolutional layers described above,

and

is the i-th convolution kernel in the corresponding convolutional layer,

and

is the two parallel binarized convolution layers mentioned above, N is the number of convolution kernels in the convolution layer, and α ^* is the scale coefficient mentioned above.

本实施例中，

中sign()函数在位置0处是不可导的，因此在无法直接使用梯度反向传播对二值化的权值进行更新。使用对应的全精度权值的梯度作为二值化权值的梯度的近似，

是全精度权值W的近似，则损失函数l对第i个权值W_i的梯度为

对其中不可导的sign()函数，使用

近似，其中

表示当|W_i|≤1时其值为1，否则为0。将(1)中的两个并列二值权重B₁、B₂代入上式，全精度权值W的近似估计

由于(3)中的尺度系数α同时依赖于

和

因此在计算W₁中第i个卷积核

的梯度时要同时考虑其他卷积核W₁ ^j和

的影响，其中

和W₁ ^j是同一个卷积层中不同的卷积核，

是与

并列卷积层中的卷积核。则损失函数l对权值

的梯度为In this embodiment,

The sign() function is non-derivative at position 0, so the binarized weights cannot be updated directly using gradient backpropagation. Use the gradient of the corresponding full-precision weight as an approximation to the gradient of the binarized weight,

is an approximation of the full-precision weight W, then the gradient of the loss function l to the _ith weight Wi is:

For the non-derivable sign() function, use

approximately, where

Indicates that the value is 1 when |W _i |≤1, and 0 otherwise. Substitute the two parallel binary weights B ₁ and B ₂ in (1) into the above formula, the approximate estimation of the full-precision weight W

Since the scale coefficient α in (3) also depends on

and

Therefore, in calculating the ith convolution kernel in W ₁

The gradients of other convolution kernels W ₁ ^j and

the impact of which

and W ₁ ^j are different convolution kernels in the same convolution layer,

With

Convolution kernels in parallel convolutional layers. Then the loss function l pairs the weights

The gradient of is

其中

分别是(1)的两个并列的卷积核。in

are the two parallel convolution kernels of (1).

2、网络优化2. Network optimization

基于训练数据对所述中间网络进行训练，得到优化后的中间网络；将所述优化后的中间网络中每个异化卷积操作层中两个子层的二值化权重相加得到一个三值权重，得到三值化的卷积神经网络。The intermediate network is trained based on the training data to obtain an optimized intermediate network; a ternary weight is obtained by adding the binarized weights of the two sublayers in each alienated convolution operation layer in the optimized intermediate network , get the ternary convolutional neural network.

本实施例的三值化的卷积神经网络，每个所述卷积操作层之前设置有激活量化层Ternarize()，对该层的输入X_i进行三值化处理：In the ternary convolutional neural network of this embodiment, an activation quantization layer Ternarize() is set before each convolution operation layer, and the input X _i of this layer is subjected to ternarization processing:

其中，

为三值化处理的输出。in,

is the output of the ternarization process.

将激活值进行三值化，使得三值权重和三值激活之间的卷积运算可以由低能耗的位运算代替。The activation values are ternarized so that the convolution operation between the ternary weights and the ternary activations can be replaced by low-energy bit operations.

网络训练完成后，得到由两个并列二值权重组成的网络模型。将两个并列的二值权重B₁和B₂进行相加，得到三值权重T，再与三值化的输入X进行卷积操作得到输出Y：

After the network training is completed, a network model consisting of two parallel binary weights is obtained. Add the two parallel binary weights B ₁ and B ₂ to obtain the ternary weight T, and then perform the convolution operation with the ternary input X to obtain the output Y:

图4示例性地给出了用二维卷积表示的二值卷积核相加得到三值卷积核的过程。图4上半部分是所述两个并列的二值权重与输入进行卷积的过程，输入分别与二值卷积核B₁、B₂进行卷积，得到中间结果“输出1”和“输出2”，将结果相加得到最终输出结果；下半部分是相加得到的三值权重与输入进行卷积的过程。两者得到的结果是相等的。在将网络模型部署时，只需要保留图4下半部分的三值卷积核。FIG. 4 exemplarily shows the process of adding two-valued convolution kernels represented by two-dimensional convolution to obtain a three-valued convolution kernel. The upper part of Fig. 4 is the process of convolving the two parallel binary weights with the input. The input is convolved with the binary convolution kernels B ₁ and B ₂ respectively, and the intermediate results "Output 1" and "Output 1" are obtained. 2", add the results to get the final output result; the lower part is the process of convolving the three-valued weights obtained by the addition with the input. The results obtained by both are equal. When deploying the network model, only the three-valued convolution kernel in the lower half of Figure 4 needs to be retained.

对于卷积层和全连接层，三值化权重和激活后的卷积运算可以使用位运算代替，因此可以显著地降低运算开销，提高运行速度。For convolutional layers and fully-connected layers, the ternary weights and activated convolution operations can be replaced by bit operations, which can significantly reduce the computational overhead and improve the running speed.

本发明实例通过对深度卷积神经网络的权值和激活进行三值化，把权值和激活由32位的浮点数转换为3个整数值，对权值用2bit格式进行存储达到压缩网络模型的目的。同时卷积运算也由原来的浮点型乘加构成的运算转换为位运算代替，达到加速网络前向推理速度的目的。In the example of the present invention, the weights and activations of the deep convolutional neural network are ternarized, the weights and activations are converted from 32-bit floating point numbers to 3 integer values, and the weights are stored in a 2-bit format to compress the network model. the goal of. At the same time, the convolution operation is also replaced by the original floating-point multiplication and addition operation, which is replaced by a bit operation, so as to achieve the purpose of accelerating the forward inference speed of the network.

本发明提供的方法可以实现对深度卷积神经网络的加速与压缩，其优势之一在于提供了一种软阈值的计算方案，不同于以往三值化方法中人为设定硬阈值的方案。在以往的三值化方案中，需要人为设定一个阈值Δ，对于给定的全精度权重X，当-Δ＜X＜Δ时，X被量化至0；当X＞Δ时，X被量化至1；当X＜-Δ时，X被量化至-1。以往方案中人为设定的硬阈值Δ为三值优化问题添加了额外的约束，是导致其三值化方法精度不高的原因之一。本发明不再受硬阈值的约束，能够自动确定三值量化中哪些位置元素量化为0：在两个并列卷积核中对应位置分别为相反数+1和-1的位置，相加后自动引入0。本发明的软阈值三值化方案相比以往的硬阈值三值化方案，约束条件被放宽，搜索空间更大，因此在分类任务上模型精度上远高于以往硬阈值方案。The method provided by the present invention can realize the acceleration and compression of the deep convolutional neural network, and one of its advantages is that it provides a calculation scheme of soft threshold, which is different from the scheme of artificially setting hard threshold in the previous ternary method. In the previous ternarization scheme, a threshold Δ needs to be set manually. For a given full-precision weight X, when -Δ<X<Δ, X is quantized to 0; when X>Δ, X is quantized to 1; when X<-Δ, X is quantified to -1. The artificially set hard threshold Δ in the previous scheme adds extra constraints to the three-valued optimization problem, which is one of the reasons for the low accuracy of its three-valued method. The present invention is no longer constrained by the hard threshold, and can automatically determine which position elements in the three-value quantization are quantized as 0: the corresponding positions in the two parallel convolution kernels are the positions of opposite numbers +1 and -1 respectively, and automatically after the addition Introduce 0. Compared with the previous hard threshold ternarization scheme, the soft threshold ternarization scheme of the present invention has relaxed constraints and a larger search space, so the model accuracy on classification tasks is much higher than the previous hard threshold scheme.

全连接层可以视为特殊的卷积层，因此全连接层也具有上述特性。对于卷积层和全连接层，三值化后的权值远远小于未经本发明实施例处理前的原权值存储大小,卷积计算复杂度得到了很大程度上的降低，因此可以显著地减低卷积神经网络权值的存储开销和卷积神经网络运行时间，进而提高了运行速度。The fully connected layer can be regarded as a special convolutional layer, so the fully connected layer also has the above characteristics. For the convolutional layer and the fully-connected layer, the weights after ternarization are far smaller than the original weights storage size before being processed by the embodiment of the present invention, and the computational complexity of convolution is greatly reduced, so it can be Significantly reduces the storage overhead of convolutional neural network weights and the running time of convolutional neural network, thereby improving the running speed.

下面以常用的ResNet18为例，对本发明方法进行简要说明：The following takes the commonly used ResNet18 as an example to briefly describe the method of the present invention:

获取图像分类中应用的ResNet18深度卷积神经网络；Get the ResNet18 deep convolutional neural network applied in image classification;

利用上述本发明实施例提供的方法对ResNet18深度卷积神经网络进行处理，得到每层有两个并列卷积核的的ResNet18深度卷积神经网络；The ResNet18 deep convolutional neural network is processed by using the method provided by the above-mentioned embodiments of the present invention to obtain a ResNet18 deep convolutional neural network with two parallel convolution kernels in each layer;

在基于梯度反传的训练过程中，对每层有两个并列卷积核的ResNet18深度卷积神经网络中的卷积层进行二值量化；In the training process based on gradient backpropagation, binary quantization is performed on the convolutional layers in the ResNet18 deep convolutional neural network with two parallel convolution kernels in each layer;

训练结束后将上述ResNet18深度卷积神经网络中的两个并列二值卷积核进行相加，得到三值化的ResNet18深度卷积神经网络。After the training, the two parallel binary convolution kernels in the above ResNet18 deep convolutional neural network are added to obtain a ternary ResNet18 deep convolutional neural network.

通过测试，经过本发明实施例提供的方法进行处理后三值化的ResNet18深度卷积神经网络占用的存储空间为减小至少为原来的16倍。在大规模图像分类任务ImageNet上的测试精度达到66.21％，为目前已知的三值网络中的最高精度。Through testing, the storage space occupied by the tertiary ResNet18 deep convolutional neural network after being processed by the method provided by the embodiment of the present invention is reduced by at least 16 times. The test accuracy on ImageNet, a large-scale image classification task, reaches 66.21%, which is the highest accuracy among the currently known ternary networks.

本发明第二实施例的一种基于软阈值三值化参数的卷积神经网络优化系统，包括构建中间网络模块和网络优化模块；A convolutional neural network optimization system based on soft threshold ternary parameters according to the second embodiment of the present invention includes an intermediate network module and a network optimization module;

所属技术领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的系统的具体工作过程及有关说明，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, for the specific working process and related description of the system described above, reference may be made to the corresponding process in the foregoing method embodiments, which will not be repeated here.

需要说明的是，上述实施例提供的基于软阈值三值化参数的卷积神经网络优化系统，仅以上述各功能模块的划分进行举例说明，在实际应用中，可以根据需要而将上述功能分配由不同的功能模块来完成，即将本发明实施例中的模块或者步骤再分解或者组合，例如，上述实施例的模块可以合并为一个模块，也可以进一步拆分成多个子模块，以完成以上描述的全部或者部分功能。对于本发明实施例中涉及的模块、步骤的名称，仅仅是为了区分各个模块或者步骤，不视为对本发明的不当限定。It should be noted that the convolutional neural network optimization system based on the soft threshold ternary parameter provided by the above embodiment is only illustrated by the division of the above functional modules. In practical applications, the above functions can be allocated as required. It is completed by different functional modules, that is, the modules or steps in the embodiments of the present invention are decomposed or combined. For example, the modules in the above embodiments can be combined into one module, and can also be further split into multiple sub-modules to complete the above description. all or part of the functions. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing each module or step, and should not be regarded as an improper limitation of the present invention.

本发明第三实施例的一种存储装置，其中存储有多条程序，所述程序适于由处理器加载并执行以实现上述的基于软阈值三值化参数的卷积神经网络优化方法。A storage device according to the third embodiment of the present invention stores a plurality of programs, and the programs are adapted to be loaded and executed by a processor to realize the above-mentioned optimization method of a convolutional neural network based on a soft threshold ternary parameter.

本发明第四实施例的一种处理装置，包括处理器、存储装置；处理器，适于执行各条程序；存储装置，适于存储多条程序；所述程序适于由处理器加载并执行以实现上述的基于软阈值三值化参数的卷积神经网络优化方法。A processing device according to a fourth embodiment of the present invention includes a processor and a storage device; the processor is adapted to execute various programs; the storage device is adapted to store multiple programs; the programs are adapted to be loaded and executed by the processor In order to realize the above-mentioned optimization method of convolutional neural network based on soft threshold ternary parameters.

所属技术领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的存储装置、处理装置的具体工作过程及有关说明，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process and relevant description of the storage device and processing device described above can refer to the corresponding process in the foregoing method embodiments, which is not repeated here. Repeat.

特别地，根据本公开的实施例，上文参考流程图描述的过程可以被实现为计算机软件程序。例如，本公开的实施例包括一种计算机程序产品，其包括承载在计算机可读介质上的计算机程序，该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中，该计算机程序可以通过通信部分从网络上被下载和安装，和/或从可拆卸介质被安装。在该计算机程序被中央处理单元(CPU)执行时，执行本申请的方法中限定的上述功能。需要说明的是，本申请上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于：具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中，计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号，其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式，包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质，该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输，包括但不限于：无线、电线、光缆、RF等等，或者上述的任意合适的组合。In particular, according to embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion, and/or installed from a removable medium. When the computer program is executed by a central processing unit (CPU), the above-mentioned functions defined in the method of the present application are performed. It should be noted that the computer-readable medium mentioned above in the present application may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In this application, a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In this application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

可以以一种或多种程序设计语言或其组合来编写用于执行本申请的操作的计算机程序代码，上述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++，还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中，远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机，或者，可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for performing the operations of the present application may be written in one or more programming languages, including object-oriented programming languages—such as Java, Smalltalk, C++, but also conventional Procedural programming language - such as the "C" language or similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).

附图中的流程图和框图，图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分，该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意，在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个接连地表示的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或操作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.

术语“第一”、“第二”等是用于区别类似的对象，而不是用于描述或表示特定的顺序或先后次序。The terms "first," "second," etc. are used to distinguish between similar objects, and are not used to describe or indicate a particular order or sequence.

术语“包括”或者任何其它类似用语旨在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备/装置不仅包括那些要素，而且还包括没有明确列出的其它要素，或者还包括这些过程、方法、物品或者设备/装置所固有的要素。The term "comprising" or any other similar term is intended to encompass a non-exclusive inclusion such that a process, method, article or device/means comprising a list of elements includes not only those elements but also other elements not expressly listed, or Also included are elements inherent to these processes, methods, articles or devices/devices.

至此，已经结合附图所示的优选实施方式描述了本发明的技术方案，但是，本领域技术人员容易理解的是，本发明的保护范围显然不局限于这些具体实施方式。在不偏离本发明的原理的前提下，本领域技术人员可以对相关技术特征作出等同的更改或替换，这些更改或替换之后的技术方案都将落入本发明的保护范围之内。So far, the technical solutions of the present invention have been described with reference to the preferred embodiments shown in the accompanying drawings, however, those skilled in the art can easily understand that the protection scope of the present invention is obviously not limited to these specific embodiments. Without departing from the principle of the present invention, those skilled in the art can make equivalent changes or substitutions to the relevant technical features, and the technical solutions after these changes or substitutions will fall within the protection scope of the present invention.

Claims

1. A convolution neural network optimization method based on soft threshold three-valued parameters is characterized by comprising two parts of constructing an intermediate network and optimizing the network:

constructing an intermediate network, comprising the steps of:

step S100, splitting each convolution operation layer of the original convolution neural network into two parallel sub-layers with the same convolution kernel size as a dissimilatory convolution operation layer; the convolution operation layer comprises a convolution layer and a full connection layer;

step S200, carrying out binarization on sub-layers in each dissimilarity convolution operation layer respectively under the constraint condition of the same equal-scale coefficient to obtain an intermediate network;

network optimization, comprising the steps of:

training the intermediate network based on training data to obtain an optimized intermediate network;

and adding the binary weights of the two sub-layers in each dissimilarity convolution operation layer in the optimized intermediate network to obtain a ternary weight, so as to obtain a ternary convolution neural network.

2. The soft-threshold tri-valued parameter-based convolutional neural network optimization method of claim 1, wherein said two sublayers are collocated in position and have the same size as the original convolutional kernel in shape.

3. The soft threshold tri-valued parameter based convolutional neural network optimization method of claim 1, wherein the convolutional operation layer comprises layers 2 to L-1 of an L-layer convolutional neural network; where L is the total number of layers of the convolutional neural network.

4. The soft-threshold tri-valued parameter-based convolutional neural network optimization method of claim 1, wherein the output of the differentiated convolutional operation layer is equal to the input of the layer, and the input of the differentiated convolutional operation layer is respectively convolved with the convolutional kernels of two parallel sublayers and then added.

5. The soft threshold ternary parameter-based convolutional neural network optimization method of claim 1, wherein in step S200, "binarize sub-layer in each dissimilatory convolutional operation layer", the method is as follows:

by minimizing the full-precision weight W₁、W₂And a binarization weight B₁、B₂Solving an optimization problem of quantization error between; wherein, W₁、W₂Full-precision weights, B, for two sublayers in a dissimilarity convolution operation layer₁、B₂Respectively, the binarization weights of two sub-layers in the dissimilarity convolution operation layer.

6. The soft-threshold tri-valued parameter-based convolutional neural network optimization method of claim 5, wherein the constraint condition for solving the optimization problem is that the constraint condition is that the equal-scale coefficients α of the binarization weights of two sub-layers in the dissimilarity convolutional operation layer₁、α₂Are equal.

7. The soft-threshold tri-valued parameter-based convolutional neural network optimization method of claim 1, wherein each convolution operation layer of the tri-valued convolutional neural network is preceded by an activation quantization layer ternizine (), and an input X to the layer is input to the activation quantization layer ternizine ()_iCarrying out ternary processing:

wherein,

is the output of the tri-valued process.

8. A convolution neural network optimization system based on soft threshold three-valued parameters is characterized by comprising a middle network building module and a network optimization module;

the intermediate network construction module is configured to construct an intermediate network by:

the network optimization module is configured to train the intermediate network based on training data to obtain an optimized intermediate network, and add the binary weights of the two sublayers in each dissimilarity convolution operation layer in the optimized intermediate network to obtain a ternary weight to obtain a ternary convolution neural network.

9. A storage device having stored therein a plurality of programs, wherein said programs are adapted to be loaded and executed by a processor to implement the soft threshold ternary parameter based convolutional neural network optimization method of claims 1-7.

10. A processing device comprising a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; characterized in that said program is adapted to be loaded and executed by a processor to implement the soft threshold ternary parameter based convolutional neural network optimization method of claims 1-7.