[go: up one dir, main page]

CN111027685A - Method for depth separable convolution and batch normalization fusion - Google Patents

Method for depth separable convolution and batch normalization fusion Download PDF

Info

Publication number
CN111027685A
CN111027685A CN201911321112.6A CN201911321112A CN111027685A CN 111027685 A CN111027685 A CN 111027685A CN 201911321112 A CN201911321112 A CN 201911321112A CN 111027685 A CN111027685 A CN 111027685A
Authority
CN
China
Prior art keywords
convolution
batch normalization
parameters
pointwise
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911321112.6A
Other languages
Chinese (zh)
Inventor
范益波
刘超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN201911321112.6A priority Critical patent/CN111027685A/en
Publication of CN111027685A publication Critical patent/CN111027685A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Processing (AREA)

Abstract

本发明属于神经网络模型技术领域,具体为一种深度可分离卷积和批规范化融合的方法。本发明首先从训练好的含有深度可分离卷积和批规范化层的神经网络模型,导出的Pointwise卷积的参数和批规范化的参数,通过特别设计的方法,重新计算出一组新的参数,用于对Pointwise卷积的权重和偏置进行赋值,修改Pointwise卷积的权重和偏置;然后删除掉原先网络结构中的批规范化层的效果,将批规范化层的计算添加在在Pointwise卷积中,得到与深度可分离卷积和批规范化等效的深度可分离卷积层,实现卷积融合批规范化的效果。本发明可有效地减少计算量。

Figure 201911321112

The invention belongs to the technical field of neural network models, in particular to a method for fusion of depthwise separable convolution and batch normalization. The present invention firstly calculates a new set of parameters through a specially designed method from the trained neural network model containing depthwise separable convolution and batch normalization layers, derived Pointwise convolution parameters and batch normalization parameters, It is used to assign the weight and bias of the Pointwise convolution, and modify the weight and bias of the Pointwise convolution; then delete the effect of the batch normalization layer in the original network structure, and add the calculation of the batch normalization layer to the Pointwise convolution , a depthwise separable convolutional layer equivalent to depthwise separable convolution and batch normalization is obtained to achieve the effect of convolution fusion batch normalization. The present invention can effectively reduce the calculation amount.

Figure 201911321112

Description

Method for depth separable convolution and batch normalization fusion
Technical Field
The invention belongs to the technical field of neural network models, and particularly relates to a method for deep separable convolution and batch normalized fusion.
Background
Neural network technology, especially lightweight neural networks, has been a hot topic of research and application. The convolution is divided into two steps, the first step is named as Depthwise convolution, the idea of grouping convolution is utilized, so that different convolution layers have no mutual calculation, only the result of single-layer convolution is calculated, and the calculation amount for realizing the convolution is greatly reduced. The second step is called Pointwise convolution, which effectively re-fuses the features learned by the first step of the Depthwise convolution, thereby realizing the defect that the Depthwise features only come from a single layer. So that the two can achieve the effect similar to the convolution of the traditional neural network as a whole. The specific implementation is typically done using a convolution with a convolution kernel of 1x 1.
In the batch normalization layer, the features learned by the middle layer of the neural network can be effectively normalized again, so that the gradient of the neural network can be effectively transmitted among multiple layers, and the training of the deep neural network becomes possible. It has four parameters, two to represent the mean and variance of the input, which are used to re-normalize the features. The other two parameters are parameters for neural network learning and are used for feature reconstruction, so that the features learned by the neural network model are not damaged. Both of them and the deep separable convolution are commonly used in the actual neural network model construction. Therefore, if the two can be fused together in the actual application, the calculation amount can be effectively reduced in the actual application.
Disclosure of Invention
The invention aims to provide a method for fusing depth separable convolution and batch normalization so as to effectively reduce the calculation amount.
The invention provides a method for fusing depth separable convolution and batch normalization, which comprises the steps of training a neural network model containing a depth separable convolution and batch normalization layer, deriving parameters of Pointwise convolution and batch normalization parameters, and recalculating a group of new parameters by a specially designed method for assigning values to the weight and bias of the Pointwise convolution and modifying the weight and bias of the Pointwise convolution; then deleting the effect of batch normalization layer in the original network structure, adding the calculation of the batch normalization layer into Pointwise convolution to obtain a depth separable convolution layer equivalent to the depth separable convolution and the batch normalization, and realizing the effect of convolution fusion batch normalization; the method comprises the following specific steps:
(1) for a trained neural network model containing deep separable convolution and batch normalization layers, which requires no nonlinear activation function between the deep separable convolution and the batch normalization layers, the weight w of Pointwise convolution of the deep separable convolution is firstly derivedpwConvAnd bias term bpwConvAnd parameters gamma, β, mean and var of the batch normalization layer, wherein gamma and β are learning parameters of the batch normalization layer, and mean and var are calculation parameters of the batch normalization layer, which are used for subsequent calculation;
(2) calculating to obtain a new Pointwise convolution parameter according to the following formula:
Figure BDA0002327174830000021
Figure BDA0002327174830000022
wherein, epsilon represents a hyper parameter for preventing dividing 0, and epsilon represents convolution calculation;
(3) will be provided with
Figure BDA0002327174830000023
And
Figure BDA0002327174830000024
weight w replacing the original Pointwise convolutionpwConvAnd bias term bpwConvDeleting the batch normalization layer in the original network to obtain a new neural network structure and corresponding weight; at this point, the depth separable convolution and batch normalization fusion is completed; by ydwConvRepresenting the output of the Depthwise convolution, ybnRepresents the normalized output of the batch, and thus is directly connected to ydwConvAnd ybn
Figure BDA0002327174830000025
(4) After the new network structure is obtained, the new network structure can be used to replace the original network structure, thereby achieving the effect of simplifying the calculation amount.
According to the invention, through the design of the method, the batch specification layer can be effectively fused into the deep separable convolution, so that the calculation amount of the neural network model in the inference stage can be reduced.
In the invention, after the model training is finished, all trained model parameters are derived, and the weight w of Pointwise convolution is weightedpwConvAnd bias term bpwConvAnd the parameters gamma, β, mean and var of the batch normalization layer are mathematically derived and calculated so that new parameters can be calculated
Figure BDA0002327174830000026
And
Figure BDA0002327174830000027
and uses it to replace the weight w of the original Pointwise convolutionpwConvAnd bias term bpwConv
In the present invention, the batch normalization layer in the original network structure is deleted, and then the weights and biases of the Pointwise convolution of the depth separable convolution of the original structure are modified by the new weights and biases.
The method can effectively reduce the calculation amount.
Drawings
FIG. 1 is a schematic view of the process of the present invention.
Detailed Description
The invention will be further described with reference to the following schematic drawings.
The starting neural network layer structure is shown in the upper half of fig. 1, which contains a depth separable convolution and batch normalization, which ends up in three parts in the schematic because the depth separable convolution contains two parts Depthwise and Pointwise. The first part is the Depthwise convolution, which is a separate convolution, all of which uses convolution kernels of three different colors convolved with the corresponding convolutional layers to represent the idea of their separate convolutions. By separating the convolutions, an output of the separating convolution is obtained. Which is fed to the Pointwise convolution as input to the Pointwise. For Pointwise convolution, which is a conventional convolution with a convolution kernel of 1x1, the convolution process is represented here using an interleaved 1x1 convolution kernel, and the effect of fusing the outputs of different Depthwise convolutions is achieved by such Pointwise convolution. After the Pointwise convolution is finished, the Pointwise output is further processed using a batch normalization layer (base normalization), which computes a Depthwise convolution with a convolution kernel of 1 × 1. This allows the data to be processed efficiently so that the counter-propagating gradient is better preserved.
It is worth noting that the method of the present invention requires that there cannot be a non-linear activation function between poitwise and batch normalization. In practical designs, the activation function is typically added after the batch normalization layer, which also ensures that the batch normalization layer performs well. After the model training is completed, the parameters of Depthwise convolution, Pointwise convolution and batch normalization are all determined and saved in the model file.
Reading the parameters from the model file, and calculating according to the formulas (1) and (2)
Figure BDA0002327174830000031
And
Figure BDA0002327174830000032
wherein the over-parameter is selected as 10-20. A neural network model B is then redesigned as shown in the lower half of fig. 1. Its structure is nearly identical to the original model structure a except that the batch normalization after each deep separable convolution is removed from the network structure, while all other network layer structures are preserved.
And for the network layers except for the Pointwise convolution, assigning the weight of the corresponding layer of the originally trained network structure A to the model B. For Pointwise convolution, the calculated
Figure BDA0002327174830000033
And
Figure BDA0002327174830000034
and assigning the weights and the bias of the Pointwise convolution so as to finish the assignment of all the parameters of the constructed new network. This results in a completely new network structure model that can be used to replace the original model for inference.
It can be easily found that the originally trained network structure A has more normalized calculation amount than the newly designed simplified model B, and the calculation amount is consistent in other places. In fact, the performance of the newly designed model is almost consistent with that of the original model, so the invention realizes the effect of saving a part of the calculated amount in the original model. And finally, replacing A with the newly designed model B, and performing inference.

Claims (1)

1. The method for consistent depth separable convolution and batch normalization fusion is characterized in that parameters of Pointwise convolution and parameters of batch normalization which are derived from a trained neural network model containing a depth separable convolution and a batch normalization layer are recalculated into a group of new parameters through a specially designed counting method, and the new parameters are used for assigning values to the weights and the offsets of the Pointwise convolution and modifying the weights and the offsets of the Pointwise convolution; then deleting the effect of batch normalization layer in the original network structure, adding the calculation of the batch normalization layer into Pointwise convolution to obtain a depth separable convolution layer equivalent to the depth separable convolution and the batch normalization, and realizing the effect of convolution fusion batch normalization; the method comprises the following specific steps:
(1) for a trained neural network model containing deep separable convolution and batch normalization layers, which requires no nonlinear activation function between the deep separable convolution and the batch normalization layers, the weight w of Pointwise convolution of the deep separable convolution is firstly derivedpwConvAnd bias term bpwConvAnd parameters gamma, β, mean and var of the batch normalization layer, wherein gamma and β are learning parameters of the batch normalization layer, and mean and var are calculation parameters of the batch normalization layer;
(2) the new Pointwise convolution parameter is calculated as follows:
Figure FDA0002327174820000011
Figure FDA0002327174820000012
wherein, epsilon represents a hyper parameter for preventing dividing 0, and epsilon represents convolution calculation;
(3) will be provided with
Figure FDA0002327174820000013
And
Figure FDA0002327174820000014
weight w replacing the original Pointwise convolutionpwConvAnd bias term bpwConvDeleting the batch normalization layer in the original network to obtain a new neural network structure and corresponding weight; at this point, the depth separable convolution and batch normalization fusion is completed; by ydwConvRepresenting the output of the Depthwise convolution, ybnRepresenting batch normalized output, and thus directly connected to ydwConvAnd ybn
Figure FDA0002327174820000015
(4) After the new network structure is obtained, the new network structure is used to replace the original network structure, thereby achieving the effect of simplifying the calculation amount.
CN201911321112.6A 2019-12-20 2019-12-20 Method for depth separable convolution and batch normalization fusion Pending CN111027685A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911321112.6A CN111027685A (en) 2019-12-20 2019-12-20 Method for depth separable convolution and batch normalization fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911321112.6A CN111027685A (en) 2019-12-20 2019-12-20 Method for depth separable convolution and batch normalization fusion

Publications (1)

Publication Number Publication Date
CN111027685A true CN111027685A (en) 2020-04-17

Family

ID=70211238

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911321112.6A Pending CN111027685A (en) 2019-12-20 2019-12-20 Method for depth separable convolution and batch normalization fusion

Country Status (1)

Country Link
CN (1) CN111027685A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113344200A (en) * 2021-06-17 2021-09-03 阿波罗智联(北京)科技有限公司 Method for training separable convolutional network, road side equipment and cloud control platform

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113344200A (en) * 2021-06-17 2021-09-03 阿波罗智联(北京)科技有限公司 Method for training separable convolutional network, road side equipment and cloud control platform
CN113344200B (en) * 2021-06-17 2024-05-28 阿波罗智联(北京)科技有限公司 Method for training separable convolutional network, road side equipment and cloud control platform

Similar Documents

Publication Publication Date Title
CN111310063B (en) Neural network-based article recommendation method for memory perception gated factorization machine
CN114463209B (en) Image restoration method based on deep multi-feature collaborative learning
CN110706214B (en) Three-dimensional U-Net brain tumor segmentation method fusing condition randomness and residual error
CN110837578B (en) A video clip recommendation method based on graph convolutional network
CN109005398B (en) A Disparity Matching Method of Stereo Image Based on Convolutional Neural Network
CN108805814A (en) Image Super-resolution Reconstruction method based on multiband depth convolutional neural networks
CN109447261B (en) A Method for Network Representation Learning Based on Multi-Order Neighborhood Similarity
CN116523015B (en) Optical neural network training method, device and equipment for process error robustness
CN111027685A (en) Method for depth separable convolution and batch normalization fusion
CN111192154B (en) A Style Transfer-Based Matching Method for Social Network User Nodes
CN118036672A (en) A Neural Network Optimization Method Based on Taylor Expansion Momentum Correction
CN116204628A (en) A Neural Collaborative Filtering Recommendation Method for Logistics Knowledge Enhanced by Knowledge Graph
CN111179188B (en) Image restoration method, model training method thereof and related device
CN109360553A (en) A Novel Delayed Recurrent Neural Network for Speech Recognition
JP2004145410A (en) Circuit design method and circuit design support system
CN114399453B (en) Facial expression synthesis method based on generation countermeasure network
CN112818502A (en) Optical mirror surface shape calculation method
CN116843609A (en) Individual image aesthetic evaluation method based on meta-shift learning
CN116108893A (en) Self-adaptive fine-tuning method, device, equipment and storage medium of convolutional neural network
CN111176892A (en) Countermeasure type searching method based on backup strategy
WO2023109229A1 (en) Production schedule generation method and apparatus, and electronic device and storage medium
CN115238868A (en) Website traffic prediction method and system based on fusion of smoothing and sharpening
CN116230112B (en) Molecular feature extraction method based on sub-structure chart convolution
WO2023279685A1 (en) Method for mining core users and core items in large-scale commodity sales
Ghimpeţeanu et al. Local denoising applied to raw images may outperform non-local patch-based methods applied to the camera output

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200417

RJ01 Rejection of invention patent application after publication