[go: up one dir, main page]

WO2023005085A1 - Method and apparatus for pruning neural network, and device and storage medium - Google Patents

Method and apparatus for pruning neural network, and device and storage medium Download PDF

Info

Publication number
WO2023005085A1
WO2023005085A1 PCT/CN2021/134336 CN2021134336W WO2023005085A1 WO 2023005085 A1 WO2023005085 A1 WO 2023005085A1 CN 2021134336 W CN2021134336 W CN 2021134336W WO 2023005085 A1 WO2023005085 A1 WO 2023005085A1
Authority
WO
WIPO (PCT)
Prior art keywords
layer
channel
input channel
module
original input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2021/134336
Other languages
French (fr)
Chinese (zh)
Inventor
尹文枫
董刚
赵雅倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
IEIT Systems Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Publication of WO2023005085A1 publication Critical patent/WO2023005085A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Definitions

  • the present application relates to the technical field of deep neural network compression and acceleration, and more specifically, to a neural network pruning method, device, device and storage medium.
  • Deep neural network compression and acceleration technology provides a solution for the long-term real-time application of deep learning in these resource-constrained devices.
  • the deep neural network compression technology achieves the purpose of reducing the storage overhead of the neural network model and improving the reasoning speed by reducing the amount of parameters and calculation of the neural network model.
  • the convolutional neural network has evolved a variety of multi-branch structures, but the pruning schemes for multi-branch structures only cut the input and output channels of the middle layer of the bottleneck structure.
  • the channel is not compressed, and the number of channels in the middle layer of the bottleneck structure of the multi-branch structure is originally smaller than the number of input and output channels of the entire module. Therefore, only compressing the middle layer of the multi-branch structure is not conducive to improving the compression ratio of the multi-branch structure. .
  • the purpose of this application is to provide a neural network pruning method, device, equipment and storage medium to improve the compression ratio of the neural network, reduce the amount of calculation required for the neural network model to perform tasks, and speed up the task processing speed of the neural network.
  • the present application provides a pruning method of a neural network, including:
  • the channel core set records the reserved input channel
  • the use of the channel compression ratio of the target network layer and the convolution kernel weight of each original input channel to determine the channel kernel set of the target network layer includes:
  • R rounds of independent sampling are performed on each original input channel by using the sampling probability of each original input channel, and a channel core set corresponding to the target network layer is generated according to the sampling result.
  • the Frobenius norm of the convolution kernel weight of the first quantity, the second quantity and each original input channel is used to determine the importance value of each original input channel, including:
  • the importance value of each original input channel is determined according to the weight coefficient of each original input channel and the initial importance function of each original input channel.
  • said reconstructing the convolution kernel of said target network layer includes:
  • Y k is the output feature map of the original convolution kernel in the output channel k
  • K is the total number of convolution kernel output channels of the target network layer
  • Wi ik represents the weight of the convolution kernel in the input channel i and output channel k, for the channel kernel set
  • the optimization function is minimized, and the weight of the convolution kernel of the target network layer is updated.
  • the present application further provides a pruning method based on the ResNet downsampling module, including the pruning method in the above scheme; wherein, the determination of the target network layer to be pruned in the neural network includes:
  • the middle layer and the output layer of the residual branch in the down-sampling module are sequentially used as the target network layer to prune the original input channels of the middle layer and the output layer, and reconstruct the volume of the middle layer and the output layer. Accumulation;
  • the pruning method also includes:
  • the present application further provides a pruning method based on the ResNe residual module, including the pruning method in the above scheme; wherein, the ResNet includes stacked N residual modules, and the determined neural
  • the target network layers to be pruned in the network include:
  • the intermediate layer and the output layer of each residual module in the N residual modules are sequentially used as the target network layer to prune the original input channels of the intermediate layer and the output layer, and reconstruct the intermediate layer and the output layer
  • the convolution kernel
  • the pruning method further includes:
  • the original input channel of the input layer of each residual module is pruned by using the screening result of the second input channel, and the residual module
  • the input layer of the input layer is used as the target network layer to perform convolution kernel reconstruction; the original output channel of the output layer of the residual module is pruned using the screening result of the second input channel, and the output layer of the residual module is pruned Convolution kernel reconstruction is performed as the target network layer.
  • the present application further provides a pruning method based on SqueezeNet, including the pruning method in the above scheme; wherein, if the target Fire module of the SqueezeNet is pruned, then in the determined neural network
  • the target network layers to be pruned include:
  • the Squeeze layer of the next Fire module of the target Fire module is used as the target network layer to prune with the original input channel of the Squeeze layer of the next Fire module, and reconstruct the volume of the Squeeze layer of the next Fire module Accumulation;
  • the pruning method also includes:
  • the original output channels of the convolution kernels of different sizes in the Expand layer of the target Fire module are pruned;
  • the present application further provides a neural network pruning device, including:
  • the network layer determination module is used to determine the target network layer to be pruned in the neural network
  • a channel kernel set determination module used to determine the channel kernel set of the target network layer by using the channel compression ratio of the target network layer and the convolution kernel weight of each original input channel; wherein, the channel kernel set records In order to reserve the input channel;
  • a channel pruning module configured to prune the original input channel of the target network layer according to the channel core set
  • the convolution kernel reconstruction module is used to reconstruct the convolution kernel of the target network layer.
  • an electronic device including:
  • a processor configured to implement the steps of the above pruning method when executing the computer program.
  • the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the above pruning method are implemented.
  • a neural network pruning method includes: determining the target network layer to be pruned in the neural network; using the channel compression ratio of the target network layer and the volume of each original input channel
  • the product kernel weight determines the channel kernel set of the target network layer; among them, the reserved input channel is recorded in the channel kernel set; the original input channel of the target network layer is pruned according to the channel kernel set, and the convolution of the target network layer is reconstructed nuclear.
  • this scheme prunes the neural network
  • the network layer to be pruned can be used as the target network layer for channel pruning and convolution kernel reconstruction. Therefore, when this scheme compresses the multi-branch structure, it does not only Limited to the middle layer, it can also compress the input layer, output layer, downsampling layer and other network layers, which greatly improves the compression ratio of the neural network, reduces the amount of calculation required by the neural network model to perform tasks, and speeds up the tasks of the neural network Processing speed; and, this scheme is a data-independent pruning method, which is beneficial to maintain the robustness of the compressed neural network.
  • This scheme is also an asynchronous channel pruning method, which can be implemented in different network layers of the neural network Pruning with different sparse granularity improves the flexibility of compression; this application also discloses a neural network pruning device, equipment and storage medium, which can also achieve the above technical effects.
  • FIG. 1 is a schematic flow diagram of a pruning method for a neural network disclosed in an embodiment of the present application
  • FIG. 2 is a schematic structural diagram of a downsampling module disclosed in an embodiment of the present application
  • Fig. 3 is a schematic diagram of the overall pruning process of the downsampling module disclosed in the embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of the residual module disclosed in the embodiment of the present application.
  • Fig. 5 is a schematic diagram of the overall process of residual module pruning disclosed in the embodiment of the present application.
  • Fig. 6 is a schematic structural diagram of the Fire module disclosed in the embodiment of the present application.
  • Fig. 7 is a schematic diagram of the overall flow of Fire module pruning disclosed in the embodiment of the present application.
  • Fig. 8 is a schematic structural diagram of a neural network pruning device disclosed in an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of an electronic device disclosed in an embodiment of the present application.
  • This application proposes an asynchronous pruning method for convolutional neural networks based on the kernel set theory, which can use the selection criteria based on the kernel set theory layer by layer in the forward reasoning process of the neural network to cut channels, and optimize the feature map weight
  • the structure error directly obtains the new weight of the compressed convolution kernel, and an asynchronous compression process is designed for multi-branch structures such as the ResNet residual module. branch.
  • the core set theory described in this embodiment specifically refers to offline and streaming corset constructions (OSCC), which provides support for the derivation and proof of various formulas in the core set construction process , this scheme is modified on the basis of OSCC to realize the construction of the channel core set.
  • OSCC offline and streaming corset constructions
  • the method described in this application performs filter-level pruning on each convolutional layer of the image recognition network trained on the classified image.
  • the method of this application is implemented in the form of asynchronous channel pruning Filter-level pruning instead of direct filter-level pruning.
  • the convolution kernel is defined as a four-dimensional tensor K ⁇ C ⁇ H ⁇ W with K output channels, C input channels and size H ⁇ W, where K ⁇ 1 ⁇ H ⁇ W corresponds to a specific
  • the convolution kernel parameters of the input channel, 1 ⁇ C ⁇ H ⁇ W correspond to the convolution kernel parameters of a specific output channel.
  • pruning a convolution kernel’ s input and output channels in an asynchronous manner, the filter 1 can be realized.
  • ⁇ 1 ⁇ H ⁇ W pruning The method described in this application adopts kernel set theory when selecting channels to be pruned, constructs a kernel set for input or output channels through weighted random sampling, and realizes data-independent channel pruning to ensure the robustness of the compressed image recognition network.
  • Fig. 1 a schematic flow chart of a neural network pruning method provided in the embodiment of the present application; referring to Fig. 1, the pruning method specifically includes the following steps:
  • the neural network in this embodiment can be a convolutional neural network applied to tasks such as image recognition. Since the parameters of the neural network have great redundancy, the convolutional neural network is pruned through this scheme. Branches can reduce the amount of calculation required for image recognition and speed up image recognition.
  • the target network layer in this embodiment can be any network layer to be pruned in the neural network, can be a network layer in a single-branch network structure, or can be a network layer in a multi-branch structure, such as: input layer, middle layer, The output layer, down-sampling layer, etc. are not specifically limited here.
  • the channel core set records the reserved input channels
  • the channel compression ratio in this embodiment is: the ratio of the number of input channels before compression to the number of input channels after compression.
  • the number of retained input channels after compression can be determined, for example: The number of original input channels of the target network layer is 50, and the channel compression ratio is 10:5, so the number of input channels that the target network layer needs to retain is 25.
  • the channel kernel set in this embodiment records the input channels that need to be reserved.
  • the specific method of determining the channel kernel set is not specifically limited here. It can be randomly selected from the original input channel, or it can be selected according to the convolution kernel of the target network layer.
  • the weight value can be selected, or it can be selected according to the importance value of each original input channel and so on.
  • the original input channels recorded in the channel core set need to be retained, and the original input channels not recorded in the channel core set need to be removed, so as to achieve the original input channel pruning.
  • the convolution kernel of the target network layer needs to be reconstructed to update the weight of the convolution kernel.
  • the convolution kernel reconstruction process does not require retraining the network, and only implemented during the forward inference of the network.
  • Data-independent pruning algorithms can provide robust compression for arbitrary test data.
  • DINP Data-Independent Neural Pruning via Coresets
  • OSCC offline and streaming corset constructions
  • DINP gives the neuron importance measurement rules based on the upper bound of the activation function value and the sampling probability of the kernel set.
  • the input data of the network layer and the weighting coefficient of each neuron sampled Computation and allocation are independent, so the neuron kernel set constructed by DINP has the property of being independent of data.
  • channel pruning is performed through convolution kernel weights. Therefore, this scheme is a data-independent pruning method, which is conducive to maintaining the robustness of the compressed neural network. sex.
  • this scheme discloses a structured pruning scheme.
  • the scheme can use the network layer to be pruned as the target network layer for channel pruning and convolution kernel reconstruction. It can compress the single-branch structure, and can also compress the multi-branch structure; when this scheme compresses the multi-branch structure, it is not limited to the middle layer of the module, but also the input layer, output layer, and downsampling of the module.
  • Layers and other network layers are compressed, so as to realize the pruning of the input channel and output channel of the entire module, which greatly improves the compression ratio of the neural network, reduces the amount of calculation required for the neural network model to perform tasks, and speeds up the task processing of the neural network speed; and, this solution prunes different network layers in an asynchronous manner, so the asynchronous channel pruning method disclosed in this solution can realize pruning with different sparse granularity at different network layers, improving the flexibility of compression , based on channel pruning at some network layers, filter-level pruning at other network layers.
  • the process of determining the channel kernel set of the target network layer by using the channel compression ratio of the target network layer and the convolution kernel weight of each original input channel specifically includes:
  • Step 1 Determine the second number of reserved input channels of the target network layer according to the first number of original input channels of the target network layer and the channel compression ratio;
  • Step 2 Using the first quantity, the second quantity and the Frobenius norm of the convolution kernel weight of each original input channel to determine the importance value of each original input channel;
  • Step 3 Determine the sampling probability of each original input channel according to the importance value of each original input channel
  • Step 4 Use the sampling probability of each original input channel to perform R rounds of independent sampling for each original input channel, and generate a channel core set corresponding to the target network layer according to the sampling results.
  • the first number of the original input channels of the target network layer can also be understood as the number of output channels of the l-1-th layer network layer, in this embodiment Among them, the first number of original input channels of the target network layer is represented by n l-1 , and the second number of reserved input channels of the target network layer is represented by a l .
  • the first number and the second number are used to determine the first number of weighting coefficients, and each weight coefficient is determined according to the convolution kernel weight value of each original input channel.
  • the Frobenius norm value of the original input channel and assign a corresponding weighting coefficient to each original input channel according to the Frobenius norm value of each original input channel;
  • the weight coefficient assigned to the original input channel with the larger Frobenius norm value is larger, and then the importance value of each original input channel is determined according to the weight coefficient of each original input channel and the initial importance function of each original input channel.
  • n l-1 weighting coefficients w i (x) can be calculated by the following formula:
  • w i (x) is the i-th weighting coefficient, which is the weighting coefficient constructed by the non-uniform sampling of each channel, and the w i (x) has no correlation with the original input channel, and
  • n l-1 weighting coefficients w i (x) with n l-1 original input channels
  • it is necessary to calculate the n l-1 original input channel convolution kernel weights of the l-th layer convolution kernel The Frobenius norm value of the value, sort the Frobenius norm values of each original input channel according to the numerical order, and sort the n l-1 weighting coefficients according to the numerical order, according to the sorting of the two, assign the weighting coefficient with the larger value Give the original input channel with a larger Frobenius norm value.
  • the importance value of each original input channel can be calculated according to the following formula:
  • s i (x) is the importance function of the i-th original input channel, which is used to determine the importance value of the i-th original input channel
  • g i (x) is the i-th original input channel
  • Linear point-wise convolution layer linear point-wise convolution layer
  • N is the total number of input channels of the linear point convolution layer;
  • w i (x) is the weighting coefficient of the i-th original input channel.
  • t is the sum of all channel importances, namely
  • the process of reconstructing the convolution kernel of the target network layer is to update the weight of the convolution kernel process, which first uses the channel kernel set to create an optimization function; the optimization function is:
  • Y k is the output feature map of the original convolution kernel in the output channel k
  • K is the total number of convolution kernel output channels of the target network layer
  • Wi ik represents the weight of the convolution kernel in the input channel i and output channel k, for the channel kernel set
  • the current neural network pruning algorithm designs heuristic screening rules to select parameters, channels, filters and other objects for pruning, and compensates for the loss of accuracy caused by compression through training or feature map reconstruction.
  • the evaluation criteria of the neural network pruning algorithm also has construction efficiency. Therefore, the pruning algorithm using the feature map reconstruction method in this embodiment can improve the neural network compression process. own operating efficiency.
  • this scheme uses the characteristics of the convolution kernel instead of the characteristics of the feature map to construct the sampling probability of channel sampling, so it realizes the data-independent pruning method, which is conducive to maintaining the compressed The robustness of the neural network; and, this scheme reconstructs the convolution kernel by minimizing the output feature map reconstruction error corresponding to the channel kernel set, that is, the convolution kernel reconstruction is realized during the forward reasoning process of the network without the need for Retrain the network, save the reconstruction time of the convolution kernel, and improve the operating efficiency of the neural network compression process itself.
  • the structured pruning method described in the above embodiment can be directly used to prune each network layer in a single-branch structure, but for For the multi-branch structure, in addition to pruning each network layer in the multi-branch structure using the structured pruning method described in the above embodiment, additional processing is required according to the special features of the multi-branch structure.
  • the pruning process of the multi-branch structure is illustrated by taking the down-sampling module and residual module of ResNet and the Fire module of SqueezeNet as examples.
  • the solution needs to use the middle layer and the output layer of the residual branch in the down-sampling module as the target network layer in turn, and continue Execute S102 to S103 to prune the original input channels of the middle layer and the output layer, and reconstruct the convolution kernels of the middle layer and the output layer;
  • the pruning method further includes: using the channel compression ratio of the input layer of the down-sampling module, randomly sampling the original input channel of the input layer to obtain the first Input channel screening results; use the first input channel screening results to prune the input layer of the down-sampling module and the original input channel of the down-sampling layer, and use the input layer and down-sampling layer of the down-sampling module as the target network layer for convolution Accumulation refactoring.
  • the pruning method described in the above embodiments is referred to as the pruning method based on kernel set theory, and this solution is applicable to any network layer in the downsampling module, residual module, and Fire module
  • the process of reconstructing the convolution kernel is the same as the process of reconstructing the target network layer described in the above embodiment, and will not be repeated here.
  • FIG 2 it is a schematic structural diagram of the down-sampling module provided by the embodiment of the present application.
  • the middle layer, output layer and input layer in Figure 2 belong to the residual branch.
  • this solution When compressing the down-sampling module, this solution first adopts the above-mentioned embodiment
  • the described pruning method based on kernel set theory performs input channel pruning on the middle layer and the output layer of the residual branch, and reconstructs the corresponding convolution kernel, and then the input layer and downsampling of the residual branch
  • the input channel of the layer is pruned according to the pruning method based on random sampling, and the convolution kernel is reconstructed; among them, the pruning method based on random sampling specifically refers to: according to the corresponding channel compression ratio, the original input channel is once Random sampling with equal probability to obtain the corresponding input channel screening results and perform pruning.
  • this process will be referred to as the pruning method based on random sampling for short, and the process will not be described in detail.
  • FIG. 3 it is a schematic diagram of the overall flow chart of the pruning of the down-sampling module provided in the embodiment of the present application.
  • the overall flow chart specifically includes the following steps:
  • the pruning method based on kernel set theory screens and prunes the original input channel of the middle layer of the residual branch, and reconstructs the convolution kernel of the output layer;
  • the pruning method based on kernel set theory screens and prunes the original input channel of the output layer of the residual branch, and reconstructs the convolution kernel of the output layer;
  • the channel kernel set can be directly applied to the downsampling layer.
  • the downsampling layer of the module does not need to be recalculated.
  • the middle layer and the output layer of each residual module in the N residual modules need to be pruned Serve as the target network layer in turn, and continue to execute S102 to S103, so as to prune the original input channel of the middle layer and the output layer, and reconstruct the convolution kernel of the middle layer and the output layer;
  • the residual module after pruning the original input channels of the intermediate layer and the output layer, and reconstructing the convolution kernels of the intermediate layer and the output layer, it also includes: using the residual The channel compression ratio of the input layer of the difference module is used to randomly sample the original input channel of the input layer of the residual module to obtain the second input channel screening result; for each residual module in the N residual modules, use the first The original input channel of the input layer of each residual module is pruned according to the screening result of the second input channel, and the input layer of the residual module is used as the target network layer for convolution kernel reconstruction; The original output channel of the output layer of the difference module is pruned, and the output layer of the residual module is used as the target network layer for convolution kernel reconstruction.
  • the residual module includes: an input layer, an intermediate layer, an output layer and a shortcut; and, Figure 4 only records a residual module structure.
  • the difference module is composed of multiple structures as shown in FIG. 4 .
  • the input layers of multiple residual modules can share the channel core set obtained by the pruning method based on random sampling, and keep the channels with the same number.
  • the difference module is compressed, the input channels of the intermediate layer and the output layer of the stacked residual modules are firstly pruned based on the kernel set theory and the convolution kernel is reconstructed, and then the pruning method is based on random sampling.
  • a residual module obtains the filtering result of the input channel, and uses the filtering result of the input channel to prune and reconstruct the convolution kernel of the input layer and output layer of multiple stacked residual modules in turn, that is, the residual
  • the output layer of the difference module performs two kernel reconstructions.
  • FIG. 5 it is a schematic diagram of the overall process flow of the residual module pruning provided by the embodiment of the present application.
  • the overall flow chart specifically includes the following steps:
  • S301 A pruning method based on kernel set theory, screening and pruning the original input channel of the middle layer of the i-th residual module, and reconstructing the convolution kernel of the middle layer;
  • the original input channel of the input layer of the i'th residual module is screened to obtain the screening result of the second input channel; the initial value of i' is 1;
  • S306 is to prune the original output channels of the shortcut branch of the output layer. Since the output channels of the output layer change, the output layer is then pruned according to the number of channels after the change. The convolution kernel is reconstructed.
  • this solution needs to use the Squeeze layer of the next Fire module of the target Fire module as the target network layer when executing S101 in the above embodiment, and continue to execute S102 to S103 , to use the original input channel of the Squeeze layer of the next Fire module for pruning, and reconstruct the convolution kernel of the Squeeze layer of the next Fire module;
  • the pruning method also includes:
  • the original output channels of the convolution kernels of different sizes in the Expand layer of the target Fire module are pruned;
  • the original input channel of the Expand layer of the target Fire module is randomly sampled to obtain the third input channel screening result;
  • FIG. 6 it is a schematic structural diagram of the Fire module provided by the embodiment of the present application.
  • the Fire module includes: a Squeeze layer and an Expand layer; and, the Expand layer has two convolution kernels of different sizes.
  • the process of channel pruning for the Fire module in this solution is as follows: the pruning method based on kernel set theory prunes and reconstructs the original input channel of the Squeeze layer of the Fire module i+1, and then reconstructs the convolution kernel of the Fire module i.
  • the output channel of the expand layer is pruned, and then the input channel of the expand layer of the Fire module i is pruned and the convolution kernel is reconstructed based on the random sampling pruning method.
  • the input channel screening result of the squeeze layer of the i+1 Fire module is divided into two parts, corresponding to the 3 ⁇ 3 volumes of the expand layer of the i-th Fire module
  • FIG. 7 it is a schematic diagram of the overall process flow of Fire module pruning provided by the embodiment of the present application.
  • the overall flow chart specifically includes the following steps:
  • the original input channel of the expand layer of Fire module i is screened to generate a channel core set, and the input of convolution kernels of different sizes in the expand layer of Fire module i is input according to the channel core set.
  • Channels are pruned and convolution kernels of different sizes are reconstructed.
  • the channel pruning method described in this scheme implements filter-level pruning through input and output channel pruning in an asynchronous manner, and designs the pruning process according to the multi-branch structure characteristics of ResNet and SqueezeNet, which solves the problem of When compressing ResNe and SqueezeNett, the existing pruning method only prunes the input and output channels of the residual module, downsampling module and the middle layer of the Fire module, and does not compress the input and output channels of the entire module. A higher compression ratio can be achieved, the calculation amount of the network in the forward reasoning process can be reduced, and pruning with different sparsity granularity can be realized in different layers of the network.
  • the channel selection rule based on kernel set theory designed by the method of the present application has data-independent characteristics, which is conducive to maintaining the robustness of the compressed image recognition network.
  • this solution can also be deployed in FPGA-based neural network acceleration applications or AI acceleration chip software platforms.
  • This solution can prune the image recognition network with a high compression ratio, reducing the amount of calculation of the image recognition network in real-time image classification applications.
  • This technical solution can also be extended and applied to the compression of the backbone network of the target detection network, such as YOLO or Faster RCNN.
  • the pruning device, equipment and medium provided in the embodiments of the present application are introduced below, and the pruning device, equipment and medium described below and the pruning method described above can be referred to each other.
  • FIG. 8 a schematic structural diagram of a neural network pruning device provided in an embodiment of the present application. It can be seen from FIG. 8 that the device includes:
  • the network layer determination module 11 is used to determine the target network layer to be pruned in the neural network
  • the channel kernel set determination module 12 is used to determine the channel kernel set of the target network layer by using the channel compression ratio of the target network layer and the convolution kernel weight of each original input channel; wherein, in the channel kernel set Records reserved input channels;
  • a channel pruning module 13 configured to prune the original input channel of the target network layer according to the channel core set
  • the convolution kernel reconstruction module 14 is configured to reconstruct the convolution kernel of the target network layer.
  • the channel core set determination module includes:
  • a first determining unit configured to determine a second number of reserved input channels of the target network layer according to the first number of original input channels of the target network layer and the channel compression ratio;
  • the second determination unit is used to determine the importance value of each original input channel by using the first quantity, the second quantity and the Frobenius norm of the convolution kernel weight of each original input channel;
  • the third determination unit is used to determine the sampling probability of each original input channel according to the importance value of each original input channel
  • the channel core set generating unit is used to perform R rounds of independent sampling on each original input channel by using the sampling probability of each original input channel, and generate a channel core set corresponding to the target network layer according to the sampling result.
  • the second determination unit includes:
  • a weighting coefficient determining subunit configured to determine the first number of weighting coefficients by using the first number and the second number
  • the weighting coefficient allocation subunit is used to determine the weight of each original input channel according to the convolution kernel weight of each original input channel
  • the importance value determining subunit is configured to determine the importance value of each original input channel according to the weighting coefficient of each original input channel and the initial importance function of each original input channel.
  • the convolution kernel reconstruction module includes:
  • a function creation unit configured to utilize the channel core set to create an optimization function; the optimization function is:
  • Y k is the output feature map of the original convolution kernel in the output channel k
  • K is the total number of convolution kernel output channels of the target network layer
  • Wi ik represents the weight of the convolution kernel in the input channel i and output channel k, for the channel kernel set
  • a weight updating unit configured to minimize the optimization function, and update the weight of the convolution kernel of the target network layer.
  • the network layer determination module is specifically used to: use the middle layer and the output layer of the residual branch in the down-sampling module as the target network layer in turn, with Prune the original input channel of the middle layer and the output layer, and reconstruct the convolution kernel of the middle layer and the output layer;
  • the device also includes:
  • the first screening module is configured to use the channel compression ratio of the input layer of the down-sampling module to randomly sample the original input channel of the input layer to obtain the first input channel screening result;
  • the channel pruning module is also used to: use the screening result of the first input channel to prune the input layer of the down sampling module and the original input channel of the down sampling layer;
  • the convolution kernel reconstruction module is also used for: reconstructing the convolution kernels of the input layer and the down-sampling layer.
  • the network layer determination module is specifically used to: use the intermediate layer and the output layer of the N residual modules as the target network layer in turn, so as to The original input channels of the middle layer and the output layer are pruned, and the convolution kernels of the middle layer and the output layer are reconstructed;
  • the device also includes:
  • the second screening module is used to pruning the original input channel of the middle layer and the output layer, and after reconstructing the convolution kernel of the middle layer and the output layer, using the channel compression ratio of the input layer of the residual module, the residual
  • the original input channel of the input layer of the difference module is randomly sampled to obtain the screening result of the second input channel;
  • the channel pruning module is also used to: use the screening results of the second input channel to sequentially prune the original input channels of the input layer of the N residual modules, and prune the original output channels of the output layer of the N residual modules pruning;
  • the convolution kernel reconstruction module is also used to: reconstruct the convolution kernel of the input layer, and reconstruct the convolution kernel of the output layer.
  • the network layer determination module is specifically used to: use the Squeeze layer of the next Fire module of the target Fire module as the target network layer to utilize the next Fire
  • the original input channel of the Squeeze layer of the module is pruned, and the convolution kernel of the Squeeze layer of the next Fire module is reconstructed;
  • the device also includes:
  • the second screening module is used to utilize the channel compression ratio of the Expand layer of the target Fire module to randomly sample the original input channel of the Expand layer of the target Fire module to obtain the third input channel screening result;
  • the channel pruning module is also used to: use the third input channel screening result to prune the original input channels of the convolution kernels of different sizes in the Expand layer of the target Fire module;
  • the convolution kernel reconstruction module is also used for: reconstructing convolution kernels of different sizes in the Expand layer of the target Fire module.
  • FIG. 9 it is a schematic structural diagram of an electronic device disclosed in the embodiment of the present application. It can be seen from FIG. 9 that the electronic device includes:
  • Memory 21 used to store computer programs
  • the processor 22 is configured to implement the steps of the neural network pruning method described in any method embodiment above when executing the computer program.
  • the device may be a PC (Personal Computer, personal computer), or may be a terminal device such as a smart phone, a tablet computer, a palmtop computer, or a portable computer.
  • PC Personal Computer
  • terminal device such as a smart phone, a tablet computer, a palmtop computer, or a portable computer.
  • the device may include a memory 21 , a processor 22 and a bus 23 .
  • the memory 21 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card type memory (for example, SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc.
  • the storage 21 may be an internal storage unit of the device in some embodiments, such as a hard disk of the device.
  • Memory 21 may also be an external storage device of the device in other embodiments, such as a plug-in hard disk equipped on the device, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, a flash memory card (Flash Card) etc.
  • the memory 21 may also include both an internal storage unit of the device and an external storage device.
  • the memory 21 can not only be used to store application software and various data installed in the device, such as program codes for executing the pruning method, but also can be used to temporarily store data that has been output or will be output.
  • Processor 22 can be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor or other data processing chip in some embodiments, is used for running the program code stored in memory 21 or processing Data, such as the program code to execute the pruning method, etc.
  • CPU Central Processing Unit
  • controller microcontroller
  • microprocessor or other data processing chip in some embodiments, is used for running the program code stored in memory 21 or processing Data, such as the program code to execute the pruning method, etc.
  • the bus 23 may be a peripheral component interconnect standard (PCI for short) bus or an extended industry standard architecture (EISA for short) bus or the like.
  • PCI peripheral component interconnect standard
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in FIG. 9 , but it does not mean that there is only one bus or one type of bus.
  • the device can also include a network interface 24, and the network interface 24 can optionally include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are usually used for communication between the device and other electronic devices Establish a communication connection.
  • a network interface 24 can optionally include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are usually used for communication between the device and other electronic devices Establish a communication connection.
  • the device may further include a user interface 25, which may include a display (Display), an input unit such as a keyboard (Keyboard), and the optional user interface 25 may also include a standard wired interface and a wireless interface.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, and the like.
  • the display may also be properly referred to as a display screen or a display unit, and is used for displaying information processed in the device and for displaying a visualized user interface.
  • FIG. 9 only shows a device with components 21-25. Those skilled in the art can understand that the structure shown in FIG. 9 does not constitute a limitation on the device, and may include fewer or more components than shown in the figure. Or combine certain components, or different component arrangements.
  • the embodiment of the present application also discloses a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the clipping of the neural network described in any of the above-mentioned method embodiments is realized.
  • the steps of the stick method are also disclosed.
  • the storage medium may include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, etc., which can store program codes. medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

A method and apparatus for pruning a neural network, and a device and a storage medium. By means of the method, when a neural network is pruned, a network layer to be pruned can be taken as a target network layer for channel pruning and convolution kernel reconstruction. Therefore, by means of the method, when a multi-branch structure is compressed, compression is not just limited to an intermediate layer, network layers such as an input layer, an output layer and a down-sampling layer can also be compressed, thereby greatly increasing the compression ratio of the neural network, reducing the calculation amount required by a neural network model to execute a task, and increasing the task processing speed of the neural network. In addition, the method is a data-independent asynchronous channel pruning method, and facilitates the maintaining of the robustness of a compressed neural network; and by means of the method, the pruning of different sparse granularities on different network layers of the neural network can also be realized, and the flexibility of compression can also be improved.

Description

一种神经网络的剪枝方法、装置、设备及存储介质A neural network pruning method, device, equipment and storage medium

本申请要求在2021年7月29日提交中国专利局、申请号为202110866324.3、发明名称为“一种神经网络的剪枝方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application submitted to the China Patent Office on July 29, 2021, with the application number 202110866324.3, and the title of the invention is "a neural network pruning method, device, equipment and storage medium", all of which The contents are incorporated by reference in this application.

技术领域technical field

本申请涉及深度神经网络压缩与加速技术领域,更具体地说,涉及一种神经网络的剪枝方法、装置、设备及存储介质。The present application relates to the technical field of deep neural network compression and acceleration, and more specifically, to a neural network pruning method, device, device and storage medium.

背景技术Background technique

随着神经网络在深度和宽度的增加,其在各种AI(Artificial Intelligence,人工智能)应用场景展现了卓越的性能,例如图像识别和目标检测等机器视觉任务。面对基于深度学习的机器视觉软件在各种嵌入式设备或移动设备的发展趋势,拥有巨大参数量的深度神经网络在计算和存储资源有限的设备中难以得到部署。深度神经网络压缩与加速技术为深度学习在这些资源受限设备中得到长期实时应用提供了解决途径。深度神经网络压缩技术通过减少神经网络模型的参数量、计算量,达到减少神经网络模型的存储开销、提高推理速度的目的。With the increase in depth and width of the neural network, it has shown excellent performance in various AI (Artificial Intelligence, artificial intelligence) application scenarios, such as machine vision tasks such as image recognition and target detection. Faced with the development trend of machine vision software based on deep learning in various embedded devices or mobile devices, deep neural networks with huge parameters are difficult to deploy in devices with limited computing and storage resources. Deep neural network compression and acceleration technology provides a solution for the long-term real-time application of deep learning in these resource-constrained devices. The deep neural network compression technology achieves the purpose of reducing the storage overhead of the neural network model and improving the reasoning speed by reducing the amount of parameters and calculation of the neural network model.

目前,卷积神经网络演化出了多样化的多分支结构,但是针对多分支结构的剪枝方案都只对瓶颈结构的中间层进行输入和输出通道的裁剪,对于整个多分支结构的输入和输出通道未作压缩处理,而多分支结构的瓶颈结构的中间层的通道数本来就小于整个模块的输入和输出通道数,因此,只压缩多分支结构的中间层不利于提高多分支结构的压缩比。At present, the convolutional neural network has evolved a variety of multi-branch structures, but the pruning schemes for multi-branch structures only cut the input and output channels of the middle layer of the bottleneck structure. For the input and output of the entire multi-branch structure The channel is not compressed, and the number of channels in the middle layer of the bottleneck structure of the multi-branch structure is originally smaller than the number of input and output channels of the entire module. Therefore, only compressing the middle layer of the multi-branch structure is not conducive to improving the compression ratio of the multi-branch structure. .

发明内容Contents of the invention

本申请的目的在于提供一种神经网络的剪枝方法、装置、设备及存储介质,以提高神经网络的压缩比,减少神经网络模型执行任务所需要的计算量,加快神经网络的任务处理速度。The purpose of this application is to provide a neural network pruning method, device, equipment and storage medium to improve the compression ratio of the neural network, reduce the amount of calculation required for the neural network model to perform tasks, and speed up the task processing speed of the neural network.

为实现上述目的,本申请提供一种神经网络的剪枝方法,包括:In order to achieve the above purpose, the present application provides a pruning method of a neural network, including:

确定神经网络中待剪枝的目标网络层;Determine the target network layer to be pruned in the neural network;

利用所述目标网络层的通道压缩比及每个原输入通道的卷积核权值确定所述目标网络层的通道核集;其中,所述通道核集内记载了保留输入通道;Using the channel compression ratio of the target network layer and the convolution kernel weight of each original input channel to determine the channel core set of the target network layer; wherein, the channel core set records the reserved input channel;

根据所述通道核集对所述目标网络层的原输入通道进行剪枝,并重构所述目标网络层的卷积核。pruning the original input channel of the target network layer according to the channel kernel set, and reconstructing the convolution kernel of the target network layer.

其中,所述利用所述目标网络层的通道压缩比及每个原输入通道的卷积核权值确定所述目标网络层的通道核集,包括:Wherein, the use of the channel compression ratio of the target network layer and the convolution kernel weight of each original input channel to determine the channel kernel set of the target network layer includes:

根据所述目标网络层的原输入通道的第一数量及所述通道压缩比,确定所述目标网络层的保留输入通道的第二数量;determining a second number of reserved input channels of the target network layer according to the first number of original input channels of the target network layer and the channel compression ratio;

利用所述第一数量、所述第二数量以及每个原输入通道的卷积核权值的Frobenius范数,确定每个原输入通道的重要性值;Using the first quantity, the second quantity and the Frobenius norm of the convolution kernel weight of each original input channel to determine the importance value of each original input channel;

根据每个原输入通道的重要性值确定每个原输入通道的采样概率;Determine the sampling probability of each original input channel according to the importance value of each original input channel;

利用每个原输入通道的采样概率对每个原输入通道进行R轮独立采样,并根据采样结果生成与所述目标网络层对应的通道核集。R rounds of independent sampling are performed on each original input channel by using the sampling probability of each original input channel, and a channel core set corresponding to the target network layer is generated according to the sampling result.

其中,所述利用所述第一数量、所述第二数量以及每个原输入通道的卷积核权值的Frobenius范数,确定每个原输入通道的重要性值,包括:Wherein, the Frobenius norm of the convolution kernel weight of the first quantity, the second quantity and each original input channel is used to determine the importance value of each original input channel, including:

利用所述第一数量及所述第二数量确定所述第一数量个加权系数;determining the first number of weighting coefficients by using the first number and the second number;

根据每个原输入通道的卷积核权值确定每个原输入通道的Frobenius范数值,并根据每个原输入通道的Frobenius范数值向每个原输入通道分配对应的加权系数;其中,Frobenius范数值越大的原输入通道分配的加权系数越大;Determine the Frobenius norm value of each original input channel according to the convolution kernel weight value of each original input channel, and assign corresponding weighting coefficients to each original input channel according to the Frobenius norm value of each original input channel; wherein, the Frobenius norm The weight coefficient assigned to the original input channel with a larger value is larger;

根据每个原输入通道的加权系数及每个原输入通道的初始重要性函数,确定每个原输入通道的重要性值。The importance value of each original input channel is determined according to the weight coefficient of each original input channel and the initial importance function of each original input channel.

其中,所述重构所述目标网络层的卷积核,包括:Wherein, said reconstructing the convolution kernel of said target network layer includes:

利用所述通道核集创建优化函数;所述优化函数为:Create an optimization function using the channel core set; the optimization function is:

Figure PCTCN2021134336-appb-000001
Figure PCTCN2021134336-appb-000001

其中,Y k是原卷积核在输出通道k的输出特征图,K是所述目标网络层的卷积核输出通道总数,

Figure PCTCN2021134336-appb-000002
代表分别计算卷积核的K个输出通道的特征图重构误差并求和,
Figure PCTCN2021134336-appb-000003
代表Frobenius范数,W ik代表卷积核在输入通道i和输出通道k的权值,
Figure PCTCN2021134336-appb-000004
为在通道核集
Figure PCTCN2021134336-appb-000005
中每个保留输入通道的输入数据x i经卷积核输出通道k的输出特征图之和,*代表卷积操作; Among them, Y k is the output feature map of the original convolution kernel in the output channel k, K is the total number of convolution kernel output channels of the target network layer,
Figure PCTCN2021134336-appb-000002
Represents the feature map reconstruction errors of the K output channels of the convolution kernel respectively calculated and summed,
Figure PCTCN2021134336-appb-000003
Represents the Frobenius norm, Wi ik represents the weight of the convolution kernel in the input channel i and output channel k,
Figure PCTCN2021134336-appb-000004
for the channel kernel set
Figure PCTCN2021134336-appb-000005
The sum of the output feature maps of the input data x i of each reserved input channel through the convolution kernel output channel k, * represents the convolution operation;

最小化所述优化函数,对所述目标网络层的卷积核的权值进行更新。The optimization function is minimized, and the weight of the convolution kernel of the target network layer is updated.

为实现上述目的,本申请进一步提供一种基于ResNet下采样模块的剪枝方法,包括上述方案中的的剪枝方法;其中,所述确定神经网络中待剪枝的目标网络层包括:In order to achieve the above purpose, the present application further provides a pruning method based on the ResNet downsampling module, including the pruning method in the above scheme; wherein, the determination of the target network layer to be pruned in the neural network includes:

将所述下采样模块中残差支路的中间层及输出层依次作为所述目标网络层,以对中间层及输出层的原输入通道进行剪枝,并重构中间层及输出层的卷积核;The middle layer and the output layer of the residual branch in the down-sampling module are sequentially used as the target network layer to prune the original input channels of the middle layer and the output layer, and reconstruct the volume of the middle layer and the output layer. Accumulation;

相应的,所述剪枝方法还包括:Correspondingly, the pruning method also includes:

利用所述下采样模块的输入层的通道压缩比,对所述输入层的原输入通道进行随机采样,获得第一输入通道筛选结果;Using the channel compression ratio of the input layer of the down-sampling module, random sampling is performed on the original input channel of the input layer to obtain the screening result of the first input channel;

利用所述第一输入通道筛选结果对所述下采样模块的输入层及下采样层的原输入通道进行剪枝,并将所述下采样模块的输入层及下采样层分别作为目标网络层进行卷积核重构。Use the screening result of the first input channel to prune the input layer of the down sampling module and the original input channel of the down sampling layer, and use the input layer and the down sampling layer of the down sampling module as the target network layer respectively Convolution kernel reconstruction.

为实现上述目的,本申请进一步提供一种基于ResNe残差模块的剪枝方法,包括上述方案中的的剪枝方法;其中,所述ResNet中包括堆叠的N个残差模块,所述确定神经网络中待剪枝的目标网络层包括:In order to achieve the above purpose, the present application further provides a pruning method based on the ResNe residual module, including the pruning method in the above scheme; wherein, the ResNet includes stacked N residual modules, and the determined neural The target network layers to be pruned in the network include:

将N个残差模块中的每个残差模块的中间层及输出层依次作为所述目标网络层,以对中间层及输出层的原输入通道进行剪枝,并重构中间层及输出层的卷积核;The intermediate layer and the output layer of each residual module in the N residual modules are sequentially used as the target network layer to prune the original input channels of the intermediate layer and the output layer, and reconstruct the intermediate layer and the output layer The convolution kernel;

相应的,所述对中间层及输出层的原输入通道进行剪枝,并重构中间层及输出层的卷积核之后,所述剪枝方法还包括:Correspondingly, after pruning the original input channels of the intermediate layer and the output layer, and reconstructing the convolution kernels of the intermediate layer and the output layer, the pruning method further includes:

利用残差模块的输入层的通道压缩比,对残差模块的输入层的原输入通道进行随机采样,获得第二输入通道筛选结果;Using the channel compression ratio of the input layer of the residual module, random sampling is performed on the original input channel of the input layer of the residual module to obtain the screening result of the second input channel;

针对所述N个残差模块中的每个残差模块,均利用所述第二输入通道筛选结果对每个残差模块的输入层的原输入通道进行剪枝,并将所述残差模块的输入层作为目标网络层进行卷积核重构;利用所述第二输入通道筛选结果对所述残差模块的输出层的原输出通道进行剪枝,并将所述残差模块的输出层作为目标网络层进行卷积核重构。For each residual module in the N residual modules, the original input channel of the input layer of each residual module is pruned by using the screening result of the second input channel, and the residual module The input layer of the input layer is used as the target network layer to perform convolution kernel reconstruction; the original output channel of the output layer of the residual module is pruned using the screening result of the second input channel, and the output layer of the residual module is pruned Convolution kernel reconstruction is performed as the target network layer.

为实现上述目的,本申请进一步提供一种基于SqueezeNet的剪枝方法,包括上述方案中的的剪枝方法;其中,若对所述SqueezeNet的目标Fire模块进行剪枝,则所述确定神经网络中待剪枝的目标网络层包括:In order to achieve the above object, the present application further provides a pruning method based on SqueezeNet, including the pruning method in the above scheme; wherein, if the target Fire module of the SqueezeNet is pruned, then in the determined neural network The target network layers to be pruned include:

将所述目标Fire模块的下一Fire模块的Squeeze层作为所述目标网络层,以利用下一Fire模块的Squeeze层的原输入通道进行剪枝,并重构下一Fire模块的Squeeze层的卷积核;The Squeeze layer of the next Fire module of the target Fire module is used as the target network layer to prune with the original input channel of the Squeeze layer of the next Fire module, and reconstruct the volume of the Squeeze layer of the next Fire module Accumulation;

相应的,所述剪枝方法还包括:Correspondingly, the pruning method also includes:

根据所述下一Fire模块的Squeeze层的通道核集,对所述目标Fire模块的Expand层中不同尺寸的卷积核的原输出通道进行剪枝;According to the channel core set of the Squeeze layer of the next Fire module, the original output channels of the convolution kernels of different sizes in the Expand layer of the target Fire module are pruned;

利用所述目标Fire模块的Expand层的通道压缩比,对所述目标Fire模块的Expand层的原输入通道进行随机采样,获得第三输入通道筛选结果;Utilize the channel compression ratio of the Expand layer of described target Fire module, carry out random sampling to the original input channel of the Expand layer of described target Fire module, obtain the 3rd input channel screening result;

利用所述第三输入通道筛选结果对所述目标Fire模块的Expand层中不同尺寸的卷积核的原输入通道进行剪枝,并将所述目标Fire模块的Expand层作为目标网络层,对Expand层中不同尺寸的卷积核进行重构。Utilize the screening result of the third input channel to prune the original input channels of the convolution kernels of different sizes in the Expand layer of the target Fire module, and use the Expand layer of the target Fire module as the target network layer, to Expand The convolution kernels of different sizes in the layer are reconstructed.

为实现上述目的,本申请进一步提供一种神经网络的剪枝装置,包括:In order to achieve the above object, the present application further provides a neural network pruning device, including:

网络层确定模块,用于确定神经网络中待剪枝的目标网络层;The network layer determination module is used to determine the target network layer to be pruned in the neural network;

通道核集确定模块,用于利用所述目标网络层的通道压缩比及每个原输入通道的卷积核权值确定所述目标网络层的通道核集;其中,所述通道核集内记载了保留输入通道;A channel kernel set determination module, used to determine the channel kernel set of the target network layer by using the channel compression ratio of the target network layer and the convolution kernel weight of each original input channel; wherein, the channel kernel set records In order to reserve the input channel;

通道剪枝模块,用于根据所述通道核集对所述目标网络层的原输入通道进行剪枝;A channel pruning module, configured to prune the original input channel of the target network layer according to the channel core set;

卷积核重构模块,用于重构所述目标网络层的卷积核。The convolution kernel reconstruction module is used to reconstruct the convolution kernel of the target network layer.

为实现上述目的,本申请进一步提供一种电子设备,包括:In order to achieve the above purpose, the present application further provides an electronic device, including:

存储器,用于存储计算机程序;memory for storing computer programs;

处理器,用于执行所述计算机程序时实现上述剪枝方法的步骤。A processor configured to implement the steps of the above pruning method when executing the computer program.

为实现上述目的,本申请进一步提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现上述剪枝方法的步骤。To achieve the above object, the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the above pruning method are implemented.

通过以上方案可知,本申请实施例提供的一种神经网络的剪枝方法,包括:确定神经网络中待剪枝的目标网络层;利用目标网络层的通道压缩比及每个原输入通道的卷积核权值确定目标网络层的通道核集;其中,通道核集内记载了保留输入通道;根据通道核集对目标网络层的原输入通道进行剪枝,并重构目标网络层的卷积核。It can be seen from the above scheme that a neural network pruning method provided by the embodiment of the present application includes: determining the target network layer to be pruned in the neural network; using the channel compression ratio of the target network layer and the volume of each original input channel The product kernel weight determines the channel kernel set of the target network layer; among them, the reserved input channel is recorded in the channel kernel set; the original input channel of the target network layer is pruned according to the channel kernel set, and the convolution of the target network layer is reconstructed nuclear.

可见,本方案在对神经网络进行剪枝时,可将待剪枝网络层作为目标网络层进行通道剪枝及卷积核重构,因此,本方案对多分支结构进行压缩时,并不仅仅局限于中间层,还可对输入层、输出层、下采样层等等网络层进行压缩,大大提升了神经网络的压缩比,减少神经网络模型执行任务所需要的计算量,加快神经网络的任务处理速度;并且,本方案是一种数据无关的剪枝方法,有利于保持压缩后神经网络的鲁棒性,本方案还是一种异步通道的剪枝方法,能够在神经网 络的不同网络层实现不同稀疏粒度的剪枝,提高压缩的灵活性;本申请还公开了一种神经网络的剪枝装置、设备及存储介质,同样能实现上述技术效果。It can be seen that when this scheme prunes the neural network, the network layer to be pruned can be used as the target network layer for channel pruning and convolution kernel reconstruction. Therefore, when this scheme compresses the multi-branch structure, it does not only Limited to the middle layer, it can also compress the input layer, output layer, downsampling layer and other network layers, which greatly improves the compression ratio of the neural network, reduces the amount of calculation required by the neural network model to perform tasks, and speeds up the tasks of the neural network Processing speed; and, this scheme is a data-independent pruning method, which is beneficial to maintain the robustness of the compressed neural network. This scheme is also an asynchronous channel pruning method, which can be implemented in different network layers of the neural network Pruning with different sparse granularity improves the flexibility of compression; this application also discloses a neural network pruning device, equipment and storage medium, which can also achieve the above technical effects.

附图说明Description of drawings

为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present application. Those skilled in the art can also obtain other drawings based on these drawings without creative work.

图1为本申请实施例公开的一种神经网络的剪枝方法流程示意图;FIG. 1 is a schematic flow diagram of a pruning method for a neural network disclosed in an embodiment of the present application;

图2为本申请实施例公开的下采样模块结构示意图;FIG. 2 is a schematic structural diagram of a downsampling module disclosed in an embodiment of the present application;

图3为本申请实施例公开的下采样模块剪枝整体流程示意图;Fig. 3 is a schematic diagram of the overall pruning process of the downsampling module disclosed in the embodiment of the present application;

图4为本申请实施例公开的残差模块结构示意图;FIG. 4 is a schematic structural diagram of the residual module disclosed in the embodiment of the present application;

图5为本申请实施例公开的残差模块剪枝整体流程示意图;Fig. 5 is a schematic diagram of the overall process of residual module pruning disclosed in the embodiment of the present application;

图6为本申请实施例公开的Fire模块结构示意图;Fig. 6 is a schematic structural diagram of the Fire module disclosed in the embodiment of the present application;

图7为本申请实施例公开的Fire模块剪枝整体流程示意图;Fig. 7 is a schematic diagram of the overall flow of Fire module pruning disclosed in the embodiment of the present application;

图8为本申请实施例公开的一种神经网络的剪枝装置结构示意图;Fig. 8 is a schematic structural diagram of a neural network pruning device disclosed in an embodiment of the present application;

图9为本申请实施例公开的一种电子设备结构示意图。FIG. 9 is a schematic structural diagram of an electronic device disclosed in an embodiment of the present application.

具体实施方式Detailed ways

下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the application with reference to the drawings in the embodiments of the application. Apparently, the described embodiments are only some of the embodiments of the application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of this application.

本申请提出了一种基于核集理论的卷积神经网络异步剪枝方法,能够在神经网络的前向推理过程中逐层采用基于核集理论的筛选准则来裁剪通道,并通过优化特征图重构误差直接获取压缩后卷积核的新权值,针对ResNet残差模块等多分支结构设计了异步的压缩流程,分别先后实现对残差模块以及其他模块的中间层以及输入输出层的通道剪枝。需要说明的是,本实施例所述的核集理论具体是指离线流式核集构建算法(offline and streaming corset constructions,OSCC),OSCC提供了核集构建过程中各种公式推导和证明的支撑,本方案在OSCC的基础上进行了改建,实现对通道核集的构建。This application proposes an asynchronous pruning method for convolutional neural networks based on the kernel set theory, which can use the selection criteria based on the kernel set theory layer by layer in the forward reasoning process of the neural network to cut channels, and optimize the feature map weight The structure error directly obtains the new weight of the compressed convolution kernel, and an asynchronous compression process is designed for multi-branch structures such as the ResNet residual module. branch. It should be noted that the core set theory described in this embodiment specifically refers to offline and streaming corset constructions (OSCC), which provides support for the derivation and proof of various formulas in the core set construction process , this scheme is modified on the basis of OSCC to realize the construction of the channel core set.

本申请所述方法对在分类图像上训练好的图像识别网络的各个卷积层进行滤波器级剪枝,与现有滤波器级剪枝方法不同,本申请方法以异步通道剪枝的方式实现滤波器级剪枝,而不是直接进行滤波器级剪枝。在本申请中,卷积核被定义为一个四维张量K×C×H×W,具有K个输出通道、C个输入通道并且尺寸为H×W,其中K×1×H×W对应特定输入通道的卷积核参数,1×C×H×W对应特定输出通道的卷积核参数,当以异步的方式对一个卷积核进行输入和输出通道剪枝时,可以实现对滤波器1×1×H×W剪枝。本申请所述方法在筛选待裁剪通道时采用了核集理论,通过加权随机采样为输入或输出通道构造核集,实现数据无关的通道剪枝,以保证压缩后图像识别网络的鲁棒性。The method described in this application performs filter-level pruning on each convolutional layer of the image recognition network trained on the classified image. Different from the existing filter-level pruning method, the method of this application is implemented in the form of asynchronous channel pruning Filter-level pruning instead of direct filter-level pruning. In this application, the convolution kernel is defined as a four-dimensional tensor K×C×H×W with K output channels, C input channels and size H×W, where K×1×H×W corresponds to a specific The convolution kernel parameters of the input channel, 1×C×H×W correspond to the convolution kernel parameters of a specific output channel. When pruning a convolution kernel’s input and output channels in an asynchronous manner, the filter 1 can be realized. ×1×H×W pruning. The method described in this application adopts kernel set theory when selecting channels to be pruned, constructs a kernel set for input or output channels through weighted random sampling, and realizes data-independent channel pruning to ensure the robustness of the compressed image recognition network.

参见图1,本申请实施例提供的一种神经网络的剪枝方法流程示意图;参见图1,该剪枝方法具体包括如下步骤:Referring to Fig. 1, a schematic flow chart of a neural network pruning method provided in the embodiment of the present application; referring to Fig. 1, the pruning method specifically includes the following steps:

S101、确定神经网络中待剪枝的目标网络层;S101. Determine the target network layer to be pruned in the neural network;

具体来说,本实施例中的神经网络可以为应用于图像识别等任务的卷积神经网络,由于该神经网络参数具有很大的冗余性,因此,通过本方案对卷积神经网络进行剪枝能够减少图像识别需要的计算量,加快图像识别速度。本实施例中的目标网络层可以为神经网络中任意待剪枝的网络层,可以为单分支网络结构中的网络层,也可以为多分枝结构中网络层,例如:输入层、中间层、输出层、下采样层等等,在此并不具体限定。Specifically, the neural network in this embodiment can be a convolutional neural network applied to tasks such as image recognition. Since the parameters of the neural network have great redundancy, the convolutional neural network is pruned through this scheme. Branches can reduce the amount of calculation required for image recognition and speed up image recognition. The target network layer in this embodiment can be any network layer to be pruned in the neural network, can be a network layer in a single-branch network structure, or can be a network layer in a multi-branch structure, such as: input layer, middle layer, The output layer, down-sampling layer, etc. are not specifically limited here.

S102、利用目标网络层的通道压缩比及每个原输入通道的卷积核权值确定目标网络层的通道核集;该通道核集内记载了保留输入通道;S102. Using the channel compression ratio of the target network layer and the convolution kernel weight of each original input channel to determine the channel core set of the target network layer; the channel core set records the reserved input channels;

需要说明的是,本实施例中的通道压缩比为:压缩前的输入通道数与压缩后的输入通道数的比例,通过该通道压缩比,可以确定压缩后的保留输入通道的数量,例如:目标网络层的原输入通道的数量为50,且通道压缩比为10:5,则目标网络层需要保留的输入通道的数量为25。本实施例中的通道核集中记载了需要保留的输入通道,确定通道核集的具体方式在此并不具体限定,可以随机从原输入通道中选取,也可以根据目标网络层的卷积核的权值进行选取,也可以根据每个原输入通道的重要性值来选取等等。It should be noted that the channel compression ratio in this embodiment is: the ratio of the number of input channels before compression to the number of input channels after compression. Through the channel compression ratio, the number of retained input channels after compression can be determined, for example: The number of original input channels of the target network layer is 50, and the channel compression ratio is 10:5, so the number of input channels that the target network layer needs to retain is 25. The channel kernel set in this embodiment records the input channels that need to be reserved. The specific method of determining the channel kernel set is not specifically limited here. It can be randomly selected from the original input channel, or it can be selected according to the convolution kernel of the target network layer. The weight value can be selected, or it can be selected according to the importance value of each original input channel and so on.

S103、根据通道核集对目标网络层的原输入通道进行剪枝,并重构目标网络层的卷积核。S103. Prune the original input channels of the target network layer according to the channel kernel set, and reconstruct the convolution kernel of the target network layer.

在本实施例中,确定目标网络层的通道核集后,该通道核集内记载的原输入通道需要保留,通道核集内未记载的原输入通道需要移除,从而实现对原输入通道的剪枝。并且,输入通道进行剪枝后,还需要对目标网络层的卷积核进行重构,以对卷积核的权值进行更新,该卷积核重构过程不需要重训练网络,仅仅在神经网络的前向推理过程中实现。In this embodiment, after determining the channel core set of the target network layer, the original input channels recorded in the channel core set need to be retained, and the original input channels not recorded in the channel core set need to be removed, so as to achieve the original input channel pruning. Moreover, after the input channel is pruned, the convolution kernel of the target network layer needs to be reconstructed to update the weight of the convolution kernel. The convolution kernel reconstruction process does not require retraining the network, and only implemented during the forward inference of the network.

需要说明的是,目前基于显著性的剪枝算法所采用的启发式筛选规则通常依赖于特征图或者卷积核权值的重要性,两者分别对应着数据驱动(data-driven)的剪枝算法和数据无关It should be noted that the heuristic screening rules currently used in saliency-based pruning algorithms usually depend on the importance of feature maps or convolution kernel weights, which correspond to data-driven pruning Algorithms are independent of data

(data-independent)的剪枝算法两个分类。已有研究表明:神经网络压缩会对压缩后网络模型的泛化产生影响,即便压缩后网络模型的平均准确率与未压缩的完整网络相近,压缩后网络模型在某一类别的样本中或者某些类别上泛化能力会下降。数据无关的剪枝算法可以为任意测试数据提供鲁棒的压缩。例如,压缩方法DINP(Data-Independent Neural Pruning via Coresets),采用离线流式核集构建算法(offline and streaming corset constructions,OSCC)为全连接层的隐层神经元构建核集,将核集内的隐层神经元视作重要神经元予以保留并更新神经元的权值。此外,DINP给出了以激活函数值上界为依据的神经元重要性衡量规则与核集采样概率,在核集的构建过程中网络层的输入数据与每个神经元被采样的加权系数的计算和分配无关,所以DINP构建的神经元核集具有与数据无关的特性。在本申请中,并非使用特征图来进行通道剪枝,是通过卷积核权值进行通道剪枝,因此本方案是一种数据无关的剪枝方法,有利于保持压缩后神经网络的鲁棒性。There are two categories of (data-independent) pruning algorithms. Existing studies have shown that neural network compression will affect the generalization of the compressed network model. Even if the average accuracy of the compressed network model is similar to that of the uncompressed complete network, the compressed network model is not good in a certain category of samples or in a certain category. The generalization ability on some categories will decrease. Data-independent pruning algorithms can provide robust compression for arbitrary test data. For example, the compression method DINP (Data-Independent Neural Pruning via Coresets) uses the offline and streaming corset constructions (OSCC) to construct a core set for the hidden layer neurons of the fully connected layer, and the core sets in the core set Hidden layer neurons are regarded as important neurons to be retained and the weights of neurons are updated. In addition, DINP gives the neuron importance measurement rules based on the upper bound of the activation function value and the sampling probability of the kernel set. During the construction of the kernel set, the input data of the network layer and the weighting coefficient of each neuron sampled Computation and allocation are independent, so the neuron kernel set constructed by DINP has the property of being independent of data. In this application, instead of using feature maps for channel pruning, channel pruning is performed through convolution kernel weights. Therefore, this scheme is a data-independent pruning method, which is conducive to maintaining the robustness of the compressed neural network. sex.

综上可见,本方案公开了一种结构化剪枝方案,本方案在对神经网络进行剪枝时,可将待剪枝网络层作为目标网络层进行通道剪枝及卷积核重构,不仅可对单分支结构进行压缩,还可以对多分支结构进行压缩;本方案对多分支结构进行压缩时,并不仅仅局限于模块的中间层,还可对模块的输入层、输出层、下采样层等等网络层进行压缩,从而实现对整个模块的输入通道和输出通道进行剪枝,大大提升了神经网络的压缩比,减少神经网络模型执行任务所需要的计算量,加 快神经网络的任务处理速度;并且,本方案对不同网络层是通过异步方式进行剪枝,因此本方案公开的这种异步通道的剪枝方法,能在不同网络层实现不同稀疏粒度的剪枝,提高压缩的灵活性,可在某些网络层实现通道剪枝的基础上,在其他网络层实现滤波器级剪枝。In summary, this scheme discloses a structured pruning scheme. When pruning the neural network, the scheme can use the network layer to be pruned as the target network layer for channel pruning and convolution kernel reconstruction. It can compress the single-branch structure, and can also compress the multi-branch structure; when this scheme compresses the multi-branch structure, it is not limited to the middle layer of the module, but also the input layer, output layer, and downsampling of the module. Layers and other network layers are compressed, so as to realize the pruning of the input channel and output channel of the entire module, which greatly improves the compression ratio of the neural network, reduces the amount of calculation required for the neural network model to perform tasks, and speeds up the task processing of the neural network speed; and, this solution prunes different network layers in an asynchronous manner, so the asynchronous channel pruning method disclosed in this solution can realize pruning with different sparse granularity at different network layers, improving the flexibility of compression , based on channel pruning at some network layers, filter-level pruning at other network layers.

基于上述实施例,在本实施例中,利用目标网络层的通道压缩比及每个原输入通道的卷积核权值确定所述目标网络层的通道核集的过程具体包括:Based on the above-mentioned embodiment, in this embodiment, the process of determining the channel kernel set of the target network layer by using the channel compression ratio of the target network layer and the convolution kernel weight of each original input channel specifically includes:

步骤一:根据目标网络层的原输入通道的第一数量及通道压缩比,确定目标网络层的保留输入通道的第二数量;Step 1: Determine the second number of reserved input channels of the target network layer according to the first number of original input channels of the target network layer and the channel compression ratio;

步骤二:利用第一数量、第二数量以及每个原输入通道的卷积核权值的Frobenius范数,确定每个原输入通道的重要性值;Step 2: Using the first quantity, the second quantity and the Frobenius norm of the convolution kernel weight of each original input channel to determine the importance value of each original input channel;

步骤三:根据每个原输入通道的重要性值确定每个原输入通道的采样概率;Step 3: Determine the sampling probability of each original input channel according to the importance value of each original input channel;

步骤四:利用每个原输入通道的采样概率对每个原输入通道进行R轮独立采样,并根据采样结果生成与目标网络层对应的通道核集。Step 4: Use the sampling probability of each original input channel to perform R rounds of independent sampling for each original input channel, and generate a channel core set corresponding to the target network layer according to the sampling results.

在本实施例中,设定目标网络层为第l层网络层,则目标网络层的原输入通道的第一数量也可以理解为第l-1层网络层的输出通道数,在本实施例中,通过n l-1表示目标网络层的原输入通道的第一数量,通过a l表示目标网络层的保留输入通道的第二数量。 In this embodiment, if the target network layer is set as the l-th layer network layer, then the first number of the original input channels of the target network layer can also be understood as the number of output channels of the l-1-th layer network layer, in this embodiment Among them, the first number of original input channels of the target network layer is represented by n l-1 , and the second number of reserved input channels of the target network layer is represented by a l .

本实施例在步骤二确定每个原输入通道的重要性值时,首先利用第一数量及第二数量确定第一数量个加权系数,根据每个原输入通道的卷积核权值确定每个原输入通道的Frobenius范数值,并根据每个原输入通道的Frobenius范数值向每个原输入通道分配对应的加权系数;其中,In this embodiment, when determining the importance value of each original input channel in step 2, the first number and the second number are used to determine the first number of weighting coefficients, and each weight coefficient is determined according to the convolution kernel weight value of each original input channel. The Frobenius norm value of the original input channel, and assign a corresponding weighting coefficient to each original input channel according to the Frobenius norm value of each original input channel; wherein,

Frobenius范数值越大的原输入通道分配的加权系数越大,然后根据每个原输入通道的加权系数及每个原输入通道的初始重要性函数,确定每个原输入通道的重要性值。The weight coefficient assigned to the original input channel with the larger Frobenius norm value is larger, and then the importance value of each original input channel is determined according to the weight coefficient of each original input channel and the initial importance function of each original input channel.

在本实施例中,可通过如下公式计算n l-1个加权系数w i(x): In this embodiment, n l-1 weighting coefficients w i (x) can be calculated by the following formula:

Figure PCTCN2021134336-appb-000006
Figure PCTCN2021134336-appb-000006

其中,w i(x)为第i个加权系数,为每个通道的非均匀采样构造的加权系数,该w i(x)并没有与原输入通道建立关联关系,且

Figure PCTCN2021134336-appb-000007
Among them, w i (x) is the i-th weighting coefficient, which is the weighting coefficient constructed by the non-uniform sampling of each channel, and the w i (x) has no correlation with the original input channel, and
Figure PCTCN2021134336-appb-000007

进一步,为了将n l-1个加权系数w i(x)与n l-1个原输入通道建立关联关系,需要计算第l层卷积核的n l-1个原输入通道卷积核权值的Frobenius范数值,将各个原输入通道的Frobenius范数值按照数值大小顺序排序,并将n l-1个加权系数按照数值大小顺序排序,根据两者的排序,将数值越大的加权系数分配给Frobenius范数值越大的原输入通道。每个原输入通道分配对应的加权系数后,即可根据如下公式计算每个原输入通道的重要性值: Further, in order to associate n l-1 weighting coefficients w i (x) with n l-1 original input channels, it is necessary to calculate the n l-1 original input channel convolution kernel weights of the l-th layer convolution kernel The Frobenius norm value of the value, sort the Frobenius norm values of each original input channel according to the numerical order, and sort the n l-1 weighting coefficients according to the numerical order, according to the sorting of the two, assign the weighting coefficient with the larger value Give the original input channel with a larger Frobenius norm value. After assigning the corresponding weighting coefficient to each original input channel, the importance value of each original input channel can be calculated according to the following formula:

s i(x)=w i(x)·g i(x)           (2) s i (x) = w i (x) g i (x) (2)

其中,在公式2中,s i(x)为第i个原输入通道的重要性函数,用于确定第i个原输入通道的重要性值,g i(x)为第i个原输入通道的初始重要性函数,在本实施例中,初始重要性函数可根据神经网络的不同进行自定义设置,如:针对ResNet及SqueezeNet,可设置g i(x)=1,针对mobilenet-v2网络中线性点卷积层(linear point-wise convolution layer),可设置:

Figure PCTCN2021134336-appb-000008
其中,i表示第i个通道,x ij是当前批次输入数据x中某个数据x j的第i通道,
Figure PCTCN2021134336-appb-000009
是Frobenius范数,N是线性点卷积层的输入通道总数;w i(x)为第i个原输入通道的加权系数。每个原输入通道的重要性值确定后,本实施例在步骤三根据如下公式3确定每个原输入通道的采样概率p i: Among them, in Formula 2, s i (x) is the importance function of the i-th original input channel, which is used to determine the importance value of the i-th original input channel, g i (x) is the i-th original input channel In this embodiment, the initial importance function can be customized according to different neural networks, such as: for ResNet and SqueezeNet, g i (x)=1 can be set, for mobilenet-v2 network Linear point-wise convolution layer (linear point-wise convolution layer), can be set:
Figure PCTCN2021134336-appb-000008
Among them, i represents the i-th channel, x ij is the i-th channel of a certain data x j in the current batch of input data x,
Figure PCTCN2021134336-appb-000009
is the Frobenius norm, N is the total number of input channels of the linear point convolution layer; w i (x) is the weighting coefficient of the i-th original input channel. After the importance value of each original input channel is determined, this embodiment determines the sampling probability p i of each original input channel according to the following formula 3 in step 3:

p i=s i(x)/t      (3) p i =s i (x)/t (3)

其中,t为所有通道重要性之和,即

Figure PCTCN2021134336-appb-000010
Among them, t is the sum of all channel importances, namely
Figure PCTCN2021134336-appb-000010

在本实施例中,确定每个原输入通道的采样概率后,需要通过步骤四按照每个原输入通道的采样概率进行R轮独立采样,获得采样结果,该采样结果为每个原输入通道被采样到次数,原输入通道的采样概率越大,则被采样的次数越多;得出采样结果后,即可将采样结果中被采样次数最多的al个原输入通道添加至通道核集

Figure PCTCN2021134336-appb-000011
而未被添加至通道核集
Figure PCTCN2021134336-appb-000012
的原输入通道需要被剪枝。 In this embodiment, after determining the sampling probability of each original input channel, it is necessary to perform R rounds of independent sampling according to the sampling probability of each original input channel through step 4 to obtain the sampling result, which is the sampling result of each original input channel When the number of samples is reached, the greater the sampling probability of the original input channel, the more times it is sampled; after the sampling result is obtained, the al original input channels that have been sampled the most times in the sampling result can be added to the channel core set
Figure PCTCN2021134336-appb-000011
without being added to the channel core set
Figure PCTCN2021134336-appb-000012
The original input channel of needs to be pruned.

对目标网络层进行通道剪枝后,还需要重构目标网络层的卷积核,在本实施例中,重构目标网络层的卷积核的过程即为对卷积核权值进行更新的过程,该过程首先利用通道核集创建优化函数;该优化函数为:After channel pruning is performed on the target network layer, the convolution kernel of the target network layer needs to be reconstructed. In this embodiment, the process of reconstructing the convolution kernel of the target network layer is to update the weight of the convolution kernel process, which first uses the channel kernel set to create an optimization function; the optimization function is:

Figure PCTCN2021134336-appb-000013
Figure PCTCN2021134336-appb-000013

其中,Y k是原卷积核在输出通道k的输出特征图,K是目标网络层的卷积核输出通道总数,

Figure PCTCN2021134336-appb-000014
代表分别计算卷积核的K个输出通道的特征图重构误差并求和,
Figure PCTCN2021134336-appb-000015
代表Frobenius范数,W ik代表卷积核在输入通道i和输出通道k的权值,
Figure PCTCN2021134336-appb-000016
为在通道核集
Figure PCTCN2021134336-appb-000017
中每个保留输入通道的输入数据x i经卷积核输出通道k的输出特征图之和,*代表卷积操作; Among them, Y k is the output feature map of the original convolution kernel in the output channel k, K is the total number of convolution kernel output channels of the target network layer,
Figure PCTCN2021134336-appb-000014
Represents the feature map reconstruction errors of the K output channels of the convolution kernel respectively calculated and summed,
Figure PCTCN2021134336-appb-000015
Represents the Frobenius norm, Wi ik represents the weight of the convolution kernel in the input channel i and output channel k,
Figure PCTCN2021134336-appb-000016
for the channel kernel set
Figure PCTCN2021134336-appb-000017
The sum of the output feature maps of the input data x i of each reserved input channel through the convolution kernel output channel k, * represents the convolution operation;

然后最小化优化函数,对目标网络层的卷积核的权值进行更新。Then the optimization function is minimized, and the weights of the convolution kernels of the target network layer are updated.

需要说明的是,目前神经网络剪枝算法设计启发式的筛选规则以选择参数、通道、滤波器等对象来进行剪枝,并通过训练或者特征图重构来弥补压缩造成的精度损失。除了网络模型的准确率和压缩比之外,神经网络剪枝算法的评价标准还有运行效率(construction efficiency),因此,本实施例采用特征图重构方式的剪枝算法能够提高神经网络压缩过程本身的运行效率。It should be noted that the current neural network pruning algorithm designs heuristic screening rules to select parameters, channels, filters and other objects for pruning, and compensates for the loss of accuracy caused by compression through training or feature map reconstruction. In addition to the accuracy rate and compression ratio of the network model, the evaluation criteria of the neural network pruning algorithm also has construction efficiency. Therefore, the pruning algorithm using the feature map reconstruction method in this embodiment can improve the neural network compression process. own operating efficiency.

综上可见,本方案在对通过进行剪枝时,是利用卷积核的特征而非特征图的特征来构造通道采样的采样概率,因此实现了数据无关的剪枝方式,有利于保持压缩后神经网络的鲁棒性;并且,本方案通过最小化通道核集对应的输出特征图重构误差来重构卷积核,即在网络的前向推理过程中实现卷积核重构而不需要重训练网络,节省卷积核重构时间,提高神经网络压缩过程本身的运行效率。To sum up, it can be seen that when pruning the passage, this scheme uses the characteristics of the convolution kernel instead of the characteristics of the feature map to construct the sampling probability of channel sampling, so it realizes the data-independent pruning method, which is conducive to maintaining the compressed The robustness of the neural network; and, this scheme reconstructs the convolution kernel by minimizing the output feature map reconstruction error corresponding to the channel kernel set, that is, the convolution kernel reconstruction is realized during the forward reasoning process of the network without the need for Retrain the network, save the reconstruction time of the convolution kernel, and improve the operating efficiency of the neural network compression process itself.

需要说明的是,在对具有单分枝结构的网络层进行剪枝时,可直接利用上述实施例所述的结构化剪枝方法对单分枝结构中每个网络层进行剪枝,但是对于多分枝结构,除了利用上述实施例所述的结构化剪枝方法对多分枝结构中每个网络层进行剪枝之外,还需要根据多分支结构的特殊之处进行额外处理,在本实施例中,以ResNet的下采样模块、残差模块,以及SqueezeNet的Fire模块为例对多分枝结构的剪枝过程进行说明。It should be noted that when pruning a network layer with a single-branch structure, the structured pruning method described in the above embodiment can be directly used to prune each network layer in a single-branch structure, but for For the multi-branch structure, in addition to pruning each network layer in the multi-branch structure using the structured pruning method described in the above embodiment, additional processing is required according to the special features of the multi-branch structure. In this embodiment In , the pruning process of the multi-branch structure is illustrated by taking the down-sampling module and residual module of ResNet and the Fire module of SqueezeNet as examples.

一、若对ResNet的下采样模块进行剪枝,则本方案在执行上述实施例中的S101时,需要将下采样模块中残差支路的中间层及输出层依次作为目标网络层,并继续执行S102至S103,以对中间层及输出层的原输入通道进行剪枝,并重构中间层及输出层的卷积核;1. If the down-sampling module of ResNet is pruned, when executing S101 in the above embodiment, the solution needs to use the middle layer and the output layer of the residual branch in the down-sampling module as the target network layer in turn, and continue Execute S102 to S103 to prune the original input channels of the middle layer and the output layer, and reconstruct the convolution kernels of the middle layer and the output layer;

相应的,由于下采样模块的特殊性,在本实施例中,该剪枝方法还包括:利用下采样模块的输入层的通道压缩比,对输入层的原输入通道进行随机采样,获得第一输入通道筛选结果;利用第一输入通道筛选结果对下采样模块的输入层及下采样层的原输入通道进行剪枝,并将下采样模块的输入层及下采样层分别作为目标网络层进行卷积核重构。Correspondingly, due to the particularity of the down-sampling module, in this embodiment, the pruning method further includes: using the channel compression ratio of the input layer of the down-sampling module, randomly sampling the original input channel of the input layer to obtain the first Input channel screening results; use the first input channel screening results to prune the input layer of the down-sampling module and the original input channel of the down-sampling layer, and use the input layer and down-sampling layer of the down-sampling module as the target network layer for convolution Accumulation refactoring.

在本实施例中,为了方便说明,将上述实施例所述的剪枝方法称为基于核集理论的剪枝方法,并且,本方案对下采样模块、残差模块、Fire模块中任意网络层的卷积核进行重构的过程也与上述实施例所述的目标网络层的重构过程相同,在此并不再具体赘述。参见图2,为本申请实施例提供的下采样模块结构示意图,图2的中间层、输出层及输入层属于残差支路,在对下采样模块进行压缩时,本方案首先采用上述实施例所述的基于核集理论的剪枝方法,对残差支路的中间层和输出层进行输入通道剪枝,并重构相应的卷积核,随后对残差支路的输入层和下采样层的输入通道按照基于随机采样的剪枝方法进行通道剪枝,并对卷积核重构;其中,基于随机采样的剪枝方法具体是指:根据对应的通道压缩比对原输入通道进行一次均等概率随机采样,获得对应的输入通道筛选结果,并进行剪枝,后文为了方便描述,将该过程简称为基于随机采样的剪枝方法,不再对该过程具体说明。In this embodiment, for the convenience of description, the pruning method described in the above embodiments is referred to as the pruning method based on kernel set theory, and this solution is applicable to any network layer in the downsampling module, residual module, and Fire module The process of reconstructing the convolution kernel is the same as the process of reconstructing the target network layer described in the above embodiment, and will not be repeated here. Referring to Figure 2, it is a schematic structural diagram of the down-sampling module provided by the embodiment of the present application. The middle layer, output layer and input layer in Figure 2 belong to the residual branch. When compressing the down-sampling module, this solution first adopts the above-mentioned embodiment The described pruning method based on kernel set theory performs input channel pruning on the middle layer and the output layer of the residual branch, and reconstructs the corresponding convolution kernel, and then the input layer and downsampling of the residual branch The input channel of the layer is pruned according to the pruning method based on random sampling, and the convolution kernel is reconstructed; among them, the pruning method based on random sampling specifically refers to: according to the corresponding channel compression ratio, the original input channel is once Random sampling with equal probability to obtain the corresponding input channel screening results and perform pruning. For the convenience of description, this process will be referred to as the pruning method based on random sampling for short, and the process will not be described in detail.

参见图3,为本申请实施例提供的下采样模块剪枝整体流程示意图,该整体流程图具体包括如下步骤:Referring to FIG. 3 , it is a schematic diagram of the overall flow chart of the pruning of the down-sampling module provided in the embodiment of the present application. The overall flow chart specifically includes the following steps:

S201、基于核集理论的剪枝方法对残差支路的中间层的原输入通道进行筛选和剪枝,并重构输出层的卷积核;S201. The pruning method based on kernel set theory screens and prunes the original input channel of the middle layer of the residual branch, and reconstructs the convolution kernel of the output layer;

S202、基于核集理论的剪枝方法对残差支路的输出层的原输入通道进行筛选和剪枝,并重构输出层的卷积核;S202. The pruning method based on kernel set theory screens and prunes the original input channel of the output layer of the residual branch, and reconstructs the convolution kernel of the output layer;

S203、基于随机采样的剪枝方法对残差支路的输入层的原输入通道进行筛选,得到第一输入通道筛选结果;S203. The pruning method based on random sampling screens the original input channel of the input layer of the residual branch, and obtains the first input channel screening result;

S204、根据该第一输入通道筛选结果对残差支路的输入层及下采样层的原输入通道进行剪枝,并重构残差支路的输入层及下采样层的卷积核。S204. Prune the input layer of the residual branch and the original input channel of the downsampling layer according to the screening result of the first input channel, and reconstruct the input layer of the residual branch and the convolution kernel of the downsampling layer.

需要说明的是,由于残差支路的输入层及下采样层具有相同的输入,因此对残差支路的输入层进行通道筛选得出通道核集后,该通道核集可直接应用在下采样模块的下采样层,无需再次计算。It should be noted that since the input layer of the residual branch and the downsampling layer have the same input, after channel screening is performed on the input layer of the residual branch to obtain the channel kernel set, the channel kernel set can be directly applied to the downsampling layer. The downsampling layer of the module does not need to be recalculated.

二、若对ResNet中堆叠的N个残差模块进行剪枝,则本方案在执行上述实施例中的S101时,需要将N个残差模块中的每个残差模块的中间层及输出层依次作为目标网络层,并继续执行S102至S103,以对中间层及输出层的原输入通道进行剪枝,并重构中间层及输出层的卷积核;2. If the N residual modules stacked in ResNet are pruned, when this solution executes S101 in the above embodiment, the middle layer and the output layer of each residual module in the N residual modules need to be pruned Serve as the target network layer in turn, and continue to execute S102 to S103, so as to prune the original input channel of the middle layer and the output layer, and reconstruct the convolution kernel of the middle layer and the output layer;

相应的,由于残差模块的特殊性,在本实施例中,对中间层及输出层的原输入通道进行剪枝,并重构中间层及输出层的卷积核之后,还包括:利用残差模块的输入层的通道压缩比,对残差模块的输入层的原输入通道进行随机采样,获得第二输入通道筛选结果;针对N个残差模块中的每个残差模块,均利用第二输入通道筛选结果对每个残差模块的输入层的原输入通道进行剪枝, 并将残差模块的输入层作为目标网络层进行卷积核重构;利用第二输入通道筛选结果对残差模块的输出层的原输出通道进行剪枝,并将残差模块的输出层作为目标网络层进行卷积核重构。Correspondingly, due to the particularity of the residual module, in this embodiment, after pruning the original input channels of the intermediate layer and the output layer, and reconstructing the convolution kernels of the intermediate layer and the output layer, it also includes: using the residual The channel compression ratio of the input layer of the difference module is used to randomly sample the original input channel of the input layer of the residual module to obtain the second input channel screening result; for each residual module in the N residual modules, use the first The original input channel of the input layer of each residual module is pruned according to the screening result of the second input channel, and the input layer of the residual module is used as the target network layer for convolution kernel reconstruction; The original output channel of the output layer of the difference module is pruned, and the output layer of the residual module is used as the target network layer for convolution kernel reconstruction.

参见图4,为本申请实施例提供的残差模块结构示意图,残差模块包括:输入层、中间层和输出层及shortcut;并且,图4仅记载了一个残差模块结构,对于堆叠的残差模块,则通过多个如图4所示结构组成。在对堆叠的残差模块进行压缩时,多个残差模块的输入层之间可以共享基于随机采样的剪枝方法获得的通道核集,保留同样编号的通道,因此,本方案对堆叠的残差模块进行压缩时,首先依次对堆叠的多个残差模块的中间层和输出层的输入通道进行基于核集理论的剪枝以及卷积核重构,随后基于随机采样的剪枝方法对其中一个残差模块获取输入通道筛选结果,并利用该输入通道筛选结果依次对堆叠的多个残差模块的输入层及输出层进行剪枝以及卷积核重构,即在不同的步骤中对残差模块的输出层执行两次卷积核重构。Referring to Figure 4, it is a schematic diagram of the structure of the residual module provided by the embodiment of the present application. The residual module includes: an input layer, an intermediate layer, an output layer and a shortcut; and, Figure 4 only records a residual module structure. For stacked residual The difference module is composed of multiple structures as shown in FIG. 4 . When compressing the stacked residual modules, the input layers of multiple residual modules can share the channel core set obtained by the pruning method based on random sampling, and keep the channels with the same number. When the difference module is compressed, the input channels of the intermediate layer and the output layer of the stacked residual modules are firstly pruned based on the kernel set theory and the convolution kernel is reconstructed, and then the pruning method is based on random sampling. A residual module obtains the filtering result of the input channel, and uses the filtering result of the input channel to prune and reconstruct the convolution kernel of the input layer and output layer of multiple stacked residual modules in turn, that is, the residual The output layer of the difference module performs two kernel reconstructions.

若残差模块的总数为N,i和i'表示当前进行剪枝的第i和第i'个残差模块,i和i'在初始情况下为1。参见图5,为本申请实施例提供的残差模块剪枝整体流程示意图,该整体流程图具体包括如下步骤:If the total number of residual modules is N, i and i' represent the i-th and i'th residual modules currently being pruned, and i and i' are 1 in the initial case. Referring to FIG. 5 , it is a schematic diagram of the overall process flow of the residual module pruning provided by the embodiment of the present application. The overall flow chart specifically includes the following steps:

S301、基于核集理论的剪枝方法,对第i个残差模块的中间层的原输入通道进行筛选和剪枝,并重构中间层的卷积核;S301. A pruning method based on kernel set theory, screening and pruning the original input channel of the middle layer of the i-th residual module, and reconstructing the convolution kernel of the middle layer;

S302、基于核集理论的剪枝方法,对第i个残差模块的输出层的原输入通道进行筛选和剪枝,并重构输出层的卷积核;S302. A pruning method based on kernel set theory, screening and pruning the original input channel of the output layer of the i-th residual module, and reconstructing the convolution kernel of the output layer;

S303、判断i是否小于等于N;若是,则将i加1,并继续S301至S302;若否,则继续执行S304;S303. Determine whether i is less than or equal to N; if so, add 1 to i, and continue from S301 to S302; if not, continue to execute S304;

S304、基于随机采样的剪枝方法,对第i’个残差模块的输入层的原输入通道进行筛选,得到第二输入通道筛选结果;i’的初始值为1;S304. Based on the pruning method of random sampling, the original input channel of the input layer of the i'th residual module is screened to obtain the screening result of the second input channel; the initial value of i' is 1;

S305、根据该第二输入通道筛选结果对第i’个残差模块的输入层的原输入通道进行剪枝,并重构输入层的卷积核;S305. Prune the original input channel of the input layer of the i'th residual module according to the screening result of the second input channel, and reconstruct the convolution kernel of the input layer;

S306、根据该第二输入通道筛选结果对第i’个残差模块的输出层的原输出通道进行剪枝,并重构输出层的卷积核。S306. Prune the original output channel of the output layer of the i'th residual module according to the screening result of the second input channel, and reconstruct the convolution kernel of the output layer.

S307、判断i’是否小于等于N;若是,则将i’加1,并继续执行S305至S306;若否,则结束流程。S307. Determine whether i' is less than or equal to N; if so, add 1 to i', and continue to execute S305 to S306; if not, end the process.

需要说明的是,由于残差模块包括shortcut,因此S306是对输出层的shortcut分支的原输出通道进行剪枝,由于输出层的输出通道发生变化,则根据变化后的通道数再对输出层的卷积核进行重构。It should be noted that since the residual module includes shortcuts, S306 is to prune the original output channels of the shortcut branch of the output layer. Since the output channels of the output layer change, the output layer is then pruned according to the number of channels after the change. The convolution kernel is reconstructed.

三、若对SqueezeNet的目标Fire模块进行剪枝,则本方案在执行上述实施例中的S101时,需要将目标Fire模块的下一Fire模块的Squeeze层作为目标网络层,并继续执行S102至S103,以利用下一Fire模块的Squeeze层的原输入通道进行剪枝,并重构下一Fire模块的Squeeze层的卷积核;3. If the target Fire module of SqueezeNet is pruned, then this solution needs to use the Squeeze layer of the next Fire module of the target Fire module as the target network layer when executing S101 in the above embodiment, and continue to execute S102 to S103 , to use the original input channel of the Squeeze layer of the next Fire module for pruning, and reconstruct the convolution kernel of the Squeeze layer of the next Fire module;

相应的,由于Fire模块的特殊性,在本实施例中,该剪枝方法还包括:Correspondingly, due to the particularity of the Fire module, in this embodiment, the pruning method also includes:

根据所述下一Fire模块的Squeeze层的通道核集,对所述目标Fire模块的Expand层中不同尺寸的卷积核的原输出通道进行剪枝;According to the channel core set of the Squeeze layer of the next Fire module, the original output channels of the convolution kernels of different sizes in the Expand layer of the target Fire module are pruned;

利用目标Fire模块的Expand层的通道压缩比,对目标Fire模块的Expand层的原输入通道进行随机采样,获得第三输入通道筛选结果;Using the channel compression ratio of the Expand layer of the target Fire module, the original input channel of the Expand layer of the target Fire module is randomly sampled to obtain the third input channel screening result;

利用第三输入通道筛选结果对目标Fire模块的Expand层中不同尺寸的卷积核的原输入通道进行剪枝,并将目标Fire模块的Expand层作为目标网络层,对Expand层中不同尺寸的卷积核进行重构。Use the filtering results of the third input channel to prune the original input channels of the convolution kernels of different sizes in the Expand layer of the target Fire module, and use the Expand layer of the target Fire module as the target network layer, and perform convolutions of different sizes in the Expand layer The accumulation kernel is refactored.

参见图6,为本申请实施例提供的Fire模块结构示意图,Fire模块包括:Squeeze层及Expand层;并且,Expand层中具有两个不同尺寸的卷积核。本方案对Fire模块进行通道剪枝的过程为:基于核集理论的剪枝方法对Fire模块i+1的Squeeze层的原输入通道进行剪枝及卷积核重构,再对Fire模块i的expand层输出通道进行剪枝,随后基于随机采样的剪枝方法对Fire模块i的expand层的输入通道进行剪枝及卷积核重构。并且,针对squeeze层的输入通道剪枝中,第i+1个Fire模块的squeeze层的输入通道筛选结果被划分为两个部分,分别对应着第i个Fire模块的expand层的3×3卷积和1×1卷积的输出通道;针对第i个Fire模块expand层的输入通道剪枝中,3×3卷积和1×1卷积之间共享输入通道筛选结果。Referring to FIG. 6 , it is a schematic structural diagram of the Fire module provided by the embodiment of the present application. The Fire module includes: a Squeeze layer and an Expand layer; and, the Expand layer has two convolution kernels of different sizes. The process of channel pruning for the Fire module in this solution is as follows: the pruning method based on kernel set theory prunes and reconstructs the original input channel of the Squeeze layer of the Fire module i+1, and then reconstructs the convolution kernel of the Fire module i. The output channel of the expand layer is pruned, and then the input channel of the expand layer of the Fire module i is pruned and the convolution kernel is reconstructed based on the random sampling pruning method. Moreover, in the input channel pruning of the squeeze layer, the input channel screening result of the squeeze layer of the i+1 Fire module is divided into two parts, corresponding to the 3×3 volumes of the expand layer of the i-th Fire module The output channel of the product and 1×1 convolution; for the input channel pruning of the i-th Fire module expand layer, the input channel screening results are shared between the 3×3 convolution and the 1×1 convolution.

参见图7,为本申请实施例提供的Fire模块剪枝整体流程示意图,该整体流程图具体包括如下步骤:Referring to FIG. 7 , it is a schematic diagram of the overall process flow of Fire module pruning provided by the embodiment of the present application. The overall flow chart specifically includes the following steps:

S401、基于核集理论的剪枝方法,对Fire模块i+1的Squeeze层的原输入通道进行筛选,生成通道核集,并根据该通道核集对Fire模块i+1的Squeeze层进行通道剪枝及卷积核重构。S401, the pruning method based on the core set theory, screening the original input channels of the Squeeze layer of the Fire module i+1, generating a channel core set, and performing channel pruning on the Squeeze layer of the Fire module i+1 according to the channel core set branch and convolution kernel reconstruction.

S402、将通道核集划分为两部分,分别对应于Fire模块i的expand层中不同尺寸的卷积核的原输出通道,并对其原输出通道进行剪枝。S402. Divide the channel core set into two parts, corresponding to the original output channels of convolution kernels of different sizes in the expand layer of the Fire module i, and pruning the original output channels.

需要说明的是,虽然Fire模块i+1的Squeeze层的输入通道与Fire模块i的expand层的输出通道之间是对应关系,但是剪枝流程中,需要通过S401及S402分别对这个两个层中各自的卷积核做剪枝。It should be noted that although there is a corresponding relationship between the input channel of the Squeeze layer of Fire module i+1 and the output channel of the expand layer of Fire module i, in the pruning process, the two layers need to be respectively processed through S401 and S402. The respective convolution kernels are pruned.

S403、基于随机采样的剪枝方法,对Fire模块i的expand层的原输入通道进行筛选,生成通道核集,并根据该通道核集对Fire模块i的expand层中不同尺寸卷积核的输入通道进行剪枝,并重构不同尺寸的卷积核。S403. Based on the pruning method of random sampling, the original input channel of the expand layer of Fire module i is screened to generate a channel core set, and the input of convolution kernels of different sizes in the expand layer of Fire module i is input according to the channel core set. Channels are pruned and convolution kernels of different sizes are reconstructed.

综上可见,本方案所述的通道剪枝方法,以异步的方式通过输入和输出通道剪枝实现滤波器级剪枝,并针对ResNet和SqueezeNet的多分支结构特点设计了剪枝流程,解决了现有剪枝方法在压缩ResNe和SqueezeNett时只对残差模块、下采样模块及Fire模块的中间层的输入和输出通道做剪枝,而不对整个模块的输入和输出通道做压缩的局限,能够取得更高的压缩比,减少网络在前向推理过程的计算量,并且能够在网络中不同的层实现不同稀疏性粒度的剪枝。此外,本申请方法所设计的基于核集理论的通道筛选规则具有数据无关特性,有利于保持压缩后图像识别网络的鲁棒性。To sum up, the channel pruning method described in this scheme implements filter-level pruning through input and output channel pruning in an asynchronous manner, and designs the pruning process according to the multi-branch structure characteristics of ResNet and SqueezeNet, which solves the problem of When compressing ResNe and SqueezeNett, the existing pruning method only prunes the input and output channels of the residual module, downsampling module and the middle layer of the Fire module, and does not compress the input and output channels of the entire module. A higher compression ratio can be achieved, the calculation amount of the network in the forward reasoning process can be reduced, and pruning with different sparsity granularity can be realized in different layers of the network. In addition, the channel selection rule based on kernel set theory designed by the method of the present application has data-independent characteristics, which is conducive to maintaining the robustness of the compressed image recognition network.

进一步,本方案还可以部署于基于FPGA的神经网络加速应用或者AI加速芯片的软件平台中,本方案能够以高压缩比剪枝图像识别网络,减少图像识别网络在实时图像分类应用中计算量,本技术方案也可以拓展应用于目标检测网络的骨干网络的压缩,例如YOLO或者Faster RCNN。Furthermore, this solution can also be deployed in FPGA-based neural network acceleration applications or AI acceleration chip software platforms. This solution can prune the image recognition network with a high compression ratio, reducing the amount of calculation of the image recognition network in real-time image classification applications. This technical solution can also be extended and applied to the compression of the backbone network of the target detection network, such as YOLO or Faster RCNN.

下面对本申请实施例提供的剪枝装置、设备及介质进行介绍,下文描述的剪枝装置、设备及介质与上文描述的剪枝方法可以相互参照。The pruning device, equipment and medium provided in the embodiments of the present application are introduced below, and the pruning device, equipment and medium described below and the pruning method described above can be referred to each other.

参见图8,本申请实施例提供的一种神经网络的剪枝装置结构示意图,通过图8可以看出,该装置包括:Referring to FIG. 8 , a schematic structural diagram of a neural network pruning device provided in an embodiment of the present application. It can be seen from FIG. 8 that the device includes:

网络层确定模块11,用于确定神经网络中待剪枝的目标网络层;The network layer determination module 11 is used to determine the target network layer to be pruned in the neural network;

通道核集确定模块12,用于利用所述目标网络层的通道压缩比及每个原输入通道的卷积核权值确定所述目标网络层的通道核集;其中,所述通道核集内记载了保留输入通道;The channel kernel set determination module 12 is used to determine the channel kernel set of the target network layer by using the channel compression ratio of the target network layer and the convolution kernel weight of each original input channel; wherein, in the channel kernel set Records reserved input channels;

通道剪枝模块13,用于根据所述通道核集对所述目标网络层的原输入通道进行剪枝;A channel pruning module 13, configured to prune the original input channel of the target network layer according to the channel core set;

卷积核重构模块14,用于重构所述目标网络层的卷积核。The convolution kernel reconstruction module 14 is configured to reconstruct the convolution kernel of the target network layer.

其中,通道核集确定模块包括:Wherein, the channel core set determination module includes:

第一确定单元,用于根据所述目标网络层的原输入通道的第一数量及所述通道压缩比,确定所述目标网络层的保留输入通道的第二数量;A first determining unit, configured to determine a second number of reserved input channels of the target network layer according to the first number of original input channels of the target network layer and the channel compression ratio;

第二确定单元,用于利用所述第一数量、所述第二数量以及每个原输入通道的卷积核权值的Frobenius范数,确定每个原输入通道的重要性值;The second determination unit is used to determine the importance value of each original input channel by using the first quantity, the second quantity and the Frobenius norm of the convolution kernel weight of each original input channel;

第三确定单元,用于根据每个原输入通道的重要性值确定每个原输入通道的采样概率;The third determination unit is used to determine the sampling probability of each original input channel according to the importance value of each original input channel;

通道核集生成单元,用于利用每个原输入通道的采样概率对每个原输入通道进行R轮独立采样,并根据采样结果生成与目标网络层对应的通道核集。The channel core set generating unit is used to perform R rounds of independent sampling on each original input channel by using the sampling probability of each original input channel, and generate a channel core set corresponding to the target network layer according to the sampling result.

其中,所述第二确定单元包括:Wherein, the second determination unit includes:

加权系数确定子单元,用于利用所述第一数量及所述第二数量确定所述第一数量个加权系数;A weighting coefficient determining subunit, configured to determine the first number of weighting coefficients by using the first number and the second number;

加权系数分配子单元,用于根据每个原输入通道的卷积核权值确定每个原输入通道的The weighting coefficient allocation subunit is used to determine the weight of each original input channel according to the convolution kernel weight of each original input channel

Frobenius范数值,并根据每个原输入通道的Frobenius范数值向每个原输入通道分配对应的加权系数;其中,Frobenius范数值越大的原输入通道分配的加权系数越大;Frobenius norm value, and assign corresponding weighting coefficients to each original input channel according to the Frobenius norm value of each original input channel; wherein, the weighting coefficient assigned to the original input channel with the larger Frobenius norm value is larger;

重要性值确定子单元,用于根据每个原输入通道的加权系数及每个原输入通道的初始重要性函数,确定每个原输入通道的重要性值。The importance value determining subunit is configured to determine the importance value of each original input channel according to the weighting coefficient of each original input channel and the initial importance function of each original input channel.

其中,所述卷积核重构模块包括:Wherein, the convolution kernel reconstruction module includes:

函数创建单元,用于利用所述通道核集创建优化函数;所述优化函数为:A function creation unit, configured to utilize the channel core set to create an optimization function; the optimization function is:

Figure PCTCN2021134336-appb-000018
Figure PCTCN2021134336-appb-000018

其中,Y k是原卷积核在输出通道k的输出特征图,K是所述目标网络层的卷积核输出通道总数,

Figure PCTCN2021134336-appb-000019
代表分别计算卷积核的K个输出通道的特征图重构误差并求和,
Figure PCTCN2021134336-appb-000020
代表Frobenius范数,W ik代表卷积核在输入通道i和输出通道k的权值,
Figure PCTCN2021134336-appb-000021
为在通道核集
Figure PCTCN2021134336-appb-000022
中每个保留输入通道的输入数据x i经卷积核输出通道k的输出特征图之和,*代表卷积操作; Among them, Y k is the output feature map of the original convolution kernel in the output channel k, K is the total number of convolution kernel output channels of the target network layer,
Figure PCTCN2021134336-appb-000019
Represents the feature map reconstruction errors of the K output channels of the convolution kernel respectively calculated and summed,
Figure PCTCN2021134336-appb-000020
Represents the Frobenius norm, Wi ik represents the weight of the convolution kernel in the input channel i and output channel k,
Figure PCTCN2021134336-appb-000021
for the channel kernel set
Figure PCTCN2021134336-appb-000022
The sum of the output feature maps of the input data x i of each reserved input channel through the convolution kernel output channel k, * represents the convolution operation;

权值更新单元,用于最小化所述优化函数,对所述目标网络层的卷积核的权值进行更新。A weight updating unit, configured to minimize the optimization function, and update the weight of the convolution kernel of the target network layer.

其中,若对ResNet的下采样模块进行剪枝,则所述网络层确定模块具体用于:将所述下采样模块中残差支路的中间层及输出层依次作为所述目标网络层,以对中间层及输出层的原输入通道进行剪枝,并重构中间层及输出层的卷积核;Wherein, if the down-sampling module of ResNet is pruned, the network layer determination module is specifically used to: use the middle layer and the output layer of the residual branch in the down-sampling module as the target network layer in turn, with Prune the original input channel of the middle layer and the output layer, and reconstruct the convolution kernel of the middle layer and the output layer;

相应的,所述装置还包括:Correspondingly, the device also includes:

第一筛选模块,用于利用所述下采样模块的输入层的通道压缩比,对所述输入层的原输入通道进行随机采样,获得第一输入通道筛选结果;The first screening module is configured to use the channel compression ratio of the input layer of the down-sampling module to randomly sample the original input channel of the input layer to obtain the first input channel screening result;

所述通道剪枝模块还用于:利用所述第一输入通道筛选结果对所述下采样模块的输入层及下采样层的原输入通道进行剪枝;The channel pruning module is also used to: use the screening result of the first input channel to prune the input layer of the down sampling module and the original input channel of the down sampling layer;

所述卷积核重构模块还用于:重构输入层及下采样层的卷积核。The convolution kernel reconstruction module is also used for: reconstructing the convolution kernels of the input layer and the down-sampling layer.

其中,若对ResNet中堆叠的N个残差模块进行剪枝,则所述网络层确定模块具体用于:将N个残差模块的中间层及输出层依次作为所述目标网络层,以对中间层及输出层的原输入通道进行剪枝,并重构中间层及输出层的卷积核;Wherein, if the N residual modules stacked in ResNet are pruned, the network layer determination module is specifically used to: use the intermediate layer and the output layer of the N residual modules as the target network layer in turn, so as to The original input channels of the middle layer and the output layer are pruned, and the convolution kernels of the middle layer and the output layer are reconstructed;

相应的,所述装置还包括:Correspondingly, the device also includes:

第二筛选模块,用于在对中间层及输出层的原输入通道进行剪枝,并重构中间层及输出层的卷积核之后,利用残差模块的输入层的通道压缩比,对残差模块的输入层的原输入通道进行随机采样,获得第二输入通道筛选结果;The second screening module is used to pruning the original input channel of the middle layer and the output layer, and after reconstructing the convolution kernel of the middle layer and the output layer, using the channel compression ratio of the input layer of the residual module, the residual The original input channel of the input layer of the difference module is randomly sampled to obtain the screening result of the second input channel;

所述通道剪枝模块还用于:利用所述第二输入通道筛选结果依次对N个残差模块的输入层的原输入通道进行剪枝,对N个残差模块的输出层的原输出通道进行剪枝;The channel pruning module is also used to: use the screening results of the second input channel to sequentially prune the original input channels of the input layer of the N residual modules, and prune the original output channels of the output layer of the N residual modules pruning;

所述卷积核重构模块还用于:重构输入层的卷积核,重构输出层的卷积核。The convolution kernel reconstruction module is also used to: reconstruct the convolution kernel of the input layer, and reconstruct the convolution kernel of the output layer.

其中,若对SqueezeNet的目标Fire模块进行剪枝,则所述网络层确定模块具体用于:将所述目标Fire模块的下一Fire模块的Squeeze层作为所述目标网络层,以利用下一Fire模块的Squeeze层的原输入通道进行剪枝,并重构下一Fire模块的Squeeze层的卷积核;Wherein, if the target Fire module of SqueezeNet is pruned, the network layer determination module is specifically used to: use the Squeeze layer of the next Fire module of the target Fire module as the target network layer to utilize the next Fire The original input channel of the Squeeze layer of the module is pruned, and the convolution kernel of the Squeeze layer of the next Fire module is reconstructed;

相应的,所述装置还包括:Correspondingly, the device also includes:

第二筛选模块,用于利用所述目标Fire模块的Expand层的通道压缩比,对所述目标Fire模块的Expand层的原输入通道进行随机采样,获得第三输入通道筛选结果;The second screening module is used to utilize the channel compression ratio of the Expand layer of the target Fire module to randomly sample the original input channel of the Expand layer of the target Fire module to obtain the third input channel screening result;

所述通道剪枝模块还用于:利用所述第三输入通道筛选结果对所述目标Fire模块的Expand层中不同尺寸的卷积核的原输入通道进行剪枝;The channel pruning module is also used to: use the third input channel screening result to prune the original input channels of the convolution kernels of different sizes in the Expand layer of the target Fire module;

所述卷积核重构模块还用于:重构所述目标Fire模块的Expand层中不同尺寸的卷积核。The convolution kernel reconstruction module is also used for: reconstructing convolution kernels of different sizes in the Expand layer of the target Fire module.

参见图9,为本申请实施例公开的一种电子设备结构示意图,通过图9可以看出,该电子设备包括:Referring to FIG. 9, it is a schematic structural diagram of an electronic device disclosed in the embodiment of the present application. It can be seen from FIG. 9 that the electronic device includes:

存储器21,用于存储计算机程序;Memory 21, used to store computer programs;

处理器22,用于执行所述计算机程序时实现上述任意方法实施例所述的神经网络的剪枝方法的步骤。The processor 22 is configured to implement the steps of the neural network pruning method described in any method embodiment above when executing the computer program.

在本实施例中,设备可以是PC(Personal Computer,个人电脑),也可以是智能手机、平板电脑、掌上电脑、便携计算机等终端设备。In this embodiment, the device may be a PC (Personal Computer, personal computer), or may be a terminal device such as a smart phone, a tablet computer, a palmtop computer, or a portable computer.

该设备可以包括存储器21、处理器22和总线23。The device may include a memory 21 , a processor 22 and a bus 23 .

其中,存储器21至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、磁性存储器、磁盘、光盘等。存储器21在一些实施例中可以是设备的内部存储单元,例如该设备的硬盘。存储器21在另一些实施例中也可以是设备的外部存储设备,例如设备上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,存储器21还可以既包括 设备的内部存储单元也包括外部存储设备。存储器21不仅可以用于存储安装于设备的应用软件及各类数据,例如执行剪枝方法的程序代码等,还可以用于暂时地存储已经输出或者将要输出的数据。Wherein, the memory 21 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card type memory (for example, SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The storage 21 may be an internal storage unit of the device in some embodiments, such as a hard disk of the device. Memory 21 may also be an external storage device of the device in other embodiments, such as a plug-in hard disk equipped on the device, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, a flash memory card (Flash Card) etc. Further, the memory 21 may also include both an internal storage unit of the device and an external storage device. The memory 21 can not only be used to store application software and various data installed in the device, such as program codes for executing the pruning method, but also can be used to temporarily store data that has been output or will be output.

处理器22在一些实施例中可以是一中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器或其他数据处理芯片,用于运行存储器21中存储的程序代码或处理数据,例如执行剪枝方法的程序代码等。Processor 22 can be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor or other data processing chip in some embodiments, is used for running the program code stored in memory 21 or processing Data, such as the program code to execute the pruning method, etc.

该总线23可以是外设部件互连标准(peripheral component interconnect,简称PCI)总线或扩展工业标准结构(extended industry standard architecture,简称EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。为便于表示,图9中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The bus 23 may be a peripheral component interconnect standard (PCI for short) bus or an extended industry standard architecture (EISA for short) bus or the like. The bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in FIG. 9 , but it does not mean that there is only one bus or one type of bus.

进一步地,设备还可以包括网络接口24,网络接口24可选的可以包括有线接口和/或无线接口(如WI-FI接口、蓝牙接口等),通常用于在该设备与其他电子设备之间建立通信连接。Further, the device can also include a network interface 24, and the network interface 24 can optionally include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are usually used for communication between the device and other electronic devices Establish a communication connection.

可选地,该设备还可以包括用户接口25,用户接口25可以包括显示器(Display)、输入单元比如键盘(Keyboard),可选的用户接口25还可以包括标准的有线接口、无线接口。可选地,在一些实施例中,显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode,有机发光二极管)触摸器等。其中,显示器也可以适当的称为显示屏或显示单元,用于显示在设备中处理的信息以及用于显示可视化的用户界面。Optionally, the device may further include a user interface 25, which may include a display (Display), an input unit such as a keyboard (Keyboard), and the optional user interface 25 may also include a standard wired interface and a wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, and the like. Wherein, the display may also be properly referred to as a display screen or a display unit, and is used for displaying information processed in the device and for displaying a visualized user interface.

图9仅示出了具有组件21-25的设备,本领域技术人员可以理解的是,图9示出的结构并不构成对设备的限定,可以包括比图示更少或者更多的部件,或者组合某些部件,或者不同的部件布置。FIG. 9 only shows a device with components 21-25. Those skilled in the art can understand that the structure shown in FIG. 9 does not constitute a limitation on the device, and may include fewer or more components than shown in the figure. Or combine certain components, or different component arrangements.

本申请实施例还公开了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现上述任意方法实施例所述的神经网络的剪枝方法的步骤。The embodiment of the present application also discloses a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the clipping of the neural network described in any of the above-mentioned method embodiments is realized. The steps of the stick method.

其中,该存储介质可以包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。Wherein, the storage medium may include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, etc., which can store program codes. medium.

本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。Each embodiment in this specification is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same and similar parts of each embodiment can be referred to each other.

对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本申请。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本申请的精神或范围的情况下,在其它实施例中实现。因此,本申请将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the application. Therefore, the present application will not be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

一种神经网络的剪枝方法,其特征在于,包括:A kind of pruning method of neural network, is characterized in that, comprises: 确定神经网络中待剪枝的目标网络层;Determine the target network layer to be pruned in the neural network; 利用所述目标网络层的通道压缩比及每个原输入通道的卷积核权值确定所述目标网络层的通道核集;其中,所述通道核集内记载了保留输入通道;Using the channel compression ratio of the target network layer and the convolution kernel weight of each original input channel to determine the channel core set of the target network layer; wherein, the channel core set records the reserved input channel; 根据所述通道核集对所述目标网络层的原输入通道进行剪枝,并重构所述目标网络层的卷积核。pruning the original input channel of the target network layer according to the channel kernel set, and reconstructing the convolution kernel of the target network layer. 根据权利要求1所述的剪枝方法,其特征在于,所述利用所述目标网络层的通道压缩比及每个原输入通道的卷积核权值确定所述目标网络层的通道核集,包括:The pruning method according to claim 1, wherein the channel compression ratio of the target network layer and the convolution kernel weights of each original input channel are used to determine the channel core set of the target network layer, include: 根据所述目标网络层的原输入通道的第一数量及所述通道压缩比,确定所述目标网络层的保留输入通道的第二数量;determining a second number of reserved input channels of the target network layer according to the first number of original input channels of the target network layer and the channel compression ratio; 利用所述第一数量、所述第二数量以及每个原输入通道的卷积核权值的Frobenius范数,确定每个原输入通道的重要性值;Using the first quantity, the second quantity and the Frobenius norm of the convolution kernel weight of each original input channel to determine the importance value of each original input channel; 根据每个原输入通道的重要性值确定每个原输入通道的采样概率;Determine the sampling probability of each original input channel according to the importance value of each original input channel; 利用每个原输入通道的采样概率对每个原输入通道进行R轮独立采样,并根据采样结果生成与所述目标网络层对应的通道核集。R rounds of independent sampling are performed on each original input channel by using the sampling probability of each original input channel, and a channel core set corresponding to the target network layer is generated according to the sampling result. 根据权利要求2所述的剪枝方法,其特征在于,所述利用所述第一数量、所述第二数量以及每个原输入通道的卷积核权值的Frobenius范数,确定每个原输入通道的重要性值,包括:The pruning method according to claim 2, wherein the Frobenius norm of the convolution kernel weight of the first quantity, the second quantity and each original input channel is used to determine each original Enter the importance value of the channel, including: 利用所述第一数量及所述第二数量确定所述第一数量个加权系数;determining the first number of weighting coefficients by using the first number and the second number; 根据每个原输入通道的卷积核权值确定每个原输入通道的Frobenius范数值,并根据每个原输入通道的Frobenius范数值向每个原输入通道分配对应的加权系数;其中,Frobenius范数值越大的原输入通道分配的加权系数越大;Determine the Frobenius norm value of each original input channel according to the convolution kernel weight value of each original input channel, and assign corresponding weighting coefficients to each original input channel according to the Frobenius norm value of each original input channel; wherein, the Frobenius norm The weight coefficient assigned to the original input channel with a larger value is larger; 根据每个原输入通道的加权系数及每个原输入通道的初始重要性函数,确定每个原输入通道的重要性值。The importance value of each original input channel is determined according to the weight coefficient of each original input channel and the initial importance function of each original input channel. 根据权利要求1所述的剪枝方法,其特征在于,所述重构所述目标网络层的卷积核,包括:The pruning method according to claim 1, wherein said reconstruction of the convolution kernel of said target network layer comprises: 利用所述通道核集创建优化函数;所述优化函数为:Create an optimization function using the channel core set; the optimization function is:
Figure PCTCN2021134336-appb-100001
Figure PCTCN2021134336-appb-100001
其中,Y k是原卷积核在输出通道k的输出特征图,K是所述目标网络层的卷积核输出通道总数,
Figure PCTCN2021134336-appb-100002
代表分别计算卷积核的K个输出通道的特征图重构误差并求和,
Figure PCTCN2021134336-appb-100003
代表Frobenius范数,W ik代表卷积核在输入通道i和输出通道k的权值,
Figure PCTCN2021134336-appb-100004
为在通道核集
Figure PCTCN2021134336-appb-100005
中每个保留输入通道的输入数据x i经卷积核输出通道k的输出特征图之和,*代表卷积操作;
Among them, Y k is the output feature map of the original convolution kernel in the output channel k, K is the total number of convolution kernel output channels of the target network layer,
Figure PCTCN2021134336-appb-100002
Represents the feature map reconstruction errors of the K output channels of the convolution kernel respectively calculated and summed,
Figure PCTCN2021134336-appb-100003
Represents the Frobenius norm, Wi ik represents the weight of the convolution kernel in the input channel i and output channel k,
Figure PCTCN2021134336-appb-100004
for the channel kernel set
Figure PCTCN2021134336-appb-100005
The sum of the output feature maps of the input data x i of each reserved input channel through the convolution kernel output channel k, * represents the convolution operation;
最小化所述优化函数,对所述目标网络层的卷积核的权值进行更新。The optimization function is minimized, and the weight of the convolution kernel of the target network layer is updated.
一种基于ResNet下采样模块的剪枝方法,其特征在于,包括上述权利要求1至4中任意一项所述的剪枝方法;其中,所述确定神经网络中待剪枝的目标网络层包括:A pruning method based on the ResNet down-sampling module, characterized in that it includes the pruning method described in any one of claims 1 to 4; wherein, the target network layer to be pruned in the determined neural network includes : 将所述下采样模块中残差支路的中间层及输出层依次作为所述目标网络层,以对中间层及输出层的原输入通道进行剪枝,并重构中间层及输出层的卷积核;The middle layer and the output layer of the residual branch in the down-sampling module are sequentially used as the target network layer to prune the original input channels of the middle layer and the output layer, and reconstruct the volume of the middle layer and the output layer. Accumulation; 相应的,所述剪枝方法还包括:Correspondingly, the pruning method also includes: 利用所述下采样模块的输入层的通道压缩比,对所述输入层的原输入通道进行随机采样,获得第一输入通道筛选结果;Using the channel compression ratio of the input layer of the down-sampling module, random sampling is performed on the original input channel of the input layer to obtain the screening result of the first input channel; 利用所述第一输入通道筛选结果对所述下采样模块的输入层及下采样层的原输入通道进行剪枝,并将所述下采样模块的输入层及下采样层分别作为目标网络层进行卷积核重构。Use the screening result of the first input channel to prune the input layer of the down sampling module and the original input channel of the down sampling layer, and use the input layer and the down sampling layer of the down sampling module as the target network layer respectively Convolution kernel reconstruction. 一种基于ResNe残差模块的剪枝方法,其特征在于,包括上述权利要求1至4中任意一项所述的剪枝方法;其中,所述ResNet中包括堆叠的N个残差模块,所述确定神经网络中待剪枝的目标网络层包括:A pruning method based on a ResNe residual module, characterized in that it comprises the pruning method described in any one of claims 1 to 4; wherein, the ResNet includes stacked N residual modules, so The above-mentioned determination of the target network layer to be pruned in the neural network includes: 将N个残差模块中的每个残差模块的中间层及输出层依次作为所述目标网络层,以对中间层及输出层的原输入通道进行剪枝,并重构中间层及输出层的卷积核;The intermediate layer and the output layer of each residual module in the N residual modules are sequentially used as the target network layer to prune the original input channels of the intermediate layer and the output layer, and reconstruct the intermediate layer and the output layer The convolution kernel; 相应的,所述对中间层及输出层的原输入通道进行剪枝,并重构中间层及输出层的卷积核之后,所述剪枝方法还包括:Correspondingly, after pruning the original input channels of the intermediate layer and the output layer, and reconstructing the convolution kernels of the intermediate layer and the output layer, the pruning method further includes: 利用残差模块的输入层的通道压缩比,对残差模块的输入层的原输入通道进行随机采样,获得第二输入通道筛选结果;Using the channel compression ratio of the input layer of the residual module, random sampling is performed on the original input channel of the input layer of the residual module to obtain the screening result of the second input channel; 针对所述N个残差模块中的每个残差模块,均利用所述第二输入通道筛选结果对每个残差模块的输入层的原输入通道进行剪枝,并将所述残差模块的输入层作为目标网络层进行卷积核重构;利用所述第二输入通道筛选结果对所述残差模块的输出层的原输出通道进行剪枝,并将所述残差模块的输出层作为目标网络层进行卷积核重构。For each residual module in the N residual modules, the original input channel of the input layer of each residual module is pruned by using the screening result of the second input channel, and the residual module The input layer of the input layer is used as the target network layer to perform convolution kernel reconstruction; the original output channel of the output layer of the residual module is pruned using the screening result of the second input channel, and the output layer of the residual module is pruned Convolution kernel reconstruction is performed as the target network layer. 一种基于SqueezeNet的剪枝方法,其特征在于,包括上述权利要求1至4中任意一项所述的剪枝方法;其中,若对所述SqueezeNet的目标Fire模块进行剪枝,则所述确定神经网络中待剪枝的目标网络层包括:A kind of pruning method based on SqueezeNet, it is characterized in that, comprises the pruning method described in any one of above-mentioned claims 1 to 4; Wherein, if the target Fire module of described SqueezeNet is pruned, then described determination The target network layers to be pruned in the neural network include: 将所述目标Fire模块的下一Fire模块的Squeeze层作为所述目标网络层,以利用下一Fire模块的Squeeze层的原输入通道进行剪枝,并重构下一Fire模块的Squeeze层的卷积核;The Squeeze layer of the next Fire module of the target Fire module is used as the target network layer to prune with the original input channel of the Squeeze layer of the next Fire module, and reconstruct the volume of the Squeeze layer of the next Fire module Accumulation; 相应的,所述剪枝方法还包括:Correspondingly, the pruning method also includes: 根据所述下一Fire模块的Squeeze层的通道核集,对所述目标Fire模块的Expand层中不同尺寸的卷积核的原输出通道进行剪枝;According to the channel core set of the Squeeze layer of the next Fire module, the original output channels of the convolution kernels of different sizes in the Expand layer of the target Fire module are pruned; 利用所述目标Fire模块的Expand层的通道压缩比,对所述目标Fire模块的Expand层的原输入通道进行随机采样,获得第三输入通道筛选结果;Utilize the channel compression ratio of the Expand layer of described target Fire module, carry out random sampling to the original input channel of the Expand layer of described target Fire module, obtain the 3rd input channel screening result; 利用所述第三输入通道筛选结果对所述目标Fire模块的Expand层中不同尺寸的卷积核的原输入通道进行剪枝,并将所述目标Fire模块的Expand层作为目标网络层,对Expand层中不同尺寸的卷积核进行重构。Utilize the screening result of the third input channel to prune the original input channels of the convolution kernels of different sizes in the Expand layer of the target Fire module, and use the Expand layer of the target Fire module as the target network layer, to Expand The convolution kernels of different sizes in the layer are reconstructed. 一种神经网络的剪枝装置,其特征在于,包括:A neural network pruning device, characterized in that it comprises: 网络层确定模块,用于确定神经网络中待剪枝的目标网络层;The network layer determination module is used to determine the target network layer to be pruned in the neural network; 通道核集确定模块,用于利用所述目标网络层的通道压缩比及每个原输入通道的卷积核权值确定所述目标网络层的通道核集;其中,所述通道核集内记载了保留输入通道;A channel kernel set determination module, used to determine the channel kernel set of the target network layer by using the channel compression ratio of the target network layer and the convolution kernel weight of each original input channel; wherein, the channel kernel set records In order to reserve the input channel; 通道剪枝模块,用于根据所述通道核集对所述目标网络层的原输入通道进行剪枝;A channel pruning module, configured to prune the original input channel of the target network layer according to the channel core set; 卷积核重构模块,用于重构所述目标网络层的卷积核。The convolution kernel reconstruction module is used to reconstruct the convolution kernel of the target network layer. 一种电子设备,其特征在于,包括:An electronic device, characterized in that it comprises: 存储器,用于存储计算机程序;memory for storing computer programs; 处理器,用于执行所述计算机程序时实现如权利要求1至7任一项所述的神经网络的剪枝方法的步骤。A processor, configured to implement the steps of the neural network pruning method according to any one of claims 1 to 7 when executing the computer program. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述的神经网络的剪枝方法的步骤。A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the neural network according to any one of claims 1 to 7 is realized. Steps of the pruning method.
PCT/CN2021/134336 2021-07-29 2021-11-30 Method and apparatus for pruning neural network, and device and storage medium Ceased WO2023005085A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110866324.3A CN113705775B (en) 2021-07-29 2021-07-29 Pruning method, device, equipment and storage medium of neural network
CN202110866324.3 2021-07-29

Publications (1)

Publication Number Publication Date
WO2023005085A1 true WO2023005085A1 (en) 2023-02-02

Family

ID=78650919

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/134336 Ceased WO2023005085A1 (en) 2021-07-29 2021-11-30 Method and apparatus for pruning neural network, and device and storage medium

Country Status (2)

Country Link
CN (1) CN113705775B (en)
WO (1) WO2023005085A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116229166A (en) * 2023-02-23 2023-06-06 杭州飞步科技有限公司 Image classification method, training method, device and equipment based on classification model
CN116306884A (en) * 2023-03-03 2023-06-23 北京泰尔英福科技有限公司 Pruning method and device for federal learning model and nonvolatile storage medium
CN116402116A (en) * 2023-06-05 2023-07-07 山东云海国创云计算装备产业创新中心有限公司 Pruning method, system, equipment, medium and image processing method of neural network
CN116451771A (en) * 2023-06-14 2023-07-18 中诚华隆计算机技术有限公司 Image classification convolutional neural network compression method and core particle device data distribution method
CN116911358A (en) * 2023-07-24 2023-10-20 中国工商银行股份有限公司 Neural network pruning methods, devices, equipment, storage media and program products
CN116992943A (en) * 2023-09-27 2023-11-03 浪潮(北京)电子信息产业有限公司 Link pruning method, device, equipment and medium of deep neural network model
CN117649568A (en) * 2024-01-30 2024-03-05 之江实验室 A network compression method and device for image classification convolutional neural network
CN118506311A (en) * 2024-07-18 2024-08-16 中汽数据(天津)有限公司 Attention mechanism-based lane line detection method, attention mechanism-based lane line detection equipment and storage medium

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113705775B (en) * 2021-07-29 2024-10-01 浪潮电子信息产业股份有限公司 Pruning method, device, equipment and storage medium of neural network
CN114444657B (en) * 2021-12-30 2025-08-26 浪潮电子信息产业股份有限公司 Image processing method, system, device and readable storage medium
CN114626527B (en) * 2022-03-25 2024-02-09 中国电子产业工程有限公司 Neural network pruning method and device based on sparse constraint retraining
CN114882247B (en) * 2022-05-19 2024-08-23 东软睿驰汽车技术(沈阳)有限公司 Image processing method and device and electronic equipment
CN115034386A (en) * 2022-06-22 2022-09-09 北京三快在线科技有限公司 Service execution method, device, storage medium and electronic equipment
CN119886233B (en) * 2025-03-28 2025-06-20 浪潮电子信息产业股份有限公司 Network model lightweight method, device, equipment, medium and program product

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190340510A1 (en) * 2018-05-01 2019-11-07 Hewlett Packard Enterprise Development Lp Sparsifying neural network models
CN112116001A (en) * 2020-09-17 2020-12-22 苏州浪潮智能科技有限公司 Image recognition method, image recognition device and computer-readable storage medium
US20210089922A1 (en) * 2019-09-24 2021-03-25 Qualcomm Incorporated Joint pruning and quantization scheme for deep neural networks
CN113128664A (en) * 2021-03-16 2021-07-16 广东电力信息科技有限公司 Neural network compression method, device, electronic equipment and storage medium
CN113705775A (en) * 2021-07-29 2021-11-26 浪潮电子信息产业股份有限公司 Neural network pruning method, device, equipment and storage medium

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106886975B (en) * 2016-11-29 2019-07-02 华南理工大学 A real-time image stylization method
US12248877B2 (en) * 2018-05-23 2025-03-11 Movidius Ltd. Hybrid neural network pruning
CN111695375B (en) * 2019-03-13 2021-04-20 上海云从企业发展有限公司 Face recognition model compression method based on model distillation, medium and terminal
CN111738401A (en) * 2019-03-25 2020-10-02 北京三星通信技术研究有限公司 Model optimization method, packet compression method, corresponding device and equipment
CN110033083B (en) * 2019-03-29 2023-08-29 腾讯科技(深圳)有限公司 Convolutional neural network model compression method and device, storage medium and electronic device
US12039448B2 (en) * 2019-09-05 2024-07-16 Huawei Technologies Co., Ltd. Selective neural network pruning by masking filters using scaling factors
CN111079923B (en) * 2019-11-08 2023-10-13 中国科学院上海高等研究院 Spark convolutional neural network system suitable for edge computing platform and circuit thereof
CN112949814B (en) * 2019-11-26 2024-04-26 联合汽车电子有限公司 Compression and acceleration method and device of convolutional neural network and embedded device
CN111768372B (en) * 2020-06-12 2024-03-12 国网智能科技股份有限公司 A method and system for detecting foreign matter inside the cavity of GIS equipment
CN112052951B (en) * 2020-08-31 2024-07-16 北京中科慧眼科技有限公司 Pruning neural network method, system, equipment and readable storage medium
CN112836751A (en) * 2021-02-03 2021-05-25 歌尔股份有限公司 Target detection method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190340510A1 (en) * 2018-05-01 2019-11-07 Hewlett Packard Enterprise Development Lp Sparsifying neural network models
US20210089922A1 (en) * 2019-09-24 2021-03-25 Qualcomm Incorporated Joint pruning and quantization scheme for deep neural networks
CN112116001A (en) * 2020-09-17 2020-12-22 苏州浪潮智能科技有限公司 Image recognition method, image recognition device and computer-readable storage medium
CN113128664A (en) * 2021-03-16 2021-07-16 广东电力信息科技有限公司 Neural network compression method, device, electronic equipment and storage medium
CN113705775A (en) * 2021-07-29 2021-11-26 浪潮电子信息产业股份有限公司 Neural network pruning method, device, equipment and storage medium

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
YIN WEN-FENG , LIANG LING-YAN , PENG HUI-MIN , CAO QI-CHUN , ZHAO JIAN , DONG GANG , ZHAO YA-QIAN , ZHAO KUN: "Research Progress on Convolutional Neural Network Compression and Acceleration Technology", COMPUTER SYSTEMS & APPLICATIONS, vol. 29, no. 9, 15 September 2020 (2020-09-15), pages 16 - 25, XP093028237, DOI: 10.15888/j.cnki.csa.007632 *
YIN WENFENG; DONG GANG; ZHAO YAQIAN; LI RENGANG: "Coresets Application in Channel Pruning for Fast Neural Network Slimming", 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 18 July 2021 (2021-07-18), pages 1 - 8, XP033975004, DOI: 10.1109/IJCNN52387.2021.9533343 *
YIN WENFENG; DONG GANG; ZHAO YAQIAN; LI RENGANG: "Recommender for Channel Pruning with Weakly Supervised Meta Learning", 2020 7TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE), 18 December 2020 (2020-12-18), pages 1072 - 1076, XP033974558, DOI: 10.1109/ICISCE50968.2020.00218 *
YOU ZHONGHUI: "Research on Convolutional Neural Network Acceleration and Compression Methods", CHINESE MASTER'S THESES FULL-TEXT DATABASE, 1 July 2020 (2020-07-01), pages 1 - 81, XP093028238, DOI: 10.26929/d.cnki.gbeju.2020.000005 *
ZHONG JING: "Model Compression Technique for Deep Neural Network", CHINESE MASTER'S THESES FULL-TEXT DATABASE, 1 May 2019 (2019-05-01), pages 1 - 75, XP093028244, DOI: 10.27266/d.cnki.gqhau.2019.000304 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116229166A (en) * 2023-02-23 2023-06-06 杭州飞步科技有限公司 Image classification method, training method, device and equipment based on classification model
CN116306884B (en) * 2023-03-03 2024-02-06 北京泰尔英福科技有限公司 Pruning method and device for federal learning model and nonvolatile storage medium
CN116306884A (en) * 2023-03-03 2023-06-23 北京泰尔英福科技有限公司 Pruning method and device for federal learning model and nonvolatile storage medium
CN116402116A (en) * 2023-06-05 2023-07-07 山东云海国创云计算装备产业创新中心有限公司 Pruning method, system, equipment, medium and image processing method of neural network
CN116402116B (en) * 2023-06-05 2023-09-05 山东云海国创云计算装备产业创新中心有限公司 Pruning method, system, equipment, medium and image processing method of neural network
CN116451771A (en) * 2023-06-14 2023-07-18 中诚华隆计算机技术有限公司 Image classification convolutional neural network compression method and core particle device data distribution method
CN116451771B (en) * 2023-06-14 2023-09-15 中诚华隆计算机技术有限公司 Image classification convolutional neural network compression method and core particle device data distribution method
CN116911358A (en) * 2023-07-24 2023-10-20 中国工商银行股份有限公司 Neural network pruning methods, devices, equipment, storage media and program products
CN116992943A (en) * 2023-09-27 2023-11-03 浪潮(北京)电子信息产业有限公司 Link pruning method, device, equipment and medium of deep neural network model
CN116992943B (en) * 2023-09-27 2024-02-09 浪潮(北京)电子信息产业有限公司 Link pruning method, device, equipment and medium of deep neural network model
CN117649568A (en) * 2024-01-30 2024-03-05 之江实验室 A network compression method and device for image classification convolutional neural network
CN117649568B (en) * 2024-01-30 2024-05-03 之江实验室 A network compression method and device for image classification convolutional neural network
CN118506311A (en) * 2024-07-18 2024-08-16 中汽数据(天津)有限公司 Attention mechanism-based lane line detection method, attention mechanism-based lane line detection equipment and storage medium

Also Published As

Publication number Publication date
CN113705775A (en) 2021-11-26
CN113705775B (en) 2024-10-01

Similar Documents

Publication Publication Date Title
WO2023005085A1 (en) Method and apparatus for pruning neural network, and device and storage medium
CN112613581B (en) Image recognition method, system, computer equipment and storage medium
CN114503125B (en) Structured pruning method, system and computer-readable medium
CN112116001B (en) Image recognition method, image recognition device and computer-readable storage medium
US20210042580A1 (en) Model training method and apparatus for image recognition, network device, and storage medium
CN109671020B (en) Image processing methods, devices, electronic equipment and computer storage media
CN112016450B (en) Training method and device of machine learning model and electronic equipment
WO2020233130A1 (en) Deep neural network compression method and related device
WO2022042123A1 (en) Image recognition model generation method and apparatus, computer device and storage medium
CN112906865A (en) Neural network architecture searching method and device, electronic equipment and storage medium
CN111222557A (en) Image classification method, device, storage medium and electronic device
CN111144296B (en) Retina fundus picture classification method based on improved CNN model
CN111985597A (en) Model compression method and device
CN114677548A (en) A neural network image classification system and method based on resistive memory
CN112819157B (en) Neural network training method and device, intelligent driving control method and device
CN114817478A (en) Text-based question answering method, device, computer equipment and storage medium
WO2024250622A1 (en) Compilation optimization method and apparatus based on neural network model, and electronic device
CN117275086A (en) Gesture recognition method, device, computer equipment, storage medium
WO2019079994A1 (en) Core scheduling method and terminal
CN119763121A (en) A content generation method, device, equipment, storage medium and product
CN113139466A (en) Image identification method based on single hidden layer neural network and related equipment
CN118312872A (en) Motor imagery electroencephalogram signal classification method, system, electronic equipment and medium
CN114925821B (en) A neural network model compression method and related system
CN117131908A (en) A compression method for deep convolutional neural networks based on annealing attenuation
CN112784818B (en) Identification method based on grouping type active learning on optical remote sensing image

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21951657

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21951657

Country of ref document: EP

Kind code of ref document: A1