[go: up one dir, main page]

WO2024230358A1 - Model processing method, electronic device and medium - Google Patents

Model processing method, electronic device and medium Download PDF

Info

Publication number
WO2024230358A1
WO2024230358A1 PCT/CN2024/084957 CN2024084957W WO2024230358A1 WO 2024230358 A1 WO2024230358 A1 WO 2024230358A1 CN 2024084957 W CN2024084957 W CN 2024084957W WO 2024230358 A1 WO2024230358 A1 WO 2024230358A1
Authority
WO
WIPO (PCT)
Prior art keywords
weight
network model
pruned
neural network
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/CN2024/084957
Other languages
French (fr)
Chinese (zh)
Inventor
田雨川
陈汉亭
郭天宇
王云鹤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of WO2024230358A1 publication Critical patent/WO2024230358A1/en
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the field of computer technology, and in particular to a model processing method, electronic equipment and medium.
  • Pruning as a commonly used method for compressing models, is mainly divided into structured pruning and unstructured pruning.
  • structured pruning uses the filter in the convolutional layer of the convolutional neural network as the pruning unit, changes the filter group and the number of feature channels in the network, and reduces the width of the convolutional neural network.
  • Unstructured pruning uses a single weight in the convolutional neural network as the pruning unit and performs fine-grained pruning.
  • the model parameters are pruned to the greatest extent possible so that the pruned model can be deployed on small edge devices.
  • the large-scale reduction of parameters will cause the model to become highly sparse, resulting in a reduction in the model's representation capabilities. For example, some features of the feature graph cannot be extracted, which in turn reduces the accuracy of the model.
  • the present application provides a model processing method, electronic device and medium.
  • the present application provides a model processing method, which can be used in electronic devices, the model processing method comprising obtaining a first neural network model, wherein the first neural network model is obtained by pruning a neural network model to be pruned, and pruning comprises modifying a first weight set of the neural network model to be pruned to a second weight set of the first neural network model, wherein a non-zero first weight in the first weight set corresponds to a value of zero in the second weight set.
  • a target weight set that meets the weight condition is selected from the second weight set.
  • Parameter adjustment processing is performed on the target weight set in the second weight set to obtain a third weight set, wherein the value of the first weight in the third weight set is non-zero.
  • the third weight set is used for the first neural network model.
  • the third weight set in the pruned weights of the converged model can be determined, that is, the important weights that can improve the model accuracy, and then the important weights are grown and reapplied to the pruned model, and the model is retrained based on the restored important weights and the unpruned weights in the pruned model until the model converges again, and the converged model is obtained.
  • the model memory can be optimized to a certain extent while effectively ensuring the model accuracy.
  • modifying the first weight set of the neural network model to be pruned to the second weight set of the first neural network model may refer to setting the first weight set of the pruned neural network model to zero, that is, achieving pruning of the first weight set.
  • target weight set in the second weight set may refer to the important weights screened out from the pruned weights mentioned in the present application.
  • growing the important weights may be restoring the important weights.
  • the weights that are set to zero may be adjusted to a positive number, or the weights that are set to zero may be adjusted to a negative number.
  • a target weight set that meets the weight condition is selected from the second weight set, including: restoring any weight in the second weight set to obtain a candidate neural network model; obtaining the gradient of the loss function corresponding to the first neural network model and the loss function corresponding to the candidate neural network model; and determining the target weight set based on the gradient of the loss function corresponding to the first neural network model and the loss function corresponding to the candidate neural network model after restoring each weight in the second weight set.
  • the weight condition can be that when the weight is applied to the pruned model alone, the current loss function corresponding to the model has a larger gradient (i.e., a decrease) than the loss function corresponding to the pruned model before application.
  • obtaining the loss function corresponding to the first neural network model includes: determining the first loss function based on the real information corresponding to the training data set and the predicted information output by the first neural network model; The candidate neural network model determines the second loss function; and the loss function corresponding to the first neural network model is determined based on the first loss function and the second loss function.
  • the first loss function can represent the accuracy of the first neural network model in achieving the preset task
  • the second loss function can represent the degree of difference between the weight matrix in the first network model to be pruned and the target low-rank approximation matrix corresponding to the weight matrix
  • the first loss function used to represent the accuracy of the first neural network model in implementing the preset task can be different.
  • Commonly used first loss functions can include 0-1 loss function, absolute value loss function, logarithmic loss function, square loss function, exponential loss function, cost loss function, cross entropy loss function, etc.
  • obtaining a first loss function corresponding to the first neural network model includes: inputting a training data set into the first neural network model to obtain prediction information output by the first neural network model; and obtaining the first loss function based on real information and prediction information corresponding to the training data set.
  • obtaining a second loss function corresponding to the first neural network model includes: obtaining a set of weight matrices in the network model to be pruned; determining a target low-rank approximation matrix corresponding to each weight matrix in the weight matrix set; and obtaining the second loss function corresponding to the first neural network model based on each weight matrix and the target low-rank approximation matrix corresponding to each weight matrix.
  • the target low-rank approximation matrix corresponding to each weight matrix in the weight matrix set is determined, including: normalizing each weight matrix to obtain a weight matrix to be decomposed corresponding to each weight matrix; performing low-rank decomposition processing on the weight matrix to be decomposed corresponding to each weight matrix to obtain a target low-rank approximation matrix corresponding to each weight matrix.
  • the target low-rank approximation matrix can be a low-rank decomposition of each weight matrix in the pruned network model, and a matrix composed of the multiplication of multiple sub-weight matrices with small parameters, simplicity and low rank can be obtained.
  • the degree of difference between each weight matrix and the target low-rank approximation matrix in the first neural network model is determined to determine the important weights, which can improve the rank of the weight matrix, reduce the omission of feature extraction, and thus improve the accuracy of the model.
  • pruning the neural network model to be pruned to obtain a first neural network model includes: pruning the network model to be pruned based on the amplitude of each weight in the network model to be pruned and a preset sparsity rate to obtain the first neural network model.
  • the network model to be pruned is pruned based on the amplitude of each weight in the network model to be pruned and the preset sparsity rate to obtain a first neural network model, including: based on the amplitude of each weight in the network model to be pruned and the preset sparsity rate, determining the weights corresponding to the preset sparsity rate in the network model to be pruned as pruning weights; setting the values corresponding to the pruning weights to zero to obtain the first neural network model.
  • the network model to be pruned has m weights, and when the preset sparsity rate corresponding to the current processing is ⁇ , the m weights can be sorted from large to small based on the amplitude (absolute value) of the m weights, and the m ⁇ weights in the sequence are determined as the weights in the first weight subset in order from large to small.
  • the weights in the first weight set are pruned, that is, all the weights in the first weight set are set to zero, and the first neural network model after the network model to be pruned is obtained.
  • the number of parameters of the network model to be pruned can be reduced. For example, reducing the parameter amount m to m-m ⁇ can improve the model sparsity rate, so that the pruned model can be deployed on a small edge device.
  • pruning the neural network model to be pruned to obtain a first neural network model includes: pruning the network model to be pruned based on the amplitude of each weight in the network model to be pruned, a preset sparsity rate, and a preset growth rate to obtain the first neural network model.
  • a neural network model to be pruned is pruned to obtain a first neural network model, including: based on the amplitude of each weight in the network model to be pruned and a preset sparsity rate, determining the weights corresponding to the preset sparsity rate in the network model to be pruned as a first weight subset; based on the amplitude of each weight other than the first weight subset in the network model to be pruned and a preset growth rate, determining the weights corresponding to the preset growth rate in the network model to be pruned other than the first weight subset as a second weight subset; setting the values corresponding to the weights in the first weight subset and the second weight subset to zero to obtain the first neural network model.
  • the network model to be pruned has m weights, the preset sparsity rate corresponding to the next process is ⁇ , and the preset growth rate corresponding to the next process is ⁇ .
  • the m weights can be sorted from large to small based on the amplitude (absolute value) of the m weights, and the m ⁇ weights in the sequence are determined as the weights in the first weight subset in order from large to small. Then, the weights can be sorted based on the amplitude (absolute value) of the m ⁇ non-zero weights.
  • the unset-to-zero weights are sorted from large to small (value), and the m ⁇ weights in the sequence are also determined as the weights in the second weight subset in order from large to small. Then, the union of the first weight subset and the second weight subset can be determined as the first weight set, and the weights in the first weight set can be pruned, that is, all the weights in the first weight set are set to zero, so that on the basis of pruning the first weight subset corresponding to the preset sparsity rate, the second weight subset is additionally pruned to obtain the first neural network model after the pruned network model is pruned.
  • the number of parameters of the network model to be pruned can be reduced. For example, reducing the parameter amount m to m-m ⁇ -m ⁇ can further improve the model sparsity rate, so that the pruned model can be deployed on a small edge device.
  • the present application provides an electronic device, comprising: a memory for storing instructions executed by one or more processors of the electronic device, and a processor, which is one of the one or more processors of the electronic device, for executing the model processing method mentioned in the present application.
  • the present application provides a readable storage medium having instructions stored thereon, which, when executed on an electronic device, enables the electronic device to execute the model processing method mentioned in the present application.
  • FIG1 shows a schematic diagram of an application scenario according to some embodiments of the present application.
  • FIG2 is a schematic diagram of a framework of a pruning method for a model
  • FIG3 is a flow chart of a pruning method that can be applied to model processing
  • FIG4 is a schematic diagram of a model processing method provided in an embodiment of the present application.
  • FIG5 is a flow chart of a method for determining a target weight set provided in an embodiment of the present application.
  • FIG6 is a schematic diagram of a process for obtaining a second loss function of a network model to be pruned provided in an embodiment of the present application
  • FIG7 is a curve diagram showing the computing power consumption and accuracy of the comprehensive model of the model processing method provided in an embodiment of the present application.
  • FIG. 8 shows a schematic diagram of the hardware structure of an electronic device.
  • the illustrative embodiments of the present application include but are not limited to a model-based processing method, an electronic device, and a medium.
  • model processing method can be used in neural network models used in tasks such as image recognition, target detection, reinforcement learning, and semantic analysis.
  • the types of neural network models are not limited to convolutional neural networks, Transformer (a model based on a multi-head attention mechanism), recurrent neural networks (RNN), long short-term memory neural networks (LSTM), and other arbitrary neural network models.
  • the server 10 can prune the neural network model 11 for image recognition and then send it to the small edge device 20, so that the small edge device 20 can perform data processing based on the pruned neural network model.
  • the small edge device can be any implementable electronic device such as a mobile phone, a camera, etc.
  • FIG2 is a schematic diagram of a framework of a pruning method for a model. As shown in FIG2 , the pruning method may include:
  • a feature map is obtained.
  • a sampled image can be input into a convolutional neural network, and feature extraction processing can be performed on the sampled image to obtain a feature map.
  • a convolutional neural network can include a convolutional layer and a fully connected layer.
  • a convolution layer is selected to perform pruning on it.
  • each filter performs filtering on the input first feature map and outputs a second feature map.
  • the first feature map is the feature map output by the previous convolution layer of the current convolution layer.
  • the filter, the first feature map and the second feature map are in one-to-one correspondence.
  • the matrix rank corresponding to the second feature map is determined based on the parameters in the second feature map. It can be understood that when a matrix is an m ⁇ n matrix, the maximum rank of the matrix is the smaller of m and n, expressed as min(m,n). When the matrix rank is relatively large, it can be proved that there is a lot of redundant information in the feature map.
  • the matrix rank corresponding to the second feature map output by each filter is sorted, and the convolutional neural network is pruned based on the sorting result, and the filter corresponding to the second feature map whose matrix rank is less than the preset matrix rank is cut off.
  • all the parameters in the filter can be set to zero.
  • the pruned convolutional neural network is fine-tuned to obtain the final convolutional neural network for prediction.
  • this pruning scheme uses the feature graph as a whole to determine the matrix rank to prune the filter, it can only be applied to the structured pruning of the model. It cannot be applied to the unstructured pruning of the model, that is, it cannot achieve fine-grained pruning of the model and cannot optimize the memory optimization of the model.
  • Fig. 3 is a flow chart of a pruning method of a model. As shown in Fig. 3, the pruning method may include:
  • the weight of each filter in each convolution layer of the convolution neural network is obtained.
  • the convolution neural network is fine-tuned, and then the next convolution layer is pruned. For example, after the l-th convolution layer is pruned, the l+1-th convolution layer can be pruned. Since this pruning scheme uses the three-dimensional tensor of the filter weights as a whole to determine the rank to prune the filter, it cannot be applied to the unstructured pruning of the model, that is, it cannot achieve fine-grained pruning of the model and cannot achieve optimal memory optimization of the model.
  • two unstructured pruning methods are provided, namely, a method based on a preset pruning process and a method based on a preset pruning standard.
  • the method based on the preset pruning process can achieve better pruning effects through the pruning process of preset weight pruning and preset growth weight.
  • the preset pruning process includes gradual pruning, "from sparse to sparse” pruning that randomly initializes a sparse network, “compression/decompression” pruning that alternately trains the network in dense and sparse states, and so on. For example, gradual pruning performs pruning once every ⁇ T training iterations, slowly increases the sparsity rate of the model until the target sparsity rate is reached and pruning ends.
  • the preset pruning standard selects redundant weights from the weight matrix for pruning by presetting the weight importance.
  • the method based on preset pruning standards implements pruning through preset pruning standards, wherein the pruning standards generally include those based on the absolute value of the weight, based on the weight gradient, based on the importance score of the weight obtained by self-learning, and the like.
  • both the method based on the preset pruning process and the method based on the preset pruning standard achieve pruning by pruning the model parameters to the greatest extent possible.
  • a large-scale reduction in the number of parameters will cause the model to become highly sparse, resulting in a reduction in the model's representation ability. For example, some features of the feature map cannot be extracted, thereby reducing the accuracy of the model.
  • the present application provides a model processing method. After obtaining a converged model and pruning the weights in the converged model, the important weights in the pruned weights of the converged model that can improve the model accuracy can be determined. Then, the important weights are grown, that is, the important weights are restored and reapplied to the pruned model. The model is retrained based on the restored important weights and the unpruned weights in the pruned model until the model converges again, and the converged model is obtained. In this way, the model memory can be optimized to a certain extent while effectively ensuring the model accuracy.
  • each important weight can be a weight with a larger gradient (i.e., a decrease range) of the current loss function corresponding to the model and the loss function corresponding to the pruned model before the weight is applied alone when the weight is applied alone to the pruned model.
  • the larger the gradient the smaller the loss function value obtained after the weight is applied alone to the pruned model, that is, the higher the degree of model convergence, that is, the higher the accuracy of the model. For example, selecting a preset number of weights with larger corresponding gradients among the pruned weights as important weights can effectively ensure the accuracy of the model.
  • the important weights can be determined by restoring each of the pruned weights in turn. For example, first randomly select a weight A from the pruned weights, restore the weight A, and perform model training based on the currently restored weight A and the unpruned weights, obtain the current loss function corresponding to the model, and obtain the gradient (i.e., the degree of decrease) of the current loss function and the loss function corresponding to the aforementioned pruned model, and use the gradient as the gradient corresponding to weight A.
  • next weight B for recovery randomly select the next weight B for recovery, and perform model training based on the currently recovered weight B and the unpruned weights to obtain the current loss function corresponding to the model, and obtain the gradient of the current loss function and the loss function corresponding to the aforementioned pruned model, and obtain the gradient corresponding to weight B; that is, obtain the gradients corresponding to each weight in the pruned weights in turn based on the above method, sort the pruned weights based on the gradient size, and select the preset number of weights with larger corresponding gradients as important weights.
  • a certain number of weights can be pruned additionally on the basis of the number of weight prunings that meet the preset sparsity rate of the model, and the number of important weights that are grown is ensured to be the same as the number of weights pruned additionally, so that It can ensure the accuracy of the model without changing the sparsity rate of the model.
  • the model sparsity rate increases due to the growth of some important weights, and does not reach the preset sparsity rate.
  • the above-mentioned pruning steps and weight growing steps may be re-executed until the model converges again and the sparsity rate of the model reaches the preset sparsity rate. In this way, the model sparsity rate can also be guaranteed.
  • the preset model sparsity rate is 60%. Due to the growth of 20% of important weights, the model sparsity rate increases to 80%. At this time, after the model converges, a second pruning can be performed. For example, the number of weights pruned reduces the model sparsity rate to 40%. At this time, the model sparsity rate reaches 40%. Then, 20% of important weights are grown. When the model converges again, the preset model sparsity rate of 60% can be reached. Among them, the pruning ratio and important weight growth ratio of the model each time can be determined according to actual needs.
  • FIG4 is a schematic diagram of a model processing method provided in the embodiment of the present application.
  • the model processing method shown in FIG4 can be executed by an electronic device.
  • the model processing method may include:
  • the network model to be pruned may include multiple convolutional layers, and each convolutional layer in the multiple convolutional layers may have layer identification information, such as identity identification information that characterizes the identity of the convolutional layer.
  • the layer identification information of each convolutional layer may be unique, that is, the layer identification information of each convolutional layer is different.
  • the network model to be pruned may include n convolutional layers, and the layer identification information of the convolutional layers may be Conv1, Conv2...Conv n.
  • Each convolutional layer may include multiple convolutional kernels, and each convolutional kernel in the multiple convolutional kernels may have kernel identification information, such as identity identification information that characterizes the identity of the convolutional kernel.
  • the kernel identification information of each convolutional kernel may be unique, that is, the kernel identification information of each convolutional kernel is different.
  • the convolutional layer Conv1 may include m convolutional kernels, and the kernel identification information of the convolutional kernel may be W1, W2...Wm.
  • Each convolutional kernel may include multiple weights (parameters), and the weights in each convolutional kernel may be all the same, all different, or partially the same.
  • the multiple weights in each convolution kernel can be represented in the form of a matrix, that is, the multiple weights in each convolution kernel can be represented by a weight matrix, and each weight in the weight matrix can be used to extract and enhance the features of image data and audio data.
  • the network model to be pruned may include but is not limited to any one of a convolutional neural network, a Transformer (a model based on a multi-head attention mechanism), a recurrent neural network (RNN), and a long short-term memory neural network (LSTM). It is understood that the network model to be pruned may be a neural network model used to implement tasks such as image recognition, target detection, reinforcement learning, and semantic analysis. In some optional implementations, the network model to be pruned may be a visual model for implementing various visual tasks.
  • the training data set may be an image data set or an audio data set.
  • the data in the training data set may have annotation information, and the annotation information may be information that matches the task of the network model to be pruned.
  • the annotation information may be the annotated category information.
  • the annotation information may be the annotated category information and location information.
  • the network model to be pruned has a large number of parameters and is difficult to deploy on a small edge device, it is necessary to minimize the number of parameters of the network model to be pruned (that is, reset the largest number of weights in the network model to be pruned to zero) to achieve maximum compression of the network model to be pruned, so that the pruned model can be deployed on a small edge device.
  • a first weight set is determined from the network model to be pruned based on the amplitude (absolute value) of each weight in the network model to be pruned, the preset sparsity rate corresponding to the current training process, and the preset growth rate corresponding to the current training process.
  • the weights in the first weight set may be located in the same convolution kernel, that is, belong to the same weight matrix, or may be located in different convolution kernels, that is, belong to different weight matrices.
  • the weights in the first weight set may be pruned, that is, the weights in the first weight set are reset to zero to obtain a second weight set.
  • a first subset of weights may be determined from the network model to be pruned based on the amplitude (absolute value) of each weight in the network model to be pruned and a preset sparsity rate corresponding to the current training process
  • a second subset of weights may be determined from the network model to be pruned based on the amplitude (absolute value) of each weight in the network model to be pruned and a preset growth rate corresponding to the current training process. Then, the union of the first subset of weights and the second subset of weights may be determined as the first set of weights, and the first set of weights may be modified to the second set of weights to obtain a first neural network model.
  • the network model to be pruned has m weights, and the preset sparsity rate corresponding to the processing is ⁇ , and the preset growth rate corresponding to the processing is ⁇ .
  • the m weights can be sorted from large to small based on the amplitude (absolute value) of the m weights, and the m ⁇ weights in the sequence are determined as the weights in the first weight subset in order from large to small.
  • the non-zero weights can be sorted from large to small based on the amplitude (absolute value) of the m-m ⁇ non-zero weights, and the m ⁇ weights in the sequence are also determined as the weights in the second weight subset in order from large to small.
  • the union of the first weight subset and the second weight subset can be determined as the first weight set, and the weights in the first weight set can be pruned, that is, all the weights in the first weight set are set to zero, so that on the basis of pruning the first weight subset corresponding to the preset sparsity rate, the second weight subset is additionally pruned to obtain the first neural network model after the network model to be pruned is pruned.
  • a target weight set that meets preset conditions can be determined from the second weight set based on the first loss function and the second loss function.
  • the first loss function can be a loss function that represents the accuracy of the first network model to be pruned in achieving a preset task
  • the second loss function can be a loss function that represents the different degrees of the weight matrix in the first network model to be pruned and the target low-rank approximation matrix corresponding to the weight matrix.
  • the degree of influence of the change of each weight in the second weight set on the change amplitude of the first loss function value and the second loss function value can be determined, and then the weight with a large degree of influence can be determined as the target weight to obtain the target weight set.
  • adjusting the target weights in the target weight set to the third weight set can be to expand the absolute value of the target weights in the target weight set, that is, to adjust the zeroed weights to positive numbers, or to adjust the zeroed weights to negative numbers, thereby ensuring that important weights that have a significant impact on the accuracy of the pruned network model can be reactivated.
  • the training data set can be input into the second neural network model, and the second neural network model can be trained to obtain the target neural network model.
  • the training data set can be input into the second neural network model, and the training data set can be feature extracted based on the weights in the second neural network model, and prediction information can be output.
  • the weights in the second neural network model include a third weight set. After the second neural network model outputs the prediction information of the training data set, the weights in the third weight set in the second neural network model are adjusted based on the annotation information and prediction information of the training data set until the number of iterations is greater than the preset number, and the target neural network model is obtained.
  • Figure 5 is a flow chart of a method for determining the target weight set provided by the embodiment of the present application. As shown in Figure 5, the steps of determining the target weight set may include:
  • the accuracy of the first neural network model in achieving the preset task can be determined, that is, the accuracy of the prediction information output by the first neural network model for predicting the training data set can be determined, and the accuracy can be represented by the error between the labeling information of the training data set and the prediction information.
  • a first loss function can be used to represent the accuracy of the first neural network model in achieving the preset task. The smaller the value of the first loss function, that is, the smaller the error between the labeling information of the training data set and the prediction information, the higher the accuracy of the first neural network model in achieving the preset task.
  • the first loss function used to represent the accuracy of the first neural network model in implementing the preset task may be different.
  • Commonly used first loss functions may include 0-1 loss function, absolute value loss function, logarithmic loss function, square loss function, exponential loss function, cost loss function, cross entropy loss function, etc.
  • the cross entropy loss function can be obtained as the first loss function
  • the cost loss function can be obtained as the first loss function.
  • the network model to be pruned is a neural network model used to achieve the preset task, and the number of weights (parameters) There are more redundant weights, so the redundant weights in the network model to be pruned can be reduced (for example, the network model to be pruned is pruned, that is, some weights in the network model to be pruned are reset to zero) to reduce the number of parameters of the network model to be pruned and achieve compression of the network model to be pruned.
  • the redundant weights in the network model to be pruned are reduced, so the weight matrix in the network model to be pruned before and after compression changes, and the rank of the weight matrix after compression may be smaller than the rank of the weight matrix before compression (for example, the weights of a row of a weight matrix in the network model to be pruned are all set to zero).
  • the low-rank weight matrix is used to extract features from the data in the training data set, feature extraction omissions will be caused, which will lead to errors between the prediction information output by the network model to be pruned before and after compression to predict the training data set.
  • the accuracy of the preset task achieved by the compressed network model to be pruned is lower than the accuracy of the preset task achieved by the network model to be pruned before compression.
  • each weight matrix in the pruned network model can be subjected to low-rank decomposition, and a target low-rank approximation matrix can be obtained by multiplying multiple sub-weight matrices with small parameter amounts, simplicity and low rank.
  • a target low-rank approximation matrix can be obtained by multiplying multiple sub-weight matrices with small parameter amounts, simplicity and low rank.
  • a target low-rank approximation matrix by multiplying sub-weight matrices U m ⁇ k , sub-weight matrices ⁇ k ⁇ k and sub-weight matrices V T k ⁇ n , where k ⁇ n, the number of parameters of the weight matrix A m ⁇ n is m ⁇ n, and the number of parameters of the target low-rank approximation matrix is k ⁇ (m+n+1).
  • the target low-rank approximation matrix corresponding to the weight matrix is much smaller than the corresponding weight matrix, due to the low rank of the sub-weight moments in the target low-rank approximation matrix, when the target low-rank approximation matrix corresponding to the weight matrix is used to extract features from the data in the training data set, feature extraction omissions will occur.
  • the difference between the weight matrix and the target low-rank approximation matrix corresponding to the weight matrix can be increased, the rank of the compressed weight matrix can be increased, and the feature extraction omission can be reduced, thereby reducing the error between the prediction information output by predicting the training data set using the network model to be pruned before and after compression, and ensuring the accuracy of the compressed network model to be pruned in achieving the preset task.
  • a second loss function can be used to represent the difference between the weight matrix and the target low-rank approximation matrix corresponding to the weight matrix.
  • the error between the annotation information and the prediction information of the training data set can be adjusted, that is, the value of the first loss function is changed, and the degree of difference between the weight matrix and the target low-rank approximation matrix corresponding to the weight matrix can be adjusted to improve the rank of the weight matrix. Therefore, by pruning the network model to be pruned to obtain the first neural network model, the first loss function value and the second loss function value can be adjusted at the same time, so as to reduce the number of parameters of the network model to be pruned while improving the rank of the weight matrix in the network model to be pruned, reduce the omission of feature extraction, and ensure the accuracy of the network model to be pruned in achieving the preset task.
  • the target weight set i.e., important weights
  • the target weight set can be determined from the second weight set for growth processing to ensure that the important weights that have a greater impact on the variation range of the first loss function value and the variation range of the second loss value will not be pruned incorrectly, so that in the subsequent process of adjusting the weights of the first neural network model, the important weights can be adjusted to change the variation range of the first loss function value and the second loss function value, so that the first neural network model converges, so as to reduce the number of parameters of the network model to be pruned while improving the rank of the weight matrix in the network model to be pruned, reducing the omission of feature extraction, and ensuring the accuracy of the network model to be pruned in achieving the preset tasks.
  • the target loss function can be determined according to the sum of the first loss function and the second loss function, and the important weights (i.e., the target weight set) that have a greater impact on the change range of the target loss function value can be determined, that is, the important weights that have a greater impact on the change range of the first loss function value and the change range of the second loss function value can be determined.
  • the important weights that have a greater impact on the change range of the first loss function value and the second loss function value can also be used.
  • the target weights in the target weight set are adjusted so that the rank of the weight matrix in the network model to be pruned is improved while reducing the number of parameters of the network model to be pruned, reducing the omission of feature extraction, and ensuring the accuracy of the network model to be pruned in achieving the preset task.
  • L may represent the target loss function
  • L task may represent the first loss function
  • L rank may represent the second loss function
  • may represent the introduced linear combination hyperparameter
  • the target loss function can be a function with multiple weights in the network model to be pruned as independent variables.
  • the derivative corresponding to each weight can be obtained.
  • the derivative can reflect the degree of influence of the change in weight on the change in the value of the target loss function. The larger the derivative of the weight, the greater the influence of the change in weight on the change in the value of the target loss function.
  • the amplitudes (absolute values) of the derivatives corresponding to each weight in the second weight set can be sorted from large to small, and the weights corresponding to the m ⁇ derivatives in the sequence can be determined as the target weight set in order from large to small.
  • FIG6 is a schematic diagram of a process for obtaining the second loss function of the first neural network model provided by the embodiment of the present application. As shown in FIG6, the steps of obtaining the second loss function of the first neural network model may include:
  • the Frobenius norm of each weight matrix in the network model to be pruned can be determined based on the Frobenius norm calculation formula, and then each weight matrix in the network model to be pruned can be divided by the Frobenius norm of each weight matrix to achieve L2 normalization of each weight matrix in the network model to be pruned, and obtain the weight matrix to be decomposed corresponding to each weight matrix in the network model to be pruned.
  • the Frobenius Norm calculation formula may be in the form shown in Formula 2:
  • F may represent the Frobeus norm of each weight matrix in the network model to be pruned
  • W may represent each weight matrix in the network model to be pruned
  • WT may represent the transpose of each weight matrix in the network model to be pruned
  • tr( ⁇ ) may represent the trace of ⁇ .
  • the weight matrix to be decomposed corresponding to each weight matrix in the network model to be pruned may be in the form shown in Formula 3:
  • F can represent the Frobeus norm of each weight matrix in the network model to be pruned
  • W can represent each weight matrix in the network model to be pruned. It can represent the weight matrix to be decomposed.
  • the weight matrix to be decomposed corresponding to each weight matrix can be subjected to singular value decomposition to obtain the eigenvector set and singular value set corresponding to each weight matrix in the network model to be pruned. Then, the left singular matrix, singular value matrix and right singular matrix can be determined based on the eigenvector set and singular value set corresponding to each weight matrix in the network model to be pruned, and the left singular matrix, singular value matrix and right singular matrix are multiplied as the target low-rank approximation matrix corresponding to the weight matrix.
  • the transpose of the weight matrix to be decomposed can be determined, and multiple eigenvalues and eigenvectors can be determined based on the weight matrix to be decomposed and the transpose of the weight matrix to be decomposed. Then, the left singular matrix and the right singular matrix corresponding to the weight matrix can be determined based on the eigenvectors, and the singular value matrix can be determined based on the eigenvalues.
  • multiple eigenvalues and eigenvectors can be determined based on the product of the weight matrix to be decomposed and the transpose of the weight matrix to be decomposed, and the left singular matrix can be determined based on the determined multiple eigenvectors.
  • Multiple eigenvalues can be determined based on the transpose of the weight matrix to be decomposed and the product of the weight matrix to be decomposed.
  • the right singular matrix is determined based on the determined multiple eigenvectors.
  • the square root of all eigenvalues can be determined as the singular value corresponding to each weight matrix in the network model to be pruned, and the singular value set corresponding to each weight matrix in the network model to be pruned is obtained. Then, the singular value matrix can be determined based on the singular value set.
  • the candidate singular value set can be determined from the singular value set corresponding to each weight matrix of the network model to be pruned. For example, the largest k singular values can be selected from the singular value set corresponding to each weight matrix of the network model to be pruned according to the matrix low rank approximation principle (Eckart-Young) as the candidate singular value set corresponding to each weight matrix in the network model to be pruned. Then, the singular value matrix corresponding to each weight matrix in the network model to be pruned can be generated based on the candidate singular value set corresponding to each weight matrix in the network model to be pruned.
  • the matrix low rank approximation principle Eckart-Young
  • 603 Determine a second loss function according to each weight matrix in the network model to be pruned and a target low-rank approximation matrix corresponding to each weight matrix.
  • the second loss function is introduced by taking a weight matrix in the network model to be pruned and the target low-rank approximation matrix corresponding to the weight matrix as an example.
  • Formula 4 can represent the weight matrix to be decomposed obtained after the weight matrix is standardized, U can represent the left singular matrix, V can represent the transpose of the right singular matrix, ⁇ can represent the singular value matrix, U and V can represent the unitary matrix concatenated by orthogonal basis vectors, Trun can represent the target low-rank approximation matrix, which can be specifically the product of the transpose of the left singular matrix, the singular value matrix, and the right singular matrix,
  • F can represent the Frobeus norm of ⁇ , and I can represent the identity matrix.
  • model processing flow method can be implemented and executed in the form of software code.
  • pseudo code M1 of the operations related to the model processing flow method is as follows:
  • the fifth line of code in M1 is used to determine the task loss.
  • the training data set can be input into the dense network W.
  • each processed weight matrix of each convolutional layer forward propagates the training data.
  • the dense network W can output the prediction information of the data in the training data set, and then the task loss can be determined based on the labeling information and prediction information of the data in the training data set.
  • the 8th line in M1 is used to determine the adversarial loss function.
  • the low-rank approximation corresponding to each weight matrix to be processed in the dense network W is determined, and the training data is convolved using each target low-rank approximation matrix.
  • the dense network W can output the prediction information of the data in the training data set, and then the adversarial loss function can be determined based on the labeling information and prediction information of the data in the training data set.
  • Line 9 in M1 is used to prune a certain number of weights based on the preset weights that need to be pruned to improve the sparsity of the model and reduce the memory of the model.
  • Line 10 in M1 is used to prune a certain number of remaining unpruned weights based on the preset weights that need to be pruned.
  • Line 11 in M1 is used to sort the pruned weights according to the absolute value of the gradient and grow the weights with larger gradients, so as to ensure that important weights that were mistakenly pruned can be reactivated.
  • the above pseudocode after pruning the weights, can determine the derivatives of all pruned weights based on the target loss function, and grow the weights with larger derivatives to ensure that important weights are not pruned incorrectly, thereby ensuring the accuracy of the model.
  • the pruning effect is verified based on the small data set CIFAR-10.
  • Table 1 shows the classification accuracy of the pruned model obtained based on different pruning algorithms at different sparsity rates. Based on Table 1, it can be seen that the pruning effect of the model processing method provided in the embodiment of the present application at a high sparsity rate is significantly better than other unstructured pruning methods.
  • the pruning effect is verified based on the large dataset ImageNet.
  • Table 2 shows the classification accuracy of the pruned model based on different pruning algorithms at different sparsity rates. Based on Table 2, it can be seen that the pruning effect of the model processing method provided in the embodiment of the present application at a high sparsity rate is significantly better than other unstructured pruning methods.
  • Figure 7 is a curve diagram of the computing power consumption and accuracy of the comprehensive model of the model processing method provided in the embodiment of the present application. Based on Figure 7, it can be seen that the model processing method provided in the embodiment of the present application is significantly better than other unstructured pruning methods in terms of the computing power consumption and accuracy of the comprehensive model.
  • downstream visual tasks such as target detection and instance segmentation are verified on the COCO val2017 dataset by using the Mask R-CNN framework with ResNet-50 FPN as the backbone network.
  • Table 3 shows the positioning accuracy and segmentation accuracy of the pruned target detection model obtained based on different pruning algorithms at different sparsity rates. Based on Table 3, it can be seen that the pruning effect of the model processing method provided in the embodiment of the present application at high sparsity rates is significantly better than other unstructured pruning methods.
  • unstructured pruning is performed on the DeiT-S model of the transformer structure, and then verified on the ImageNet large-scale image classification dataset.
  • Table 4 shows the classification accuracy of the transformer model obtained based on different pruning algorithms at different sparsity rates. Based on Table 4, it can be seen that the model processing method provided in the embodiment of the present application is also more effective on the transformer architecture.
  • the model processing method provided in the embodiment of the present application can efficiently compress the visual model, reduce the computing power of the visual model, and help deploy large models on small edge devices. For example, it can be deployed to a cloud service or to a terminal device.
  • the following example illustrates the effect of deploying a pruned model on a small edge device.
  • Challenging visual tasks such as object detection often require the support of large visual models.
  • smartphones are small devices due to power and energy consumption constraints.
  • object detection model that completes unstructured pruning on the mobile phone side, it is possible to significantly reduce the energy consumption of the mobile phone without compromising the model performance, thereby extending the operating life of the mobile phone.
  • cameras are usually small end-side devices. For some remote areas where it is difficult to deploy power supply facilities for surveillance systems, they are often powered by solar energy. Therefore, the energy consumption of the target detection model deployed on the camera end is highly required. By deploying the target detection model that has completed unstructured pruning on the camera end, the energy consumption of video surveillance can be significantly reduced without compromising the performance of the model.
  • the electronic devices may include but are not limited to: mobile phones (including foldable screen mobile phones and straight-screen mobile phones), tablet computers, desktop computers, handheld computers, notebook computers, ultra-mobile personal computers (UMPC), netbooks, personal digital assistants (personal digital assistants), tablet computers (portable android devices, PADs), personal digital assistants (personal digital assistants, PDAs), handheld devices with wireless communication functions, computing devices, vehicle-mounted devices or wearable devices, virtual reality (virtual reality)
  • Electronic devices with data transmission synchronization requirements include mobile terminals or fixed terminals, such as virtual reality (VR) terminal devices, augmented reality (AR) terminal devices, wireless terminals in industrial control, wireless terminals in self driving, wireless terminals in remote medical, wireless terminals in smart grid, wireless terminals in transportation safety, wireless terminals in smart city, wireless terminals in smart home, power banks, etc.
  • VR virtual reality
  • AR augmented reality
  • Fig. 8 shows a schematic diagram of the hardware structure of the electronic device. It is understandable that the electronic device of the present application can be a server, a desktop computer, a handheld computer, a notebook computer (laptop) and other electronic devices. The structure of the electronic device is introduced below by taking the electronic device as an example of a server.
  • FIG8 is a block diagram of a server provided in an embodiment of the present application, and FIG8 schematically shows an example server of multiple embodiments.
  • the server may include one or more processors 804, a system control logic 806 connected to at least one of the processors 804, a system memory 812 connected to the system control logic 808, a non-volatile memory (NVM) 816 connected to the system control logic 808, and a network interface 820 connected to the system control logic 808.
  • processors 804 a system control logic 806 connected to at least one of the processors 804, a system memory 812 connected to the system control logic 808, a non-volatile memory (NVM) 816 connected to the system control logic 808, and a network interface 820 connected to the system control logic 808.
  • NVM non-volatile memory
  • the processor 804 may include one or more single-core or multi-core processors. In some embodiments, the processor 804 may include any combination of general-purpose processors and special-purpose processors (e.g., graphics processors, application processors, baseband processors, etc.). In an embodiment where the server adopts an eNB (Evolved Node B, enhanced base station) 101 or a RAN (Radio Access Network, wireless access network) controller 102, the processor 804 may be configured to execute various compliant embodiments.
  • eNB evolved Node B, enhanced base station
  • RAN Radio Access Network, wireless access network
  • system control logic 808 may include any suitable interface controller to provide any suitable interface to at least one of processors 804 and/or any suitable device or component in communication with system control logic 808 .
  • system control logic 808 may include one or more memory controllers to provide an interface to a system memory 812.
  • the system memory 812 may be used to load and store data and/or instructions.
  • the server's memory 812 may include any suitable volatile memory, such as a suitable dynamic random access memory (DRAM).
  • DRAM dynamic random access memory
  • the non-volatile memory (NVM) 816 may include one or more tangible, non-transitory computer-readable media for storing data and/or instructions.
  • the non-volatile memory (NVM) 816 may include any suitable non-volatile memory such as flash memory and/or any suitable non-volatile storage device, such as at least one of a HDD (Hard Disk Drive), a CD (Compact Disc) drive, and a DVD (Digital Versatile Disc) drive.
  • HDD Hard Disk Drive
  • CD Compact Disc
  • DVD Digital Versatile Disc
  • the non-volatile memory (NVM) 816 may include a portion of storage resources on the device where the server is installed, or it may be accessible by the device but not necessarily a portion of the device.
  • the non-volatile memory (NVM) 816 may be accessed over a network via the network interface 820 .
  • system memory 812 and the non-volatile memory (NVM) 816 may include: a temporary copy and a permanent copy of the instruction 824, respectively.
  • the instruction 824 may include: an instruction that causes the server to implement the model quantization method mentioned in the embodiment of the present application when executed by at least one of the processors 804.
  • the instruction 824, hardware, firmware and/or its software components may be additionally/alternatively placed in the system control logic 808, the network interface 820 and/or the processor 804.
  • the network interface 820 may include a transceiver for providing a radio interface for the server, and then communicating with any other suitable device (such as a front-end module, an antenna, etc.) through one or more networks.
  • the network interface 820 may be integrated with other components of the server.
  • the network interface 820 may be integrated with at least one of the processor 804, the system memory 812, the non-volatile memory (NVM) 816, and a firmware device (not shown) having instructions.
  • the server implements the model quantization method mentioned in the embodiments of the present application.
  • the network interface 820 may further include any suitable hardware and/or firmware to provide a multiple-input multiple-output radio interface.
  • the network interface 820 may be a network adapter, a wireless network adapter, a telephone modem and/or a wireless modem.
  • At least one of the processors 804 may be packaged together with logic for one or more controllers of the system control logic 808 to form a system in package (SiP). In one embodiment, at least one of the processors 804 may be integrated on the same die with logic for one or more controllers of the system control logic 808 to form a system on chip (SoC).
  • SiP system in package
  • SoC system on chip
  • the server may further include: an input/output (I/O) device 832.
  • the I/O device 832 may include a user interface that enables a user to interact with the server; the design of the peripheral component interface enables the peripheral component to also interact with the server.
  • the server also includes a sensor for determining at least one of an environmental condition and location information related to the server.
  • the user interface may include, but is not limited to, a display (e.g., an LCD display, a touch screen display, etc.), a speaker, a microphone, one or more cameras (e.g., a still image camera and/or a video camera), a flashlight (e.g., an LED flash), and a keyboard.
  • a display e.g., an LCD display, a touch screen display, etc.
  • a speaker e.g., a speaker
  • a microphone e.g., a microphone
  • one or more cameras e.g., a still image camera and/or a video camera
  • a flashlight e.g., an LED flash
  • the peripheral component interface may include, but is not limited to, a non-volatile memory port, an audio jack, and a power interface.
  • the sensors may include, but are not limited to, gyroscope sensors, accelerometers, proximity sensors, ambient light sensors, and positioning units.
  • the positioning unit may also be part of or interact with the network interface 820 to communicate with components of a positioning network (e.g., global positioning system (GPS) satellites).
  • GPS global positioning system
  • a logical unit/module can be a physical unit/module, or a part of a physical unit/module, or can be implemented as a combination of multiple physical units/modules.
  • the physical implementation method of these logical units/modules themselves is not the most important.
  • the combination of functions implemented by these logical units/modules is the key to solving the technical problems proposed by the present application.
  • the above-mentioned device embodiments of the present application do not introduce units/modules that are not closely related to solving the technical problems proposed by the present application, which does not mean that there are no other units/modules in the above-mentioned device embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Feedback Control In General (AREA)

Abstract

The present application relates to the technical field of computers. Disclosed are a model processing method, an electronic device and a medium. The model processing method comprises: after weightings of a trained convergence model are pruned to obtain a pruned model, recovering among the pruned weightings some important weightings affecting model precision, and applying the recovered important weightings to the pruned model so as to retrain the model until a converged model is obtained. Thus, the precision of models can be ensured while the models are compressed.

Description

一种模型处理方法、电子设备及介质Model processing method, electronic device and medium

本申请要求2023年05月08日提交中国专利局、申请号为202310514419.8、申请名称为“一种模型处理方法、电子设备及介质”的中国专利申请的优先权,上述申请的全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed with the China Patent Office on May 8, 2023, with application number 202310514419.8 and application name “A model processing method, electronic device and medium”. The entire contents of the above application are incorporated into this application by reference.

技术领域Technical Field

本申请涉及计算机技术领域,特别涉及一种模型处理方法、电子设备及介质。The present application relates to the field of computer technology, and in particular to a model processing method, electronic equipment and medium.

背景技术Background Art

近年来,随着卷积神经网络(Convolution Neural Network,CNN)的发展,网络深度越来越深,参数越来越多,使得大型卷积神经网络难以部署在小型边缘端设备。剪枝作为一种常用的压缩模型的方法,主要分为结构化剪枝和非结构化剪枝。其中,结构化剪枝以卷积神经网络的卷积层中的滤波器为剪枝单位,改变网络中的滤波器组和特征通道数目,实现对卷积神经网络的宽度进行缩减。非结构化剪枝以卷积神经网络中的单个权重为剪枝单位,进行细粒度的剪枝。In recent years, with the development of Convolution Neural Network (CNN), the network depth has become deeper and deeper, and the number of parameters has increased, making it difficult to deploy large convolutional neural networks on small edge devices. Pruning, as a commonly used method for compressing models, is mainly divided into structured pruning and unstructured pruning. Among them, structured pruning uses the filter in the convolutional layer of the convolutional neural network as the pruning unit, changes the filter group and the number of feature channels in the network, and reduces the width of the convolutional neural network. Unstructured pruning uses a single weight in the convolutional neural network as the pruning unit and performs fine-grained pruning.

在传统的非结构化剪枝方法中,通过最大程度地剪去模型的参数量,使得剪枝后的模型能够部署在小型边缘端设备。然而,参数量的大规模减少会造成模型的高稀疏化,导致模型的表征能力降低,例如导致特征图的部分特征无法提取,进而导致模型的精度降低。In traditional unstructured pruning methods, the model parameters are pruned to the greatest extent possible so that the pruned model can be deployed on small edge devices. However, the large-scale reduction of parameters will cause the model to become highly sparse, resulting in a reduction in the model's representation capabilities. For example, some features of the feature graph cannot be extracted, which in turn reduces the accuracy of the model.

发明内容Summary of the invention

为解决上述提及的现有非结构化剪枝方法会造成模型的高稀疏化,导致模型的表征能力降低,例如导致特征图的部分特征无法提取,进而导致模型的精度降低的问题,本申请提供一种模型处理方法、电子设备及介质。In order to solve the problem mentioned above that the existing unstructured pruning method will cause the model to become highly sparse, resulting in a reduction in the model's representation ability, such as causing some features of the feature graph to be unable to be extracted, thereby resulting in a reduction in the accuracy of the model, the present application provides a model processing method, electronic device and medium.

第一方面,本申请提供一种模型处理方法,该方法可以用于电子设备,模型处理方法包括获取第一神经网络模型,其中,第一神经网络模型是对待剪枝神经网络模型进行剪枝得到的,并且剪枝包括将待剪枝神经网络模型的第一权重集合修改为第一神经网络模型的第二权重集合,其中,第一权重集合中非零的第一权重在第二权重集合中对应的值为零。从第二权重集合中选择出满足权重条件的目标权重集合。对第二权重集合中的目标权重集合做参数调整处理得到第三权重集合,其中,第一权重在第三权重集合中的值非零。将第三权重集合用于第一神经网络模型。In a first aspect, the present application provides a model processing method, which can be used in electronic devices, the model processing method comprising obtaining a first neural network model, wherein the first neural network model is obtained by pruning a neural network model to be pruned, and pruning comprises modifying a first weight set of the neural network model to be pruned to a second weight set of the first neural network model, wherein a non-zero first weight in the first weight set corresponds to a value of zero in the second weight set. A target weight set that meets the weight condition is selected from the second weight set. Parameter adjustment processing is performed on the target weight set in the second weight set to obtain a third weight set, wherein the value of the first weight in the third weight set is non-zero. The third weight set is used for the first neural network model.

基于上述方案,获得剪枝模型后,可以确定出收敛模型的被剪枝权重中的第三权重集合,即能提高模型精度的重要权重,然后对该部分重要权重进行生长,重新应用到剪枝模型中,并基于恢复后的重要权重和剪枝模型中的未剪枝权重进行模型的再次训练,直至模型再次收敛,获取收敛后的收敛模型。如此,能够在一定程度上优化模型内存的同时有效保证模型精度。Based on the above scheme, after obtaining the pruned model, the third weight set in the pruned weights of the converged model can be determined, that is, the important weights that can improve the model accuracy, and then the important weights are grown and reapplied to the pruned model, and the model is retrained based on the restored important weights and the unpruned weights in the pruned model until the model converges again, and the converged model is obtained. In this way, the model memory can be optimized to a certain extent while effectively ensuring the model accuracy.

可以理解,将待剪枝神经网络模型的第一权重集合修改为第一神经网络模型的第二权重集合可以指将剪枝神经网络模型的第一权重集合置零,即实现第一权重集合的剪枝。It can be understood that modifying the first weight set of the neural network model to be pruned to the second weight set of the first neural network model may refer to setting the first weight set of the pruned neural network model to zero, that is, achieving pruning of the first weight set.

可以理解,对第二权重集合中的目标权重集合可以指本申请提及的从被剪枝权重中筛选出来的重要权重。It can be understood that the target weight set in the second weight set may refer to the important weights screened out from the pruned weights mentioned in the present application.

可以理解,对重要权重进行生长可以是对重要权重进行恢复,例如,可以将置零的权重调整为正数,或者将置零的权重调整为负数。It can be understood that growing the important weights may be restoring the important weights. For example, the weights that are set to zero may be adjusted to a positive number, or the weights that are set to zero may be adjusted to a negative number.

在一些可选的实例中,从第二权重集合中选择出满足权重条件的目标权重集合,包括:对第二权重集合中的任一权重进行恢复处理,得到候选神经网络模型;获取第一神经网络模型对应的损失函数和候选神经网络模型对应的损失函数的梯度;基于对第二权重集合中每个权重进行恢复后,第一神经网络模型对应的损失函数和候选神经网络模型对应的损失函数的梯度,确定目标权重集合。In some optional instances, a target weight set that meets the weight condition is selected from the second weight set, including: restoring any weight in the second weight set to obtain a candidate neural network model; obtaining the gradient of the loss function corresponding to the first neural network model and the loss function corresponding to the candidate neural network model; and determining the target weight set based on the gradient of the loss function corresponding to the first neural network model and the loss function corresponding to the candidate neural network model after restoring each weight in the second weight set.

可以理解,权重条件可以是将该权重单独应用到剪枝模型后时,模型所对应的当前损失函数,与未应用前剪枝模型对应的损失函数的梯度(即下降幅度)较大。It can be understood that the weight condition can be that when the weight is applied to the pruned model alone, the current loss function corresponding to the model has a larger gradient (i.e., a decrease) than the loss function corresponding to the pruned model before application.

在一些可选的实例中,获取第一神经网络模型对应的损失函数,包括:基于训练数据集对应的真实信息和第一神经网络模型输出的预测信息,确定第一损失函数;基于第一神经网络模型和 候选神经网络模型,确定第二损失函数;基于第一损失函数和第二损失函数确定第一神经网络模型对应的损失函数。In some optional examples, obtaining the loss function corresponding to the first neural network model includes: determining the first loss function based on the real information corresponding to the training data set and the predicted information output by the first neural network model; The candidate neural network model determines the second loss function; and the loss function corresponding to the first neural network model is determined based on the first loss function and the second loss function.

可以理解,第一损失函数可以表示第一神经网络模型实现预设任务的精度,第二损失函数可以表示第一待剪枝网络模型中权重矩阵与权重矩阵对应的目标低秩逼近矩阵的不同程度。It can be understood that the first loss function can represent the accuracy of the first neural network model in achieving the preset task, and the second loss function can represent the degree of difference between the weight matrix in the first network model to be pruned and the target low-rank approximation matrix corresponding to the weight matrix.

可以理解,预设待剪枝网络模型为实现不同任务所使用的神经网络模型时,用来表示第一神经网络模型实现预设任务的精度的第一损失函数可以不同。常用第一损失函数可以包括0-1损失函数、绝对值损失函数、对数损失函数、平方损失函数、指数损失函数、代价损失函数、交叉熵损失函数等。It can be understood that when the preset network model to be pruned is a neural network model used to implement different tasks, the first loss function used to represent the accuracy of the first neural network model in implementing the preset task can be different. Commonly used first loss functions can include 0-1 loss function, absolute value loss function, logarithmic loss function, square loss function, exponential loss function, cost loss function, cross entropy loss function, etc.

在一些可选的实例中,获取第一神经网络模型对应的第一损失函数,包括:将训练数据集输入第一神经网络模型,获取第一神经网络模型输出的预测信息;基于训练数据集对应的真实信息和预测信息,获取第一损失函数。In some optional instances, obtaining a first loss function corresponding to the first neural network model includes: inputting a training data set into the first neural network model to obtain prediction information output by the first neural network model; and obtaining the first loss function based on real information and prediction information corresponding to the training data set.

在一些可选的实例中,获取第一神经网络模型对应的第二损失函数,包括:获取待剪枝网络模型中的权重矩阵集合;确定权重矩阵集合中各权重矩阵对应的目标低秩逼近矩阵;基于各权重矩阵和各权重矩阵对应的目标低秩逼近矩阵,获取第一神经网络模型对应的第二损失函数。In some optional instances, obtaining a second loss function corresponding to the first neural network model includes: obtaining a set of weight matrices in the network model to be pruned; determining a target low-rank approximation matrix corresponding to each weight matrix in the weight matrix set; and obtaining the second loss function corresponding to the first neural network model based on each weight matrix and the target low-rank approximation matrix corresponding to each weight matrix.

在一些可选的实例中,确定权重矩阵集合中各权重矩阵对应的目标低秩逼近矩阵,包括:对各权重矩阵进行标准化处理,得到各权重矩阵对应的待分解权重矩阵;对各权重矩阵对应的待分解权重矩阵进行低秩分解处理,获取各权重矩阵对应的目标低秩逼近矩阵。In some optional instances, the target low-rank approximation matrix corresponding to each weight matrix in the weight matrix set is determined, including: normalizing each weight matrix to obtain a weight matrix to be decomposed corresponding to each weight matrix; performing low-rank decomposition processing on the weight matrix to be decomposed corresponding to each weight matrix to obtain a target low-rank approximation matrix corresponding to each weight matrix.

可以理解,目标低秩逼近矩阵可以是对待剪枝网络模型中的各权重矩阵进行低秩分解,可以得到由多个参数量少、简单且低秩的子权重矩阵相乘构成的矩阵。It can be understood that the target low-rank approximation matrix can be a low-rank decomposition of each weight matrix in the pruned network model, and a matrix composed of the multiplication of multiple sub-weight matrices with small parameters, simplicity and low rank can be obtained.

可以理解,通过确定各权重矩阵对应的目标低秩逼近矩阵,来确定第一神经网络模型中各权重矩阵与目标低秩逼近矩阵的不同程度来确定重要权重,可以提高权重矩阵的秩,减小特征提取的遗漏,进而可以提高模型的精度。It can be understood that by determining the target low-rank approximation matrix corresponding to each weight matrix, the degree of difference between each weight matrix and the target low-rank approximation matrix in the first neural network model is determined to determine the important weights, which can improve the rank of the weight matrix, reduce the omission of feature extraction, and thus improve the accuracy of the model.

在一些可选的实例中,对待剪枝神经网络模型进行剪枝,得到第一神经网络模型,包括:基于待剪枝网络模型中各权重的幅值和预设稀疏率对待剪枝网络模型进行剪枝,得到第一神经网络模型。In some optional instances, pruning the neural network model to be pruned to obtain a first neural network model includes: pruning the network model to be pruned based on the amplitude of each weight in the network model to be pruned and a preset sparsity rate to obtain the first neural network model.

在一些可选的实例中,基于待剪枝网络模型中各权重的幅值和预设稀疏率对待剪枝网络模型进行剪枝,得到第一神经网络模型,包括:基于待剪枝网络模型中各权重的幅值和预设稀疏率,将待剪枝网络模型中预设稀疏率对应数量的权重确定为剪枝权重;将剪枝权重对应的值置零,得到第一神经网络模型。In some optional instances, the network model to be pruned is pruned based on the amplitude of each weight in the network model to be pruned and the preset sparsity rate to obtain a first neural network model, including: based on the amplitude of each weight in the network model to be pruned and the preset sparsity rate, determining the weights corresponding to the preset sparsity rate in the network model to be pruned as pruning weights; setting the values corresponding to the pruning weights to zero to obtain the first neural network model.

例如,待剪枝网络模型具有m个权重,当次处理对应的预设稀疏率为α,可以基于m个权重的幅值(绝对值)对m个权重从大到小进行排序,并按照从大到小的顺序将序列中m×α个权重确定为第一权重子集合中的权重。并对第一权重集合中的权重进行剪枝处理,即将第一权重集合中的权重全部置零,得到对待剪枝网络模型进行剪枝处理后的第一神经网络模型。For example, the network model to be pruned has m weights, and when the preset sparsity rate corresponding to the current processing is α, the m weights can be sorted from large to small based on the amplitude (absolute value) of the m weights, and the m×α weights in the sequence are determined as the weights in the first weight subset in order from large to small. The weights in the first weight set are pruned, that is, all the weights in the first weight set are set to zero, and the first neural network model after the network model to be pruned is obtained.

可以理解,通过对待剪枝网络模型进行剪枝处理,可以减小待剪枝网络模型的参数量,例如将参数量m减小为m-m×α,可以提高模型稀疏率,使得剪枝后的模型可以部署在小型边缘端设备。It can be understood that by pruning the network model to be pruned, the number of parameters of the network model to be pruned can be reduced. For example, reducing the parameter amount m to m-m×α can improve the model sparsity rate, so that the pruned model can be deployed on a small edge device.

在一些可选的实例中,对待剪枝神经网络模型进行剪枝,得到第一神经网络模型,包括:基于待剪枝网络模型中各权重的幅值、预设稀疏率和预设生长率,对待剪枝网络模型进行剪枝,得到第一神经网络模型。In some optional instances, pruning the neural network model to be pruned to obtain a first neural network model includes: pruning the network model to be pruned based on the amplitude of each weight in the network model to be pruned, a preset sparsity rate, and a preset growth rate to obtain the first neural network model.

在一些可选的实例中,对待剪枝神经网络模型进行剪枝,得到第一神经网络模型,包括:基于待剪枝网络模型中各权重的幅值和预设稀疏率,将待剪枝网络模型中预设稀疏率对应数量的权重确定为第一权重子集合;基于待剪枝网络模型中除第一权重子集合之外的各权重的幅值和预设生长率,将待剪枝网络模型中除第一权重子集合之外预设生长率对应数量的权重确定为第二权子集合;将第一权重子集合和第二权重子集合中权重对应的值置零,得到第一神经网络模型。In some optional instances, a neural network model to be pruned is pruned to obtain a first neural network model, including: based on the amplitude of each weight in the network model to be pruned and a preset sparsity rate, determining the weights corresponding to the preset sparsity rate in the network model to be pruned as a first weight subset; based on the amplitude of each weight other than the first weight subset in the network model to be pruned and a preset growth rate, determining the weights corresponding to the preset growth rate in the network model to be pruned other than the first weight subset as a second weight subset; setting the values corresponding to the weights in the first weight subset and the second weight subset to zero to obtain the first neural network model.

例如,待剪枝网络模型具有m个权重,当次处理对应的预设稀疏率为α,当次处理对应的预设生长率为β,可以基于m个权重的幅值(绝对值)对m个权重从大到小进行排序,并按照从大到小的顺序将序列中m×α个权重确定为第一权重子集合中的权重。然后可以基于m-m×α个未置零的权重的幅值(绝 对值)对未置零的权重进行从大到小进行排序,并按照从大到小的顺序将序列中m×β个权重也确定为第二权重子集合中的权重。接着,可以将第一权重子集合和第二权重子集合的并集确定为第一权重集合,并对第一权重集合中的权重进行剪枝处理,即将第一权重集合中的权重全部置零,使得在剪去预设稀疏率对应的第一权重子集合的基础上,额外地剪去第二权重子集合,得到对待剪枝网络模型进行剪枝处理后的第一神经网络模型。For example, the network model to be pruned has m weights, the preset sparsity rate corresponding to the next process is α, and the preset growth rate corresponding to the next process is β. The m weights can be sorted from large to small based on the amplitude (absolute value) of the m weights, and the m×α weights in the sequence are determined as the weights in the first weight subset in order from large to small. Then, the weights can be sorted based on the amplitude (absolute value) of the m×α non-zero weights. The unset-to-zero weights are sorted from large to small (value), and the m×β weights in the sequence are also determined as the weights in the second weight subset in order from large to small. Then, the union of the first weight subset and the second weight subset can be determined as the first weight set, and the weights in the first weight set can be pruned, that is, all the weights in the first weight set are set to zero, so that on the basis of pruning the first weight subset corresponding to the preset sparsity rate, the second weight subset is additionally pruned to obtain the first neural network model after the pruned network model is pruned.

可以理解,通过对待剪枝网络模型进行剪枝处理,可以减小待剪枝网络模型的参数量,例如将参数量m减小为m-m×α-m×β,可以进一步提高模型稀疏率,使得剪枝后的模型可以部署在小型边缘端设备。It can be understood that by pruning the network model to be pruned, the number of parameters of the network model to be pruned can be reduced. For example, reducing the parameter amount m to m-m×α-m×β can further improve the model sparsity rate, so that the pruned model can be deployed on a small edge device.

第二方面,本申请提供一种电子设备,包括:存储器,用于存储电子设备的一个或多个处理器执行的指令,以及处理器,是电子设备的一个或多个处理器之一,用于执行本申请提及的模型处理方法。In a second aspect, the present application provides an electronic device, comprising: a memory for storing instructions executed by one or more processors of the electronic device, and a processor, which is one of the one or more processors of the electronic device, for executing the model processing method mentioned in the present application.

第三方面,本申请提供一种可读存储介质,可读存储介质上存储有指令,指令在电子设备上执行时使得电子设备执行本申请提及的模型处理方法。In a third aspect, the present application provides a readable storage medium having instructions stored thereon, which, when executed on an electronic device, enables the electronic device to execute the model processing method mentioned in the present application.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1根据本申请的一些实施例,示出了一种应用场景示意图;FIG1 shows a schematic diagram of an application scenario according to some embodiments of the present application;

图2是一种模型的剪枝方法的框架示意图;FIG2 is a schematic diagram of a framework of a pruning method for a model;

图3是一种能够应用于模型处理的剪枝方法的流程示意图;FIG3 is a flow chart of a pruning method that can be applied to model processing;

图4是本申请实施例提供的一种模型处理方法的示意图;FIG4 is a schematic diagram of a model processing method provided in an embodiment of the present application;

图5是本申请实施例提供的一种确定目标权重集合的方法的流程示意图;FIG5 is a flow chart of a method for determining a target weight set provided in an embodiment of the present application;

图6是本申请实施例提供的一种获取待剪枝网络模型的第二损失函数的流程示意图;FIG6 is a schematic diagram of a process for obtaining a second loss function of a network model to be pruned provided in an embodiment of the present application;

图7是本申请实施例提供的模型处理方法在综合模型消耗算力和准确率的曲线示意图;FIG7 is a curve diagram showing the computing power consumption and accuracy of the comprehensive model of the model processing method provided in an embodiment of the present application;

图8示出了电子设备的硬件结构示意图。FIG. 8 shows a schematic diagram of the hardware structure of an electronic device.

具体实施方式DETAILED DESCRIPTION

本申请的说明性实施例包括但不限于基于一种模型处理方法、电子设备及介质。The illustrative embodiments of the present application include but are not limited to a model-based processing method, an electronic device, and a medium.

可以理解,本申请实施例中提供的模型处理方法可以用于图像识别、目标检测、增强学习、语义分析等任务中使用的神经网络模型中,例如,神经网络模型的种类不限于卷积神经网络、Transformer(一种基于多头注意力机制的模型)、循环神经网络(Recurrent Neural Network,RNN)、长短记忆神经网络(Long Short-Term Memory,LSTM)等任意神经网络模型。It can be understood that the model processing method provided in the embodiments of the present application can be used in neural network models used in tasks such as image recognition, target detection, reinforcement learning, and semantic analysis. For example, the types of neural network models are not limited to convolutional neural networks, Transformer (a model based on a multi-head attention mechanism), recurrent neural networks (RNN), long short-term memory neural networks (LSTM), and other arbitrary neural network models.

如图1所示,在一种具体实现中,服务器10可以对用于图像识别的神经网络模型11进行剪枝,然后发送至小型边缘端设备20,以便小型边缘端设备20能够基于剪枝后的神经网络模型进行数据处理。小型边缘端设备可以是手机、摄像头等任意可实施的电子设备。As shown in FIG1 , in a specific implementation, the server 10 can prune the neural network model 11 for image recognition and then send it to the small edge device 20, so that the small edge device 20 can perform data processing based on the pruned neural network model. The small edge device can be any implementable electronic device such as a mobile phone, a camera, etc.

下面对一些实施例中提及的模型的剪枝方法进行介绍。The following introduces the pruning methods of the models mentioned in some embodiments.

图2是一种模型的剪枝方法的框架示意图。如图2所示,剪枝方法可以包括:FIG2 is a schematic diagram of a framework of a pruning method for a model. As shown in FIG2 , the pruning method may include:

首先,获取特征图,例如可以将采样图像输入卷积神经网络,对采样图像进行特征提取处理,得到特征图。卷积神经网络可以包括卷积层和全连接层。First, a feature map is obtained. For example, a sampled image can be input into a convolutional neural network, and feature extraction processing can be performed on the sampled image to obtain a feature map. A convolutional neural network can include a convolutional layer and a fully connected layer.

然后,选取一层卷积层对其进行剪枝处理。例如对于卷积神经网络中的当前卷积层,各滤波器对输入的第一特征图进行滤波处理,输出第二特征图。其中,第一特征图是当前卷积层的上一层卷积层输出的特征图。滤波器、第一特征图和第二特征图是一一对应的。基于第二特征图中的参数确定第二特征图对应的矩阵秩。可以理解,当一个矩阵为m×n矩阵,则该矩阵的秩最大为m和n中的较小者,表示为min(m,n)。当矩阵秩比较大,则可以证明该特征图中的冗余信息多。对各滤波器输出的第二特征图对应的矩阵秩进行排序处理,基于排序处理结果对卷积神经网络进行剪枝处理,剪去矩阵秩小于预设矩阵秩的第二特征图对应的滤波器,可选地,可以将滤波器中的参数全部置零。通过剪去矩阵秩小的特征图对应的滤波器,即将提取冗余信息的滤波器对应的权重矩阵中的参数全部置零,可以在后续利用剪去滤波器的模型提取图像特征的过程中减少冗余信息的提取。Then, a convolution layer is selected to perform pruning on it. For example, for the current convolution layer in the convolutional neural network, each filter performs filtering on the input first feature map and outputs a second feature map. Among them, the first feature map is the feature map output by the previous convolution layer of the current convolution layer. The filter, the first feature map and the second feature map are in one-to-one correspondence. The matrix rank corresponding to the second feature map is determined based on the parameters in the second feature map. It can be understood that when a matrix is an m×n matrix, the maximum rank of the matrix is the smaller of m and n, expressed as min(m,n). When the matrix rank is relatively large, it can be proved that there is a lot of redundant information in the feature map. The matrix rank corresponding to the second feature map output by each filter is sorted, and the convolutional neural network is pruned based on the sorting result, and the filter corresponding to the second feature map whose matrix rank is less than the preset matrix rank is cut off. Optionally, all the parameters in the filter can be set to zero. By cutting off the filter corresponding to the feature map with a small matrix rank, that is, setting all the parameters in the weight matrix corresponding to the filter that extracts redundant information to zero, the extraction of redundant information can be reduced in the subsequent process of extracting image features using the model with the cut filter.

接着,对剪枝后的卷积神经网络进行微调,得到最终的用于预测的卷积神经网络。Next, the pruned convolutional neural network is fine-tuned to obtain the final convolutional neural network for prediction.

由于该剪枝方案以特征图为整体确定矩阵秩来对滤波器进行剪枝,因此只能应用于模型的结构化剪 枝,无法应用于模型的非结构化剪枝,即无法实现对模型的细粒度剪枝,无法使得模型的内存优化达到最优。Since this pruning scheme uses the feature graph as a whole to determine the matrix rank to prune the filter, it can only be applied to the structured pruning of the model. It cannot be applied to the unstructured pruning of the model, that is, it cannot achieve fine-grained pruning of the model and cannot optimize the memory optimization of the model.

图3是一种模型的剪枝方法的流程示意图。如图3所示,剪枝方法可以包括:Fig. 3 is a flow chart of a pruning method of a model. As shown in Fig. 3, the pruning method may include:

获取卷积神经网络的每个卷积层中每个滤波器的权重,例如,可以获取第l层卷积层中滤波器的权重W={Wl 1,...,Wn 1}。计算每个滤波器权重的三维张量的秩,得到滤波器权重的三维张量的秩R={Rl 1,...,Rn 1},根据滤波器的三维张量的秩的大小对滤波器的权重进行排序,得到排序结果,例如,得到第l层滤波器的权重W’={Wl 1,...,Wn 1}。基于排序结果剪去秩小于预设秩的滤波器,得到剪枝后的卷积神经网络W={Wm 1,...,Wn 1}。剪枝完成后,对卷积神经网络进行微调,再进行下一层卷积层的剪枝,例如对第l层卷积层完成剪枝后,可以对第l+1层卷积层进行剪枝。由于该剪枝方案以滤波器权重的三维张量为整体确定秩来对滤波器进行剪枝,也无法应用于模型的非结构化剪枝,即无法实现对模型的细粒度剪枝,无法使得模型的内存优化达到最优。The weight of each filter in each convolution layer of the convolution neural network is obtained. For example, the weight W = {W l 1 ,...,W n 1 } of the filter in the l-th convolution layer can be obtained. The rank of the three-dimensional tensor of each filter weight is calculated to obtain the rank R = {R l 1 ,...,R n 1 } of the three-dimensional tensor of the filter weight. The weights of the filters are sorted according to the rank of the three-dimensional tensor of the filters to obtain the sorting result. For example, the weight W' = {W l 1 ,...,W n 1 } of the l-th filter is obtained. Based on the sorting result, the filters whose ranks are less than the preset ranks are pruned to obtain the pruned convolution neural network W = {W m 1 ,...,W n 1 }. After the pruning is completed, the convolution neural network is fine-tuned, and then the next convolution layer is pruned. For example, after the l-th convolution layer is pruned, the l+1-th convolution layer can be pruned. Since this pruning scheme uses the three-dimensional tensor of the filter weights as a whole to determine the rank to prune the filter, it cannot be applied to the unstructured pruning of the model, that is, it cannot achieve fine-grained pruning of the model and cannot achieve optimal memory optimization of the model.

为了解决上述以特征图为整体确定秩以及以滤波器权重的三维张量为整体确定秩来对滤波器进行剪枝无法应用于非结构化剪枝,进而也无法应用于细粒度的非结构化剪枝的问题,在一些实施例中,提供两种非结构化的剪枝方法,分别为基于预设剪枝流程的方法和基于预设剪枝标准的方法。In order to solve the problem that the above-mentioned pruning of filters by determining the rank based on the feature map as a whole and the rank based on the three-dimensional tensor of the filter weights as a whole cannot be applied to unstructured pruning, and thus cannot be applied to fine-grained unstructured pruning, in some embodiments, two unstructured pruning methods are provided, namely, a method based on a preset pruning process and a method based on a preset pruning standard.

其中,基于预设剪枝流程的方法通过预设权重剪去以及预设生长权重的剪枝流程,能够达到较好的剪枝效果。预设剪枝流程包括逐渐剪枝、随机初始化一个稀疏网络的“从稀疏到稀疏”剪枝、网络稠密与稀疏状态交替训练的“压缩/解压”剪枝等等。例如,逐渐剪枝通过每隔ΔT步训练迭代进行一次剪枝,缓慢升高模型的稀疏率直至达到目标稀疏率结束剪枝。预设剪枝标准通过预设权重重要性,从权重矩阵中选取冗余权重进行剪枝。Among them, the method based on the preset pruning process can achieve better pruning effects through the pruning process of preset weight pruning and preset growth weight. The preset pruning process includes gradual pruning, "from sparse to sparse" pruning that randomly initializes a sparse network, "compression/decompression" pruning that alternately trains the network in dense and sparse states, and so on. For example, gradual pruning performs pruning once every ΔT training iterations, slowly increases the sparsity rate of the model until the target sparsity rate is reached and pruning ends. The preset pruning standard selects redundant weights from the weight matrix for pruning by presetting the weight importance.

基于预设剪枝标准的方法通过预设剪枝标准实现剪枝,其中,剪枝标准一般包括基于权重绝对值、基于权重梯度、基于自学习得到权重的重要性分数等等。The method based on preset pruning standards implements pruning through preset pruning standards, wherein the pruning standards generally include those based on the absolute value of the weight, based on the weight gradient, based on the importance score of the weight obtained by self-learning, and the like.

然而,基于预设剪枝流程的方法和基于预设剪枝标准的方法均是通过最大程度地剪去模型的参数量实现剪枝,参数量的大规模减少会造成模型的高稀疏化,导致模型的表征能力降低,例如,导致特征图的部分特征无法提取,进而导致模型的精度降低。However, both the method based on the preset pruning process and the method based on the preset pruning standard achieve pruning by pruning the model parameters to the greatest extent possible. A large-scale reduction in the number of parameters will cause the model to become highly sparse, resulting in a reduction in the model's representation ability. For example, some features of the feature map cannot be extracted, thereby reducing the accuracy of the model.

为解决上述问题,本申请提供一种模型处理方法,在获取到收敛模型且对收敛模型中的权重进行剪枝处理,获得剪枝模型后,可以确定出收敛模型的被剪枝权重中能提高模型精度的重要权重,然后对该部分重要权重进行生长,即将该部分重要权重进行恢复,重新应用到剪枝模型中,并基于恢复后的重要权重和剪枝模型中的未剪枝权重进行模型的再次训练,直至模型再次收敛,获取收敛后的收敛模型。如此,能够在一定程度上优化模型内存的同时有效保证模型精度。In order to solve the above problems, the present application provides a model processing method. After obtaining a converged model and pruning the weights in the converged model, the important weights in the pruned weights of the converged model that can improve the model accuracy can be determined. Then, the important weights are grown, that is, the important weights are restored and reapplied to the pruned model. The model is retrained based on the restored important weights and the unpruned weights in the pruned model until the model converges again, and the converged model is obtained. In this way, the model memory can be optimized to a certain extent while effectively ensuring the model accuracy.

其中,各重要权重可以为将该权重单独应用到剪枝模型后时,模型所对应的当前损失函数,与未应用前剪枝模型对应的损失函数的梯度(即下降幅度)较大的权重,可以理解,梯度越大,则证明该权重单独应用到剪枝模型后获得的损失函数值越来越小,即模型收敛的程度越高,即模型的精度越高,例如,选取被剪枝权重中对应梯度较大的前预设数量的权重作为重要权重,可以有效保证模型精度。Among them, each important weight can be a weight with a larger gradient (i.e., a decrease range) of the current loss function corresponding to the model and the loss function corresponding to the pruned model before the weight is applied alone when the weight is applied alone to the pruned model. It can be understood that the larger the gradient, the smaller the loss function value obtained after the weight is applied alone to the pruned model, that is, the higher the degree of model convergence, that is, the higher the accuracy of the model. For example, selecting a preset number of weights with larger corresponding gradients among the pruned weights as important weights can effectively ensure the accuracy of the model.

确定重要权重的方式可以为:对被剪枝权重中的各权重依次进行恢复处理,例如,首先从被剪枝权重的各权重中随机挑选一个权重A,对该权重A进行恢复处理,并基于当前已恢复权重A和未被剪枝的权重进行模型训练,获取模型对应的当前损失函数,并获取当前损失函数与前述剪枝后模型对应的损失函数的梯度(即下降幅度),并将该梯度作为权重A对应的梯度。The important weights can be determined by restoring each of the pruned weights in turn. For example, first randomly select a weight A from the pruned weights, restore the weight A, and perform model training based on the currently restored weight A and the unpruned weights, obtain the current loss function corresponding to the model, and obtain the gradient (i.e., the degree of decrease) of the current loss function and the loss function corresponding to the aforementioned pruned model, and use the gradient as the gradient corresponding to weight A.

然后随机选取下一个权重B进行恢复处理,并基于当前已恢复权重B和未被剪枝的权重进行模型训练,获取模型对应的当前损失函数,并获取当前损失函数与前述剪枝后模型对应的损失函数的梯度,获取权重B对应的梯度;即依次基于上述方法获取被剪枝的权重中的各权重分别对应的梯度,并基于梯度大小对被剪枝权重进行排序,选取对应梯度较大的前预设数量的权重作为重要权重。Then randomly select the next weight B for recovery, and perform model training based on the currently recovered weight B and the unpruned weights to obtain the current loss function corresponding to the model, and obtain the gradient of the current loss function and the loss function corresponding to the aforementioned pruned model, and obtain the gradient corresponding to weight B; that is, obtain the gradients corresponding to each weight in the pruned weights in turn based on the above method, sort the pruned weights based on the gradient size, and select the preset number of weights with larger corresponding gradients as important weights.

可以理解,梯度越大,则证明基于当前已恢复的权重和未被剪枝的权重进行模型训练获得的损失函数值越来越小,即模型收敛的程度越高,即模型的精度越高,如此,选取对应梯度较大的前预设数量的权重作为重要权重,可以有效保证模型精度。It can be understood that the larger the gradient is, the smaller the loss function value obtained by model training based on the currently restored weights and unpruned weights is, that is, the higher the degree of model convergence is, the higher the accuracy of the model is. In this way, selecting a preset number of weights corresponding to larger gradients as important weights can effectively ensure the accuracy of the model.

在一些实施例中,在上述进行模型剪枝的过程中,可以在满足模型预设稀疏率的权重剪枝数量的基础上,额外剪去一定数量的权重,且保证生长的重要权重的数量与额外剪去的权重的数量相同,如此, 能够在不改变模型稀疏率的基础上且保证模型精度。In some embodiments, during the above-mentioned model pruning process, a certain number of weights can be pruned additionally on the basis of the number of weight prunings that meet the preset sparsity rate of the model, and the number of important weights that are grown is ensured to be the same as the number of weights pruned additionally, so that It can ensure the accuracy of the model without changing the sparsity rate of the model.

在一些实施例中,在上述进行模型剪枝的过程中,也可以只剪去一定数量的权重,使得其剪枝后满足模型预设稀疏率,并不额外剪去权重,但是在生长部分重要权重后,例如在第一次生长重要权重获取收敛模型时,此时由于生长了部分重要权重,模型稀疏率有所升高,未达到预设稀疏率,此时可以重新执行上述剪枝步骤和权重生长步骤,直至模型再次收敛且模型的稀疏率达到预设稀疏率,如此,也可以保证模型稀疏率。In some embodiments, during the above-mentioned model pruning process, only a certain number of weights may be pruned so that the preset sparsity rate of the model is met after pruning, and no additional weights may be pruned. However, after growing some important weights, for example, when growing important weights for the first time to obtain a converged model, the model sparsity rate increases due to the growth of some important weights, and does not reach the preset sparsity rate. At this time, the above-mentioned pruning steps and weight growing steps may be re-executed until the model converges again and the sparsity rate of the model reaches the preset sparsity rate. In this way, the model sparsity rate can also be guaranteed.

例如,预设的模型稀疏率为60%,由于生长了20%的重要权重,模型稀疏率升高为80%,此时,在模型收敛后,可以进行第二次剪枝,例如,剪去权重的数量使得模型稀疏率降低为40%,此时使得模型稀疏率达到40%,再生长20%的重要权重,在模型再次收敛时,即可达到预设的模型稀疏率60%。其中,模型每次的剪枝比例和重要权重生长比例可以根据实际需求确定。For example, the preset model sparsity rate is 60%. Due to the growth of 20% of important weights, the model sparsity rate increases to 80%. At this time, after the model converges, a second pruning can be performed. For example, the number of weights pruned reduces the model sparsity rate to 40%. At this time, the model sparsity rate reaches 40%. Then, 20% of important weights are grown. When the model converges again, the preset model sparsity rate of 60% can be reached. Among them, the pruning ratio and important weight growth ratio of the model each time can be determined according to actual needs.

下面对本申请实施例中提及的模型处理方法进行详细介绍。图4是本申请实施例提供的一种模型处理方法的示意图,图4中所示的模型处理方法可以由电子设备执行。如图4所示,模型处理方法可以包括:The model processing method mentioned in the embodiment of the present application is described in detail below. FIG4 is a schematic diagram of a model processing method provided in the embodiment of the present application. The model processing method shown in FIG4 can be executed by an electronic device. As shown in FIG4, the model processing method may include:

401:获取待剪枝网络模型和训练数据集。401: Obtain the network model to be pruned and the training data set.

本申请实施例中,待剪枝网络模型可以包括多层卷积层,多层卷积层中各卷积层可以具有层标识信息,如表征卷积层身份的身份标识信息。各卷积层的层标识信息可以是唯一的,即各卷积层的层标识信息不同。例如,待剪枝网络模型可以包括n层卷积层,卷积层的层标识信息可以是Conv1、Conv2...Conv n。各卷积层可以包括多个卷积核,多个卷积核中的各卷积核可以具有核标识信息,如表征卷积核身份的身份标识信息。各卷积核的核标识信息可以是唯一的,即各卷积核的核标识信息不同。例如,卷积层Conv1可以包括m个卷积核,卷积核的核标识信息可以是W1、W2...Wm。每个卷积核可以包括多个权重(参数),每个卷积核中的权重可以全部相同、也可以全部不同、也可以部分相同。每个卷积核中的多个权重可以以矩阵的形式进行表示,即每个卷积核中的多个权重可以用权重矩阵进行表示,权重矩阵中的每个权重可以用于提取和强化图像数据、音频数据的特征。In an embodiment of the present application, the network model to be pruned may include multiple convolutional layers, and each convolutional layer in the multiple convolutional layers may have layer identification information, such as identity identification information that characterizes the identity of the convolutional layer. The layer identification information of each convolutional layer may be unique, that is, the layer identification information of each convolutional layer is different. For example, the network model to be pruned may include n convolutional layers, and the layer identification information of the convolutional layers may be Conv1, Conv2...Conv n. Each convolutional layer may include multiple convolutional kernels, and each convolutional kernel in the multiple convolutional kernels may have kernel identification information, such as identity identification information that characterizes the identity of the convolutional kernel. The kernel identification information of each convolutional kernel may be unique, that is, the kernel identification information of each convolutional kernel is different. For example, the convolutional layer Conv1 may include m convolutional kernels, and the kernel identification information of the convolutional kernel may be W1, W2...Wm. Each convolutional kernel may include multiple weights (parameters), and the weights in each convolutional kernel may be all the same, all different, or partially the same. The multiple weights in each convolution kernel can be represented in the form of a matrix, that is, the multiple weights in each convolution kernel can be represented by a weight matrix, and each weight in the weight matrix can be used to extract and enhance the features of image data and audio data.

在一些可选的实例中,待剪枝网络模型可以包括但不限于卷积神经网络、Transformer(一种基于多头注意力机制的模型)、循环神经网络(RecurrentNeuralNetwork,RNN)、长短记忆神经网络(Long Short-TermMemory,LSTM)中的任意一种。可以理解,待剪枝网络模型可以是预设实现图像识别、目标检测、增强学习、语义分析等任务所使用的神经网络模型。在一些可选的实施方式中,待剪枝网络模型可以是视觉模型,用于实现各种视觉任务。In some optional instances, the network model to be pruned may include but is not limited to any one of a convolutional neural network, a Transformer (a model based on a multi-head attention mechanism), a recurrent neural network (RNN), and a long short-term memory neural network (LSTM). It is understood that the network model to be pruned may be a neural network model used to implement tasks such as image recognition, target detection, reinforcement learning, and semantic analysis. In some optional implementations, the network model to be pruned may be a visual model for implementing various visual tasks.

在一些可选的实例中,训练数据集可以是图像数据集,也可以是音频数据集。训练数据集中的数据可以具有标注信息,该标注信息可以是与待剪枝网络模型的任务相匹配的信息。例如,当待剪枝网络模型是预设实现图像识别任务所使用的神经网络模型时,标注信息可以是标注的类别信息。再如,当待剪枝网络模型是预设实现目标检测任务所使用的神经网络模型时,标注信息可以是标注的类别信息和位置信息。In some optional instances, the training data set may be an image data set or an audio data set. The data in the training data set may have annotation information, and the annotation information may be information that matches the task of the network model to be pruned. For example, when the network model to be pruned is a neural network model used to implement the image recognition task, the annotation information may be the annotated category information. For another example, when the network model to be pruned is a neural network model used to implement the target detection task, the annotation information may be the annotated category information and location information.

402:将待剪枝网络模型的第一权重集合修改为第二权重集合,得到第一神经网络模型。402: Modify the first weight set of the network model to be pruned into a second weight set to obtain a first neural network model.

可以理解,当待剪枝网络模型参数量较多,难以部署在小型边缘端设备时,需要最大程度地减少待剪枝网络模型的参数量(即将待剪枝网络模型中最多数量的权重置零),实现对待剪枝网络模型最大程度的压缩,使得剪枝后的模型能够部署在小型边缘端设备。It can be understood that when the network model to be pruned has a large number of parameters and is difficult to deploy on a small edge device, it is necessary to minimize the number of parameters of the network model to be pruned (that is, reset the largest number of weights in the network model to be pruned to zero) to achieve maximum compression of the network model to be pruned, so that the pruned model can be deployed on a small edge device.

在一些可选的实例中,基于待剪枝网络模型中各权重的幅值(绝对值)、当次训练处理对应的预设稀疏率以及当次训练处理对应的预设生长率,从待剪枝网络模型中确定第一权重集合。其中,第一权重集合中的权重可以位于同一卷积核,即属于同一权重矩阵,也可以位于不同卷积核,即属于不同权重矩阵。在确定第一权重集合之后,可以对第一权重集合中的权重进行剪枝处理,即将第一权重集合中的权重置零,得到第二权重集合。In some optional instances, a first weight set is determined from the network model to be pruned based on the amplitude (absolute value) of each weight in the network model to be pruned, the preset sparsity rate corresponding to the current training process, and the preset growth rate corresponding to the current training process. The weights in the first weight set may be located in the same convolution kernel, that is, belong to the same weight matrix, or may be located in different convolution kernels, that is, belong to different weight matrices. After determining the first weight set, the weights in the first weight set may be pruned, that is, the weights in the first weight set are reset to zero to obtain a second weight set.

在一些可选的实例中,可以基于待剪枝网络模型中各权重的幅值(绝对值)和当次训练处理对应的预设稀疏率,从待剪枝网络模型中确定第一权重子集合,并基于待剪枝网络模型中各权重的幅值(绝对值)和当次训练处理对应的预设生长率,从待剪枝网络模型中确定第二权重子集合。接着,可以将第一权重子集合和第二权重子集合的并集确定为第一权重集合,并将第一权重集合修改为第二权重集合,得到第一神经网络模型。 In some optional examples, a first subset of weights may be determined from the network model to be pruned based on the amplitude (absolute value) of each weight in the network model to be pruned and a preset sparsity rate corresponding to the current training process, and a second subset of weights may be determined from the network model to be pruned based on the amplitude (absolute value) of each weight in the network model to be pruned and a preset growth rate corresponding to the current training process. Then, the union of the first subset of weights and the second subset of weights may be determined as the first set of weights, and the first set of weights may be modified to the second set of weights to obtain a first neural network model.

例如,待剪枝网络模型具有m个权重,当次处理对应的预设稀疏率为α,当次处理对应的预设生长率为β,可以基于m个权重的幅值(绝对值)对m个权重从大到小进行排序,并按照从大到小的顺序将序列中m×α个权重确定为第一权重子集合中的权重。然后可以基于m-m×α个未置零的权重的幅值(绝对值)对未置零的权重进行从大到小进行排序,并按照从大到小的顺序将序列中m×β个权重也确定为第二权重子集合中的权重。接着,可以将第一权重子集合和第二权重子集合的并集确定为第一权重集合,并对第一权重集合中的权重进行剪枝处理,即将第一权重集合中的权重全部置零,使得在剪去预设稀疏率对应的第一权重子集合的基础上,额外地剪去第二权重子集合,得到对待剪枝网络模型进行剪枝处理后的第一神经网络模型。For example, the network model to be pruned has m weights, and the preset sparsity rate corresponding to the processing is α, and the preset growth rate corresponding to the processing is β. The m weights can be sorted from large to small based on the amplitude (absolute value) of the m weights, and the m×α weights in the sequence are determined as the weights in the first weight subset in order from large to small. Then, the non-zero weights can be sorted from large to small based on the amplitude (absolute value) of the m-m×α non-zero weights, and the m×β weights in the sequence are also determined as the weights in the second weight subset in order from large to small. Then, the union of the first weight subset and the second weight subset can be determined as the first weight set, and the weights in the first weight set can be pruned, that is, all the weights in the first weight set are set to zero, so that on the basis of pruning the first weight subset corresponding to the preset sparsity rate, the second weight subset is additionally pruned to obtain the first neural network model after the network model to be pruned is pruned.

403:从第二权重集合中确定满足预设条件的目标权重集合。403: Determine a target weight set that meets a preset condition from the second weight set.

可以理解,在对待剪枝网络模型进行剪枝的过程中,不仅需要将最大数量的权重置零,而且需要提高权重矩阵的秩,减少特征提取的遗漏,实现在对待剪枝网络模型进行压缩的同时,保证待剪枝网络模型实现预设任务的精度。因此,需要从第二权重集合中确定目标权重集合,在后续调整权重的过程中对其进行调整,保证重要权重不会被错剪,以保证模型实现预设任务的精度。It can be understood that in the process of pruning the network model to be pruned, it is necessary not only to reset the maximum number of weights to zero, but also to increase the rank of the weight matrix, reduce the omission of feature extraction, and achieve the accuracy of the network model to be pruned in achieving the preset task while compressing the network model to be pruned. Therefore, it is necessary to determine the target weight set from the second weight set, and adjust it in the subsequent weight adjustment process to ensure that important weights will not be pruned by mistake, so as to ensure the accuracy of the model in achieving the preset task.

在一些可选的实例中,可以基于第一损失函数和第二损失函数,从第二权重集合中确定满足预设条件的目标权重集合。其中,第一损失函数可以是表示第一待剪枝网络模型实现预设任务的精度的损失函数,第二损失函数可以是表示第一待剪枝网络模型中权重矩阵与权重矩阵对应的目标低秩逼近矩阵的不同程度的损失函数。例如,可以确定第二权重集合中每个权重的变化对第一损失函数值和第二损失函数值的变化幅度的影响程度,进而可以将影响程度大的权重确定为目标权重,得到目标权重集合。In some optional instances, a target weight set that meets preset conditions can be determined from the second weight set based on the first loss function and the second loss function. The first loss function can be a loss function that represents the accuracy of the first network model to be pruned in achieving a preset task, and the second loss function can be a loss function that represents the different degrees of the weight matrix in the first network model to be pruned and the target low-rank approximation matrix corresponding to the weight matrix. For example, the degree of influence of the change of each weight in the second weight set on the change amplitude of the first loss function value and the second loss function value can be determined, and then the weight with a large degree of influence can be determined as the target weight to obtain the target weight set.

其中,基于第一损失函数和第二损失函数,从第二权重集合中确定满足预设条件的目标权重集合的具体方式在图5中详述。Among them, based on the first loss function and the second loss function, a specific method of determining a target weight set that meets the preset conditions from the second weight set is detailed in FIG5 .

404:将目标权重集合中的目标权重调整为第三权重集合,得到第二神经网络模型。404: Adjust the target weights in the target weight set to a third weight set to obtain a second neural network model.

可以理解,将目标权重集合中的目标权重调整为第三权重集合可以是将目标权重集合中的目标权重的绝对值扩大,即将置零的权重调整为正数,或者将置零的权重调整为负数,进而可以确保对待剪枝网络模型的精度具有重大影响的重要权重能够被重新激活。It can be understood that adjusting the target weights in the target weight set to the third weight set can be to expand the absolute value of the target weights in the target weight set, that is, to adjust the zeroed weights to positive numbers, or to adjust the zeroed weights to negative numbers, thereby ensuring that important weights that have a significant impact on the accuracy of the pruned network model can be reactivated.

可以理解,在得到第二神经网络模型之后,可以将训练数据集输入第二神经网络模型,对第二神经网络模型进行训练,得到目标神经网络模型。例如,可以将训练数据集输入第二神经网络模型,基于第二神经网络模型中的权重对训练数据集进行特征提取处理,并输出预测信息。其中,第二神经网络模型中的权重包括第三权重集合。在第二神经网络模型输出训练数据集的预测信息之后,基于训练数据集的标注信息和预测信息,对第二神经网络模型中的第三权重集合中的权重进行调整,直至迭代次数大于预设次数,得到目标神经网络模型。It can be understood that after obtaining the second neural network model, the training data set can be input into the second neural network model, and the second neural network model can be trained to obtain the target neural network model. For example, the training data set can be input into the second neural network model, and the training data set can be feature extracted based on the weights in the second neural network model, and prediction information can be output. Among them, the weights in the second neural network model include a third weight set. After the second neural network model outputs the prediction information of the training data set, the weights in the third weight set in the second neural network model are adjusted based on the annotation information and prediction information of the training data set until the number of iterations is greater than the preset number, and the target neural network model is obtained.

下面,对本申请实施例中提及的确定目标权重集合的方法进行详细介绍。图5是本申请实施例提供的一种确定目标权重集合的方法的流程示意图。如图5所示,确定目标权重集合的步骤可以包括:Below, the method for determining the target weight set mentioned in the embodiment of the present application is described in detail. Figure 5 is a flow chart of a method for determining the target weight set provided by the embodiment of the present application. As shown in Figure 5, the steps of determining the target weight set may include:

501:获取第一神经网络模型的第一损失函数。501: Obtain a first loss function of a first neural network model.

可以理解,在得到第一神经网络模型之后,可以确定第一神经网络模型实现预设任务的精度,即确定第一神经网络模型对训练数据集进行预测所输出的预测信息的精确度,该精确度可以用训练数据集的标注信息与预测信息的误差进行表示。可选地,可以采用第一损失函数来表示第一神经网络模型实现预设任务的精度,第一损失函数值越小,即训练数据集的标注信息与预测信息的误差越小,表示第一神经网络模型实现预设任务的精度越高。It can be understood that after obtaining the first neural network model, the accuracy of the first neural network model in achieving the preset task can be determined, that is, the accuracy of the prediction information output by the first neural network model for predicting the training data set can be determined, and the accuracy can be represented by the error between the labeling information of the training data set and the prediction information. Optionally, a first loss function can be used to represent the accuracy of the first neural network model in achieving the preset task. The smaller the value of the first loss function, that is, the smaller the error between the labeling information of the training data set and the prediction information, the higher the accuracy of the first neural network model in achieving the preset task.

可以理解,预设待剪枝网络模型为实现不同任务所使用的神经网络模型时,用来表示第一神经网络模型实现预设任务的精度的第一损失函数可以不同。常用第一损失函数可以包括0-1损失函数、绝对值损失函数、对数损失函数、平方损失函数、指数损失函数、代价损失函数、交叉熵损失函数等。例如,当待剪枝网络模型是实现图像识别任务所使用的神经网络模型时,可以获取交叉熵损失函数作为第一损失函数,当待剪枝网络模型是实现目标检测任务所使用的神经网络模型时,可以获取代价损失函数作为第一损失函数。It can be understood that when the network model to be pruned is preset as a neural network model used to implement different tasks, the first loss function used to represent the accuracy of the first neural network model in implementing the preset task may be different. Commonly used first loss functions may include 0-1 loss function, absolute value loss function, logarithmic loss function, square loss function, exponential loss function, cost loss function, cross entropy loss function, etc. For example, when the network model to be pruned is a neural network model used to implement an image recognition task, the cross entropy loss function can be obtained as the first loss function, and when the network model to be pruned is a neural network model used to implement a target detection task, the cost loss function can be obtained as the first loss function.

502:获取第一神经网络模型的第二损失函数。502: Obtain a second loss function of the first neural network model.

可以理解,待剪枝网络模型作为一种实现预设任务所使用的神经网络模型,权重的数量(参数量) 较多,存在较多冗余权重,因此可以减小待剪枝网络模型中的冗余权重(例如,对待剪枝网络模型进行剪枝处理,即将待剪枝网络模型中的部分权重置零),来减小待剪枝网络模型的参数量,实现对待剪枝网络模型的压缩。在对待剪枝网络模型进行压缩时,由于减少了待剪枝网络模型中的冗余权重,故压缩前后的待剪枝网络模型中的权重矩阵发生变化,压缩后的权重矩阵的秩可能会小于压缩前的权重矩阵的秩(例如将待剪枝网络模型中某个权重矩阵某一行的权重均置零),而利用低秩的权重矩阵对训练数据集中的数据进行特征提取时,会造成特征提取遗漏,进而会导致利用压缩前后的待剪枝网络模型对训练数据集进行预测所输出的预测信息之间具有误差,压缩后的待剪枝网络模型实现预设任务的精度低于压缩前的待剪枝网络模型实现预设任务的精度。It can be understood that the network model to be pruned is a neural network model used to achieve the preset task, and the number of weights (parameters) There are more redundant weights, so the redundant weights in the network model to be pruned can be reduced (for example, the network model to be pruned is pruned, that is, some weights in the network model to be pruned are reset to zero) to reduce the number of parameters of the network model to be pruned and achieve compression of the network model to be pruned. When compressing the network model to be pruned, the redundant weights in the network model to be pruned are reduced, so the weight matrix in the network model to be pruned before and after compression changes, and the rank of the weight matrix after compression may be smaller than the rank of the weight matrix before compression (for example, the weights of a row of a weight matrix in the network model to be pruned are all set to zero). When the low-rank weight matrix is used to extract features from the data in the training data set, feature extraction omissions will be caused, which will lead to errors between the prediction information output by the network model to be pruned before and after compression to predict the training data set. The accuracy of the preset task achieved by the compressed network model to be pruned is lower than the accuracy of the preset task achieved by the network model to be pruned before compression.

在一些可选的实例中,可以对待剪枝网络模型中的各权重矩阵进行低秩分解,可以得到由多个参数量少、简单且低秩的子权重矩阵相乘构成的目标低秩逼近矩阵。例如,对于一个m×n的权重矩阵A,可以对其进行低秩分解得到由子权重矩阵Um×k、子权重矩阵∑k×k以及子权重矩阵VT k×n相乘的目标低秩逼近矩阵,其中,k<<n,权重矩阵Am×n的参数量为m×n,目标低秩逼近矩阵的参数量为k×(m+n+1)。虽然,权重矩阵对应的目标低秩逼近矩阵的参数量远小于相应的权重矩阵,但是由于目标低秩逼近矩阵中的子权重矩的秩低,故利用权重矩阵对应的目标低秩逼近矩阵对训练数据集中的数据进行特征提取时,会造成特征提取遗漏。因此,在对待剪枝网络模型进行压缩时,可以增大权重矩阵与权重矩阵对应的目标低秩逼近矩阵的不同,提高压缩后的权重矩阵的秩,减小特征提取遗漏,进而可以减小利用压缩前后的待剪枝网络模型对训练数据集进行预测所输出的预测信息之间的误差,保证压缩后的待剪枝网络模型实现预设任务的精度。可选地,可以采用第二损失函数来表示权重矩阵与权重矩阵对应的目标低秩逼近矩阵的不同,第二损失函数值越小,即权重矩阵与权重矩阵对应的目标低秩逼近矩阵的不同程度越大,权重矩阵的秩越高,对特征提取遗漏的越少,进而利用压缩前后的待剪枝网络模型对训练数据集进行预测所输出的预测信息之间的误差越小,压缩后的待剪枝网络模型实现预设任务的精度越高。In some optional instances, each weight matrix in the pruned network model can be subjected to low-rank decomposition, and a target low-rank approximation matrix can be obtained by multiplying multiple sub-weight matrices with small parameter amounts, simplicity and low rank. For example, for an m×n weight matrix A, it can be subjected to low-rank decomposition to obtain a target low-rank approximation matrix by multiplying sub-weight matrices U m×k , sub-weight matrices ∑ k×k and sub-weight matrices V T k×n , where k<<n, the number of parameters of the weight matrix A m×n is m×n, and the number of parameters of the target low-rank approximation matrix is k×(m+n+1). Although the number of parameters of the target low-rank approximation matrix corresponding to the weight matrix is much smaller than the corresponding weight matrix, due to the low rank of the sub-weight moments in the target low-rank approximation matrix, when the target low-rank approximation matrix corresponding to the weight matrix is used to extract features from the data in the training data set, feature extraction omissions will occur. Therefore, when compressing the network model to be pruned, the difference between the weight matrix and the target low-rank approximation matrix corresponding to the weight matrix can be increased, the rank of the compressed weight matrix can be increased, and the feature extraction omission can be reduced, thereby reducing the error between the prediction information output by predicting the training data set using the network model to be pruned before and after compression, and ensuring the accuracy of the compressed network model to be pruned in achieving the preset task. Optionally, a second loss function can be used to represent the difference between the weight matrix and the target low-rank approximation matrix corresponding to the weight matrix. The smaller the value of the second loss function, that is, the greater the difference between the weight matrix and the target low-rank approximation matrix corresponding to the weight matrix, the higher the rank of the weight matrix, the fewer feature extraction omissions, and the smaller the error between the prediction information output by predicting the training data set using the network model to be pruned before and after compression, the higher the accuracy of the compressed network model to be pruned in achieving the preset task.

503:基于第一损失函数和第二损失函数,从第二权重集合中确定目标权重集合。503: Determine a target weight set from the second weight set based on the first loss function and the second loss function.

可以理解,在对待剪枝网络模型进行剪枝的过程中,不仅需要减少参数量,而且需要减少特征提取的遗漏,因此,在对待剪枝网络模型进行剪枝的过程中,不仅需要将最多数量的权重置零,而且需要提高权重矩阵的秩,实现在对待剪枝网络模型最大程度的压缩的同时,保证待剪枝网络模型实现预设任务的精度。It can be understood that in the process of pruning the network model to be pruned, it is necessary not only to reduce the amount of parameters but also to reduce the omission of feature extraction. Therefore, in the process of pruning the network model to be pruned, it is necessary not only to reset the maximum number of weights to zero but also to increase the rank of the weight matrix to achieve the maximum compression of the network model to be pruned while ensuring the accuracy of the preset task of the network model to be pruned.

在一些可选的实例中,通过对待剪枝网络模型进行剪枝得到第一神经网络模型,可以调整训练数据集的标注信息与预测信息的误差,即改变第一损失函数值大小,并且可以调整权重矩阵与权重矩阵对应的目标低秩逼近矩阵的不同程度,提高权重矩阵的秩。因此,通过对待剪枝网络模型进行剪枝得到第一神经网络模型,可以同时调整第一损失函数值和第二损失函数值,以在减少待剪枝网络模型的参数量的同时,提高待剪枝网络模型中权重矩阵的秩,减少特征提取的遗漏,保证待剪枝网络模型实现预设任务的精度。In some optional instances, by pruning the network model to be pruned to obtain the first neural network model, the error between the annotation information and the prediction information of the training data set can be adjusted, that is, the value of the first loss function is changed, and the degree of difference between the weight matrix and the target low-rank approximation matrix corresponding to the weight matrix can be adjusted to improve the rank of the weight matrix. Therefore, by pruning the network model to be pruned to obtain the first neural network model, the first loss function value and the second loss function value can be adjusted at the same time, so as to reduce the number of parameters of the network model to be pruned while improving the rank of the weight matrix in the network model to be pruned, reduce the omission of feature extraction, and ensure the accuracy of the network model to be pruned in achieving the preset task.

可以理解,在对待剪枝网络模型中的第一权重集合进行置零后,若第一权重集合中存在对第一损失函数值的变化幅度和第二损失函数值的变化幅度具有较大影响的重要权重,即对第一神经网络模型所输出的预测信息的精确度具有重大影响的重要权重,若将重要权重保持置零,在后续对第一神经网络模型进行权重(未置零权重)调整的过程中,第一损失函数值的变化幅度和第二损失函数值的变化幅度较小,难以通过调整未置零权重使得第一神经网络模型收敛。因此,可以在对待剪枝网络模型中的第一权重集合进行剪枝处理得到第二权重集合后,可以从第二权重集合中确定目标权重集合(即重要权重)进行生长处理,以保证对第一损失函数值的变化幅度和第二损失值的变化幅度具有较大影响的重要权重不会被错剪,使得在后续对第一神经网络模型进行权重调整的过程中,可以对重要权重进行权重调整,以改变第一损失函数值和第二损失函数值的变化幅度,使得第一神经网络模型收敛,以在减少待剪枝网络模型的参数量的同时,提高待剪枝网络模型中权重矩阵的秩,减少特征提取的遗漏,保证待剪枝网络模型实现预设任务的精度。It can be understood that after the first weight set in the pruned network model is set to zero, if there are important weights in the first weight set that have a greater impact on the change amplitude of the first loss function value and the change amplitude of the second loss function value, that is, important weights that have a significant impact on the accuracy of the prediction information output by the first neural network model, if the important weights are kept at zero, in the subsequent process of adjusting the weights (weights that are not set to zero) of the first neural network model, the change amplitude of the first loss function value and the change amplitude of the second loss function value are small, and it is difficult to make the first neural network model converge by adjusting the weights that are not set to zero. Therefore, after pruning the first weight set in the network model to be pruned to obtain the second weight set, the target weight set (i.e., important weights) can be determined from the second weight set for growth processing to ensure that the important weights that have a greater impact on the variation range of the first loss function value and the variation range of the second loss value will not be pruned incorrectly, so that in the subsequent process of adjusting the weights of the first neural network model, the important weights can be adjusted to change the variation range of the first loss function value and the second loss function value, so that the first neural network model converges, so as to reduce the number of parameters of the network model to be pruned while improving the rank of the weight matrix in the network model to be pruned, reducing the omission of feature extraction, and ensuring the accuracy of the network model to be pruned in achieving the preset tasks.

在一些可选的实例中,可以根据第一损失函数和第二损失函数的和确定目标损失函数,并确定对目标损失函数值的变化幅度影响较大的重要权重(即目标权重集合),即对第一损失函数值的变化幅度和第二损失函数值的变化幅度影响较大的重要权重,在后续对待剪枝网络模型中的权重进行调整时,也对 目标权重集合中的目标权重进行调整,使得在减少待剪枝网络模型的参数量的同时,提高待剪枝网络模型中权重矩阵的秩,减少特征提取的遗漏,保证待剪枝网络模型实现预设任务的精度。In some optional examples, the target loss function can be determined according to the sum of the first loss function and the second loss function, and the important weights (i.e., the target weight set) that have a greater impact on the change range of the target loss function value can be determined, that is, the important weights that have a greater impact on the change range of the first loss function value and the change range of the second loss function value can be determined. When the weights in the pruned network model are subsequently adjusted, the important weights that have a greater impact on the change range of the first loss function value and the second loss function value can also be used. The target weights in the target weight set are adjusted so that the rank of the weight matrix in the network model to be pruned is improved while reducing the number of parameters of the network model to be pruned, reducing the omission of feature extraction, and ensuring the accuracy of the network model to be pruned in achieving the preset task.

在一些可选的实例中,目标损失函数可以采用如公式1所示的形式:
L=Ltask+λLrank     公式1
In some optional instances, the objective loss function may take the form shown in Formula 1:
L=L task +λL rank Formula 1

其中,公式1中,L可以表示目标损失函数,Ltask可以表示第一损失函数,Lrank可以表示第二损失函数,λ可以表示引入的线性组合超参数。In Formula 1, L may represent the target loss function, L task may represent the first loss function, L rank may represent the second loss function, and λ may represent the introduced linear combination hyperparameter.

可以理解,目标损失函数可以是以待剪枝网络模型中的多个权重为自变量的函数,利用目标损失函数对每个权重进行链式求导,可以得到每个权重对应的导数,该导数可以反映权重的变化对目标损失函数值的变化幅度的影响程度,权重的导数越大,权重的变化对目标损失函数值的变化幅度的影响程度越大。It can be understood that the target loss function can be a function with multiple weights in the network model to be pruned as independent variables. By chain-deriving each weight using the target loss function, the derivative corresponding to each weight can be obtained. The derivative can reflect the degree of influence of the change in weight on the change in the value of the target loss function. The larger the derivative of the weight, the greater the influence of the change in weight on the change in the value of the target loss function.

在一些可选的实例中,在确定第二权重集合中每个权重对应的导数之后,可以对第二权重集合中每个权重对应的导数的幅值(绝对值)从大到小进行排序,并按照从大到小的顺序将序列中m×β个导数对应的权重确定为目标权重集合。In some optional instances, after determining the derivative corresponding to each weight in the second weight set, the amplitudes (absolute values) of the derivatives corresponding to each weight in the second weight set can be sorted from large to small, and the weights corresponding to the m×β derivatives in the sequence can be determined as the target weight set in order from large to small.

下面,对本申请实施例中提及的第二损失函数的确定方法进行详细介绍。图6是本申请实施例提供的一种获取第一神经网络模型的第二损失函数的流程示意图。如图6所示,获取第一神经网络模型的第二损失函数的步骤可以包括:Below, the method for determining the second loss function mentioned in the embodiment of the present application is described in detail. FIG6 is a schematic diagram of a process for obtaining the second loss function of the first neural network model provided by the embodiment of the present application. As shown in FIG6, the steps of obtaining the second loss function of the first neural network model may include:

601:对待剪枝网络模型中的各权重矩阵进行标准化处理,得到各权重矩阵对应的待分解权重矩阵。601: Standardize each weight matrix in the network model to be pruned to obtain a weight matrix to be decomposed corresponding to each weight matrix.

可以理解,对于待剪枝网络模型中的各权重矩阵,可以基于弗罗贝乌斯范数(Frobenius Norm)计算公式确定待剪枝网络模型中各权重矩阵的弗罗贝乌斯范数,然后可以将待剪枝网络模型中的各权重矩阵除以各权重矩阵的弗罗贝乌斯范数,实现对待剪枝网络模型中的各权重矩阵进行L2标准化处理,得到待剪枝网络模型中各权重矩阵对应的待分解权重矩阵。It can be understood that for each weight matrix in the network model to be pruned, the Frobenius norm of each weight matrix in the network model to be pruned can be determined based on the Frobenius norm calculation formula, and then each weight matrix in the network model to be pruned can be divided by the Frobenius norm of each weight matrix to achieve L2 normalization of each weight matrix in the network model to be pruned, and obtain the weight matrix to be decomposed corresponding to each weight matrix in the network model to be pruned.

在一些可选的实例中,弗罗贝乌斯范数(Frobenius Norm)计算公式可以采用如公式2所示的形式:
In some optional examples, the Frobenius Norm calculation formula may be in the form shown in Formula 2:

其中,在公式2中,||W||F可以表示待剪枝网络模型中各权重矩阵的弗罗贝乌斯范数,W可以表示待剪枝网络模型中的各权重矩阵,WT可以表示待剪枝网络模型中各权重矩阵的转置,tr(·)可以表示·的迹。In Formula 2, ||W|| F may represent the Frobeus norm of each weight matrix in the network model to be pruned, W may represent each weight matrix in the network model to be pruned, WT may represent the transpose of each weight matrix in the network model to be pruned, and tr(·) may represent the trace of ·.

在一些可选的实例中,待剪枝网络模型中各权重矩阵对应的待分解权重矩阵可以采用如公式3所示的形式:
In some optional examples, the weight matrix to be decomposed corresponding to each weight matrix in the network model to be pruned may be in the form shown in Formula 3:

其中,在公式3中,||W||F可以表示待剪枝网络模型中各权重矩阵的弗罗贝乌斯范数,W可以表示待剪枝网络模型中的各权重矩阵,可以表示待分解权重矩阵。Among them, in formula 3, ||W|| F can represent the Frobeus norm of each weight matrix in the network model to be pruned, and W can represent each weight matrix in the network model to be pruned. It can represent the weight matrix to be decomposed.

602:对待剪枝网络模型中各权重矩阵对应的待分解权重矩阵进行奇异值分解处理,确定待剪枝网络模型中各权重矩阵对应的目标低秩逼近矩阵。602: Perform singular value decomposition processing on the weight matrices to be decomposed corresponding to each weight matrix in the network model to be pruned, and determine the target low-rank approximation matrix corresponding to each weight matrix in the network model to be pruned.

可以理解,为了提高压缩后的权重矩阵的秩,可以对各权重矩阵对应的待分解权重矩阵进行奇异值分解,得到待剪枝网络模型中各权重矩阵对应的特征向量集合和奇异值集合。然后可以基于待剪枝网络模型中各权重矩阵对应的特征向量合集和奇异值集合确定左奇异矩阵、奇异值矩阵和右奇异矩阵,并将左奇异矩阵、奇异值矩阵和右奇异矩阵相乘作为权重矩阵对应的目标低秩逼近矩阵。It can be understood that in order to improve the rank of the compressed weight matrix, the weight matrix to be decomposed corresponding to each weight matrix can be subjected to singular value decomposition to obtain the eigenvector set and singular value set corresponding to each weight matrix in the network model to be pruned. Then, the left singular matrix, singular value matrix and right singular matrix can be determined based on the eigenvector set and singular value set corresponding to each weight matrix in the network model to be pruned, and the left singular matrix, singular value matrix and right singular matrix are multiplied as the target low-rank approximation matrix corresponding to the weight matrix.

在一些可选的实例中,对于待剪枝网络模型中各权重矩阵对应的待分解权重矩阵,可以确定待分解权重矩阵的转置,并基于待分解权重矩阵和待分解权重矩阵的转置确定多个特征值和特征向量。然后可以基于特征向量确定权重矩阵对应的左奇异矩阵和右奇异矩阵,基于特征值确定奇异值矩阵。例如,可以基于待分解权重矩阵和待分解权重矩阵的转置的乘积,确定多个特征值和特征向量,并基于确定的多个特征向量确定左奇异矩阵。可以基于待分解权重矩阵的转置和待分解权重矩阵的乘积,确定多个特征 值和特征向量,并基于确定的多个特征向量确定右奇异矩阵。可以将全部特征值的开方确定为待剪枝网络模型中各权重矩阵对应的奇异值,得到待剪枝网络模型中各权重矩阵对应的奇异值集合。接着,可以基于奇异值集合确定奇异值矩阵。In some optional instances, for the weight matrix to be decomposed corresponding to each weight matrix in the network model to be pruned, the transpose of the weight matrix to be decomposed can be determined, and multiple eigenvalues and eigenvectors can be determined based on the weight matrix to be decomposed and the transpose of the weight matrix to be decomposed. Then, the left singular matrix and the right singular matrix corresponding to the weight matrix can be determined based on the eigenvectors, and the singular value matrix can be determined based on the eigenvalues. For example, multiple eigenvalues and eigenvectors can be determined based on the product of the weight matrix to be decomposed and the transpose of the weight matrix to be decomposed, and the left singular matrix can be determined based on the determined multiple eigenvectors. Multiple eigenvalues can be determined based on the transpose of the weight matrix to be decomposed and the product of the weight matrix to be decomposed. The right singular matrix is determined based on the determined multiple eigenvectors. The square root of all eigenvalues can be determined as the singular value corresponding to each weight matrix in the network model to be pruned, and the singular value set corresponding to each weight matrix in the network model to be pruned is obtained. Then, the singular value matrix can be determined based on the singular value set.

在一些可选的实例中,在得到待剪枝网络模型中各权重矩阵对应的奇异值集合之后,可以从待剪枝网络模型的各权重矩阵对应的奇异值集合中确定候选奇异值集合。例如,可以根据矩阵低秩近似原理(Eckart-Young)从待剪枝网络模型的各权重矩阵对应的奇异值集合中选取最大的k个奇异值,作为待剪枝网络模型中各权重矩阵对应的候选奇异值集合。接着,可以基于待剪枝网络模型中各权重矩阵对应的候选奇异值集合生成待剪枝网络模型中各权重矩阵对应的奇异值矩阵。In some optional instances, after obtaining the singular value set corresponding to each weight matrix in the network model to be pruned, the candidate singular value set can be determined from the singular value set corresponding to each weight matrix of the network model to be pruned. For example, the largest k singular values can be selected from the singular value set corresponding to each weight matrix of the network model to be pruned according to the matrix low rank approximation principle (Eckart-Young) as the candidate singular value set corresponding to each weight matrix in the network model to be pruned. Then, the singular value matrix corresponding to each weight matrix in the network model to be pruned can be generated based on the candidate singular value set corresponding to each weight matrix in the network model to be pruned.

603:根据待剪枝网络模型中的各权重矩阵和各权重矩阵对应的目标低秩逼近矩阵,确定第二损失函数。603: Determine a second loss function according to each weight matrix in the network model to be pruned and a target low-rank approximation matrix corresponding to each weight matrix.

下面,以待剪枝网络模型中一个权重矩阵和权重矩阵对应的目标低秩逼近矩阵为例对第二损失函数进行介绍,具体地,第二损失函数的计算公式可以采用如公式4所示的形式:

UTU=I
VTV=I
Next, the second loss function is introduced by taking a weight matrix in the network model to be pruned and the target low-rank approximation matrix corresponding to the weight matrix as an example. Specifically, the calculation formula of the second loss function can be in the form shown in Formula 4:

U T U=I
V T V=I

其中,在公式4中,可以表示权重矩阵经过标准化处理得到的待分解权重矩阵,U可以表示左奇异矩阵,V可以表示右奇异矩阵的转置,∑可以表示奇异值矩阵,U和V可以表示正交基向量拼接成的酉矩阵,Trun可以表示目标低秩逼近矩阵,其可以具体为左奇异矩阵、奇异值矩阵、右奇异矩阵的转置的乘积,||·||F可以表示·的弗罗贝乌斯范数,I可以表示单位矩阵。In Formula 4, can represent the weight matrix to be decomposed obtained after the weight matrix is standardized, U can represent the left singular matrix, V can represent the transpose of the right singular matrix, ∑ can represent the singular value matrix, U and V can represent the unitary matrix concatenated by orthogonal basis vectors, Trun can represent the target low-rank approximation matrix, which can be specifically the product of the transpose of the left singular matrix, the singular value matrix, and the right singular matrix, ||·|| F can represent the Frobeus norm of ·, and I can represent the identity matrix.

可以理解,实际应用中,模型处理流程方法可以通过软件代码的形式实现并执行。It can be understood that in practical applications, the model processing flow method can be implemented and executed in the form of software code.

在一些可选的实施方式中,模型处理流程方法相关操作的伪代码(记为伪代码M1)如下:In some optional implementations, the pseudo code (referred to as pseudo code M1) of the operations related to the model processing flow method is as follows:

Input:一个n层的稠密网络W;在第iter次迭代的目标稀疏率fs(iter);在iter次迭代的生长权重占比fdeacy(iter);剪枝权重的更新频率Δ;总剪枝时长T=Tprune+Tfinetune;学习率αInput: an n-layer dense network W; target sparsity rate fs (iter) at iteration iter; growth weight ratio fdeacy (iter) at iteration iter; update frequency Δ of pruning weights; total pruning time T = Tprune + Tfinetune ; learning rate α

Out:A Sparse Model W⊙MOut:A Sparse Model W⊙M

:

1初始化一个全1矩阵;1Initialize a matrix of all 1s;

2 M←1;2 M←1;

3 //阶段1:权重更新(剪枝和生长);3 //Phase 1: weight update (pruning and growing);

4 for iter←1to Tprune do4 for iter←1to T prune do

5将训练数据输入网络进行前向传播,计算损失L;5 Input the training data into the network for forward propagation and calculate the loss L;

8 if iter%Δ=0then8 if iter%Δ=0then

7 //更新权重位置掩码M;7 //Update weight position mask M;

8 8

9根据权重幅值|W|剪枝到iter次迭代的目标稀疏率fs(iter)%;9 Prune to the target sparsity rate fs (iter)% of iter iterations according to the weight amplitude |W|;

10再次根据权重幅值|W|剪去剩余权重的fdeacy(iter)%;10 Again cut off f deacy (iter)% of the remaining weights according to the weight amplitude |W|;

11根据组合损失产生的梯度大小生长同样数量的权重fdeacy(iter)%11 Gradient size generated by combined loss Grow the same amount of weights f deacy (iter)%

12 end if12 end if

13 else 13 else

14使用梯度下降法训练稀疏网络 14 Training sparse networks using gradient descent

15 end if15 end if

18 end for18 end for

17 //阶段2:不改变权重的位置,对网络进行微调;17 //Stage 2: Fine-tune the network without changing the position of weights;

18 for iter←Tprune+1to T do18 for iter←T prune +1to T do

19不改变权重位置,训练稀疏网络直至收敛;19. Without changing the weight positions, train the sparse network until convergence;

20 end for20 end for

21 return稀疏网络W⊙M接下来,对伪代码M1中的各行代码进行说明:21 return sparse network W⊙M Next, we explain each line of code in pseudocode M1:

首先,M1中第5行代码用于确定任务损失,可以将训练数据集输入稠密网络W,在稠密网络W中,各卷积层的各待处理权重矩阵对训练数据进行前向传播,稠密网络W可以输出训练数据集中数据的预测信息,进而可以基于训练数据集中数据的标注信息和预测信息确定任务损失。First, the fifth line of code in M1 is used to determine the task loss. The training data set can be input into the dense network W. In the dense network W, each processed weight matrix of each convolutional layer forward propagates the training data. The dense network W can output the prediction information of the data in the training data set, and then the task loss can be determined based on the labeling information and prediction information of the data in the training data set.

M1中第8行用于确定对抗损失函数,可以通过确定稠密网络W中各待处理权重矩阵对应的低秩逼近,并利用各目标低秩逼近矩阵对训练数据进行卷积处理,稠密网络W可以输出训练数据集中数据的预测信息,进而可以基于训练数据集中数据的标注信息和预测信息,确定对抗损失函数。The 8th line in M1 is used to determine the adversarial loss function. The low-rank approximation corresponding to each weight matrix to be processed in the dense network W is determined, and the training data is convolved using each target low-rank approximation matrix. The dense network W can output the prediction information of the data in the training data set, and then the adversarial loss function can be determined based on the labeling information and prediction information of the data in the training data set.

M1中的第9行用于基于预设的需要剪枝的权重的剪去一定数量的权重,以提高模型的稀疏化,降低模型的内存。Line 9 in M1 is used to prune a certain number of weights based on the preset weights that need to be pruned to improve the sparsity of the model and reduce the memory of the model.

M1中的第10行用于在预设的需要剪枝的权重的基础上额外剪去一定数量的剩余未剪枝权重。Line 10 in M1 is used to prune a certain number of remaining unpruned weights based on the preset weights that need to be pruned.

M1中的第11行用于对已经被剪权重,按照梯度的绝对值的大小进行排序,对梯度较大的权重进行生长,从而可以确定保证被错剪的重要的权重能够被重新激活。Line 11 in M1 is used to sort the pruned weights according to the absolute value of the gradient and grow the weights with larger gradients, so as to ensure that important weights that were mistakenly pruned can be reactivated.

可以理解,上述伪代码,通过在对权重进行剪枝后,可以基于目标损失函数确定全部被剪权重的导数,对导数较大的权重进行生长,以保证重要的权重不会被错剪,保证模型的精度。It can be understood that the above pseudocode, after pruning the weights, can determine the derivatives of all pruned weights based on the target loss function, and grow the weights with larger derivatives to ensure that important weights are not pruned incorrectly, thereby ensuring the accuracy of the model.

为验证本申请实施例提供的模型处理方法的剪枝效果,在一些实例中,基于小数据集CIFAR-10进行剪枝效果的验证。表1示出了在不同稀疏率下基于不同剪枝算法得到的剪枝后的模型的分类准确率。基于表1可知,本申请实施例提供的模型处理方法在高稀疏率上的剪枝效果显著优于其他非结构化剪枝方法。In order to verify the pruning effect of the model processing method provided in the embodiment of the present application, in some examples, the pruning effect is verified based on the small data set CIFAR-10. Table 1 shows the classification accuracy of the pruned model obtained based on different pruning algorithms at different sparsity rates. Based on Table 1, it can be seen that the pruning effect of the model processing method provided in the embodiment of the present application at a high sparsity rate is significantly better than other unstructured pruning methods.

表1
Table 1

在另一些实例中,基于大数据集ImageNet进行剪枝效果的验证。表2示出了在不同稀疏率下基于不同剪枝算法得到剪枝后的模型的分类准确率。基于表2可知,本申请实施例提供的模型处理方法在高稀疏率上的剪枝效果显著优于其他非结构化剪枝方法。In other examples, the pruning effect is verified based on the large dataset ImageNet. Table 2 shows the classification accuracy of the pruned model based on different pruning algorithms at different sparsity rates. Based on Table 2, it can be seen that the pruning effect of the model processing method provided in the embodiment of the present application at a high sparsity rate is significantly better than other unstructured pruning methods.

表2

Table 2

图7是本申请实施例提供的模型处理方法在综合模型消耗算力和准确率的曲线示意图,基于图7可知,本申请实施例提供的模型处理方法在综合模型消耗算力和准确率上的效果显著优于其他非结构化剪枝方法。Figure 7 is a curve diagram of the computing power consumption and accuracy of the comprehensive model of the model processing method provided in the embodiment of the present application. Based on Figure 7, it can be seen that the model processing method provided in the embodiment of the present application is significantly better than other unstructured pruning methods in terms of the computing power consumption and accuracy of the comprehensive model.

在另一些实例中,在下游视觉任务,如目标检测、实例分割上通过使用ResNet-50 FPN为主干网络的Mask R-CNN框架,在COCO val2017数据集上进行验证。表3示出了在不同稀疏率下基于不同剪枝算法得到的剪枝后的目标检测模型的定位准确率和分割准确率。基于表3可知,本申请实施例提供的模型处理方法在高稀疏率上的剪枝效果显著优于其他非结构化剪枝方法。In other examples, downstream visual tasks such as target detection and instance segmentation are verified on the COCO val2017 dataset by using the Mask R-CNN framework with ResNet-50 FPN as the backbone network. Table 3 shows the positioning accuracy and segmentation accuracy of the pruned target detection model obtained based on different pruning algorithms at different sparsity rates. Based on Table 3, it can be seen that the pruning effect of the model processing method provided in the embodiment of the present application at high sparsity rates is significantly better than other unstructured pruning methods.

表3
Table 3

在另一些实例中,在transformer结构的DeiT-S模型上进行非结构化的剪枝,随后在ImageNet大型图片分类数据集上进行验证。表4示出了在不同稀疏率下基于不同剪枝算法得到的在transformer模型的分类准确率。基于表4可知,本申请实施例提供的模型处理方法在transformer架构上也较为有效。In other examples, unstructured pruning is performed on the DeiT-S model of the transformer structure, and then verified on the ImageNet large-scale image classification dataset. Table 4 shows the classification accuracy of the transformer model obtained based on different pruning algorithms at different sparsity rates. Based on Table 4, it can be seen that the model processing method provided in the embodiment of the present application is also more effective on the transformer architecture.

表4
Table 4

本申请实施例提供的模型处理方法能够对视觉模型进行高效压缩,降低视觉模型的算力,帮助大模型部署在小型边缘端设备。例如,可以部署至云服务,也可以部署至终端设备。下面举例说明将完成剪枝的模型部署于小型边缘端设备的效果。The model processing method provided in the embodiment of the present application can efficiently compress the visual model, reduce the computing power of the visual model, and help deploy large models on small edge devices. For example, it can be deployed to a cloud service or to a terminal device. The following example illustrates the effect of deploying a pruned model on a small edge device.

对于目标检测这种具有挑战性的视觉任务,往往需要大型视觉模型的支撑。在一些实例中,智能手机受功率和能耗的限制,属于小型的端侧设备。通过将完成非结构化剪枝的目标检测模型部署于手机端侧,能够在不损害模型性能的同时显著降低手机的能耗,延长手机运行寿命。Challenging visual tasks such as object detection often require the support of large visual models. In some cases, smartphones are small devices due to power and energy consumption constraints. By deploying the object detection model that completes unstructured pruning on the mobile phone side, it is possible to significantly reduce the energy consumption of the mobile phone without compromising the model performance, thereby extending the operating life of the mobile phone.

在另一些实例中,在视频监控中,摄像头通常为小型的端侧设备,对于部分偏于地区的监控系统供电设施部署困难,往往是通过太阳能供电,因此,对部署于摄像头端侧的目标检测模型的能耗具有较高的要求。通过将完成非结构化剪枝的目标检测模型部署在摄像头端侧,能够在不损害模型性能的同时显著降低视频监控的能耗。In other instances, in video surveillance, cameras are usually small end-side devices. For some remote areas where it is difficult to deploy power supply facilities for surveillance systems, they are often powered by solar energy. Therefore, the energy consumption of the target detection model deployed on the camera end is highly required. By deploying the target detection model that has completed unstructured pruning on the camera end, the energy consumption of video surveillance can be significantly reduced without compromising the performance of the model.

下面对本申请提及的电子设备进行介绍。可以理解,电子设备可以包括但不限于:手机(包括折叠屏手机和直板手机)、平板电脑、台式机(桌面型电脑)、手持计算机、笔记本电脑(膝上型电脑)、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant平板电脑(portable android device,PAD)、个人数字处理(personal digital assistant,PDA)、具有无线通信功能的手持设备、计算设备、车载设备或可穿戴设备,虚拟现实(virtual reality,VR)终端设备、增强现实(augmented reality,AR)终端设备、工业控制(industrial control)中的无线终端、无人驾驶(self driving)中的无线终端、远程医疗(remote medical)中的无线终端、智能电网(smart grid)中的无线终端、运输安全(transportation safety)中的无线终端、智慧城市(smart city)中的无线终端、智慧家庭(smart home)中的无线终端等移动终端或固定终端、充电宝等具有数据传输同步需求的电子设备。The electronic devices mentioned in this application are introduced below. It is understood that the electronic devices may include but are not limited to: mobile phones (including foldable screen mobile phones and straight-screen mobile phones), tablet computers, desktop computers, handheld computers, notebook computers, ultra-mobile personal computers (UMPC), netbooks, personal digital assistants (personal digital assistants), tablet computers (portable android devices, PADs), personal digital assistants (personal digital assistants, PDAs), handheld devices with wireless communication functions, computing devices, vehicle-mounted devices or wearable devices, virtual reality (virtual reality) Electronic devices with data transmission synchronization requirements include mobile terminals or fixed terminals, such as virtual reality (VR) terminal devices, augmented reality (AR) terminal devices, wireless terminals in industrial control, wireless terminals in self driving, wireless terminals in remote medical, wireless terminals in smart grid, wireless terminals in transportation safety, wireless terminals in smart city, wireless terminals in smart home, power banks, etc.

图8示出了电子设备的硬件结构示意图。可以理解的是,本申请的电子设备可以为服务器、台式机(桌面型电脑)、手持计算机、笔记本电脑(膝上型电脑)等电子设备,下面以电子设备为服务器为例对电子设备的结构进行介绍。Fig. 8 shows a schematic diagram of the hardware structure of the electronic device. It is understandable that the electronic device of the present application can be a server, a desktop computer, a handheld computer, a notebook computer (laptop) and other electronic devices. The structure of the electronic device is introduced below by taking the electronic device as an example of a server.

图8是本申请实施例中提供的服务器的框图,图8示意性地示出了多个实施例的示例服务器。在一个实施例中,服务器可以包括一个或多个处理器804,与处理器804中的至少一个连接的系统控制逻辑806,与系统控制逻辑808连接的系统内存812,与系统控制逻辑808连接的非易失性存储器(NVM)816,以及与系统控制逻辑808连接的网络接口820。FIG8 is a block diagram of a server provided in an embodiment of the present application, and FIG8 schematically shows an example server of multiple embodiments. In one embodiment, the server may include one or more processors 804, a system control logic 806 connected to at least one of the processors 804, a system memory 812 connected to the system control logic 808, a non-volatile memory (NVM) 816 connected to the system control logic 808, and a network interface 820 connected to the system control logic 808.

在一些实施例中,处理器804可以包括一个或多个单核或多核处理器。在一些实施例中,处理器804可以包括通用处理器和专用处理器(例如,图形处理器,应用处理器,基带处理器等)的任意组合。在服务器采用eNB(Evolved Node B,增强型基站)101或RAN(Radio Access Network,无线接入网)控制器102的实施例中,处理器804可以被配置为执行各种符合的实施例。In some embodiments, the processor 804 may include one or more single-core or multi-core processors. In some embodiments, the processor 804 may include any combination of general-purpose processors and special-purpose processors (e.g., graphics processors, application processors, baseband processors, etc.). In an embodiment where the server adopts an eNB (Evolved Node B, enhanced base station) 101 or a RAN (Radio Access Network, wireless access network) controller 102, the processor 804 may be configured to execute various compliant embodiments.

在一些实施例中,系统控制逻辑808可以包括任意合适的接口控制器,以向处理器804中的至少一个和/或与系统控制逻辑808通信的任意合适的设备或组件提供任意合适的接口。In some embodiments, system control logic 808 may include any suitable interface controller to provide any suitable interface to at least one of processors 804 and/or any suitable device or component in communication with system control logic 808 .

在一些实施例中,系统控制逻辑808可以包括一个或多个存储器控制器,以提供连接到系统内存812的接口。系统内存812可以用于加载以及存储数据和/或指令。在一些实施例中服务器的内存812可以包括任意合适的易失性存储器,例如合适的动态随机存取存储器(DRAM)。 In some embodiments, the system control logic 808 may include one or more memory controllers to provide an interface to a system memory 812. The system memory 812 may be used to load and store data and/or instructions. In some embodiments, the server's memory 812 may include any suitable volatile memory, such as a suitable dynamic random access memory (DRAM).

非易失性存储器(NVM)816可以包括用于存储数据和/或指令的一个或多个有形的、非暂时性的计算机可读介质。在一些实施例中,非易失性存储器(NVM)816可以包括闪存等任意合适的非易失性存储器和/或任意合适的非易失性存储设备,例如HDD(Hard Disk Drive,硬盘驱动器),CD(Compact Disc,光盘)驱动器,DVD(Digital Versatile Disc,数字通用光盘)驱动器中的至少一个。The non-volatile memory (NVM) 816 may include one or more tangible, non-transitory computer-readable media for storing data and/or instructions. In some embodiments, the non-volatile memory (NVM) 816 may include any suitable non-volatile memory such as flash memory and/or any suitable non-volatile storage device, such as at least one of a HDD (Hard Disk Drive), a CD (Compact Disc) drive, and a DVD (Digital Versatile Disc) drive.

非易失性存储器(NVM)816可以包括安装服务器的装置上的一部分存储资源,或者它可以由设备访问,但不一定是设备的一部分。例如,可以经由网络接口820通过网络访问非易失性存储器(NVM)816。The non-volatile memory (NVM) 816 may include a portion of storage resources on the device where the server is installed, or it may be accessible by the device but not necessarily a portion of the device. For example, the non-volatile memory (NVM) 816 may be accessed over a network via the network interface 820 .

特别地,系统内存812和非易失性存储器(NVM)816可以分别包括:指令824的暂时副本和永久副本。指令824可以包括:由处理器804中的至少一个执行时导致服务器实施本申请实施例中提及的模型量化方法的指令。在一些实施例中,指令824、硬件、固件和/或其软件组件可另外地/替代地置于系统控制逻辑808,网络接口820和/或处理器804中。In particular, the system memory 812 and the non-volatile memory (NVM) 816 may include: a temporary copy and a permanent copy of the instruction 824, respectively. The instruction 824 may include: an instruction that causes the server to implement the model quantization method mentioned in the embodiment of the present application when executed by at least one of the processors 804. In some embodiments, the instruction 824, hardware, firmware and/or its software components may be additionally/alternatively placed in the system control logic 808, the network interface 820 and/or the processor 804.

网络接口820可以包括收发器,用于为服务器提供无线电接口,进而通过一个或多个网络与任意其他合适的设备(如前端模块,天线等)进行通信。在一些实施例中,网络接口820可以集成于服务器的其他组件。例如,网络接口820可以集成于处理器804的,系统内存812,非易失性存储器(NVM)816,和具有指令的固件设备(未示出)中的至少一种,当处理器804中的至少一个执行所述指令时,服务器实现本申请实施例中提及的模型量化方法。The network interface 820 may include a transceiver for providing a radio interface for the server, and then communicating with any other suitable device (such as a front-end module, an antenna, etc.) through one or more networks. In some embodiments, the network interface 820 may be integrated with other components of the server. For example, the network interface 820 may be integrated with at least one of the processor 804, the system memory 812, the non-volatile memory (NVM) 816, and a firmware device (not shown) having instructions. When at least one of the processors 804 executes the instructions, the server implements the model quantization method mentioned in the embodiments of the present application.

网络接口820可以进一步包括任意合适的硬件和/或固件,以提供多输入多输出无线电接口。例如,网络接口820可以是网络适配器,无线网络适配器,电话调制解调器和/或无线调制解调器。The network interface 820 may further include any suitable hardware and/or firmware to provide a multiple-input multiple-output radio interface. For example, the network interface 820 may be a network adapter, a wireless network adapter, a telephone modem and/or a wireless modem.

在一个实施例中,处理器804中的至少一个可以与用于系统控制逻辑808的一个或多个控制器的逻辑封装在一起,以形成系统封装(SiP)。在一个实施例中,处理器804中的至少一个可以与用于系统控制逻辑808的一个或多个控制器的逻辑集成在同一管芯上,以形成片上系统(SoC)。In one embodiment, at least one of the processors 804 may be packaged together with logic for one or more controllers of the system control logic 808 to form a system in package (SiP). In one embodiment, at least one of the processors 804 may be integrated on the same die with logic for one or more controllers of the system control logic 808 to form a system on chip (SoC).

服务器可以进一步包括:输入/输出(I/O)设备832。I/O设备832可以包括用户界面,使得用户能够与服务器进行交互;外围组件接口的设计使得外围组件也能够与服务器交互。在一些实施例中,服务器还包括传感器,用于确定与服务器相关的环境条件和位置信息的至少一种。The server may further include: an input/output (I/O) device 832. The I/O device 832 may include a user interface that enables a user to interact with the server; the design of the peripheral component interface enables the peripheral component to also interact with the server. In some embodiments, the server also includes a sensor for determining at least one of an environmental condition and location information related to the server.

在一些实施例中,用户界面可包括但不限于显示器(例如,液晶显示器,触摸屏显示器等),扬声器,麦克风,一个或多个相机(例如,静止图像照相机和/或摄像机),手电筒(例如,发光二极管闪光灯)和键盘。In some embodiments, the user interface may include, but is not limited to, a display (e.g., an LCD display, a touch screen display, etc.), a speaker, a microphone, one or more cameras (e.g., a still image camera and/or a video camera), a flashlight (e.g., an LED flash), and a keyboard.

在一些实施例中,外围组件接口可以包括但不限于非易失性存储器端口、音频插孔和电源接口。In some embodiments, the peripheral component interface may include, but is not limited to, a non-volatile memory port, an audio jack, and a power interface.

在一些实施例中,传感器可包括但不限于陀螺仪传感器,加速度计,近程传感器,环境光线传感器和定位单元。定位单元还可以是网络接口820的一部分或与网络接口820交互,以与定位网络的组件(例如,全球定位系统(GPS)卫星)进行通信。In some embodiments, the sensors may include, but are not limited to, gyroscope sensors, accelerometers, proximity sensors, ambient light sensors, and positioning units. The positioning unit may also be part of or interact with the network interface 820 to communicate with components of a positioning network (e.g., global positioning system (GPS) satellites).

需要说明的是,本申请各设备实施例中提到的各单元/模块都是逻辑单元/模块,在物理上,一个逻辑单元/模块可以是一个物理单元/模块,也可以是一个物理单元/模块的一部分,还可以以多个物理单元/模块的组合实现,这些逻辑单元/模块本身的物理实现方式并不是最重要的,这些逻辑单元/模块所实现的功能的组合才是解决本申请所提出的技术问题的关键。此外,为了突出本申请的创新部分,本申请上述各设备实施例并没有将与解决本申请所提出的技术问题关系不太密切的单元/模块引入,这并不表明上述设备实施例并不存在其它的单元/模块。It should be noted that the units/modules mentioned in the various device embodiments of the present application are all logical units/modules. Physically, a logical unit/module can be a physical unit/module, or a part of a physical unit/module, or can be implemented as a combination of multiple physical units/modules. The physical implementation method of these logical units/modules themselves is not the most important. The combination of functions implemented by these logical units/modules is the key to solving the technical problems proposed by the present application. In addition, in order to highlight the innovative part of the present application, the above-mentioned device embodiments of the present application do not introduce units/modules that are not closely related to solving the technical problems proposed by the present application, which does not mean that there are no other units/modules in the above-mentioned device embodiments.

需要说明的是,在本专利的示例和说明书中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that in the examples and description of this patent, relational terms such as first and second, etc. are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Moreover, the terms "include", "comprise" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or device. In the absence of further restrictions, the elements defined by the sentence "including one" do not exclude the existence of other identical elements in the process, method, article or device including the elements.

虽然通过参照本申请的某些优选实施例,已经对本申请进行了图示和描述,但本领域的普通技术人员应该明白,可以在形式上和细节上对其作各种改变,而不偏离本申请的范围。 Although the present application has been illustrated and described with reference to certain preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the present application.

Claims (12)

一种模型处理方法,其特征在于,用于电子设备,所述方法包括:A model processing method, characterized in that it is used for electronic equipment, the method comprising: 获取第一神经网络模型,其中,所述第一神经网络模型是对待剪枝神经网络模型进行剪枝得到的,并且所述剪枝包括将所述待剪枝神经网络模型的第一权重集合修改为所述第一神经网络模型的第二权重集合,其中,第一权重集合中非零的第一权重在所述第二权重集合中对应的值为零;Obtaining a first neural network model, wherein the first neural network model is obtained by pruning the neural network model to be pruned, and the pruning includes modifying a first weight set of the neural network model to be pruned into a second weight set of the first neural network model, wherein a non-zero first weight in the first weight set has a corresponding value of zero in the second weight set; 从所述第二权重集合中选择出满足权重条件的目标权重集合;Selecting a target weight set that meets a weight condition from the second weight set; 对所述第二权重集合中的目标权重集合做参数调整处理得到第三权重集合,其中,所述第一权重在所述第三权重集合中的值非零;Performing parameter adjustment processing on the target weight set in the second weight set to obtain a third weight set, wherein the value of the first weight in the third weight set is non-zero; 将所述第三权重集合用于所述第一神经网络模型。The third set of weights is used for the first neural network model. 根据权利要求1所述的方法,其特征在于,所述从所述第二权重集合中选择出满足权重条件的目标权重集合,包括:The method according to claim 1, characterized in that the step of selecting a target weight set satisfying a weight condition from the second weight set comprises: 对所述第二权重集合中的任一权重进行恢复处理,得到候选神经网络模型;Performing recovery processing on any weight in the second weight set to obtain a candidate neural network model; 获取所述第一神经网络模型对应的损失函数和所述候选神经网络模型对应的损失函数的梯度;Obtaining the gradient of the loss function corresponding to the first neural network model and the loss function corresponding to the candidate neural network model; 基于对所述第二权重集合中每个权重进行恢复后,所述第一神经网络模型对应的损失函数和所述候选神经网络模型对应的损失函数的梯度,确定所述目标权重集合。The target weight set is determined based on the gradient of the loss function corresponding to the first neural network model and the loss function corresponding to the candidate neural network model after each weight in the second weight set is restored. 根据权利要求2所述的方法,其特征在于,所述获取所述第一神经网络模型对应的损失函数,包括:The method according to claim 2, characterized in that obtaining the loss function corresponding to the first neural network model comprises: 基于训练数据集对应的真实信息和所述第一神经网络模型输出的预测信息,确定第一损失函数;Determining a first loss function based on real information corresponding to the training data set and predicted information output by the first neural network model; 基于所述第一神经网络模型和所述候选神经网络模型,确定第二损失函数;Determine a second loss function based on the first neural network model and the candidate neural network model; 基于所述第一损失函数和所述第二损失函数确定所述第一神经网络模型对应的损失函数。A loss function corresponding to the first neural network model is determined based on the first loss function and the second loss function. 根据权利要求3所述的方法,其特征在于,所述获取所述第一神经网络模型对应的第一损失函数,包括:The method according to claim 3, characterized in that obtaining a first loss function corresponding to the first neural network model comprises: 将所述训练数据集输入所述第一神经网络模型,获取所述第一神经网络模型输出的预测信息;Inputting the training data set into the first neural network model to obtain prediction information output by the first neural network model; 基于所述训练数据集对应的真实信息和所述预测信息,获取所述第一损失函数。The first loss function is obtained based on the real information corresponding to the training data set and the predicted information. 根据权利要求3所述的方法,其特征在于,所述获取所述第一神经网络模型对应的第二损失函数,包括:The method according to claim 3, characterized in that obtaining the second loss function corresponding to the first neural network model comprises: 获取所述待剪枝网络模型中的权重矩阵集合;Obtaining a weight matrix set in the network model to be pruned; 确定所述权重矩阵集合中各权重矩阵对应的目标低秩逼近矩阵;Determine a target low-rank approximation matrix corresponding to each weight matrix in the weight matrix set; 基于所述各权重矩阵和所述各权重矩阵对应的目标低秩逼近矩阵,获取第一神经网络模型对应的所述第二损失函数。Based on the weight matrices and the target low-rank approximation matrices corresponding to the weight matrices, the second loss function corresponding to the first neural network model is obtained. 根据权利要求5所述的方法,其特征在于,所述确定所述权重矩阵集合中各权重矩阵对应的目标低秩逼近矩阵,包括:The method according to claim 5, characterized in that the step of determining the target low-rank approximation matrix corresponding to each weight matrix in the weight matrix set comprises: 对所述各权重矩阵进行标准化处理,得到各所述权重矩阵对应的待分解权重矩阵;Performing standardization processing on the weight matrices to obtain weight matrices to be decomposed corresponding to the weight matrices; 对所述各权重矩阵对应的待分解权重矩阵进行低秩分解处理,获取各所述权重矩阵对应的目标低秩逼近矩阵。Perform low-rank decomposition processing on the weight matrices to be decomposed corresponding to the weight matrices to obtain the target low-rank approximation matrices corresponding to the weight matrices. 根据权利要求1-6任一项所述的方法,其特征在于,所述对所述待剪枝神经网络模型进行剪 枝,得到第一神经网络模型,包括:The method according to any one of claims 1 to 6, characterized in that the pruning of the neural network model to be pruned The first neural network model is obtained, including: 基于所述待剪枝网络模型中各权重的幅值和预设稀疏率对所述待剪枝网络模型进行剪枝,得到所述第一神经网络模型。The network model to be pruned is pruned based on the amplitude of each weight in the network model to be pruned and a preset sparsity rate to obtain the first neural network model. 根据权利要求7所述的方法,其特征在于,所述基于所述待剪枝网络模型中各权重的幅值和预设稀疏率对所述待剪枝网络模型进行剪枝,得到所述第一神经网络模型,包括:The method according to claim 7, characterized in that the pruning of the network model to be pruned based on the amplitude of each weight in the network model to be pruned and a preset sparsity rate to obtain the first neural network model comprises: 基于所述待剪枝网络模型中各权重的幅值和预设稀疏率,将所述待剪枝网络模型中所述预设稀疏率对应数量的权重确定为剪枝权重;Based on the amplitude of each weight in the network model to be pruned and the preset sparsity rate, determining the weights of the number corresponding to the preset sparsity rate in the network model to be pruned as pruning weights; 将所述剪枝权重对应的值置零,得到所述第一神经网络模型。The value corresponding to the pruning weight is set to zero to obtain the first neural network model. 根据权利要求1-6任一项所述的方法,其特征在于,所述对所述待剪枝神经网络模型进行剪枝,得到第一神经网络模型,包括:The method according to any one of claims 1 to 6, characterized in that pruning the neural network model to be pruned to obtain a first neural network model comprises: 基于所述待剪枝网络模型中各权重的幅值、预设稀疏率和预设生长率,对所述待剪枝网络模型进行剪枝,得到所述第一神经网络模型。Based on the amplitude of each weight in the network model to be pruned, a preset sparsity rate and a preset growth rate, the network model to be pruned is pruned to obtain the first neural network model. 根据权利要求9所述的方法,其特征在于,所述对所述待剪枝神经网络模型进行剪枝,得到第一神经网络模型,包括:The method according to claim 9, characterized in that pruning the neural network model to be pruned to obtain the first neural network model comprises: 基于所述待剪枝网络模型中各权重的幅值和预设稀疏率,将所述待剪枝网络模型中所述预设稀疏率对应数量的权重确定为第一权重子集合;Based on the amplitude of each weight in the network model to be pruned and the preset sparsity rate, determining the weights of the number corresponding to the preset sparsity rate in the network model to be pruned as a first weight subset; 基于所述待剪枝网络模型中除所述第一权重子集合之外的各权重的幅值和预设生长率,将所述待剪枝网络模型中除所述第一权重子集合之外所述预设生长率对应数量的权重确定为第二权子集合;Based on the amplitude and preset growth rate of each weight in the network model to be pruned except the first weight subset, the weights corresponding to the preset growth rate in the network model to be pruned except the first weight subset are determined as a second weight subset; 将所述第一权重子集合和所述第二权重子集合中权重对应的值置零,得到所述第一神经网络模型。The values corresponding to the weights in the first weight subset and the second weight subset are set to zero to obtain the first neural network model. 一种电子设备,其特征在于,包括:存储器,用于存储所述电子设备的一个或多个处理器执行的指令,以及所述处理器,是所述电子设备的一个或多个处理器之一,用于执行权利要求1-10任一项所述的模型处理方法。An electronic device, characterized in that it comprises: a memory for storing instructions executed by one or more processors of the electronic device, and the processor, which is one of the one or more processors of the electronic device, is used to execute the model processing method described in any one of claims 1-10. 一种可读存储介质,其特征在于,所述可读存储介质上存储有指令,所述指令在电子设备上执行时使得所述电子设备执行权利要求1-10任一项所述的模型处理方法。 A readable storage medium, characterized in that instructions are stored on the readable storage medium, and when the instructions are executed on an electronic device, the electronic device executes the model processing method according to any one of claims 1 to 10.
PCT/CN2024/084957 2023-05-08 2024-03-29 Model processing method, electronic device and medium Pending WO2024230358A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202310514419.8A CN116702858A (en) 2023-05-08 2023-05-08 A model processing method, electronic equipment and medium
CN202310514419.8 2023-05-08

Publications (1)

Publication Number Publication Date
WO2024230358A1 true WO2024230358A1 (en) 2024-11-14

Family

ID=87834804

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2024/084957 Pending WO2024230358A1 (en) 2023-05-08 2024-03-29 Model processing method, electronic device and medium

Country Status (2)

Country Link
CN (1) CN116702858A (en)
WO (1) WO2024230358A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119227769A (en) * 2024-11-28 2024-12-31 山东海量信息技术研究院 A model pruning method, device and storage medium for heterogeneous computing clusters
CN120386534A (en) * 2025-06-27 2025-07-29 科讯嘉联信息技术有限公司 A lightweight large-model intelligent customer service deployment method for edge computing

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117034090A (en) * 2023-09-06 2023-11-10 北京百度网讯科技有限公司 Model parameter adjustment, model application methods, devices, equipment and media
CN117058525B (en) * 2023-10-08 2024-02-06 之江实验室 Model training method and device, storage medium and electronic equipment
CN119886258B (en) * 2025-01-07 2025-10-03 杭州电子科技大学 An unstructured pruning method based on gradient and weight changes

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109726799A (en) * 2018-12-27 2019-05-07 四川大学 A compression method of deep neural network
CN111079899A (en) * 2019-12-05 2020-04-28 中国电子科技集团公司信息科学研究院 Neural network model compression method, system, device and medium
US20200167689A1 (en) * 2018-11-28 2020-05-28 Here Global B.V. Method, apparatus, and system for providing data-driven selection of machine learning training observations
CN112836817A (en) * 2019-11-22 2021-05-25 中国科学技术大学 A Compression Method for Convolutional Neural Network Models
CN115374926A (en) * 2021-05-17 2022-11-22 Oppo广东移动通信有限公司 Neural network pruning method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200167689A1 (en) * 2018-11-28 2020-05-28 Here Global B.V. Method, apparatus, and system for providing data-driven selection of machine learning training observations
CN109726799A (en) * 2018-12-27 2019-05-07 四川大学 A compression method of deep neural network
CN112836817A (en) * 2019-11-22 2021-05-25 中国科学技术大学 A Compression Method for Convolutional Neural Network Models
CN111079899A (en) * 2019-12-05 2020-04-28 中国电子科技集团公司信息科学研究院 Neural network model compression method, system, device and medium
CN115374926A (en) * 2021-05-17 2022-11-22 Oppo广东移动通信有限公司 Neural network pruning method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119227769A (en) * 2024-11-28 2024-12-31 山东海量信息技术研究院 A model pruning method, device and storage medium for heterogeneous computing clusters
CN120386534A (en) * 2025-06-27 2025-07-29 科讯嘉联信息技术有限公司 A lightweight large-model intelligent customer service deployment method for edge computing

Also Published As

Publication number Publication date
CN116702858A (en) 2023-09-05

Similar Documents

Publication Publication Date Title
WO2024230358A1 (en) Model processing method, electronic device and medium
CN112101190B (en) A remote sensing image classification method, storage medium and computing device
US10282641B2 (en) Technologies for classification using sparse coding in real time
EP4016331A1 (en) Neural network dense layer sparsification and matrix compression
WO2023098544A1 (en) Structured pruning method and apparatus based on local sparsity constraints
CN111062382A (en) Channel pruning method for target detection network
CN108764471A (en) The neural network cross-layer pruning method of feature based redundancy analysis
CN116186612A (en) Sulfur hexafluoride recycling management system
CN113901904A (en) Image processing method, face recognition model training method, device and equipment
CN109165699B (en) Fine-grained Image Classification Methods
CN114677545A (en) Lightweight image classification method based on similarity pruning and efficient module
CN114207605A (en) Text classification method and device, electronic equipment and storage medium
CN111353591A (en) Computing device and related product
US20230325665A1 (en) Sparsity-based reduction of gate switching in deep neural network accelerators
US20240028895A1 (en) Switchable one-sided sparsity acceleration
US20230252299A1 (en) Detecting and mitigating fault in sparsity computation in deep neural network
CN108664993A (en) A kind of convolutional neural networks image classification method of intensive weight connection
JP7546630B2 (en) Neural network optimization method, computer system, and computer-readable storage medium
CN108062559A (en) A kind of image classification method based on multiple receptive field, system and device
CN111539460A (en) Image classification method, device, electronic device and storage medium
US20230394312A1 (en) Pruning activations and weights of neural networks with programmable thresholds
CN111340057A (en) Classification model training method and device
CN110688501B (en) A Hash Retrieval Method Based on Deep Learning Fully Convolutional Networks
CN113222121B (en) A data processing method, device and equipment
US12236342B2 (en) Tensor ring decomposition for neural networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24802645

Country of ref document: EP

Kind code of ref document: A1