WO2023071766A1 - Model compression method, model compression system, server, and storage medium - Google Patents
Model compression method, model compression system, server, and storage medium Download PDFInfo
- Publication number
- WO2023071766A1 WO2023071766A1 PCT/CN2022/124433 CN2022124433W WO2023071766A1 WO 2023071766 A1 WO2023071766 A1 WO 2023071766A1 CN 2022124433 W CN2022124433 W CN 2022124433W WO 2023071766 A1 WO2023071766 A1 WO 2023071766A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- model
- candidate
- computing power
- compression
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Definitions
- the present application relates to the field of neural network architecture search, in particular to a model compression method, model compression system, server and storage medium.
- Neural networks have achieved outstanding results in the field of machine learning, such as object classification, target detection, etc. Intelligent algorithms based on neural networks have changed human production and lifestyle. However, due to the high computational complexity of neural networks and the large number of model parameters, model computing power compression technology has become a research hotspot in the academic and industrial circles in recent years.
- model compression includes: parameter quantization, network pruning, and artificial design of efficient neural network structures; however, the two methods of parameter quantization and network pruning have great limitations in accuracy; The design of the neural network structure is inseparable from a large amount of human investment and research, and requires a lot of time for manual experiments and parameter adjustments; it can be seen that the current model compression methods cannot balance accuracy and save labor costs.
- the main purpose of the embodiment of the present application is to provide a model compression method, a model compression system, a server and a storage medium, taking into account the accuracy of obtaining the compressed model and reducing labor costs.
- an embodiment of the present application provides a model compression method, including: receiving a candidate model; performing neural network architecture search on the candidate model to obtain multiple transformation models; performing computing power compression on the candidate model to obtain Compression model: using the transformation model and the compression model as the candidate model, re-performing the neural network architecture search and the computing power compression.
- the embodiment of the present application also provides a model compression system, including: a receiving module, a model automatic search module, and a computing power compression module; the receiving module is used to receive candidate models; the model automatic search module is used to search for the candidate The model performs a neural network architecture search to obtain a plurality of transformation models and input them into the receiving module as the candidate model; the computing power compression module is used to compress the computing power of the candidate model to obtain a compressed model, and the compressed model is used as The candidate models are input into the receiving module.
- the embodiment of the present application also provides a server, including: at least one processor; and a memory connected in communication with the at least one processor; wherein, the memory stores instructions that can be executed by the at least one processor , the instructions are executed by the at least one processor, so that the at least one processor can execute the above-mentioned model compression method.
- the embodiment of the present application also provides a computer-readable storage medium storing a computer program, and implementing the above-mentioned model compression method when the computer program is executed by a processor.
- the model compression method of this application can obtain as many neural network models as possible through the neural network architecture search, thereby improving the probability of obtaining a better model, and this application can compress multiple neural network models obtained in the previous poll through computing power compression.
- Transform the model for compression it can be seen that this application combines model compression with neural network architecture search, and uses the transformed model obtained by neural network architecture search as a candidate model to compress the computing power in the next poll, and again as a candidate Model, the compression model obtained by computing power compression can also be searched for the neural network architecture in the next polling to obtain the transformation model; through continuous polling, after polling reaches a certain level, a high-precision compression model can be obtained, thereby Taking into account the accuracy of the compressed model, and the entire model compression process reduces manual participation and reduces labor costs.
- the present application can automatically obtain the compressed model, and also speeds up the speed of model compression.
- Fig. 1 is a schematic flow chart of a model compression method according to an embodiment of the present application
- FIG. 2 is a schematic flowchart of the sub-steps of step 103 of the model compression method according to an embodiment of the present application;
- FIG. 3 is a schematic flow diagram of a model compression method according to an embodiment of the present application.
- Fig. 4 is a schematic flow chart of a model compression method according to an embodiment of the present application.
- FIG. 5 is a schematic structural diagram of a model compression system according to an embodiment of the present application.
- FIG. 6 is a schematic structural diagram of a model compression system according to an embodiment of the present application.
- Fig. 7 is a schematic structural diagram of a server according to an embodiment of the present application.
- FIG. 1 is a schematic flow chart of the model compression method of this embodiment, specifically including the following steps:
- Step 101 receiving candidate models.
- the execution subject of this embodiment is the model compression system, and the model compression system needs to receive candidate models first during the process of model compression.
- step 102 a neural network architecture search is performed on the candidate models to obtain multiple transformed models.
- NAS Neural Architecture Search
- NAS can effectively liberate human input, and NAS can explore the network structure in a larger space, and has the opportunity to generate high-performance structures beyond human design.
- the network field has been widely used. Therefore, while the model compression method of this embodiment combines NAS to solve the problem of manpower input, it can also obtain a compressed model with higher accuracy.
- performing neural network architecture search on candidate models to obtain multiple transformation models includes: performing neural network architecture search based on a preset search space, performing random transformation on candidate models to obtain multiple transformation models.
- the process of NAS search is to traverse the preset search space based on the preset search strategy, and randomly transform the candidate models in each NAS search to obtain multiple transformed models. This randomness ensures the diversity of generated model structures, which is conducive to searching for high-performance models.
- the search space in this embodiment can be set by the user or defaulted by the system, and the search strategy can also be set by the user or defaulted by the system; wherein, the NAS search strategy includes reinforcement learning-based, evolutionary algorithm-based and based on gradient descent etc.
- search strategy involved in this embodiment can be any search strategy, which has strong robustness, and can be set as a corresponding search strategy according to the requirements of the user or the system, which is not specifically limited in this embodiment.
- step 103 computing power is compressed on the candidate model to obtain a compressed model.
- Step 102 may be before step 103, step 102 may be after step 103, and steps 102 and 103 may be performed simultaneously.
- performing computing power compression on the candidate model to obtain the compressed model includes the following sub-steps, and the specific flowchart is shown in FIG. 2 , including:
- Step 1031 calculate the computing power of each layer in the candidate model.
- computing power compression is responsible for directional compression of the computing power of candidate models. For example, it will detect model computing power-intensive areas and perform directional compression processing on the layer that consumes the most computing power.
- the computing power consumption of each layer in the candidate model can be calculated according to the input of the candidate model and the parameters of each layer.
- the computing power consumption of each layer is mainly determined by the size of the feature map input to the layer and the size of the weight parameters that need to be trained for the operation of the layer, which can be approximately expressed by the following formula:
- l represents an operation layer in the model (such as a convolutional layer)
- H and W represent the size of the feature map input to this layer
- C represents the number of channels of the input feature map
- Params represents the parameter amount of the current layer.
- Step 1032 comparing the computing power of each layer to obtain the layer with the largest computing power.
- the computing power compression step mainly performs computing power compression on the "computing power intensive area" of the model.
- step 1033 computing power is compressed on the layer with the largest computing power to obtain a compressed model.
- performing computing power compression on the layer with the largest computing power includes: reducing the parameter amount of the layer with the largest computing power, or removing the layer with the largest computing power, or reducing the size of the input feature map of the layer with the largest computing power .
- the factors that affect the calculation power mainly include the input feature map of the layer and the parameter quantity of the current layer; therefore, the compression of the calculation power of a certain layer in the model has the following three available directions:
- step 104 the transformation model and the compression model are used as candidate models for training.
- the model generated by neural network architecture search and computing power compression is used as a candidate model, and will return to step 101 to re-perform the neural network architecture search and computing power compression; that is, the above steps 101 to Step 104 is a polling process. After step 104 is completed, step 101 is executed again, thereby forming a closed-loop operation of the system.
- the candidate model received by the model compression system is the reference model input by the user.
- the candidate models in this embodiment will all be set with an ID (ID, Identity document), the user will input the benchmark model into the system, and the model generated through neural network architecture search and computing power compression will be assigned an ID. This ID is unique, which can ensure that the trained model will not have the problem of repeated training.
- the model compression method of this embodiment can obtain as many neural network models as possible through the neural network architecture search, thereby improving the probability of obtaining a better model, and this application can use the computational power compression to obtain as many neural network models as possible in the previous polling.
- this application combines model compression with neural network architecture search, and uses the transformation model obtained by neural network architecture search as the compression model obtained by computing power compression in the next polling of the candidate model, and again as the compression model
- the candidate model, the compression model obtained by computing power compression can also be searched for the neural network architecture in the next polling to obtain the transformation model; through continuous polling, after polling reaches a certain level, a high-precision compression model can be obtained. In this way, the accuracy of the compressed model is taken into account, and the entire model compression process reduces manual participation and reduces labor costs.
- the present application can automatically obtain a compressed model, and also accelerates the speed of model compression.
- An embodiment of the present application relates to a model compression method. This embodiment is roughly the same as the previous embodiment. The main difference is that after receiving the candidate model, it also includes: training the candidate model to obtain the trained model; obtaining the trained model Accuracy and computing power: Based on the accuracy and computing power, the evaluation parameters of the candidate models corresponding to the trained model are obtained. For ease of description, the parts of this embodiment that are the same as or corresponding to the previous embodiment will not be described again.
- FIG. 3 A schematic flow chart of the model compression method of this embodiment is shown in Figure 3, specifically including the following steps:
- Step 201 receiving candidate models.
- Step 202 train the candidate model to obtain the trained model.
- Step 203 acquiring the accuracy and computing power of the trained model.
- step 204 the evaluation parameters of the candidate models corresponding to the trained models are obtained based on the accuracy and computing power.
- Step 205 perform neural network architecture search on the candidate models to obtain multiple transformed models.
- step 206 computing power is compressed on the candidate model to obtain a compressed model.
- step 207 the transformation model and the compression model are used as candidate models.
- step 201, step 205 to step 207 are the same as step 101 to step 104 in the previous embodiment, and will not be repeated here.
- step 201 to step 207 is a polling process, and the model compression system will enter step 201 again after completing step 207, thus forming a closed-loop operation of the system.
- the candidate model is trained to obtain the trained model, and the performance of the trained model is evaluated, so that the candidate model required by the user can be screened out according to the user's needs.
- the benchmark model input by the user is used as a candidate model, and only the benchmark model input by the user is evaluated for performance.
- the number of candidate models is large, and model training needs to be performed on each candidate model, and there are also many trained models.
- performance evaluation it is necessary to traverse all the trained models. And evaluate the performance of each trained model.
- the performance evaluation of this embodiment requires the use of two parameters, the accuracy of the candidate model and the computing power. Therefore, after the candidate model is obtained in this embodiment, the accuracy and computing power of the candidate model are obtained.
- the accuracy It refers to the accuracy rate of the candidate model during use
- computing power refers to the intensity of the calculation amount of the candidate model during use.
- the evaluation parameter can represent the performance of the candidate model after weighing multiple indicators.
- the evaluation parameters include evaluation scores; the evaluation parameters of the candidate models corresponding to the trained models are obtained based on the accuracy and computing power, including: calculating the accuracy and computing power according to the preset weight ratio to obtain the candidate models evaluation score.
- different weights are set for accuracy and computing power, so as to calculate the evaluation scores of candidate models, and different weights can be set according to user needs. Set a higher weight for accuracy; or users have higher requirements for computing power, so you can set a higher weight for computing power.
- the evaluation score it can be set that the higher the evaluation score, the better the performance of the obtained candidate model.
- the evaluation parameter includes an evaluation score; after obtaining the evaluation parameters of the candidate model corresponding to the trained model based on accuracy and computing power, it also includes: retaining the candidate model whose evaluation score is greater than or equal to the preset score, or retaining Evaluate the N candidate models with the largest scores.
- the evaluation parameter includes an evaluation score; after obtaining the evaluation parameters of the candidate model corresponding to the trained model based on accuracy and computing power, it also includes: retaining the candidate model whose evaluation score is greater than or equal to the preset score, or retaining Evaluate the N candidate models with the largest scores.
- this embodiment will set a preset rule, that is, retain candidate models whose evaluation scores are greater than or equal to the preset score, or reserve N candidate models with the largest evaluation scores, so as to retain Candidate models that do not meet the preset rules are deleted.
- the number of candidate models increases and needs to be screened.
- Each candidate model can be evaluated according to the evaluation score. Sorting, setting a preset score in advance, retaining the candidate models whose evaluation score is greater than or equal to the preset score, or setting a value N in advance, retaining the N candidate models with the largest evaluation score, so as to enter the subsequent neural network architecture search and
- candidate models that do not meet the preset rules are deleted, so that models with higher performance can be fully utilized, thereby reducing the amount of computing of the system and avoiding resource waste.
- the default rule is to set a value N, and keep the N candidate models with the largest evaluation scores.
- An embodiment of the present application relates to a model compression method. This embodiment is roughly the same as the previous embodiment. The main difference is that in this embodiment, after the evaluation score of the candidate model corresponding to the trained model is obtained based on accuracy and computing power, It also includes: judging whether the maximum evaluation score obtained by this polling is greater than the maximum evaluation score obtained by the previous polling; for the convenience of description, the same or corresponding parts of this embodiment and the previous embodiment will not be described again.
- FIG. 4 A schematic flow chart of the model compression method of this embodiment is shown in Figure 4, specifically including the following steps:
- Step 301 receiving candidate models.
- Step 302 train the candidate model to obtain the trained model.
- Step 303 acquiring the accuracy and computing power of the trained model.
- Step 304 Obtain the evaluation parameters of the candidate model corresponding to the trained model based on the accuracy and computing power.
- Step 305 judging whether the maximum evaluation score obtained in this poll is greater than the maximum evaluation score obtained in the previous poll. If yes, go to step 307 and step 308; if not, go to step 306.
- Step 306 adding 1 to the counter value, and judging whether the counter value is smaller than a preset threshold. If yes, go to step 307 and step 308; if not, polling ends.
- Step 307 perform neural network architecture search on the candidate models to obtain multiple transformed models.
- step 308 computing power is compressed on the candidate model to obtain a compressed model.
- step 309 the transformation model and the compression model are used as candidate models.
- steps 301 to 304, and steps 307 to 309 are the same as steps 201 to 207 in the previous embodiment, and will not be repeated here.
- the evaluation score of each candidate model will be calculated, and the model with the highest evaluation score and the best performance will appear; after that, the maximum evaluation score of this poll will be compared score and the maximum evaluation score obtained in the last poll. If the maximum evaluation score obtained in this poll is greater than the maximum evaluation score obtained in the previous poll, it means that the candidate model is being continuously optimized, and polling again may result in better performance.
- step 307 and step 308 continue to step 307 and step 308; if the maximum evaluation score of this poll is less than or equal to the maximum evaluation score of the previous poll, it means that the model with the best performance has probably been obtained so far , enter step 306 at this time, add 1 to the counter value, and judge whether the counter value is less than the preset threshold value. If the model is high, it is necessary to enter step 307 and step 308 again; if the counter value is greater than or equal to the preset threshold, the number of polling times has reached the set standard, and the polling ends.
- the model compression system is equipped with a counter. If the maximum evaluation score of this poll is less than or equal to the maximum evaluation score obtained in the previous poll, the counter will be incremented by 1, and the initial value of the counter value is 0.
- the user can also actively issue a stop command to instruct the system to end polling.
- the system has acquired a model with higher performance in steps 303 and 304, the user thinks that there is no need to poll again, and the user can actively end the entire polling process.
- the target candidate model is obtained from multiple candidate models; the accuracy of the target candidate model is greater than the preset accuracy and the computing power of the target candidate model is smaller than the preset computing power.
- the system will organize the log information recorded during the search process, including the structural information, accuracy rate, computing power consumption and comprehensive performance information of the candidate models in each round of iterations.
- This embodiment will satisfy the Candidate models whose accuracy rate is greater than the preset accuracy and whose computing power is less than the preset computing power are used as target candidate models, and the target candidate models are organized into files and output to the user.
- the system completes the output of the results, it releases the computing resources occupied by the search task and ends the entire process.
- the system is preset with a benchmark model, the preset accuracy is the accuracy of the benchmark model, and the preset computing power is the computing power of the benchmark model.
- the user can also use other methods to obtain target candidate models, for example, after calculating the evaluation score of each candidate model, select the candidate model with the highest evaluation score as the target candidate model, and set The target candidate models are organized into files and output to the user to ensure that the user uses the optimal candidate model.
- step division of the above various methods is only for the sake of clarity of description. During implementation, it can be combined into one step or some steps can be split and decomposed into multiple steps. As long as they include the same logical relationship, they are all within the scope of protection of this patent. ; Adding insignificant modifications or introducing insignificant designs to the algorithm or process, but not changing the core design of the algorithm and process are all within the scope of protection of this patent.
- An embodiment of the present application relates to a model compression system.
- the specific structural diagram is shown in FIG. 5 , including: a receiving module 401 , an automatic model search module 402 , and a computing power compression module 403 .
- the receiving module 401 is used to receive candidate models; the automatic model search module 402 is used to search the neural network architecture for the candidate models to obtain multiple transformation models and input them into the receiving module 401 as candidate models; the computing power compression module 403 is used to Computational compression is performed on the candidate model to obtain a compressed model, and the compressed model is input into the receiving module 401 as a candidate model.
- the model compression system further includes: a model training module and a performance evaluation module.
- FIG. 6 it is a schematic structural diagram of this embodiment, including: a receiving module 401 , an automatic model search module 402 , a computing power compression module 403 , a model training module 404 , and a performance evaluation module 405 .
- the receiving module 401 is connected to the model training module 404, and the model training module 404 is connected to the performance evaluation module 405.
- the module 403 is connected to the receiving module 401; the model training module 404 is used to receive the candidate model and train the candidate model to obtain the trained model; the performance evaluation module 405 is used to evaluate the performance of the trained model.
- the model training module 404 mainly has two functions: 1 train candidate models to obtain the trained model; 2 verify each candidate model to obtain the accuracy rate and computing power of each candidate model.
- the model training module will input the obtained two model indicators into the performance evaluation module.
- the model training module adopts a distributed parallel training method.
- the performance evaluation module 405 will receive the accuracy and computing power of each candidate model, evaluate the comprehensive performance of each candidate model, obtain the evaluation score of each candidate model, and retain the candidate models that meet the preset rules , to delete candidate models that do not satisfy the preset rules.
- the model automatic search module 402 receives candidate models, and performs neural network architecture search based on the candidate models. Search the network architecture according to the search strategy for the search space set by the user or the default full search space.
- the computing power compression module 403 receives candidate models.
- This module is the core module of the system and is responsible for directional compression of the computing power of the candidate models.
- the layers are subjected to directional compression processing.
- this embodiment is a system embodiment corresponding to the previous embodiment, and this embodiment can be implemented in cooperation with the previous embodiment.
- the relevant technical details mentioned in the previous embodiment are still valid in this embodiment, and will not be repeated here in order to reduce repetition.
- the relevant technical details mentioned in this embodiment can also be applied in the previous embodiment.
- modules involved in this embodiment are logical modules.
- a logical unit can be a physical unit, or a part of a physical unit, or multiple physical units. Combination of units.
- units that are not closely related to solving the technical problem proposed in the present application are not introduced in this embodiment, but this does not mean that there are no other units in this embodiment.
- An embodiment of the present application relates to a server, as shown in FIG. 7 , including at least one processor 501; and a memory 502 communicatively connected to at least one processor 501; Instructions to be executed, the instructions are executed by at least one processor 501, so that at least one processor 501 can execute the communication control method as described above.
- the memory 502 and the processor 501 are connected by a bus, and the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors 501 and various circuits of the memory 502 together.
- the bus may also connect together various other circuits such as peripherals, voltage regulators, and power management circuits, all of which are well known in the art and therefore will not be further described herein.
- the bus interface provides an interface between the bus and the transceivers.
- a transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing means for communicating with various other devices over a transmission medium.
- the data processed by the processor 501 is transmitted on the wireless medium through the antenna, further, the antenna also receives the data and transmits the data to the processor 501 .
- Processor 501 is responsible for managing the bus and general processing, and may also provide various functions including timing, peripheral interface, voltage regulation, power management and other control functions. And the memory 502 may be used to store data used by the processor 501 when performing operations.
- An embodiment of the present application relates to a computer-readable storage medium storing a computer program.
- the above method embodiments are implemented when the computer program is executed by the processor.
- the program is stored in a storage medium, and includes several instructions to make a device ( It may be a single-chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods in the various embodiments of the present application.
- the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Supply And Distribution Of Alternating Current (AREA)
Abstract
Description
优先权信息priority information
本申请要求于2021年10月28日申请的、申请号为202111266014.4的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to a Chinese patent application with application number 202111266014.4 filed on October 28, 2021, the entire contents of which are incorporated herein by reference.
本申请涉及神经网络架构搜索领域,特别涉及一种模型压缩方法、模型压缩系统、服务器及存储介质。The present application relates to the field of neural network architecture search, in particular to a model compression method, model compression system, server and storage medium.
神经网络在机器学习领域已经取得非常出色的成绩,如物体分类、目标检测等,基于神经网络的智能算法改变了人类的生产生活方式。但由于神经网络计算复杂度高、模型参数量大,因此,模型算力压缩技术成为最近几年学术界和工业界研究的热点。Neural networks have achieved outstanding results in the field of machine learning, such as object classification, target detection, etc. Intelligent algorithms based on neural networks have changed human production and lifestyle. However, due to the high computational complexity of neural networks and the large number of model parameters, model computing power compression technology has become a research hotspot in the academic and industrial circles in recent years.
目前,模型压缩的常用方法包括:参数量化、网络剪枝、人工设计高效神经网络结构;然而,参数量化和网络剪枝这两种方法在准确率上有很大的局限性;而人工设计高效神经网络结构的设计离不开大量人力的投入和研究,需要投入大量的时间进行人工实验和调参;可见,目前的模型压缩方法均无法兼顾准确性与节省人力成本。At present, common methods for model compression include: parameter quantization, network pruning, and artificial design of efficient neural network structures; however, the two methods of parameter quantization and network pruning have great limitations in accuracy; The design of the neural network structure is inseparable from a large amount of human investment and research, and requires a lot of time for manual experiments and parameter adjustments; it can be seen that the current model compression methods cannot balance accuracy and save labor costs.
发明内容Contents of the invention
本申请实施例的主要目的在于提出一种模型压缩方法、模型压缩系统、服务器及存储介质,兼顾获取压缩模型的准确率以及降低人工成本。The main purpose of the embodiment of the present application is to provide a model compression method, a model compression system, a server and a storage medium, taking into account the accuracy of obtaining the compressed model and reducing labor costs.
为实现上述目的,本申请实施例提供了一种模型压缩方法,包括:接收候选模型;对所述候选模型进行神经网络架构搜索,得到多个变换模型;对所述候选模型进行算力压缩得到压缩模型;将所述变换模型、所述压缩模型作为所述候选模型,重新进行所述神经网络架构搜索和所述算力压缩。In order to achieve the above purpose, an embodiment of the present application provides a model compression method, including: receiving a candidate model; performing neural network architecture search on the candidate model to obtain multiple transformation models; performing computing power compression on the candidate model to obtain Compression model: using the transformation model and the compression model as the candidate model, re-performing the neural network architecture search and the computing power compression.
本申请实施例还提供了一种模型压缩系统,包括:接收模块、模型自动搜索模块、算力压缩模块;所述接收模块用于接收候选模型;所述模型自动搜索模块用于对所述候选模型进行神经网络架构搜索,得到多个变换模型并作为所述候选模型输入所述接收模块;所述算力压缩模块用于对所述候选模型进行算力压缩得到压缩模型,所述压缩模型作为所述候选模型输入所述接收模块。The embodiment of the present application also provides a model compression system, including: a receiving module, a model automatic search module, and a computing power compression module; the receiving module is used to receive candidate models; the model automatic search module is used to search for the candidate The model performs a neural network architecture search to obtain a plurality of transformation models and input them into the receiving module as the candidate model; the computing power compression module is used to compress the computing power of the candidate model to obtain a compressed model, and the compressed model is used as The candidate models are input into the receiving module.
本申请实施例还提供了一种服务器,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如上述的模型压缩方法。The embodiment of the present application also provides a server, including: at least one processor; and a memory connected in communication with the at least one processor; wherein, the memory stores instructions that can be executed by the at least one processor , the instructions are executed by the at least one processor, so that the at least one processor can execute the above-mentioned model compression method.
本申请实施例还提供了一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现上述的模型压缩方法。The embodiment of the present application also provides a computer-readable storage medium storing a computer program, and implementing the above-mentioned model compression method when the computer program is executed by a processor.
本申请的模型压缩方法,可以通过神经网络架构搜索获取尽可能多的神经网络模型,从而可以提高获取到较好模型的概率,且本申请通过算力压缩可以对上一次轮询得到的多个变换模型进行压缩;可见,本申请将模型压缩与神经网络架构搜索相结合,将神经网络架构搜索得到的变换模型作为候选模型在下一次轮询时进行算力压缩得到的压缩模型,并再次作为候选模型,算力压缩得到的压缩模型也可以在下一次轮询时进行神经网络架构搜索得到变换模型;通过不断地轮询,在轮询到一定程度之后,可以获得精确度较高的压缩模型,从而兼顾获取压缩模型的准确率,且整个模型压缩过程减少了人工参与,降低人工成本。另外,本申请相对于相关技术而言,能够自动获得压缩模型,也加快了模型压缩的速度。The model compression method of this application can obtain as many neural network models as possible through the neural network architecture search, thereby improving the probability of obtaining a better model, and this application can compress multiple neural network models obtained in the previous poll through computing power compression. Transform the model for compression; it can be seen that this application combines model compression with neural network architecture search, and uses the transformed model obtained by neural network architecture search as a candidate model to compress the computing power in the next poll, and again as a candidate Model, the compression model obtained by computing power compression can also be searched for the neural network architecture in the next polling to obtain the transformation model; through continuous polling, after polling reaches a certain level, a high-precision compression model can be obtained, thereby Taking into account the accuracy of the compressed model, and the entire model compression process reduces manual participation and reduces labor costs. In addition, compared with related technologies, the present application can automatically obtain the compressed model, and also speeds up the speed of model compression.
一个或多个实施例通过与之对应的附图中的图片进行示例性说明,这些示例性说明并不构成对实施例的限定,附图中具有相同参考数字标号的元件表示为类似的元件,除非有特别申明,附图中的图不构成比例限制。One or more embodiments are exemplified by the pictures in the corresponding drawings, and these exemplifications do not constitute a limitation to the embodiments. Elements with the same reference numerals in the drawings represent similar elements. Unless otherwise stated, the drawings in the drawings are not limited to scale.
图1是根据本申请一实施例的模型压缩方法的流程示意图;Fig. 1 is a schematic flow chart of a model compression method according to an embodiment of the present application;
图2是根据本申请一实施例的模型压缩方法步骤103的子步骤的流程示意图;FIG. 2 is a schematic flowchart of the sub-steps of
图3是根据本申请一实施例的模型压缩方法的流程示意图;3 is a schematic flow diagram of a model compression method according to an embodiment of the present application;
图4是根据本申请一实施例的模型压缩方法的流程示意图;Fig. 4 is a schematic flow chart of a model compression method according to an embodiment of the present application;
图5是根据本申请一实施例的模型压缩系统的结构示意图;5 is a schematic structural diagram of a model compression system according to an embodiment of the present application;
图6是根据本申请一实施例的模型压缩系统的结构示意图;6 is a schematic structural diagram of a model compression system according to an embodiment of the present application;
图7是根据本申请一实施例的服务器的结构示意图。Fig. 7 is a schematic structural diagram of a server according to an embodiment of the present application.
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请的各实施例进行详细的阐述。然而,本领域的普通技术人员可以理解,在本申请各实施例中,为了使读者更好地理解本申请而提出了许多技术细节。但是,即使没有这些技术细节和基于以下各实施例的种种变化和修改,也可以实现本申请所要求保护的技术方案。以下各个实施例的划分是为了描述方便,不应对本申请的具体实现方式构成任何限定,各个实施例在不矛盾的前提下可以相互结合相互引用。In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the embodiments of the present application will be described in detail below with reference to the accompanying drawings. However, those of ordinary skill in the art can understand that in each embodiment of the application, many technical details are provided for readers to better understand the application. However, even without these technical details and various changes and modifications based on the following embodiments, the technical solutions claimed in this application can also be realized. The division of the following embodiments is for the convenience of description, and should not constitute any limitation to the specific implementation of the present application, and the embodiments can be combined and referred to each other on the premise of no contradiction.
本申请一实施例涉及一种模型压缩方法,如图1所示,为本实施例的模型压缩方法的流程示意图,具体包括以下步骤:An embodiment of the present application relates to a model compression method, as shown in Figure 1, which is a schematic flow chart of the model compression method of this embodiment, specifically including the following steps:
步骤101,接收候选模型。
具体地说,本实施例的执行主体为模型压缩系统,在进行模型压缩的过程中,模型压缩系统需要先接收候选模型。Specifically, the execution subject of this embodiment is the model compression system, and the model compression system needs to receive candidate models first during the process of model compression.
步骤102,对候选模型进行神经网络架构搜索,得到多个变换模型。In
具体地说,神经网络架构搜索(NAS,Neural Architecture Search)能够有效地解放人力投入,且NAS能够对网络结构进行更大空间的探索,有机会生成超出人类设计之外的高性能结构,在神经网络领域得到了广泛的应用。因此,本实施例的模型压缩方法结合NAS解决人力投入问题的同时,也可以得到准确度更高的压缩模型。在一个实施例中,对候选模型进行神经网络架构搜索,得到多个变换模型,包括:基于预设的搜索空间进行神经网络架构搜索, 对候选模型进行随机变换,得到多个变换模型。Specifically, Neural Architecture Search (NAS, Neural Architecture Search) can effectively liberate human input, and NAS can explore the network structure in a larger space, and has the opportunity to generate high-performance structures beyond human design. The network field has been widely used. Therefore, while the model compression method of this embodiment combines NAS to solve the problem of manpower input, it can also obtain a compressed model with higher accuracy. In one embodiment, performing neural network architecture search on candidate models to obtain multiple transformation models includes: performing neural network architecture search based on a preset search space, performing random transformation on candidate models to obtain multiple transformation models.
具体地说,NAS搜索的过程是基于预设的搜索策略遍历预设的搜索空间,并在每次NAS搜索时对候选模型进行随机变换,得到多个变换模型。这种随机性确保了生成模型结构的多样性,有利于搜索到高性能模型。需要说明的是,本实施例的搜索空间可以是用户设定的或者系统默认的,搜索策略也可以是用户设定的或者系统默认的;其中,NAS搜索策略包括有基于强化学习、基于进化算法和基于梯度下降等。Specifically, the process of NAS search is to traverse the preset search space based on the preset search strategy, and randomly transform the candidate models in each NAS search to obtain multiple transformed models. This randomness ensures the diversity of generated model structures, which is conducive to searching for high-performance models. It should be noted that the search space in this embodiment can be set by the user or defaulted by the system, and the search strategy can also be set by the user or defaulted by the system; wherein, the NAS search strategy includes reinforcement learning-based, evolutionary algorithm-based and based on gradient descent etc.
需要说明的是,本实施例涉及的搜索策略可以是任意一种搜索策略,具有较强的鲁棒性,可以根据用户或系统的需求设置为对应的搜索策略,本实施例不作具体的限定。It should be noted that the search strategy involved in this embodiment can be any search strategy, which has strong robustness, and can be set as a corresponding search strategy according to the requirements of the user or the system, which is not specifically limited in this embodiment.
步骤103,对候选模型进行算力压缩得到压缩模型。In
上述步骤102、步骤103的先后顺序本实施例不作具体限定,步骤102可以在步骤103之前,步骤102也可以在步骤103之后,步骤102、步骤103也可以同时进行。The order of the
在一个实施例中,对候选模型进行算力压缩得到压缩模型即步骤104包括以下子步骤,具体流程示意图如图2所示,包括:In one embodiment, performing computing power compression on the candidate model to obtain the compressed model, that is,
步骤1031,计算出候选模型中各个层的算力。
具体地说,算力压缩就是负责对候选模型的算力进行定向压缩,例如,会对模型算力密集区进行检测,并对模型算力消耗最大的层进行定向的压缩处理。Specifically, computing power compression is responsible for directional compression of the computing power of candidate models. For example, it will detect model computing power-intensive areas and perform directional compression processing on the layer that consumes the most computing power.
具体地说,根据候选模型的输入和每一层的参数可以计算出候选模型中各个层算力消耗。每个层的算力消耗主要由输入该层的特征图大小和该层操作需要训练的权重参数量大小决定的,可以近似由如下公式表示:Specifically, the computing power consumption of each layer in the candidate model can be calculated according to the input of the candidate model and the parameters of each layer. The computing power consumption of each layer is mainly determined by the size of the feature map input to the layer and the size of the weight parameters that need to be trained for the operation of the layer, which can be approximately expressed by the following formula:
FLOPs(l)=H×W×C×Params(l);FLOPs(l)=H×W×C×Params(l);
其中,l代表模型中的某个操作层(如卷积层),H和W代表输入该层的特征图尺寸,C代表输入特征图的通道数,Params表示当前层的参数量。Among them, l represents an operation layer in the model (such as a convolutional layer), H and W represent the size of the feature map input to this layer, C represents the number of channels of the input feature map, and Params represents the parameter amount of the current layer.
步骤1032,比较各个层的算力,得到算力最大层。
具体地说,通过比较步骤1041中得到的各个层的算力,能够定位到模型中算力最大的层即“算力密集区”。算力压缩步骤主要针对模型的“算力密集区”进行算力压缩。Specifically, by comparing the computing power of each layer obtained in step 1041, it is possible to locate the layer with the largest computing power in the model, that is, the "computing power-intensive area". The computing power compression step mainly performs computing power compression on the "computing power intensive area" of the model.
步骤1033,对算力最大层进行算力压缩,得到压缩模型。In
在一个实施例中,对算力最大层进行算力压缩,包括:减小算力最大层的参数量,或者,移除算力最大层,或者,减小算力最大层的输入特征图大小。In one embodiment, performing computing power compression on the layer with the largest computing power includes: reducing the parameter amount of the layer with the largest computing power, or removing the layer with the largest computing power, or reducing the size of the input feature map of the layer with the largest computing power .
具体地说,通过上述的算力消耗的公式可知,影响算力的因素主要有输入该层的特征图、当前层的参数量;因此对模型中某个层的算力进行压缩,有以下3个可操作方向:Specifically, from the above calculation power consumption formula, it can be known that the factors that affect the calculation power mainly include the input feature map of the layer and the parameter quantity of the current layer; therefore, the compression of the calculation power of a certain layer in the model has the following three available directions:
(1)减小当前层的参数量。例如对一个卷积操作,其参数量是由其卷积核的大小和数量决定的,因此可以压缩卷积核的大小或减小提出的在启动搜索任务后滤波器的个数。减小当前层的参数量相当于改变当前层的超参数,但改变的方向是沿着参数量下降的方向进行的。(1) Reduce the parameter amount of the current layer. For example, for a convolution operation, its parameter quantity is determined by the size and number of its convolution kernel, so the size of the convolution kernel can be compressed or the number of filters proposed after starting the search task can be reduced. Reducing the parameter amount of the current layer is equivalent to changing the hyperparameters of the current layer, but the direction of change is along the direction of decreasing the parameter amount.
(2)移除当前层,这是第(1)种情况的极限情况,即将该层的参数量减小为0。(2) Remove the current layer, which is the limit case of the first case, that is, reduce the parameter amount of this layer to 0.
(3)减小输入进该层的特征图大小。减小输入特征图的大小意味着需要改变“算力密集区”前面的操作层,因为只有改变前面的操作层才能够有机会改变当前层的输入。具体的操作包括:①在“算力密集区”前面增加一个池化层,对特征进行压缩;②追溯到“算力密集区”前面有算力消耗的层,改变其超参数,如减小输出特征的通道数或增加步长,以起到下采样特征图的作用。(3) Reduce the size of the feature map input into this layer. Reducing the size of the input feature map means that it is necessary to change the operation layer in front of the "computing intensive area", because only by changing the operation layer in front can there be a chance to change the input of the current layer. The specific operations include: ① Add a pooling layer in front of the "computing power-intensive area" to compress the features; The number of channels of the output feature or increase the step size to play the role of downsampling feature map.
步骤104,将变换模型、压缩模型作为候选模型进行训练。In
具体地说,神经网络架构搜索和算力压缩生成的模型共同作为候选模型,会返回到步骤101中,重新进行所述神经网络架构搜索和所述算力压缩;也就是说,上述步骤101至步骤104为一次轮询的过程,在步骤104结束之后,再次执行步骤101,由此形成一个系统的闭环操作。Specifically, the model generated by neural network architecture search and computing power compression is used as a candidate model, and will return to step 101 to re-perform the neural network architecture search and computing power compression; that is, the
具体地说,在第一次轮询的过程中,模型压缩系统接收的候选模型为用户输入的基准模型。Specifically, during the first polling process, the candidate model received by the model compression system is the reference model input by the user.
需要说明的是,本实施例的候选模型均会设置由一个标识(ID,Identity document),用户输入基准模型进入到系统,以及通过神经网络架构搜索和算力压缩生成的模型都会分配一个ID,这个ID均是唯一的,可以确保训练过的模型不会存在重复训练的问题。It should be noted that the candidate models in this embodiment will all be set with an ID (ID, Identity document), the user will input the benchmark model into the system, and the model generated through neural network architecture search and computing power compression will be assigned an ID. This ID is unique, which can ensure that the trained model will not have the problem of repeated training.
本实施例的模型压缩方法,可以通过神经网络架构搜索获取尽可能多的神经网络模型,从而可以提高获取到较好模型的概率,且本申请通过算力压缩可以对上一次轮询得到的多个变换模型进行压缩;可见,本申请将模型压缩与神经网络架构搜索相结合,将神经网络架构搜索得到的变换模型作为候选模型在下一次轮询时进行算力压缩得到的压缩模型,并再次作为候选模型,算力压缩得到的压缩模型也可以在下一次轮询时进行神经网络架构搜索得到变换模型;通过不断地轮询,在轮询到一定程度之后,可以获得精确度较高的压缩模型,从而兼顾获取压缩模型的准确率,且整个模型压缩过程减少了人工参与,降低人工成本。另外,本申请相对于相关技术而言,能够自动获得压缩模型,也加快的模型压缩的速度。The model compression method of this embodiment can obtain as many neural network models as possible through the neural network architecture search, thereby improving the probability of obtaining a better model, and this application can use the computational power compression to obtain as many neural network models as possible in the previous polling. It can be seen that this application combines model compression with neural network architecture search, and uses the transformation model obtained by neural network architecture search as the compression model obtained by computing power compression in the next polling of the candidate model, and again as the compression model The candidate model, the compression model obtained by computing power compression can also be searched for the neural network architecture in the next polling to obtain the transformation model; through continuous polling, after polling reaches a certain level, a high-precision compression model can be obtained. In this way, the accuracy of the compressed model is taken into account, and the entire model compression process reduces manual participation and reduces labor costs. In addition, compared with related technologies, the present application can automatically obtain a compressed model, and also accelerates the speed of model compression.
本申请一实施例涉及一种模型压缩方法,本实施例于上一实施例大致相同,主要区别在于,接收候选模型之后,还包括:对候选模型进行训练得到训练后模型;获取训练后模型的准确度以及算力;基于准确度、算力得到训练后模型对应的候选模型的评估参数。为了便于描述,本实施例与上一实施例相同或相应的部分再次不再赘述。An embodiment of the present application relates to a model compression method. This embodiment is roughly the same as the previous embodiment. The main difference is that after receiving the candidate model, it also includes: training the candidate model to obtain the trained model; obtaining the trained model Accuracy and computing power: Based on the accuracy and computing power, the evaluation parameters of the candidate models corresponding to the trained model are obtained. For ease of description, the parts of this embodiment that are the same as or corresponding to the previous embodiment will not be described again.
本实施例的模型压缩方法的流程示意图如图3所示,具体包括以下步骤:A schematic flow chart of the model compression method of this embodiment is shown in Figure 3, specifically including the following steps:
步骤201,接收候选模型。
步骤202,对候选模型进行训练得到训练后模型。
步骤203,获取训练后模型的准确度以及算力。
步骤204,基于准确度、算力得到训练后模型对应的候选模型的评估参数。In
步骤205,对候选模型进行神经网络架构搜索,得到多个变换模型。
步骤206,对候选模型进行算力压缩得到压缩模型。In
步骤207,将变换模型、压缩模型作为候选模型。In
上述步骤201、步骤205至步骤207与上一实施例的步骤101至步骤104相同,在此不再赘述。The
需要说明的是,上述步骤201至步骤207为一次轮询的过程,模型压缩系统在完成步骤207之后,会再次进入到步骤201,从而形成一个系统的闭环操作。It should be noted that the
具体地说,在接收候选模型之后,对候选模型进行训练得到训练后模型,对训练后模型进行性能评估,从而可以根据用户需要筛选出用户所需的候选模型。具体地说,在第一次轮询的过程中,将用户输入的基准模型作为候选模型,仅有用户输入的基准模型进行了性能评估。在之后的轮询过程中,候选模型的数量较多,需要对每个候选模型均进行模型训练,得 到的训练后模型也较多,在性能评估的过程中,需要遍历所有的训练后模型,并对每个训练后模型进行性能评估。Specifically, after receiving the candidate model, the candidate model is trained to obtain the trained model, and the performance of the trained model is evaluated, so that the candidate model required by the user can be screened out according to the user's needs. Specifically, in the first polling process, the benchmark model input by the user is used as a candidate model, and only the benchmark model input by the user is evaluated for performance. In the subsequent polling process, the number of candidate models is large, and model training needs to be performed on each candidate model, and there are also many trained models. In the process of performance evaluation, it is necessary to traverse all the trained models. And evaluate the performance of each trained model.
具体地说,本实施例的性能评估需要借助候选模型的准确度以及算力这两个参数,因此,本实施例在获取候选模型之后,获取候选模型的准确度以及算力,其中,准确度是指候选模型在使用过程中的准确率,算力是指候选模型在使用过程中运算量的强度。Specifically, the performance evaluation of this embodiment requires the use of two parameters, the accuracy of the candidate model and the computing power. Therefore, after the candidate model is obtained in this embodiment, the accuracy and computing power of the candidate model are obtained. Among them, the accuracy It refers to the accuracy rate of the candidate model during use, and computing power refers to the intensity of the calculation amount of the candidate model during use.
具体地说,由于准确度和算力是不同的参数,所以在进行评估的时候,需要将各个参数的值转化到同一量纲下,即基于准确度、算力得到训练后模型对应的候选模型的评估参数,最终给出训练后模型对应的候选模型的综合性能得分即评估参数,评估参数能够表征候选模型在权衡多个指标后的性能。Specifically, since accuracy and computing power are different parameters, when evaluating, it is necessary to transform the values of each parameter into the same dimension, that is, to obtain the candidate model corresponding to the trained model based on accuracy and computing power Finally, the comprehensive performance score of the candidate model corresponding to the trained model is given, that is, the evaluation parameter. The evaluation parameter can represent the performance of the candidate model after weighing multiple indicators.
在一个实施例中,评估参数包括评估分数;基于准确度、算力得到训练后模型对应的候选模型的评估参数,包括:按照预设的权重比例对准确度、算力进行计算,得到候选模型的评估分数。具体地说,本实施例对于准确度和算力分别设置不同的权重,从而计算得到候选模型的评估分数,可以根据用户的需求设置不同的权重,例如,用户对准确度要求较高,可以对准确度设置较高的权重;或者用户对算力要求较高,可以对算力设置较高的权重。具体地说,对于评估分数,可以设置为评估分数越高,得到的候选模型的性能越好。In one embodiment, the evaluation parameters include evaluation scores; the evaluation parameters of the candidate models corresponding to the trained models are obtained based on the accuracy and computing power, including: calculating the accuracy and computing power according to the preset weight ratio to obtain the candidate models evaluation score. Specifically, in this embodiment, different weights are set for accuracy and computing power, so as to calculate the evaluation scores of candidate models, and different weights can be set according to user needs. Set a higher weight for accuracy; or users have higher requirements for computing power, so you can set a higher weight for computing power. Specifically, for the evaluation score, it can be set that the higher the evaluation score, the better the performance of the obtained candidate model.
在一个实施例中,评估参数包括评估分数;基于准确度、算力得到训练后模型对应的候选模型的评估参数之后,还包括:保留评估分数大于或等于预设分数的候选模型,或者,保留评估分数最大的N个候选模型。具体地说,除了第一次轮询,后续的轮询过程会存在多个候选模型,若每个候选模型均进入后续的神经网络架构搜索和算力压缩,会导致系统的运算量较大,且也会造成资源的浪费,因此,本实施例会设置一个预设规则即保留评估分数大于或等于预设分数的候选模型,或者,保留评估分数最大的N个候选模型,从而保留满足预设规则的候选模型,将不满足预设规则的候选模型删除。In one embodiment, the evaluation parameter includes an evaluation score; after obtaining the evaluation parameters of the candidate model corresponding to the trained model based on accuracy and computing power, it also includes: retaining the candidate model whose evaluation score is greater than or equal to the preset score, or retaining Evaluate the N candidate models with the largest scores. Specifically, in addition to the first polling process, there will be multiple candidate models in the subsequent polling process. If each candidate model enters the subsequent neural network architecture search and computing power compression, the system will have a large amount of computation. And it will also cause a waste of resources. Therefore, this embodiment will set a preset rule, that is, retain candidate models whose evaluation scores are greater than or equal to the preset score, or reserve N candidate models with the largest evaluation scores, so as to retain Candidate models that do not meet the preset rules are deleted.
具体地说,第一次轮询的过程中,仅有用户输入的基准模型进行训练,在后续的轮询过程中,候选模型的数量增加,需要进行筛选,可以将各个候选模型按照评估分数进行排序,预先设置一个预设分数,将评估分数大于或等于预设分数的候选模型保留,或者预先设置一个数值N,将评估分数最大的N个候选模型保留,从而进入后续的神经网络架构搜索和算力压缩的步骤,将不满足预设规则的候选模型删除,使得性能较高的模型能够得到充分的利用,从而减小系统的运算量,避免资源浪费。Specifically, in the first polling process, only the benchmark model input by the user is used for training. In the subsequent polling process, the number of candidate models increases and needs to be screened. Each candidate model can be evaluated according to the evaluation score. Sorting, setting a preset score in advance, retaining the candidate models whose evaluation score is greater than or equal to the preset score, or setting a value N in advance, retaining the N candidate models with the largest evaluation score, so as to enter the subsequent neural network architecture search and In the step of computing power compression, candidate models that do not meet the preset rules are deleted, so that models with higher performance can be fully utilized, thereby reducing the amount of computing of the system and avoiding resource waste.
需要说明的是,若用户没有特殊需求,默认规则是设置一个数值N,将评估分数最大的N个候选模型保留。It should be noted that if the user has no special requirements, the default rule is to set a value N, and keep the N candidate models with the largest evaluation scores.
具体地说,在后续的轮询过程中,也可以采用其他的方式保留部分候选模型进入后续的神经网络架构搜索和算力压缩,例如轮盘赌选择法、锦标赛选择法等。Specifically, in the subsequent polling process, other methods can also be used to retain some candidate models for subsequent neural network architecture search and computing power compression, such as roulette selection method, tournament selection method, etc.
本申请一实施例涉及一种模型压缩方法,本实施例于上一实施例大致相同,主要区别在于,本实施例在基于准确度、算力得到训练后模型对应的候选模型的评估分数之后,还包括:判断本次轮询得到的最大评估分数是否大于上一次轮询得到的最大评估分数;为了便于描述,本实施例与上一实施例相同或相应的部分再次不再赘述。An embodiment of the present application relates to a model compression method. This embodiment is roughly the same as the previous embodiment. The main difference is that in this embodiment, after the evaluation score of the candidate model corresponding to the trained model is obtained based on accuracy and computing power, It also includes: judging whether the maximum evaluation score obtained by this polling is greater than the maximum evaluation score obtained by the previous polling; for the convenience of description, the same or corresponding parts of this embodiment and the previous embodiment will not be described again.
本实施例的模型压缩方法的流程示意图如图4所示,具体包括以下步骤:A schematic flow chart of the model compression method of this embodiment is shown in Figure 4, specifically including the following steps:
步骤301,接收候选模型。
步骤302,对候选模型进行训练得到训练后模型。
步骤303,获取训练后模型的准确度以及算力。
步骤304,基于准确度、算力得到训练后模型对应的候选模型的评估参数。Step 304: Obtain the evaluation parameters of the candidate model corresponding to the trained model based on the accuracy and computing power.
步骤305,判断本次轮询得到的最大评估分数是否大于上一次轮询得到的最大评估分数。若是,进入步骤307、步骤308;若否,进入步骤306。
步骤306,将计数器数值加1,判断计数器数值是否小于预设阈值。若是,进入步骤307、步骤308;若否,则轮询结束。
步骤307,对候选模型进行神经网络架构搜索,得到多个变换模型。
步骤308,对候选模型进行算力压缩得到压缩模型。In
步骤309,将变换模型、压缩模型作为候选模型。In
上述步骤301至步骤304、步骤307至步骤309与上一实施例的步骤201至步骤207相同,在此不再赘述。The foregoing steps 301 to 304, and steps 307 to 309 are the same as
具体地说,每次轮询对候选模型进行性能评估之后,均会计算出每个候选模型的评估分数,其中会出现评估分数最高即性能最好的模型;之后,比较本次轮询的最大评估分数与上一次轮询得到的最大评估分数,若本次轮询的最大评估分数大于上一次轮询得到的最大评估分数,则表示候选模型正在不断地进行优化,再次轮询可能会得到性能更好的模型,此时,继续进入步骤307、步骤308;若本次轮询的最大评估分数小于或等于上一次轮询得到的最大评估分数,则表示目前很可能已经得到了性能最好的模型,此时进入步骤306,将计数器数值加1,判断将计数器数值是否小于预设阈值,若计数器数值小于预设阈值,则轮询次数还未达到设定的标准,可能还会出现评估分数更高的模型,需要再次进入步骤307、步骤308;若计数器数值大于或等于预设阈值,则轮询次数已经达到设定的标准,则轮询结束。Specifically, after evaluating the performance of the candidate models in each poll, the evaluation score of each candidate model will be calculated, and the model with the highest evaluation score and the best performance will appear; after that, the maximum evaluation score of this poll will be compared score and the maximum evaluation score obtained in the last poll. If the maximum evaluation score obtained in this poll is greater than the maximum evaluation score obtained in the previous poll, it means that the candidate model is being continuously optimized, and polling again may result in better performance. A good model, at this time, continue to step 307 and step 308; if the maximum evaluation score of this poll is less than or equal to the maximum evaluation score of the previous poll, it means that the model with the best performance has probably been obtained so far , enter
具体地说,模型压缩系统设置有计数器,若本次轮询的最大评估分数小于或等于上一次轮询得到的最大评估分数,计数器均会加1,计数器数值的初始数值为0。Specifically, the model compression system is equipped with a counter. If the maximum evaluation score of this poll is less than or equal to the maximum evaluation score obtained in the previous poll, the counter will be incremented by 1, and the initial value of the counter value is 0.
具体地说,用户也可以主动发出停止指令,指示系统结束轮询。当系统在步骤303、步骤304中已经获取到了性能较高的模型,用户认为无需再进行轮询,用户可以主动结束整个轮询的过程。Specifically, the user can also actively issue a stop command to instruct the system to end polling. When the system has acquired a model with higher performance in
在一个实施例中,在轮询结束的情况下,在多个候选模型中获取目标候选模型;目标候选模型的准确度大于预设准确度且目标候选模型的算力小于预设算力。具体地说,轮询结束即停止搜索后,系统会整理搜索过程中记录的日志信息,包括每轮迭代中候选模型的结构信息、准确率、算力消耗和综合性能信息,本实施例会将满足准确率大于预设准确度且算力小于预设算力的候选模型作为目标候选模型,将目标候选模型整理成文件输出给用户,系统完成结果输出后,释放搜索任务所占用的计算资源,结束整个流程。其中,系统预先设置有一个基准模型,预设准确度为该基准模型的准确度,预设算力为该基准模型的算力。In one embodiment, when the polling ends, the target candidate model is obtained from multiple candidate models; the accuracy of the target candidate model is greater than the preset accuracy and the computing power of the target candidate model is smaller than the preset computing power. Specifically, after the polling ends and the search is stopped, the system will organize the log information recorded during the search process, including the structural information, accuracy rate, computing power consumption and comprehensive performance information of the candidate models in each round of iterations. This embodiment will satisfy the Candidate models whose accuracy rate is greater than the preset accuracy and whose computing power is less than the preset computing power are used as target candidate models, and the target candidate models are organized into files and output to the user. After the system completes the output of the results, it releases the computing resources occupied by the search task and ends the entire process. Among them, the system is preset with a benchmark model, the preset accuracy is the accuracy of the benchmark model, and the preset computing power is the computing power of the benchmark model.
具体地说,在轮询结束的情况下,用户也可以采用其他的方式获取目标候选模型,例如,计算出每个候选模型的评估分数之后,选取评估分数最高的候选模型作为目标候选模型,将目标候选模型整理成文件输出给用户,确保用户使用最优的候选模型。Specifically, when the polling ends, the user can also use other methods to obtain target candidate models, for example, after calculating the evaluation score of each candidate model, select the candidate model with the highest evaluation score as the target candidate model, and set The target candidate models are organized into files and output to the user to ensure that the user uses the optimal candidate model.
上面各种方法的步骤划分,只是为了描述清楚,实现时可以合并为一个步骤或者对某些步骤进行拆分,分解为多个步骤,只要包括相同的逻辑关系,都在本专利的保护范围内;对算法中或者流程中添加无关紧要的修改或者引入无关紧要的设计,但不改变其算法和流程的核心设计都在该专利的保护范围内。The step division of the above various methods is only for the sake of clarity of description. During implementation, it can be combined into one step or some steps can be split and decomposed into multiple steps. As long as they include the same logical relationship, they are all within the scope of protection of this patent. ; Adding insignificant modifications or introducing insignificant designs to the algorithm or process, but not changing the core design of the algorithm and process are all within the scope of protection of this patent.
本申请一实施例涉及一种模型压缩系统,具体结构示意图如图5所示,包括:接收模块401、模型自动搜索模块402、算力压缩模块403。An embodiment of the present application relates to a model compression system. The specific structural diagram is shown in FIG. 5 , including: a receiving
具体地说,接收模块401用于接收候选模型;模型自动搜索模块402用于对候选模型进行神经网络架构搜索,得到多个变换模型并作为候选模型输入接收模块401;算力压缩模块403用于对候选模型进行算力压缩得到压缩模型,压缩模型作为候选模型输入接收模块401。Specifically, the receiving
在一个实施例中,模型压缩系统还包括:模型训练模块、性能评估模块。In one embodiment, the model compression system further includes: a model training module and a performance evaluation module.
如图6所示,为本实施例的结构示意图,包括:接收模块401、模型自动搜索模块402、算力压缩模块403、模型训练模块404、性能评估模块405。As shown in FIG. 6 , it is a schematic structural diagram of this embodiment, including: a receiving
具体地说,接收模块401连接模型训练模块404,模型训练模块404连接性能评估模块405,性能评估模块405分别连接模型自动搜索模块402、算力压缩模块403,模型自动搜索模块402、算力压缩模块403连接接收模块401;模型训练模块404用于接收候选模型并对候选模型进行训练得到训练后模型;性能评估模块405用于对训练后模型进行性能评估。Specifically, the receiving
具体地说,模型训练模块404主要有两个功能:①训练候选模型,得到训练后模型;②对每个候选模型进行验证,得到每个候选模型的准确率以及算力。模型训练模块会将得到的这两个模型指标指会输入到性能评估模块中。为了高效利用系统资源,模型训练模块采用分布式并行训练方式。Specifically, the model training module 404 mainly has two functions: ① train candidate models to obtain the trained model; ② verify each candidate model to obtain the accuracy rate and computing power of each candidate model. The model training module will input the obtained two model indicators into the performance evaluation module. In order to efficiently utilize system resources, the model training module adopts a distributed parallel training method.
具体地说,性能评估模块405会接收每个候选模型的准确率和算力,对每个候选模型的综合性能进行评估,得到每个候选模型的评估分数,并保留满足预设规则的候选模型,删除不满足预设规则的候选模型。Specifically, the
具体地说,模型自动搜索模块402接收候选模型,基于候选模型进行神经网络架构搜索。针对用户设定的搜索空间或默认的全搜索空间按照搜索策略进行网络架构搜索。Specifically, the model
具体地说,算力压缩模块403接收候选模型,该模块是本系统的核心模块,负责对候选模型的算力进行定向压缩,会对模型算力密集区进行检测,并对模型算力消耗最大的层进行定向的压缩处理。Specifically, the computing
不难发现,本实施例为与上一实施例相对应的系统实施例,本实施例可与上一实施例互相配合实施。上一实施例中提到的相关技术细节在本实施例中依然有效,为了减少重复,这里不再赘述。相应地,本实施例中提到的相关技术细节也可应用在上一实施例中。It is not difficult to find that this embodiment is a system embodiment corresponding to the previous embodiment, and this embodiment can be implemented in cooperation with the previous embodiment. The relevant technical details mentioned in the previous embodiment are still valid in this embodiment, and will not be repeated here in order to reduce repetition. Correspondingly, the relevant technical details mentioned in this embodiment can also be applied in the previous embodiment.
值得一提的是,本实施例中所涉及到的各模块均为逻辑模块,在实际应用中,一个逻辑单元可以是一个物理单元,也可以是一个物理单元的一部分,还可以以多个物理单元的组合实现。此外,为了突出本申请的创新部分,本实施例中并没有将与解决本申请所提出的技术问题关系不太密切的单元引入,但这并不表明本实施例中不存在其它的单元。It is worth mentioning that all the modules involved in this embodiment are logical modules. In practical applications, a logical unit can be a physical unit, or a part of a physical unit, or multiple physical units. Combination of units. In addition, in order to highlight the innovative part of the present application, units that are not closely related to solving the technical problem proposed in the present application are not introduced in this embodiment, but this does not mean that there are no other units in this embodiment.
本申请一实施例涉及一种服务器,如图7所示,包括至少一个处理器501;以及,与至少一个处理器501通信连接的存储器502;其中,存储器502存储有可被至少一个处理器501执行的指令,指令被至少一个处理器501执行,以使至少一个处理器501能够执行如上述的通讯控制方法。An embodiment of the present application relates to a server, as shown in FIG. 7 , including at least one
其中,存储器502和处理器501采用总线方式连接,总线可以包括任意数量的互联的总线和桥,总线将一个或多个处理器501和存储器502的各种电路连接在一起。总线还可以将 诸如外围设备、稳压器和功率管理电路等之类的各种其他电路连接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口在总线和收发机之间提供接口。收发机可以是一个元件,也可以是多个元件,比如多个接收器和发送器,提供用于在传输介质上与各种其他装置通信的单元。经处理器501处理的数据通过天线在无线介质上进行传输,进一步,天线还接收数据并将数据传送给处理器501。Wherein, the
处理器501负责管理总线和通常的处理,还可以提供各种功能,包括定时,外围接口,电压调节、电源管理以及其他控制功能。而存储器502可以被用于存储处理器501在执行操作时所使用的数据。
本申请一实施例涉及一种计算机可读存储介质,存储有计算机程序。计算机程序被处理器执行时实现上述方法实施例。An embodiment of the present application relates to a computer-readable storage medium storing a computer program. The above method embodiments are implemented when the computer program is executed by the processor.
即,本领域技术人员可以理解,实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。That is, those skilled in the art can understand that all or part of the steps in the method of the above-mentioned embodiments can be completed by instructing related hardware through a program, the program is stored in a storage medium, and includes several instructions to make a device ( It may be a single-chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .
本领域的普通技术人员可以理解,上述各实施例是实现本申请的具体实施例,而在实际应用中,可以在形式上和细节上对其作各种改变,而不偏离本申请的精神和范围。Those of ordinary skill in the art can understand that the above-mentioned embodiments are specific embodiments for realizing the present application, and in practical applications, various changes can be made to it in form and details without departing from the spirit and spirit of the present application. scope.
Claims (13)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202111266014.4A CN114004334B (en) | 2021-10-28 | 2021-10-28 | Model compression method, model compression system, server and storage medium |
| CN202111266014.4 | 2021-10-28 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023071766A1 true WO2023071766A1 (en) | 2023-05-04 |
Family
ID=79924841
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2022/124433 Ceased WO2023071766A1 (en) | 2021-10-28 | 2022-10-10 | Model compression method, model compression system, server, and storage medium |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN114004334B (en) |
| WO (1) | WO2023071766A1 (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114004334B (en) * | 2021-10-28 | 2025-07-04 | 中兴通讯股份有限公司 | Model compression method, model compression system, server and storage medium |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110490323A (en) * | 2019-08-20 | 2019-11-22 | 腾讯科技(深圳)有限公司 | Network model compression method, device, storage medium and computer equipment |
| CN111178546A (en) * | 2019-12-31 | 2020-05-19 | 华为技术有限公司 | Search method for machine learning model and related devices and equipment |
| US20200302271A1 (en) * | 2019-03-18 | 2020-09-24 | Microsoft Technology Licensing, Llc | Quantization-aware neural architecture search |
| CN111814966A (en) * | 2020-08-24 | 2020-10-23 | 国网浙江省电力有限公司 | Neural network architecture search method, neural network application method, device and storage medium |
| CN113128661A (en) * | 2020-01-15 | 2021-07-16 | 富士通株式会社 | Information processing apparatus, information processing method, and computer program |
| CN113159188A (en) * | 2021-04-23 | 2021-07-23 | 北京达佳互联信息技术有限公司 | Model generation method, device, equipment and storage medium |
| CN114004334A (en) * | 2021-10-28 | 2022-02-01 | 中兴通讯股份有限公司 | Model compression method, model compression system, server and storage medium |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP4614357A1 (en) * | 2022-11-03 | 2025-09-10 | Beijing Xiaomi Mobile Software Co., Ltd. | Model selection method and apparatus |
| CN116167990B (en) * | 2023-01-28 | 2024-06-25 | 阿里巴巴(中国)有限公司 | Target recognition and neural network model processing method based on image |
-
2021
- 2021-10-28 CN CN202111266014.4A patent/CN114004334B/en active Active
-
2022
- 2022-10-10 WO PCT/CN2022/124433 patent/WO2023071766A1/en not_active Ceased
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20200302271A1 (en) * | 2019-03-18 | 2020-09-24 | Microsoft Technology Licensing, Llc | Quantization-aware neural architecture search |
| CN110490323A (en) * | 2019-08-20 | 2019-11-22 | 腾讯科技(深圳)有限公司 | Network model compression method, device, storage medium and computer equipment |
| CN111178546A (en) * | 2019-12-31 | 2020-05-19 | 华为技术有限公司 | Search method for machine learning model and related devices and equipment |
| CN113128661A (en) * | 2020-01-15 | 2021-07-16 | 富士通株式会社 | Information processing apparatus, information processing method, and computer program |
| CN111814966A (en) * | 2020-08-24 | 2020-10-23 | 国网浙江省电力有限公司 | Neural network architecture search method, neural network application method, device and storage medium |
| CN113159188A (en) * | 2021-04-23 | 2021-07-23 | 北京达佳互联信息技术有限公司 | Model generation method, device, equipment and storage medium |
| CN114004334A (en) * | 2021-10-28 | 2022-02-01 | 中兴通讯股份有限公司 | Model compression method, model compression system, server and storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| CN114004334B (en) | 2025-07-04 |
| CN114004334A (en) | 2022-02-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP2023510566A (en) | Adaptive search method and apparatus for neural networks | |
| WO2024168972A1 (en) | Target detection model training method, target detection method, device, and medium | |
| CN115221102B (en) | Method for optimizing convolution operation of system-on-chip and related product | |
| CN113467851B (en) | Dynamic vehicle computing task unloading method and device based on vehicle clustering | |
| CN110677402A (en) | Data integration method and device based on intelligent network card | |
| CN114637650A (en) | Elastic expansion method based on Kubernetes cluster | |
| CN114462577A (en) | Federated learning system, method, computer equipment and storage medium | |
| WO2024016569A1 (en) | Index recommendation method and apparatus based on data feature | |
| CN110955642A (en) | Data acquisition optimization method, device and equipment and readable storage medium | |
| CN118071161B (en) | Method and system for evaluating threat of air cluster target under small sample condition | |
| CN117892769A (en) | Neural network training method, video memory scheduling method, system, device and product | |
| WO2023071766A1 (en) | Model compression method, model compression system, server, and storage medium | |
| CN114707636A (en) | Neural network architecture searching method and device, electronic equipment and storage medium | |
| CN113591629A (en) | Finger three-mode fusion recognition method, system, device and storage medium | |
| CN117394342A (en) | A power load forecasting method based on VMD and improved gray wolf algorithm | |
| CN120371781B (en) | AI agent memory management method and system based on hot and cold stratification | |
| CN113946717B (en) | A method, apparatus, device, and storage medium for obtaining subgraph index features | |
| CN116560933A (en) | Method, device and computer for determining frequency of CPU core | |
| CN113627593A (en) | Automatic quantification method of target detection model fast R-CNN | |
| CN118113458A (en) | SQL traffic scheduling method, system, medium and device based on machine learning | |
| WO2020001095A1 (en) | Context uncertainty elimination system based on hierarchical comprehensive quality index qox, and working method therefor | |
| CN117853915A (en) | Improved PP-PicoDet-based wheat yield estimation method and system | |
| CN119322912B (en) | Matrix operation processing method of parallel computing device and related equipment | |
| CN119377429B (en) | Image archiving method, device, electronic device and storage medium | |
| CN114385628B (en) | Data processing method and device, electronic device, and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22885645 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 22885645 Country of ref document: EP Kind code of ref document: A1 |