[go: up one dir, main page]

WO2020087281A1 - Procédé et appareil d'optimisation d'hyper-paramètres - Google Patents

Procédé et appareil d'optimisation d'hyper-paramètres Download PDF

Info

Publication number
WO2020087281A1
WO2020087281A1 PCT/CN2018/112712 CN2018112712W WO2020087281A1 WO 2020087281 A1 WO2020087281 A1 WO 2020087281A1 CN 2018112712 W CN2018112712 W CN 2018112712W WO 2020087281 A1 WO2020087281 A1 WO 2020087281A1
Authority
WO
WIPO (PCT)
Prior art keywords
hyperparameters
optimization
machine learning
value
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2018/112712
Other languages
English (en)
Chinese (zh)
Inventor
蒋阳
赵丛
张李亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SZ DJI Technology Co Ltd
Original Assignee
SZ DJI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SZ DJI Technology Co Ltd filed Critical SZ DJI Technology Co Ltd
Priority to PCT/CN2018/112712 priority Critical patent/WO2020087281A1/fr
Priority to CN201880038686.XA priority patent/CN110770764A/zh
Publication of WO2020087281A1 publication Critical patent/WO2020087281A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • This application relates to the field of computer technology, and in particular, to a hyperparameter optimization method and device.
  • the parameters of machine learning algorithms mainly include hyper-parameters and ordinary parameters. Among them, common parameters can be learned and estimated from the data; hyperparameters cannot be estimated from the data, and can only be specified by human experience design, hyperparameters are parameters that need to be set before starting the learning process. Hyperparameters define higher-level concepts about machine learning models, such as complexity or learning ability. For example, the hyperparameters may include and are not limited to: regular term coefficients, learning rate, network structure, width and depth of convolution kernel, and so on.
  • the adjustment of hyperparameters has a very large impact on the performance of machine learning algorithms.
  • the adjustment of hyperparameters is a black box operation (black box), which often requires algorithm designers to get through a lot of debugging, and designers need to have in this field.
  • the deeper accumulation requires a lot of time and effort, and even the optimal results are often not obtained, and the optimization efficiency is low.
  • the desired hyperparameters can be obtained by modeling the unknown function and searching for its global optimal solution.
  • Bayesian optimization algorithm Boyesian Optimization Algorithm, BOA
  • BOA Bayesian Optimization Algorithm
  • the number of hyperparameters that need to be optimized may be very large, which makes it difficult to solve the global optimal solution of the unknown function in a high-dimensional space. It is often stuck in the local optimal solution and cannot Get better results.
  • the present application provides a hyperparameter optimization method and device, which can realize a dimensionality reduction search for hyperparameters, and at the same time can weaken the assumption of limiting the solution space, so as to obtain better hyperparameter optimization results.
  • a hyperparameter optimization method includes: dividing the hyperparameters to be optimized by machine learning into N groups of hyperparameters, where N is an integer greater than 1; and performing Bayesian optimization on the N groups of hyperparameters, respectively To obtain the optimized hyperparameters, where in the process of Bayesian optimization of each group of hyperparameters, the values of the remaining groups of hyperparameters are fixed to the latest values.
  • a hyperparameter optimization device in a second aspect, includes: a division unit that divides the hyperparameters that need to be optimized for machine learning into N sets of hyperparameters, where N is an integer greater than 1; Bayesian optimization is performed on the group of hyperparameters to obtain optimized hyperparameters. In the process of performing Bayesian optimization on each group of hyperparameters, the values of the remaining groups of hyperparameters are fixed to the latest values.
  • an apparatus for processing video images includes a memory and a processor.
  • the memory is used to store instructions
  • the processor is used to execute instructions stored in the memory.
  • the execution of the instructions stored in the memory causes the processor to execute the first On the one hand provides the optimization method.
  • a chip is provided.
  • the chip includes a processing module and a communication interface.
  • the processing module is used to control the communication interface to communicate with the outside.
  • the processing module is also used to implement the optimization method provided in the first aspect.
  • a computer-readable storage medium on which a computer program is stored, which when executed by a computer causes the computer to implement the optimization method provided in the first aspect.
  • a computer program product containing instructions which when executed by a computer causes the computer to implement the optimization method provided in the first aspect.
  • the solution provided by this application performs Bayesian optimization on the hyperparameter grouping that needs to be optimized for machine learning, on the one hand, it can realize the dimensionality reduction search for the hyperparameters, and on the other hand, it can weaken the limitation of the dimensionality reduction assumption.
  • Figure 1 is a schematic diagram of the basic principle of the Bayesian optimization algorithm.
  • FIG. 2 is a schematic flowchart of a hyperparameter optimization method provided by an embodiment of the present application.
  • FIG. 3 is another schematic flowchart of a hyperparameter optimization method provided by an embodiment of the present application.
  • FIG. 4 is a schematic block diagram of a hyperparameter optimization apparatus provided by an embodiment of the present application.
  • FIG. 5 is another schematic block diagram of a hyperparameter optimization apparatus provided by an embodiment of the present application.
  • Bayesian optimization algorithm (Bayesian Optimization Algorithm, BOA) is an algorithm for solving the global optimal solution of unknown functions.
  • D is the candidate set of s.
  • the goal of Bayesian optimization is to select an s from D, so that the value of the unknown function f (s) is the smallest (or largest).
  • the unknown function f (s) can be called the objective function.
  • the first step is to make a priori assumption about the function space distribution of the objective function f (s), that is to say, the function space distribution of f (s) is a priori distribution.
  • the a priori assumption usually uses Gaussian process prior (Gaussian process prior). For example, suppose the spatial distribution of the function of f (s) is Gaussian (Gaussian distribution).
  • the first step also includes obtaining at least two sample values and obtaining at least two observation values corresponding to these sample values.
  • the observed values are f (s 0 ) and f (s 1 ).
  • sampling values can be selected from the candidate set D by sampling and the like s 0 and s 1 .
  • the first step also includes using at least two observations to update the average and variance of the prior distribution to obtain a posterior distribution.
  • the modified Gaussian distribution model is the posterior distribution of f (s).
  • the acquisition function is constructed using the posterior distribution, and the acquisition function is used to calculate the next sample value.
  • the process of the second step is specifically to select the next sampled value s i from the modified Gaussian distribution model
  • the selection criterion is that, relative to the candidate set D
  • the acquisition function mentioned in the second step is to average the smaller (or larger, PS: f (s) is the loss function here, if f (s) represents the accuracy of the model)
  • PS the smaller (or larger)
  • f (s) the loss function here, if f (s) represents the accuracy of the model
  • the observation value corresponding to the sampling value obtained in the second step is obtained, and whether the sampling value is the optimal solution is judged according to the observation value. If it is, the Bayesian optimization process ends, if not, go to the fourth step.
  • the adopted value can be substituted into the objective function f (s) to calculate the observed value.
  • the observation value obtained in the third step is used to continue to modify the posterior distribution, and the process goes to the second step. That is, the second step, the third step, and the fourth step are repeatedly executed until convergence (that is, the optimal solution is obtained in the third step).
  • the Bayesian optimization algorithm can be used to adjust the hyperparameters of the machine learning model (also called optimization).
  • the hyperparameter adjustment process of machine learning is regarded as solving the maximum value problem in the Bayesian optimization algorithm, in which the hyperparameters to be optimized are regarded as s, and the candidate values of the hyperparameters to be optimized constitute the candidate set D, and then passed
  • the Bayesian optimization process shown in 1 looks for the global optimal solution of the objective function, and the optimized hyperparameters can be obtained.
  • the loss function is generally used as the objective function.
  • the loss function is used to estimate the degree of inconsistency between the predicted value and the true value of the machine learning model. It can be a non-negative real-valued function. Assuming that the independent variable of the machine learning model g () is X and the dependent variable is Y, taking the samples (X i , Y i ) as an example, the predicted value of the machine learning model is g (X i ), and the true value of the machine learning model is Y i .
  • loss functions There are many common loss functions, for example, log loss function, square loss function (also called least squares loss function), exponential loss function and other loss functions.
  • n is the number of samples
  • g (X i ) represents the predicted value of the machine learning model
  • Y i represents the true value of the machine learning model
  • Y i -g (X i ) represents the difference between the predicted value and the true value of the machine learning model
  • L (Y, g (X)) represents the sum of squares of the residuals in the sample space.
  • Bayesian optimization is used as the objective function in the Bayesian optimization algorithm.
  • the purpose of Bayesian optimization is to minimize the value of the square loss function, so as to obtain the optimized hyperparameters.
  • the hyperparameters to be optimized are usually defined as a multi-dimensional vector S.
  • the process of Bayesian optimization is the process of searching the optimal value of the vector S.
  • the number of hyperparameters that need to be optimized may be very large, resulting in a very high dimension of the vector S. It is very difficult to solve the global optimal solution of the unknown function in a high-dimensional space, and it is often stuck The local optimal solution cannot obtain good results.
  • the existing solutions aim at high-dimensional hyperparameters and assume that the solution space of the global optimal solution of the unknown function is a relatively low-dimensional solution space, and then directly perform Bayesian optimization in the hypothetical low-dimensional solution space. This makes the hypothesis strategy from the solution space of the global optimal solution of the unknown function to the relatively low-dimensional solution space have a great influence on the Bayesian optimization results. If the hypothesis strategy is unreasonable, it will lead to poor optimization results. This makes the algorithm not robust enough.
  • This application proposes a hyperparameter optimization scheme, which can realize the dimensionality reduction search for hyperparameters, and at the same time can weaken the assumption of limiting the solution space, so as to obtain better hyperparameter optimization results.
  • FIG. 2 is a schematic flowchart of a hyperparameter optimization method provided by an embodiment of the present application.
  • the optimization method includes the following steps.
  • the hyperparameters to be optimized include N sets of hyperparameters, and N is an integer greater than 1.
  • the hyperparameters to be optimized for machine learning may be divided into N groups in advance.
  • the hyperparameters that need to be optimized for machine learning may be divided into N groups in real time when optimization is needed.
  • the grouping strategy for the hyperparameters that need to be optimized may be different.
  • the number of hyperparameters included in each group of hyperparameters in the N sets of hyperparameters is less than the total number of hyperparameters that need to be optimized in machine learning.
  • S220 Perform Bayesian optimization on the N groups of hyperparameters respectively to obtain optimized hyperparameters. In the process of performing Bayesian optimization on each group of hyperparameters, the values of the remaining groups of hyperparameters are fixed to the latest values. value.
  • the Bayesian optimization algorithm shown in Figure 1 can be used to implement.
  • the values of the remaining groups of hyperparameters are fixed to the latest Value.
  • the values of the remaining groups of hyperparameters can be determined by sampling.
  • each Bayesian optimization process Bayesian optimization is performed on the solution space corresponding to a set of hyperparameters, because the dimension of each set of hyperparameters is smaller than the total dimension of the hyperparameters that machine learning needs to optimize Therefore, the dimensionality reduction search for hyperparameters can be realized, and the optimal solution can be avoided from being stuck in the local optimal solution.
  • the dimensionality reduction search can be performed on the hyperparameters
  • each set of hyperparameters in the N sets of hyperparameters that need to be optimized by machine learning includes at least one hyperparameter.
  • the number of hyperparameters included in each group in the N groups of hyperparameters may be the same, that is, the dimensions of each group of hyperparameters may be the same.
  • the number of hyperparameters included in different groups among the N sets of hyperparameters may also be different, that is, the dimensions of different groups of hyperparameters may not be completely the same.
  • the N sets of hyperparameters are obtained by randomly grouping the hyperparameters that need to be optimized.
  • the N sets of hyperparameters are obtained by grouping the hyperparameters that need to be optimized through experience.
  • the N sets of hyperparameters are divided according to the type of hyperparameters in machine learning.
  • Hyperparameters can include at least two of the following: kernel size (kernel), kernel number (kernel), convolution step (stride), jumper connection (shortcut connection), and sum operation ( add) and selection of concatenation operation (concat), number of branches, number of layers (layer), number of iterations (epoch), initialization parameters (such as MSRA initialization and Xaiver initialization), regular term coefficients, learning rate, neural network structure, neural The number of layers of the network.
  • the hyperparameter types of different groups of hyperparameters in the N groups of hyperparameters may not be completely the same.
  • hyperparameters have different hyperparameter types.
  • the hyper-parameters to be optimized are grouped according to the hyper-parameter type, and then each group of hyper-parameters are optimized separately, so that the optimization efficiency of the hyper-parameters can be improved to a certain extent.
  • the grouping strategy for the hyperparameters to be optimized is fixed.
  • the grouping strategy for the hyperparameters that need to be optimized may be different or the same, which is not limited in this application and can be determined according to actual needs.
  • an implementation manner of step S220 is: using at least one round of Bayesian optimization operations to obtain optimized hyperparameters, where each round of Bayesian optimization operations includes: Bayesian optimization is performed on the i-th group of hyperparameters in the group of hyperparameters.
  • each round of Bayesian optimization operations includes: Bayesian optimization is performed on the i-th group of hyperparameters in the group of hyperparameters.
  • the values of the remaining groups of hyperparameters are fixed to the latest values, and i traverses 1 , 2, ..., N.
  • Bayesian optimization is performed on N sets of hyperparameters in each round of Bayesian optimization operation, in other words, in the process of obtaining optimized hyperparameters, each hyperparameter that needs to be optimized by machine learning All are optimized by Bayesian optimization algorithm, therefore, the limitation of the dimensionality reduction hypothesis can be weakened.
  • the dimensionality reduction search can be performed on the hyperparameters, and on the other hand, the limitation of the dimensionality reduction hypothesis can be weakened .
  • step S220 two or three or more rounds of Bayesian optimization operations are performed to obtain optimized hyperparameters, where each round of Bayesian optimization operations includes: Bayesian optimization is performed on the i-th group of hyperparameters.
  • Bayesian optimization is performed on the i-th group of hyperparameters.
  • the values of the remaining groups of hyperparameters are fixed to the latest values, i traverses 1, 2, ..., N.
  • Bayesian optimization of alternative optimization The method of performing Bayesian optimization on N sets of hyperparameters in each round of Bayesian optimization operation can be referred to as Bayesian optimization of alternative optimization.
  • the embodiment of the present application introduces the idea of alternating optimization into the process of Bayesian optimization, which can achieve effective dimensionality reduction for the high-dimensional search space, weaken the assumption limitations in the existing research technology, and help to search for the optimal solution. parameter.
  • the entire process of optimizing hyperparameters in the embodiments of the present application is as follows.
  • the hyperparameter adjustment process of machine learning is regarded as the objective function f (S).
  • S represents the hyperparameters that need to be optimized.
  • S ⁇ D, D represents the sample space of the hyper-parameter S that needs to be optimized.
  • the objective function f (S) may be a loss function.
  • the process of sampling from D i to the sampled value and obtaining the observation value may be that the sampled value is brought into the objective function f (S) to obtain the observation value corresponding to the sampled value.
  • the objective function of Bayesian optimization is a loss function.
  • the objective function of Bayesian optimization may be any of the following: log loss function, square loss function (also called least squares loss function), and exponential loss function.
  • a loss function can be selected as the objective function of Bayesian optimization according to the needs of the actual application.
  • the objective function f (S) of Bayesian optimization is as follows:
  • (X, Y) is the sample.
  • g (X) represents the machine learning model
  • X represents the independent variable of the machine learning model
  • Y represents the dependent variable of the machine learning model.
  • n represents the number of samples.
  • the samples here refer to (X, Y) samples.
  • g (X i ) represents the predicted value of the machine learning model.
  • Y i represents the true value of the machine learning model.
  • Y i -g (X i ) represents the residual between the predicted value and the true value of the machine learning model.
  • L (Y, g (X)) represents the sum of squared residuals in the sample space.
  • the samples used in the Bayesian optimized objective function may be training set samples, or test set samples, or training set samples and test set samples.
  • (X, Y) is the sample.
  • g (X) represents the machine learning model
  • X represents the independent variable of the machine learning model
  • Y represents the dependent variable of the machine learning model.
  • g (X i ) represents the predicted value of the machine learning model.
  • Y i represents the true value of the machine learning model.
  • Y i -g (X i ) represents the residual between the predicted value and the true value of the machine learning model.
  • L (Y, g (X)) represents the sum of squared residuals in the sample space.
  • the sample space is the sample space of the training set
  • n represents the number of samples in the training set.
  • the sample space is the sample space of the test set
  • n represents the number of samples in the test set.
  • the sample space is a sample space composed of a training set and a test set, and n represents the total number of samples in the training set and the test set.
  • each value of the hyperparameter corresponds to a machine learning model.
  • the values of hyperparameters are different, and the corresponding machine learning models are also different. Therefore, in the Bayesian optimization process of hyperparameters, each time the value of the hyperparameter is updated, the machine learning model used in the objective function should also be updated.
  • the machine learning model corresponding to the value of each hyperparameter can be obtained through training.
  • any existing feasible model training method may be used to train the machine learning model corresponding to each hyperparameter value, which is not limited in this application.
  • the observation value in the Bayesian optimization process is determined according to the loss function used by the machine learning model in the training process.
  • the observation value corresponding to a sampled value of the i-th set of hyperparameters is determined by the following formula:
  • T_loss (j) is the loss value of the machine learning model on the training set samples after the jth round of training
  • V_loss (j) is the loss value of the machine learning model on the test set samples after the jth round of training
  • w 1 and w 2 are the weights of T_loss (j) and V_loss (j)
  • w 1 and w 2 are not simultaneously zero.
  • the number of trainings of the control machine learning model is less than a preset value.
  • the number of trainings to control the machine learning model is less than 20.
  • the convergence time of the machine learning model or the number of training times of the machine learning model directly affects the optimization speed of the hyperparameter.
  • the embodiment of the present application can increase the optimization speed of the hyperparameters by limiting the training times of the machine learning model to less than a preset value.
  • the final performance of the model is related to the initial performance of the model training.
  • the final performance of the model will also be monotonically convergent; if the model no longer monotonically converges (ie, diverges) at the initial stage of training, then the final performance of the model will no longer monotonically converge.
  • the number of training rounds should be controlled within the preset value.
  • controlling the number of training times of the machine learning model corresponding to each update of the i-th set of hyperparameters is less than a preset value, including: the machine corresponding to the value of each update of the i-th set of hyperparameters
  • the early stop strategy is adopted, so that the training times of the machine learning model are less than the preset value.
  • the preset value is 20.
  • the machine learning model corresponding to each hyperparameter only 20 training stops. If the number of training rounds is less than 20, the machine learning model no longer converges monotonously, so it stops early.
  • the machine learning model converges monotonously and the training is also stopped.
  • the solution of the embodiment of the present application may be applied to the hyper-parameter adjustment process of deep learning.
  • step 220 is mainly described as an example shown in FIG. 3.
  • the implementation of step 220 includes, but is not limited to, the method shown in FIG. 3.
  • Bayesian optimization is performed on the N sets of hyperparameters during the process of obtaining optimized hyperparameters, all such schemes fall into the protection scope of the present application.
  • each round of Bayesian optimization operation on the first N1 group of hyperparameters includes: performing Bayesian optimization on the i-th group of hyperparameters in the N1 group of hyperparameters, where Bayesian optimization is performed on the i-th group of hyperparameters
  • the values of the remaining sets of hyperparameters are fixed to the latest value, i traverses 1, 2, ..., N1.
  • Each round of Bayesian optimization operations on the last N2 group of hyperparameters includes: performing Bayesian optimization on the i-th group of hyperparameters in the N2 group of hyperparameters, where Bayesian optimization is performed on the i-th group of hyperparameters During the process, the values of the remaining sets of hyperparameters are fixed to the latest value, i traverses 1, 2, ..., N2.
  • the first group of hyperparameters and the second group of hyperparameters are alternately optimized as follows to obtain the optimized first group of hyperparameters and second group of hyperparameters: perform at least one round of Bayesian optimization Operation, each round of Bayesian optimization operations includes: Bayesian optimization of the first group of hyperparameters, in the process, the value of the remaining group of hyperparameters is fixed to the latest value; the Bayesian of the second group of hyperparameters In the process of optimization, the value of the rest of the hyperparameters is fixed to the latest value.
  • the third group of hyperparameters, the fourth group of hyperparameters and the fifth group of hyperparameters are alternately optimized as follows to obtain the optimized third group of hyperparameters,
  • the fourth group of hyperparameters and the fifth group of hyperparameters perform at least one round of Bayesian optimization operations, each round of Bayesian optimization operations includes: Bayesian optimization of the third group of hyperparameters, in the process, the rest The values of the group hyperparameters are the latest values; Bayesian optimization is performed on the group 4 hyperparameters. In the process, the values of the other group hyperparameters are fixed to the latest values; Bayesian is performed on the group 5 hyperparameters In the process of optimization, the value of the rest of the hyperparameters is fixed to the latest value.
  • the dimensionality reduction search can be performed on the hyperparameters, and on the other hand, the limitation of the dimensionality reduction assumption can be weakened.
  • solution provided by the present application may be, but not limited to, optimization of hyperparameters in machine learning, and may also be applied to other scenarios where a global optimal solution of an unknown function needs to be solved.
  • FIG. 4 is a schematic block diagram of a hyperparameter optimization apparatus 400 provided by an embodiment of the present application.
  • the device 400 includes the following units.
  • the dividing unit 410 divides the hyper-parameters to be optimized for machine learning into N sets of hyper-parameters, where N is an integer greater than 1;
  • the optimization unit 420 is used to perform Bayesian optimization on N groups of hyperparameters respectively to obtain optimized hyperparameters. In the process of performing Bayesian optimization on each group of hyperparameters, the values of the remaining groups of hyperparameters are fixed Take the latest value.
  • each Bayesian optimization process Bayesian optimization is performed on the solution space corresponding to a set of hyperparameters, because the dimension of each set of hyperparameters is smaller than the total dimension of the hyperparameters that machine learning needs to optimize Therefore, the dimensionality reduction search for hyperparameters can be realized, and the optimal solution can be avoided from being stuck in the local optimal solution.
  • the dimensionality reduction search can be performed on the hyperparameters, and on the other hand, the limitation of the dimensionality reduction hypothesis can be weakened .
  • the optimization unit 420 is configured to obtain optimized hyperparameters using at least one round of Bayesian optimization operations, where each round of Bayesian optimization operations includes: Bayesian optimization is performed on the i-th set of hyperparameters. During the Bayesian optimization of the i-th set of hyperparameters, the values of the remaining sets of hyperparameters are fixed to the latest values, i traverses 1, 2, ... , N.
  • the N sets of hyperparameters are optimized separately, and the order of optimization may make the optimization of each hyperparameter group different.
  • multiple rounds of optimization are performed.
  • the Bayesian optimization operation can weaken this difference to a certain extent, thereby further weakening the limitation of the dimensionality reduction hypothesis.
  • the dimensionality reduction search can be performed on the hyperparameters, and on the other hand, the limitation of the dimensionality reduction hypothesis can be weakened .
  • the number of hyperparameters included in each group of the N groups of hyperparameters may be the same, that is, the dimension of each group of hyperparameters may be the same.
  • the number of hyperparameters included in different groups among the N sets of hyperparameters may also be different, that is, the dimensions of different groups of hyperparameters may not be completely the same.
  • the N sets of hyperparameters are divided according to the type of hyperparameters in machine learning.
  • the hyper-parameters may include at least two of the following: kernel size (kernel size), kernel number (kernel), convolution stride (stride), jumper connection ( shortcut connection method), addition and concatenation (concat) selection, number of branches, number of layers (layer), number of iterations (epoch), initialization parameters (such as MSRA initialization and Xaiver initialization), regular term coefficients , Learning rate, neural network structure, the number of layers of the neural network.
  • the hyperparameter types of different groups of hyperparameters in the N groups of hyperparameters may not be completely the same.
  • hyperparameters have different hyperparameter types.
  • the hyper-parameters to be optimized are grouped according to the hyper-parameter type, and then each group of hyper-parameters are optimized separately, so that the optimization efficiency of the hyper-parameters can be improved to a certain extent.
  • the objective function of Bayesian optimization is a loss function
  • the samples used in the loss function are training set samples and / or test set samples.
  • the observation values used by Bayesian optimization are based on the loss values used in model training by the machine learning model corresponding to each group of hyperparameters determine.
  • the observation value Loss corresponding to one sample value of each group of hyperparameters is determined by the following formula:
  • epoch is the number of training rounds of the machine learning model corresponding to the current value of each group of hyperparameters
  • T_loss (j) is the loss value of the machine learning model on the training set samples after the jth round of training
  • V_loss ( j) is the loss value of the machine learning model on the test set samples after the jth round of training
  • w 1 and w 2 are the weights of T_loss (j) and V_loss (j) respectively, and w 1 and w 2 are not zero at the same time .
  • the optimization unit 420 is configured to control the number of trainings of the machine learning model to be less than a preset value during Bayesian optimization of each set of hyperparameters.
  • the optimization unit 420 is configured to adopt an early stop strategy so that the number of trainings of the machine learning model is less than a preset value.
  • the dividing unit 410 is used to divide the hyperparameters that need to be optimized for machine learning into N sets of hyperparameters according to the application scenario of machine learning, and N is an integer greater than 1.
  • the machine learning model is a deep learning model.
  • an embodiment of the present application further provides a hyperparameter optimization apparatus 500, which includes a processor 510 and a memory 520.
  • the memory 520 is used to store instructions
  • the processor 510 is used to execute instructions stored in the memory 520.
  • the execution of the instructions stored in the memory 520 makes the processor 510 be used to execute the optimization method in the above method embodiment.
  • Execution of the instructions stored in the memory 520 causes the processor 510 to be used to perform the actions performed by the dividing unit 410 and the optimization unit 420 in the above-described embodiments.
  • the apparatus 500 may further include a communication interface 530 for exchanging signals with external devices.
  • the processor 510 is used to control the interface 530 to receive and / or send signals.
  • Embodiments of the present application also provide a computer storage medium on which a computer program is stored.
  • the computer program executes the optimization method in the foregoing method embodiments.
  • Embodiments of the present application also provide a computer program product containing instructions, which when executed by a computer causes the computer to execute the optimization method in the foregoing method embodiments.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a dedicated computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transferred from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be from a website site, computer, server or data center Transmission to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device including a server, a data center, and the like integrated with one or more available media.
  • the available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, digital video disc (DVD)), or semiconductor media (eg, solid state disk (SSD)), etc. .
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the units is only a division of logical functions.
  • there may be other divisions for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Feedback Control In General (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention concerne un procédé et un appareil d'optimisation d'hyper-paramètres. Le procédé consiste : à diviser des hyper-paramètres qui doivent être optimisés pour un apprentissage machine en N groupes d'hyper-paramètres ; et à effectuer séparément une optimisation bayésienne sur les N groupes d'hyper-paramètres pour obtenir des hyper-paramètres optimisés, pendant le processus d'optimisation bayésienne de chaque groupe d'hyper-paramètres, les valeurs des groupes d'hyper-paramètres restants étant fixées aux dernières valeurs. La réalisation d'une optimisation bayésienne sur des groupes d'hyper-paramètres qui doivent être optimisés pour un apprentissage machine, peut mettre en œuvre une recherche de réduction de dimensionnalité pour les hyper-paramètres et peut également affaiblir les limites pour une hypothèse de réduction de dimensionnalité.
PCT/CN2018/112712 2018-10-30 2018-10-30 Procédé et appareil d'optimisation d'hyper-paramètres Ceased WO2020087281A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2018/112712 WO2020087281A1 (fr) 2018-10-30 2018-10-30 Procédé et appareil d'optimisation d'hyper-paramètres
CN201880038686.XA CN110770764A (zh) 2018-10-30 2018-10-30 超参数的优化方法及装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/112712 WO2020087281A1 (fr) 2018-10-30 2018-10-30 Procédé et appareil d'optimisation d'hyper-paramètres

Publications (1)

Publication Number Publication Date
WO2020087281A1 true WO2020087281A1 (fr) 2020-05-07

Family

ID=69328799

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/112712 Ceased WO2020087281A1 (fr) 2018-10-30 2018-10-30 Procédé et appareil d'optimisation d'hyper-paramètres

Country Status (2)

Country Link
CN (1) CN110770764A (fr)
WO (1) WO2020087281A1 (fr)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112598133A (zh) * 2020-12-16 2021-04-02 联合汽车电子有限公司 车辆数据的处理方法、装置、设备和存储介质
CN112990480A (zh) * 2021-03-10 2021-06-18 北京嘀嘀无限科技发展有限公司 构建模型的方法、装置、电子设备和存储介质
CN113052252A (zh) * 2021-03-31 2021-06-29 北京字节跳动网络技术有限公司 超参数确定方法、装置、深度强化学习框架、介质及设备
CN114330198A (zh) * 2021-12-28 2022-04-12 网络通信与安全紫金山实验室 基于相关度分析的快速调参方法、装置、设备和介质
CN114781086A (zh) * 2022-04-21 2022-07-22 西安热工研究院有限公司 一种基于贝叶斯优化XGBoost算法预警风电机组轴承故障的方法
WO2022211179A1 (fr) * 2021-03-30 2022-10-06 주식회사 솔리드웨어 Procédé de recherche de modèle optimal et dispositif associé
CN115796346A (zh) * 2022-11-22 2023-03-14 烟台国工智能科技有限公司 一种收率优化方法、系统及非暂态计算机可读存储介质
CN116401612A (zh) * 2023-03-21 2023-07-07 中国华能集团清洁能源技术研究院有限公司 基于改良dqn算法的光伏逆变器故障诊断方法、系统、装置及介质
CN118982294A (zh) * 2024-10-21 2024-11-19 苏交科集团股份有限公司 桥梁系统状态多层次指标的动态赋权方法、系统及存储介质
CN119067271A (zh) * 2024-11-07 2024-12-03 山东零公里润滑科技有限公司 用于车辆润滑油更换周期预测的数据处理方法及装置
CN120408762A (zh) * 2025-03-17 2025-08-01 广东粤东城际铁路有限公司 一种站台门优化方法、系统及存储介质

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368931B (zh) * 2020-03-09 2023-11-17 第四范式(北京)技术有限公司 确定图像分类模型的学习率的方法
US11823076B2 (en) 2020-07-27 2023-11-21 International Business Machines Corporation Tuning classification hyperparameters
CN112232508A (zh) * 2020-09-18 2021-01-15 苏州浪潮智能科技有限公司 一种模型的训练方法、系统、设备以及介质
CN112883331B (zh) * 2021-02-24 2024-03-01 东南大学 一种基于多输出高斯过程的目标跟踪方法
CN113312855B (zh) * 2021-07-28 2021-12-10 北京大学 基于搜索空间分解的机器学习优化方法、电子设备及介质
CN113806895A (zh) * 2021-08-18 2021-12-17 广西电网有限责任公司河池供电局 基于连续学习的输电线路销钉级缺陷识别模型调优方法
CN114298166B (zh) * 2021-12-10 2024-11-05 南京航空航天大学 一种基于无线通信网络的频谱可用性预测方法和系统
CN118603930B (zh) * 2024-05-31 2025-06-17 中国计量科学研究院 油品种类定性判别的近红外光谱方法、系统、介质及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016042322A (ja) * 2014-08-19 2016-03-31 日本電気株式会社 データ分析装置、分析方法とそのプログラム
CN107018184A (zh) * 2017-03-28 2017-08-04 华中科技大学 分布式深度神经网络集群分组同步优化方法及系统
CN108062587A (zh) * 2017-12-15 2018-05-22 清华大学 一种无监督机器学习的超参数自动优化方法及系统
CN108573281A (zh) * 2018-04-11 2018-09-25 中科弘云科技(北京)有限公司 一种基于贝叶斯优化的深度学习超参数的调优改进方法
WO2018189279A1 (fr) * 2017-04-12 2018-10-18 Deepmind Technologies Limited Optimisation de boîte noire à l'aide de réseaux neuronaux

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102638802B (zh) * 2012-03-26 2014-09-03 哈尔滨工业大学 一种分层协作联合频谱感知算法
US20140156231A1 (en) * 2012-11-30 2014-06-05 Xerox Corporation Probabilistic relational data analysis
CN108470210A (zh) * 2018-04-02 2018-08-31 中科弘云科技(北京)有限公司 一种深度学习中超参数的优化选取方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016042322A (ja) * 2014-08-19 2016-03-31 日本電気株式会社 データ分析装置、分析方法とそのプログラム
CN107018184A (zh) * 2017-03-28 2017-08-04 华中科技大学 分布式深度神经网络集群分组同步优化方法及系统
WO2018189279A1 (fr) * 2017-04-12 2018-10-18 Deepmind Technologies Limited Optimisation de boîte noire à l'aide de réseaux neuronaux
CN108062587A (zh) * 2017-12-15 2018-05-22 清华大学 一种无监督机器学习的超参数自动优化方法及系统
CN108573281A (zh) * 2018-04-11 2018-09-25 中科弘云科技(北京)有限公司 一种基于贝叶斯优化的深度学习超参数的调优改进方法

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112598133B (zh) * 2020-12-16 2023-07-28 联合汽车电子有限公司 车辆数据的处理方法、装置、设备和存储介质
CN112598133A (zh) * 2020-12-16 2021-04-02 联合汽车电子有限公司 车辆数据的处理方法、装置、设备和存储介质
CN112990480A (zh) * 2021-03-10 2021-06-18 北京嘀嘀无限科技发展有限公司 构建模型的方法、装置、电子设备和存储介质
WO2022211179A1 (fr) * 2021-03-30 2022-10-06 주식회사 솔리드웨어 Procédé de recherche de modèle optimal et dispositif associé
CN113052252A (zh) * 2021-03-31 2021-06-29 北京字节跳动网络技术有限公司 超参数确定方法、装置、深度强化学习框架、介质及设备
CN113052252B (zh) * 2021-03-31 2024-03-26 北京字节跳动网络技术有限公司 超参数确定方法、装置、深度强化学习框架、介质及设备
CN114330198A (zh) * 2021-12-28 2022-04-12 网络通信与安全紫金山实验室 基于相关度分析的快速调参方法、装置、设备和介质
CN114781086A (zh) * 2022-04-21 2022-07-22 西安热工研究院有限公司 一种基于贝叶斯优化XGBoost算法预警风电机组轴承故障的方法
CN115796346B (zh) * 2022-11-22 2023-07-21 烟台国工智能科技有限公司 一种收率优化方法、系统及非暂态计算机可读存储介质
CN115796346A (zh) * 2022-11-22 2023-03-14 烟台国工智能科技有限公司 一种收率优化方法、系统及非暂态计算机可读存储介质
CN116401612A (zh) * 2023-03-21 2023-07-07 中国华能集团清洁能源技术研究院有限公司 基于改良dqn算法的光伏逆变器故障诊断方法、系统、装置及介质
CN118982294A (zh) * 2024-10-21 2024-11-19 苏交科集团股份有限公司 桥梁系统状态多层次指标的动态赋权方法、系统及存储介质
CN119067271A (zh) * 2024-11-07 2024-12-03 山东零公里润滑科技有限公司 用于车辆润滑油更换周期预测的数据处理方法及装置
CN120408762A (zh) * 2025-03-17 2025-08-01 广东粤东城际铁路有限公司 一种站台门优化方法、系统及存储介质

Also Published As

Publication number Publication date
CN110770764A (zh) 2020-02-07

Similar Documents

Publication Publication Date Title
WO2020087281A1 (fr) Procédé et appareil d'optimisation d'hyper-paramètres
US20250165792A1 (en) Adversarial training of machine learning models
CN110503192B (zh) 资源有效的神经架构
CN113692594A (zh) 通过强化学习的公平性改进
KR20210032521A (ko) 데이터 세트들에 대한 머신 학습 모델들의 적합성 결정
US20190095794A1 (en) Methods and apparatus for training a neural network
JP2022063250A (ja) SuperLoss:堅牢なカリキュラム学習のための一般的な損失
KR101828215B1 (ko) Long Short Term Memory 기반 순환형 상태 전이 모델의 학습 방법 및 장치
CN112183326B (zh) 人脸年龄识别模型训练方法及相关装置
WO2022056841A1 (fr) Recherche d'architecture neuronale par classement d'opérateurs basé sur une similarité
US20220121927A1 (en) Providing neural networks
US12175365B2 (en) Learning apparatus, method, and non-transitory computer readable medium
US20210090552A1 (en) Learning apparatus, speech recognition rank estimating apparatus, methods thereof, and program
CN117216232B (zh) 一种大语言模型超参数优化方法及系统
CA3160910A1 (fr) Systemes et methodes pour un apprentissage actif semi-supervise
CN113469204A (zh) 数据处理方法、装置、设备和计算机存储介质
TWI758223B (zh) 具有動態最小批次尺寸之運算方法,以及用於執行該方法之運算系統及電腦可讀儲存媒體
EP3742354A1 (fr) Appareil de traitement d'informations, procédé de traitement d'informations et programme
US20230186150A1 (en) Hyperparameter selection using budget-aware bayesian optimization
US20210397948A1 (en) Learning method and information processing apparatus
WO2021061798A1 (fr) Procédés et appareil d'entraînement de modèle d'apprentissage machine
CN116227556A (zh) 获取目标网络模型的方法、装置、计算机设备及存储介质
WO2025101527A1 (fr) Techniques d'apprentissage de co-engagement et de relations sémantiques à l'aide de réseaux neuronaux graphiques
US20250013866A1 (en) Efficient vision-language retrieval using structural pruning
US20230267349A1 (en) Smart training and smart deployment of machine learning models

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18938904

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18938904

Country of ref document: EP

Kind code of ref document: A1