CN118036672A

CN118036672A - A Neural Network Optimization Method Based on Taylor Expansion Momentum Correction

Info

Publication number: CN118036672A
Application number: CN202410249597.7A
Authority: CN
Inventors: 彭丽辉; 许艺怀; 肖赣; 刘庚泓
Original assignee: Zhejiang Gongshang University
Current assignee: Zhejiang Gongshang University
Priority date: 2024-03-05
Filing date: 2024-03-05
Publication date: 2024-05-14

Abstract

The invention discloses a neural network optimization method based on Taylor expansion momentum correction. Aiming at the defects that the Adam algorithm is high in convergence speed and is inferior to SGDM algorithm in generalization effect, the method combines the second-order Taylor expansion of the activation function with the parameters softplus with the second-order momentum, realizes gradual smooth transition from Adam to SGDM, and specifically comprises the following steps: s1, data preprocessing and batch division; s2, initializing a neural network model and super parameters; s3, calculating the gradient, the first-order momentum and the second-order momentum of the model parameters, correcting the second-order momentum by using a second-order Taylor expansion of the function activated by the band parameters softplus, and updating the model parameters; s4, accumulating the iteration round number until the specified maximum round number is reached. Compared with a plurality of currently popular neural network optimization algorithms, the method has the advantages of faster convergence speed and better generalization.

Description

A Neural Network Optimization Method Based on Taylor Expansion Momentum Correction

技术领域Technical Field

本发明属于深度神经网络技术领域，具体涉及一种基于泰勒展开动量修正的神经网络优化方法，能够成功实现从Adam到SGDM的逐渐平滑过渡。The present invention belongs to the technical field of deep neural networks, and in particular relates to a neural network optimization method based on Taylor expansion momentum correction, which can successfully achieve a gradual and smooth transition from Adam to SGDM.

背景技术Background technique

深度神经网络(DNN)是一类由多层神经元组成的计算模型，具有深层次的结构，通常包含输入层、多个隐藏层和输出层。通过多个隐藏层的堆叠，深度神经网络可以逐层地提取和表达数据的抽象特征。深度神经网络在机器学习和人工智能领域应用广泛，包括图像识别、语音识别、自然语言处理等任务。一些常见的深度神经网络结构包括卷积神经网络(CNN)、环神经网络(RNN)和长短时记忆网络(LSTM)。A deep neural network (DNN) is a type of computational model composed of multiple layers of neurons. It has a deep structure and usually includes an input layer, multiple hidden layers, and an output layer. By stacking multiple hidden layers, a deep neural network can extract and express the abstract features of data layer by layer. Deep neural networks are widely used in the fields of machine learning and artificial intelligence, including tasks such as image recognition, speech recognition, and natural language processing. Some common deep neural network structures include convolutional neural networks (CNN), recurrent neural networks (RNN), and long short-term memory networks (LSTM).

训练深度神经网络的过程通常涉及大量的数据和参数调整，优化算法在其中扮演着至关重要的角色。深度神经网络的优化是指通过调整网络参数和更新算法，使得网络能够更好地拟合训练数据并提高其在未见过的数据上的泛化能力。The process of training a deep neural network usually involves a large amount of data and parameter adjustments, in which optimization algorithms play a vital role. The optimization of a deep neural network refers to adjusting network parameters and updating algorithms so that the network can better fit the training data and improve its generalization ability on unseen data.

梯度下降(GD)是最常用的优化算法之一，但存在一些问题，如局部最小值和学习率的选择。为了解决这些问题，研究人员提出了一系列改进的优化算法，例如随机梯度下降(SGD)、动量法(SGDM)和自适应学习率方法(如Adam、Adagrad等)。其中，Adam算法因其快速、鲁棒和易于实现的特点，在图像处理和自然语言处理等领域取得了显著的成功，并成为深度学习领域中最受欢迎的优化算法之一。Gradient descent (GD) is one of the most commonly used optimization algorithms, but there are some problems, such as local minima and the choice of learning rate. In order to solve these problems, researchers have proposed a series of improved optimization algorithms, such as stochastic gradient descent (SGD), momentum method (SGDM) and adaptive learning rate methods (such as Adam, Adagrad, etc.). Among them, the Adam algorithm has achieved remarkable success in fields such as image processing and natural language processing due to its fast, robust and easy implementation, and has become one of the most popular optimization algorithms in the field of deep learning.

然而，在实际应用中，尽管这些算法在不同情境下都表现出了卓越的性能，但仍存在一些普遍的问题。具体来说，传统的自适应学习方法(如Adam)在优化深度神经网络时，尽管具有较快的收敛速度，但在泛化性能上可能不如带有动量的随机梯度下降法(SGDM)。因此，如何克服传统自适应方法在泛化性能上的劣势，同时保持较快的收敛速度，成为研究学者们亟待解决的问题。However, in practical applications, although these algorithms have shown excellent performance in different situations, there are still some common problems. Specifically, when optimizing deep neural networks, traditional adaptive learning methods (such as Adam) may not be as good as the stochastic gradient descent method with momentum (SGDM) in terms of generalization performance, despite their fast convergence speed. Therefore, how to overcome the disadvantages of traditional adaptive methods in generalization performance while maintaining a fast convergence speed has become a problem that researchers are eager to solve.

发明内容Summary of the invention

针对现有神经网络优化算法无法平衡泛化性和收敛速度的问题，本发明提出一种基于泰勒展开动量修正的神经网络优化方法，该优化方法将二阶动量项和带参softplus激活函数的二阶泰勒展开式相结合，使得算法既有较好的泛化性，也有较快的收敛速度，一定程度上解决了现有优化算法的缺陷。In view of the problem that existing neural network optimization algorithms cannot balance generalization and convergence speed, the present invention proposes a neural network optimization method based on Taylor expansion momentum correction. The optimization method combines the second-order momentum term and the second-order Taylor expansion with parameter softplus activation function, so that the algorithm has both good generalization and fast convergence speed, which solves the defects of existing optimization algorithms to a certain extent.

本发明一种基于泰勒展开动量修正的神经网络优化方法，包括以下步骤：The present invention provides a neural network optimization method based on Taylor expansion momentum correction, comprising the following steps:

S1：对数据集进行预处理后划分为训练集和测试集，确定优化时数据批次的大小；S1: After preprocessing the data set, divide it into training set and test set to determine the size of the data batch during optimization;

S2：对神经网络模型的参数进行随机初始化，对优化过程的超参数和流程控制参数赋值；S2: Randomly initialize the parameters of the neural network model and assign values to the hyperparameters and process control parameters of the optimization process;

S3：将训练集数据按照步骤S1中确定的批次大小逐批输入神经网络模型；每个批次数据输入时，计算神经网络模型中损失函数在当前批次训练数据下对神经网络模型参数的梯度、一阶动量和二阶动量，并对二阶动量进行修正得到修正二阶动量，再基于一阶动量和修正二阶动量对神经网络模型参数进行迭代更新；每更新一次神经网络模型参数，迭代次数累加1，直到当前数据批次为训练集所有数据中最后一批，则将迭代轮数累加1；S3: Input the training set data into the neural network model batch by batch according to the batch size determined in step S1; when each batch of data is input, calculate the gradient, first-order momentum and second-order momentum of the loss function in the neural network model under the current batch of training data, and correct the second-order momentum to obtain the corrected second-order momentum, and then iteratively update the neural network model parameters based on the first-order momentum and the corrected second-order momentum; each time the neural network model parameters are updated, the number of iterations is accumulated by 1 until the current data batch is the last batch of all data in the training set, then the number of iterations is accumulated by 1;

S4：若迭代轮数达到设定的最大迭代轮数epoch，输出最终的神经网络模型参数，否则返回步骤S3。S4: If the number of iterations reaches the set maximum number of iterations epoch, output the final neural network model parameters, otherwise return to step S3.

进一步地，所述步骤S1包括以下子步骤：Furthermore, the step S1 includes the following sub-steps:

S11：对原始数据集的各原始图像依次进行随机裁剪、水平翻转、标准化；S11: randomly crop, horizontally flip, and standardize each original image of the original data set in sequence;

S12：将步骤S11处理后的数据集按照比例划分为训练集和测试集，并将训练集和测试集均转换为数据张量；确定优化时数据批次的大小。S12: Divide the data set processed in step S11 into a training set and a test set in proportion, and convert both the training set and the test set into data tensors; determine the size of the data batch during optimization.

进一步地，所述步骤S2包括以下子步骤：Furthermore, the step S2 includes the following sub-steps:

S21：设置神经网络模型各个层及模块的输入维度、输出维度并进行连接，其中，初始卷积层包括一个卷积核和数量为输出维度的输出通道，且输出通道后加入批量归一化层；初始卷积层之后依次为残差块和全连接层；激活函数采用ReLU。S21: Set the input dimension and output dimension of each layer and module of the neural network model and connect them, where the initial convolution layer includes a convolution kernel and output channels with the number of output dimensions, and a batch normalization layer is added after the output channel; the initial convolution layer is followed by a residual block and a fully connected layer; the activation function uses ReLU.

S22：批量归一化层、残差块内卷积层和全连接层参数初始化使用均匀分布随机数，残差块内卷积层和全连接层参数初始化分布的范围由神经网络模型参数的维度确定。S22: The batch normalization layer, the convolution layer in the residual block, and the fully connected layer parameters are initialized using uniformly distributed random numbers. The range of the initialization distribution of the convolution layer and the fully connected layer parameters in the residual block is determined by the dimension of the neural network model parameters.

S23：对优化过程的超参数和流程控制参数赋值。S23: Assign values to the hyperparameters and process control parameters of the optimization process.

进一步地，所述一阶动量m_t＝β¹ _tm_t-1+(1-β¹ _t)g_t，二阶动量其中，/>g_t为神经网络模型参数的梯度，β¹ _t为一阶衰减系数，β² _t为二阶衰减系数，一阶动量m_t初始值为0，二阶动量v_t初始值为0。Furthermore, the first-order momentum m _t =β ¹ _t m _t-1 +(1-β ¹ _t )g _t , and the second-order momentum Among them,/> g _t is the gradient of the neural network model parameters, β ¹ _t is the first-order attenuation coefficient, β ² _t is the second-order attenuation coefficient, the initial value of the first-order momentum m _t is 0, and the initial value of the second-order momentum v _t is 0.

进一步地，所述对二阶动量进行修正，包括以下子步骤：Furthermore, the correction of the second-order momentum comprises the following sub-steps:

①：将平滑参数代入带参softplus激活函数；①: Substitute the smoothing parameter into the softplus activation function with parameters;

②：对步骤①得到的带参softplus激活函数在零处作二阶泰勒展开，并舍弃余项；②: Perform a second-order Taylor expansion at zero for the softplus activation function with parameters obtained in step ①, and discard the remainder;

③：将二阶动量代入步骤②的二阶泰勒展开式，得到修正二阶动量。③: Substitute the second-order momentum into the second-order Taylor expansion of step ② to obtain the corrected second-order momentum.

更进一步地，所述基于一阶动量和修正二阶动量对神经网络模型参数进行迭代更新，包括以下子步骤：Furthermore, the iterative updating of the neural network model parameters based on the first-order momentum and the modified second-order momentum includes the following sub-steps:

①：更新一阶衰减系数和二阶衰减系数，根据新的一阶衰减系数和二阶衰减系数计算学习率；①: Update the first-order attenuation coefficient and the second-order attenuation coefficient, and calculate the learning rate based on the new first-order attenuation coefficient and the second-order attenuation coefficient;

②：将所述学习率、一阶动量和修正二阶动量的算术平方根的倒数相乘得到神经网络模型参数的更新量；②: Multiply the learning rate, the first-order momentum, and the reciprocal of the arithmetic square root of the modified second-order momentum to obtain the update amount of the neural network model parameters;

③：将神经网络模型参数值与神经网络模型参数的更新量做差得到新的神经网络模型参数值。③: Subtract the neural network model parameter value from the update amount of the neural network model parameter to obtain the new neural network model parameter value.

进一步地，所述步骤S4中最大迭代轮数epoch为遍历训练集所有数据的次数。Furthermore, the maximum number of iterations epoch in step S4 is the number of times all the data in the training set are traversed.

与现有技术相比，本发明的有益效果如下：Compared with the prior art, the present invention has the following beneficial effects:

1.本发明属于实现了从Adam类优化算法和SGDM算法的一种改进，通过带参softplus函数的二阶泰勒展开对二阶动量进行平滑，实现了对自适应学习率的极端值进行约束，从而提高了自适应学习率的稳定性和鲁棒性，这有助于避免过大或过小的学习率对训练的不良影响。1. The present invention realizes an improvement from the Adam type optimization algorithm and the SGDM algorithm, smoothes the second-order momentum through the second-order Taylor expansion with a parameter softplus function, and constrains the extreme values of the adaptive learning rate, thereby improving the stability and robustness of the adaptive learning rate, which helps to avoid the adverse effects of excessive or too small learning rates on training.

2.本发明消除了原实现Adam算法中敏感参数∈的需求：带参softplus函数的二阶泰勒展开式在非负变量上使用时有非零数值下界，避免了对∈的依赖，并且展开式中的参数可以根据需要灵活设置，大大简化了优化过程。2. The present invention eliminates the need for sensitive parameters ∈ in the original implementation of the Adam algorithm: the second-order Taylor expansion with parameter softplus function has a non-zero numerical lower bound when used on non-negative variables, avoiding dependence on ∈, and the parameters in the expansion can be flexibly set as needed, greatly simplifying the optimization process.

3.本发明中二阶泰勒展开式的超参数是一个用于平滑学习率的各向异性的控制器，实现了从Adam到SGDM的逐渐过度，使本发明同时具备Adam算法收敛速度快和SGDM算法泛化性能强的优点：当超参数较小时，本发明的表现与SGDM相似，当超参数足够大时，本发明退化为原始的Adam优化方法。3. The hyperparameter of the second-order Taylor expansion in the present invention is a controller for smoothing the anisotropy of the learning rate, which realizes the gradual transition from Adam to SGDM, so that the present invention has the advantages of fast convergence speed of the Adam algorithm and strong generalization performance of the SGDM algorithm: when the hyperparameter is small, the performance of the present invention is similar to SGDM, and when the hyperparameter is large enough, the present invention degenerates into the original Adam optimization method.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明提供的一种基于泰勒展开动量修正的神经网络优化方法的流程图。FIG1 is a flow chart of a neural network optimization method based on Taylor expansion momentum correction provided by the present invention.

图2为不同优化方法在多层感知机(MLP)上训练MNIST数据集的表现对比图。Figure 2 is a performance comparison of different optimization methods for training the MNIST dataset on a multi-layer perceptron (MLP).

图3为不同优化方法在残差网络ResNet-18上训练MNIST数据集的表现对比图。Figure 3 is a performance comparison of different optimization methods for training the MNIST dataset on the residual network ResNet-18.

图4为不同优化方法在卷积网络LeNet-5上训练CIFAR-10数据集的表现对比图。Figure 4 is a performance comparison of different optimization methods for training the CIFAR-10 dataset on the convolutional network LeNet-5.

图5为不同优化方法在残差网络ResNet-20上训练CIFAR-10数据集的表现对比图。Figure 5 is a performance comparison of different optimization methods for training the CIFAR-10 dataset on the residual network ResNet-20.

具体实施方式Detailed ways

下面结合附图与实施例对本发明的实施方式作进一步详细描述。以下实施例的详细描述和附图用于示例性地说明本发明的原理，但不能用来限制本发明的范围，本领域技术人员无需创造性劳动即可将实施例中的技术特征组合形成新的实施例，且这些新的实施例同样涵盖在本发明的保护范围之内。The following is a further detailed description of the embodiments of the present invention in conjunction with the accompanying drawings and examples. The detailed description of the following embodiments and the accompanying drawings are used to exemplarily illustrate the principles of the present invention, but cannot be used to limit the scope of the present invention. Those skilled in the art can combine the technical features in the embodiments to form new embodiments without creative work, and these new embodiments are also covered within the scope of protection of the present invention.

下面以在ResNet-20上训练CIFAR-10数据集为例并结合附图，对本发明进行详细描述。The present invention is described in detail below by taking the training of CIFAR-10 dataset on ResNet-20 as an example and combining with the accompanying drawings.

本发明提供基于泰勒展开动量修正的神经网络优化方法，其基本思想是：将Adam和SGDM这两种各有优点的优化算法结合起来，将二阶动量项用softplus激活函数的二阶泰勒展开进行修正，实现算法由Adam到SGDM的平滑过渡，通过控制参数使得算法在训练初期表现出Adam的快速收敛特性，训练后期表现出SGDM的泛化性能强的特征，从而提升模型的精度和收敛速度。The present invention provides a neural network optimization method based on Taylor expansion momentum correction. The basic idea is to combine Adam and SGDM, two optimization algorithms with their own advantages, and correct the second-order momentum term with the second-order Taylor expansion of the softplus activation function to achieve a smooth transition from Adam to SGDM. By controlling parameters, the algorithm exhibits the fast convergence characteristics of Adam in the early stage of training and the strong generalization performance characteristics of SGDM in the later stage of training, thereby improving the accuracy and convergence speed of the model.

如图1所示，本发明提供的基于泰勒展开动量修正的神经网络优化方法，具体包含以下步骤：As shown in FIG1 , the neural network optimization method based on Taylor expansion momentum correction provided by the present invention specifically comprises the following steps:

步骤1、对原始数据集进行预处理(步骤1.1至步骤1.3)和分割，以所使用的CIFAR-10数据集为例。Step 1: Preprocess the original data set (step 1.1 to step 1.3) and segment it, taking the CIFAR-10 data set as an example.

步骤1.1、对原始数据集的各原始图像进行随机裁剪，裁剪出大小为32×32像素的区域。如果原始图像的尺寸小于32×32，那么在裁剪之前会在图像周围填充4像素的边界。这种操作有助于增加数据的多样性，模拟在实际应用中可能出现的图像尺寸变化。Step 1.1: Randomly crop each original image in the original dataset to a 32×32 pixel area. If the original image size is smaller than 32×32, a 4-pixel border will be padded around the image before cropping. This operation helps increase the diversity of the data and simulates the image size changes that may occur in actual applications.

步骤1.2、以50％的概率水平翻转裁剪后的图像。这种数据增强方法可以提高模型对左右对称性的鲁棒性。Step 1.2: Flip the cropped image horizontally with a probability of 50%. This data augmentation method can improve the model's robustness to left-right symmetry.

步骤1.3、对步骤1.2处理后的图像进行标准化。使用公式(1)使其像素值的范围缩放到[-1,1]。Step 1.3: Normalize the image processed in step 1.2. Use formula (1) to scale the pixel value range to [-1, 1].

oldValuei表示标准化前的图像中第i个通道的单个像素点的值，NewValuei表示该值标准化后的结果，meani为第i个通道的像素点均值，stdi为第i个通道的像素点标准差，RGB三通道分别记为第1个通道、第2个通道、第3个通道，各通道的meani记录在mean中，各通道的stdi记录在std中，mean＝[mean₁,mean₂,mean₃]和std＝[std₁,std₂,std₃]。需要特别说明的是，本实施例中mean＝[0.4914，0.4822，0.4465]和std＝[0.2023，0.1994，0.2010]。oldValuei represents the value of a single pixel of the i-th channel in the image before normalization, NewValuei represents the result of normalization, meani is the mean value of the pixels of the i-th channel, stdi is the standard deviation of the pixels of the i-th channel, the three RGB channels are respectively recorded as the first channel, the second channel, and the third channel, the meani of each channel is recorded in mean, the stdi of each channel is recorded in std, mean = [mean ₁ , mean ₂ , mean ₃ ] and std = [std ₁ , std ₂ , std ₃ ]. It should be noted that, in this embodiment, mean = [0.4914, 0.4822, 0.4465] and std = [0.2023, 0.1994, 0.2010].

步骤1.4、分割：划分数据集并将数据转换为可计算张量。训练集为50000张图像，随机打乱后拼接为32×32×3×50000的数据张量。测试集为10000张图像，不进行打乱，直接拼接为32×32×3×10000的数据张量。将数据批次大小batch_size设置为128，最后若不足一个完整批次的直接作为一个批次。Step 1.4, segmentation: divide the data set and convert the data into computable tensors. The training set consists of 50,000 images, which are randomly shuffled and spliced into a data tensor of 32×32×3×50,000. The test set consists of 10,000 images, which are not shuffled and directly spliced into a data tensor of 32×32×3×10,000. Set the data batch size batch_size to 128, and finally treat it as a batch if it is less than a complete batch.

步骤2、对ResNet-20模型的参数进行随机初始化，给定训练过程涉及的超参数和控制参数的具体值。Step 2: Randomly initialize the parameters of the ResNet-20 model, giving specific values of the hyperparameters and control parameters involved in the training process.

步骤2.1、将ResNet-20设置为3通道输入，初始卷积层包括一个3×3的卷积核和16个输出通道，且16个输出通道后加入批量归一化层；随后分为3个阶段，每个阶段包含3个残差块，每个残差块内有两个3×3的卷积层，使用全局平均池化层降低特征图的维度，最后的全连接层输出类别概率，输出类别数是10；激活函数采用ReLU。Step 2.1, set ResNet-20 to 3-channel input, the initial convolution layer includes a 3×3 convolution kernel and 16 output channels, and a batch normalization layer is added after the 16 output channels; then it is divided into 3 stages, each stage contains 3 residual blocks, each residual block has two 3×3 convolution layers, and the global average pooling layer is used to reduce the dimension of the feature map. The final fully connected layer outputs the category probability, and the number of output categories is 10; the activation function uses ReLU.

步骤2.2、批量归一化层的参数初始化采用[0,1]均匀分布随机数；残差块内卷积层和全连接层参数初始化使用均匀分布随机数进行初始化，分布的范围取为参数θ维度的算术平方根的倒数，即公式(2)；Step 2.2: The parameters of the batch normalization layer are initialized using uniformly distributed random numbers in the range [0, 1]. The parameters of the convolutional layer and the fully connected layer in the residual block are initialized using uniformly distributed random numbers, and the distribution range is taken as the inverse of the arithmetic square root of the dimension of the parameter θ, that is, formula (2).

其中θ表示模型所有参数，下标表示迭代更新的次数，θ₀即为第0次迭代更新，即初始化；size(θ)表示参数θ的维度，rand(·,·)表示取对应区间的均匀分布随机数。Where θ represents all parameters of the model, the subscript represents the number of iterative updates, θ ₀ is the 0th iterative update, i.e. initialization; size(θ) represents the dimension of parameter θ, and rand(·,·) represents taking a uniformly distributed random number in the corresponding interval.

步骤2.3、对优化过程的超参数和流程控制参数赋值，具体包括：最大迭代轮数epoch＝101，当前迭代轮数epoch_temp＝0，学习率α_t初始值为0.01，一阶衰减系数β¹ _t＝0.9，二阶衰减系数β² _t＝0.999，平滑参数β＝50；迭代更新次数t＝1，一阶动量m_t初始值为0，二阶动量v_t初始值为0。Step 2.3, assign values to the hyperparameters and process control parameters of the optimization process, including: maximum number of iterations epoch = 101, current iteration number epoch_temp = 0, initial value of learning rate α _t = 0.01, first-order attenuation coefficient β ¹ _t = 0.9, second-order attenuation coefficient β ² _t = 0.999, smoothing parameter β = 50; number of iteration updates t = 1, initial value of first-order momentum m _t = 0, initial value of second-order momentum v _t = 0.

步骤3、按照batch_size＝128，将训练集拼接的数据张量分批，并依次送入ResNet-20模型进行训练。具体地，将ResNet-20模型对当前数据批次的预测值和当前数据批次的真实值代入损失函数f(θ)，利用链式法则计算损失函数对ResNet-20模型参数的梯度f(θ_t)为第t次迭代更新的损失函数f(θ)值，/>为梯度算子。Step 3: According to batch_size=128, the concatenated data tensors of the training set are divided into batches and sent to the ResNet-20 model for training. Specifically, the predicted value of the ResNet-20 model for the current data batch and the true value of the current data batch are substituted into the loss function f(θ), and the gradient of the loss function to the ResNet-20 model parameters is calculated using the chain rule. f(θ _t ) is the updated loss function f(θ) value at the tth iteration,/> is the gradient operator.

计算一阶动量：Compute the first-order momentum:

m_t＝β¹ _tm_t-1+(1-β¹ _t)g_t(3)m _t = β ¹ _t m _t-1 + (1-β ¹ _t ) g _t (3)

二阶动量：Second-order momentum:

接着，对二阶动量进行修正，具体为：Next, the second-order momentum is corrected as follows:

将带参softplus激活函数，即公式(5)在x＝0处作二阶泰勒展开，并舍弃余项，得到公式(6)。The softplus activation function with parameters, that is, formula (5), is subjected to a second-order Taylor expansion at x=0, and the remainder is discarded to obtain formula (6).

将公式(6)中x用二阶动量v_t代入，得到修正二阶动量 Substituting the second-order momentum _vt into x in formula (6), we get the modified second-order momentum

最后，更新一阶衰减系数和二阶衰减系数，计算学习率，更新一次模型参数θ，具体为：Finally, update the first-order attenuation coefficient and the second-order attenuation coefficient, calculate the learning rate, and update the model parameter θ once, specifically:

根据公式(7)计算得到新的一阶衰减系数和二阶衰减系数According to formula (7), the new first-order attenuation coefficient and second-order attenuation coefficient are calculated:

根据公式(8)计算新的学习率Calculate the new learning rate according to formula (8)

根据公式(9)更新一次模型参数。Update the model parameters once according to formula (9).

迭代次数t累加1，直到当前数据批次为最后一批，则当前迭代轮数epoch_temp累加1。The number of iterations t is accumulated by 1 until the current data batch is the last batch, then the current iteration round number epoch_temp is accumulated by 1.

步骤4、若epoch_temp<epoch，则返回执行步骤3，否则终止训练并输出最终的模型参数θ。Step 4: If epoch_temp<epoch, return to step 3, otherwise terminate the training and output the final model parameters θ.

为了验证本发明提出的一种基于泰勒展开动量修正的神经网络优化方法(记为MyAdam)的有效性，本发明利用深度学习框架Pytorch在两个经典的图像分类数据集MNIST、CIFAR-10上进行了测试，并记录了训练过程的损失函数值(Loss)、测试集预测准确率(Accuracy)等指标，以用于与SGDM、Adam、Adabound、Sadam等几个经典且效果较好的优化算法作对比。In order to verify the effectiveness of the neural network optimization method based on Taylor expansion momentum correction (denoted as MyAdam) proposed in the present invention, the deep learning framework Pytorch was used to test it on two classic image classification datasets MNIST and CIFAR-10, and the loss function value (Loss) of the training process, the test set prediction accuracy (Accuracy) and other indicators were recorded for comparison with several classic and effective optimization algorithms such as SGDM, Adam, Adabound, and Sadam.

为了体现本发明算法的有效性和高泛化性，实验过程运用了多层神经网络MLP、残差网络ResNet-18、卷积网络LeNet-5、残差网络ResNet-20共4种不同的神经网络模型，训练轮数皆为101，实验结果分别如图2、图3、图4、图5所示。除了多层神经网络MLP外，其余3个网络的结构均是固定结构，下面是多层神经网络MLP的网络和参数设置：In order to demonstrate the effectiveness and high generalization of the algorithm of the present invention, the experiment used four different neural network models, namely, multi-layer neural network MLP, residual network ResNet-18, convolutional network LeNet-5, and residual network ResNet-20. The number of training rounds was 101, and the experimental results are shown in Figures 2, 3, 4, and 5. Except for the multi-layer neural network MLP, the structures of the other three networks are fixed structures. The following is the network and parameter settings of the multi-layer neural network MLP:

网络的输入层维度是28×28(784维)，第一个隐藏层有1000个神经元，激活函数使用ReLU，之后有百分之五十的Dropout(丢弃率为0.5)，第二个隐藏层同样有1000个神经元，激活函数也是ReLU，再次使用百分之五十的Dropout。最后的输出层有10个神经元，其中输出层应用softmax函数进行多类别分类，初始化采用Pytorch的默认初始化方法。The input layer of the network has a dimension of 28×28 (784 dimensions). The first hidden layer has 1000 neurons, and the activation function uses ReLU, followed by 50% Dropout (drop rate is 0.5). The second hidden layer also has 1000 neurons, and the activation function is also ReLU, and 50% Dropout is used again. The final output layer has 10 neurons, where the output layer uses the softmax function for multi-category classification, and the default initialization method of Pytorch is used for initialization.

由图2、图3、图4、图5可知，对于两个分类难度不同的数据集和不同的模型结构，本发明的一种基于泰勒展开动量修正的神经网络优化方法相比于SGDM、Adam、Adabound、Sadam等方法在收敛速度和泛化精度上的表现要更好。还可以看到，本发明所述方法弥补了Adam方法收敛速度虽然比SGDM快，但是泛化性能方面要弱于SGDM的不足。As shown in Figures 2, 3, 4 and 5, for two data sets with different classification difficulties and different model structures, the neural network optimization method based on Taylor expansion momentum correction of the present invention performs better in convergence speed and generalization accuracy than SGDM, Adam, Adabound, Sadam and other methods. It can also be seen that the method of the present invention makes up for the deficiency that although the convergence speed of the Adam method is faster than SGDM, the generalization performance is weaker than SGDM.

综上，以上仅为本发明的较佳实施例，并非用于限定本发明的保护范围。凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。In summary, the above are only preferred embodiments of the present invention and are not intended to limit the protection scope of the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present invention should be included in the protection scope of the present invention.

Claims

1. A neural network optimization method based on Taylor expansion momentum correction is characterized by comprising the following steps of: the method comprises the following steps:

S1: preprocessing a data set, dividing the preprocessed data set into a training set and a testing set, and determining the size of a data batch in optimization;

S2: randomly initializing parameters of the neural network model, and assigning super parameters and flow control parameters of the optimization process;

S3: inputting the training set data into a neural network model batch by batch according to the batch size determined in the step S1; when each batch of data is input, calculating the gradient, the first-order momentum and the second-order momentum of the neural network model parameters by a loss function in the neural network model under the current batch of training data, correcting the second-order momentum to obtain corrected second-order momentum, and carrying out iterative update on the neural network model parameters based on the first-order momentum and the corrected second-order momentum; accumulating the iteration times by 1 when the neural network model parameters are updated once, and accumulating the iteration times by 1 when the current data batch is the last batch in all data of the training set;

S4: and outputting final neural network model parameters if the iteration round number reaches the set maximum iteration round number epoch, otherwise, returning to the step S3.

2. The neural network optimization method based on taylor expansion momentum correction according to claim 1, wherein the method comprises the following steps of: said step S1 comprises the sub-steps of:

S11: sequentially carrying out random cutting, horizontal overturning and standardization on each original image of an original data set;

S12: dividing the data set processed in the step S11 into a training set and a testing set according to a proportion, and converting the training set and the testing set into data tensors; the size of the data batch at the time of optimization is determined.

3. The neural network optimization method based on taylor expansion momentum correction according to claim 1, wherein the method comprises the following steps of: said step S2 comprises the sub-steps of:

S21: setting input dimensions and output dimensions of each layer and module of the neural network model and connecting the layers, wherein an initial convolution layer comprises a convolution kernel and output channels with the number of the output dimensions, and adding a batch normalization layer after the output channels; the initial convolution layer is followed by a residual block and a full connection layer in sequence; the activation function adopts a ReLU;

S22: uniformly distributed random numbers are used for initializing parameters of the batch normalization layer, the residual intra-block convolution layer and the full connection layer, and the range of the parameter initialization distribution of the residual intra-block convolution layer and the full connection layer is determined by the dimension of the neural network model parameters;

s23: and assigning values to the super-parameters and the flow control parameters of the optimization process.

4. The neural network optimization method based on taylor expansion momentum correction according to claim 1, wherein the method comprises the following steps of: the first order momentum mt=beta ¹tmt-₁+(1-β¹ t) gt, the second order momentumWherein,Gt is the gradient of the neural network model parameters, beta ¹ _t is a first-order attenuation coefficient, beta ² _t is a second-order attenuation coefficient, the initial value of the first-order momentum m _t is 0, and the initial value of the second-order momentum v _t is 0.

5. The neural network optimization method based on taylor expansion momentum correction according to claim 1, wherein the method comprises the following steps of: the second order momentum is corrected, which comprises the following substeps:

① : substituting the smoothing parameters into the band parameter softplus to activate the function;

② : performing second-order Taylor expansion on the activation function of the band parameter softplus obtained in the step ① at zero, and discarding the remainder;

③ : substituting the second order momentum into the second order taylor expansion of step ② results in a modified second order momentum.

6. The neural network optimization method based on taylor expansion momentum correction according to claim 4, wherein the method comprises the following steps: the iterative updating of the neural network model parameters based on the first-order momentum and the corrected second-order momentum comprises the following sub-steps:

① : updating the first-order attenuation coefficient and the second-order attenuation coefficient, and calculating the learning rate according to the new first-order attenuation coefficient and the new second-order attenuation coefficient;

② : multiplying the learning rate, the first-order momentum and the inverse of the arithmetic square root of the corrected second-order momentum to obtain the updating quantity of the neural network model parameters;

③ : and obtaining a new neural network model parameter value by making a difference between the neural network model parameter value and the update quantity of the neural network model parameter.

7. The neural network optimization method based on taylor expansion momentum correction according to claim 1, wherein the method comprises the following steps of: and in the step S4, the maximum iteration round number epoch is the number of times of traversing all data of the training set.