WO2019237357A1

WO2019237357A1 - Method and device for determining weight parameters of neural network model

Info

Publication number: WO2019237357A1
Application number: PCT/CN2018/091652
Authority: WO
Inventors: 杨帆; 钟刚
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2018-06-15
Filing date: 2018-06-15
Publication date: 2019-12-19
Anticipated expiration: 2020-12-15
Also published as: CN111937011B; CN111937011A

Abstract

A method for determining weight parameters of a neural network model, comprising: processing sample data on the basis of to-be-determined weight parameters of a neural network model, to obtain an output result; calculating an original error value between the output result and a preset expected result, the original error value being a numerical representation of the difference between the output result and the expected result; correcting the original error value on the basis of a correction value, to obtain a corrected error value; determining, on the basis of the corrected error value and the to-be-determined weight parameters, model weight parameters of the neural network model; wherein the correction value is obtained according to the following formula: R＝(w k-Q(w k))×Q(w k)，R representing the correction value, w k representing the k-th to-be-determined weight parameter of the neural network model,Q(w k) representing a quantization value of the k-th to-be-determined weight parameter, k being a non-negative integer.

Description

Method and equipment for determining weight parameters of neural network model

Technical field

本申请涉及数据处理技术领域，尤其涉及一种神经网络模型权重参数的确定方法及设备。The present application relates to the field of data processing technology, and in particular, to a method and device for determining weight parameters of a neural network model.

Background technique

近年来，神经网络模型在计算机视觉、语音处理等应用中表现出极其优越的性能，受到了人们的广泛关注。神经网络模型的成功伴随的代价是引入大量参数和计算，而神经网络模型相关模型参数的量化技术可以减少相关模型参数精度的冗余性，在降低对模型准确度的不利影响的前提下，实现模型压缩的目的。In recent years, neural network models have shown extremely superior performance in applications such as computer vision and speech processing, and have received widespread attention. The cost of the success of the neural network model is the introduction of a large number of parameters and calculations, and the quantization technology of the relevant model parameters of the neural network model can reduce the redundancy of the accuracy of the relevant model parameters, and achieve the premise of reducing the adverse impact on the accuracy of the model Purpose of model compression.

模型压缩不仅可以降低内存带宽的占用和数据存取的能耗，低精度的运算往往也带来更低的运算能耗。对于一些支持多种精度计算的计算单元，单位时间内完成低精度计算次数要高于可完成高精度计算的次数。Model compression can not only reduce the consumption of memory bandwidth and energy consumption for data access, but low-precision operations often also bring lower operation energy consumption. For some calculation units that support multiple precision calculations, the number of low-precision calculations per unit time is higher than the number of times that high-precision calculations can be completed.

发明内容Summary of the Invention

本申请实施例提供一种神经网络模型权重参数的确定方法及设备，在神经网络模型应用的各种数据处理场景中，比如图像识别、语音识别、图像超分辨率处理等，能够通过对模型训练的输出结果和期望结果间的误差引入适当的修正值的方式，减少量化误差并且避免部分数值较大的权值参数主导神经网络的推理结果而导致过拟合的问题。The embodiments of the present application provide a method and a device for determining a weight parameter of a neural network model. In various data processing scenarios applied to the neural network model, such as image recognition, speech recognition, image super-resolution processing, etc., the model can be trained by The error between the output result and the expected result introduces a proper correction value, which reduces the quantization error and avoids the problem of over-fitting caused by some weight parameters with larger numerical values leading the reasoning results of the neural network.

为达到上述目的，本申请实施例采用如下技术方案：To achieve the above purpose, the embodiments of the present application adopt the following technical solutions:

第一方面，本申请实施例提供一种神经网络模型权重参数的确定方法，包括：基于神经网络模型的待定权重参数，对样本数据进行处理，以获得输出结果；计算所述输出结果和预设的期望结果的原始误差值，所述原始误差值为所述输出结果和所述期望结果的差异的数值化表示；基于修正值，对所述原始误差值进行修正，以获得修正误差值；基于所述修正误差值和所述待定权重参数，确定所述神经网络模型的模型权重参数；其中，所述修正值根据如下公式获得：In a first aspect, an embodiment of the present application provides a method for determining a weight parameter of a neural network model, including: processing sample data based on a neural network model's pending weight parameters to obtain an output result; calculating the output result and a preset The original error value of the expected result, the original error value being a numerical representation of the difference between the output result and the expected result; based on the correction value, correcting the original error value to obtain a corrected error value; based on The correction error value and the pending weight parameter determine a model weight parameter of the neural network model; wherein the correction value is obtained according to the following formula:

R＝(w _k-Q(w _k))×Q(w _k) R = (w _k -Q (w _k )) × Q (w _k )

R表示所述修正值，w _k表示所述神经网络模型的第k个待定权重参数，Q(w _k)表示所述第k个待定权重参数的量化值，k为非负整数。 R represents the correction value, w _k represents the k-th pending weight parameter of the neural network model, Q (w _k ) represents a quantized value of the k-th pending weight parameter, and k is a non-negative integer.

本申请实施例通过对模型训练的输出结果和期望结果间的误差引入适当的修正值的方式，减少量化误差并且避免部分数值较大的权值参数主导神经网络的推理结果而导致过拟合的问题。The embodiment of the present application introduces a proper correction value to the error between the output result of the model training and the expected result, thereby reducing the quantization error and avoiding the overfitting caused by some of the larger weight parameter leading the reasoning results of the neural network. problem.

在第一方面的第一种可行的实施方式中，所述修正误差值根据如下公式获得：In a first feasible implementation manner of the first aspect, the correction error value is obtained according to the following formula:

其中，E1表示所述修正误差值，E0表示所述原始误差值，α为常数，m为用于处理所述样本数据的待定权重参数的总数量，F((w _k-Q(w _k))×Q(w _k))表示以所述修正值为自变量的函数，m为正整数。 Among them, E1 represents the correction error value, E0 represents the original error value, α is a constant, m is the total number of pending weight parameters used to process the sample data, and F ((w _k -Q (w _k ) ) × Q (w _k )) represents a function in which the correction value is an independent variable, and m is a positive integer.

在第一方面的第二种可行的实施方式中，所述以所述修正值为自变量的函数为计算所述修正值的绝对值；对应的，所述修正误差值根据如下公式获得：In a second feasible implementation manner of the first aspect, the absolute value of the correction value is calculated by using the function of the correction value as an independent variable; correspondingly, the correction error value is obtained according to the following formula:

其中，|(w _k-Q(w _k))×Q(w _k)|表示计算(w _k-Q(w _k))×Q(w _k)的绝对值。 Among them, | (w _k -Q (w _k )) × Q (w _k ) | means calculating the absolute value of (w _k -Q (w _k )) × Q (w _k ).

在第一方面的第三种可行的实施方式中，所述神经网络模型包括p个网络层，每个所述网络层包括q个所述待定权重参数，所述第k个待定权重参数为所述神经网络模型中第i个网络层的第j个待定权重参数；对应的，所述修正误差值根据如下公式获得：In a third feasible implementation manner of the first aspect, the neural network model includes p network layers, each of the network layers includes q the pending weight parameters, and the k-th pending weight parameter is The j-th pending weight parameter of the i-th network layer in the neural network model is described; correspondingly, the correction error value is obtained according to the following formula:

其中，p和q为正整数，i和j为非负整数。Among them, p and q are positive integers, and i and j are non-negative integers.

本申请实施例的上述可行的实施方式，示例性的给出了修正值的不同计算方法和形式，来对训练将结果的误差值进行修正，达到了减少量化误差并且避免部分数值较大的权值参数主导神经网络的推理结果而导致过拟合的问题的效果。The foregoing feasible implementation manners of the embodiments of the present application exemplarily give different calculation methods and forms of correction values to correct the error values of the training results, thereby achieving the right of reducing quantization errors and avoiding large values. The value parameter dominates the reasoning results of the neural network and leads to the effect of overfitting problems.

应理解，在一些神经网络模型中，有些网络层中不具有待定权重参数，显然，不具有待定权重参数的网络层不能参与到上述公式中修正误差值的计算中。It should be understood that in some neural network models, some network layers do not have undetermined weight parameters. Obviously, network layers without undetermined weight parameters cannot participate in the calculation of the correction error value in the above formula.

在第一方面的第四种可行的实施方式中，所述基于神经网络模型的待定权重参数，对样本数据进行处理，包括：获得所述待定权重参数；量化所述获得的待定权重参数，以得到量化权重参数，所述量化权重参数为所述待定权重参数的量化值；将所述量化权重参数作为所述神经网络模型的模型权重参数，采用前向传播算法，对所述样本数据进行处理；从所述神经网络模型的输出层获得所述输出结果。In a fourth feasible implementation manner of the first aspect, processing the sample data based on the neural network model's pending weight parameters includes: obtaining the pending weight parameters; quantizing the obtained pending weight parameters, and A quantized weight parameter is obtained, where the quantized weight parameter is a quantized value of the pending weight parameter; the quantized weight parameter is used as a model weight parameter of the neural network model, and a forward propagation algorithm is used to process the sample data Obtaining the output result from an output layer of the neural network model.

应理解，本申请实施例中提到的前向传播算法可能有多种，本发明实施例不做限定。通过前向传播算法，对输入的样本数据，基于神经网络模型进行推演，获得输出结果。It should be understood that there may be multiple forward propagation algorithms mentioned in the embodiments of the present application, which are not limited in the embodiments of the present invention. Through the forward propagation algorithm, the input sample data is deduced based on the neural network model to obtain the output result.

在第一方面的第五种可行的实施方式中，所述神经网络模型的模型权重参数采用迭代训练的方式获得，当所述迭代训练满足结束条件时，所述基于所述修正误差值和所述待定权重参数，确定所述神经网络模型的模型权重参数，包括：将所述量化权重参数作为所述神经网络模型的模型权重参数。In a fifth feasible implementation manner of the first aspect, the model weight parameters of the neural network model are obtained in an iterative training manner, and when the iterative training meets an end condition, the based on the modified error value and the Said to-be-determined weight parameters and determining model weight parameters of the neural network model includes: using the quantized weight parameters as model weight parameters of the neural network model.

本申请实施例通过对权重参数进行量化，实现了对神经网络模型的压缩，降低了内存带宽的占用和能耗，也提高了处理器的运算效率。The embodiment of the present application realizes compression of the neural network model by quantizing the weight parameters, reduces the occupation of memory bandwidth and energy consumption, and improves the computing efficiency of the processor.

在第一方面的第六种可行的实施方式中，当所述迭代训练不满足所述结束条件时，所述基于所述修正误差值和所述待定权重参数，确定所述神经网络模型的模型权重参数，包括：根据所述修正误差值，采用反向传播算法，对所述神经网络模型的网络层，逐层调整所述待定权重参数，直到所述神经网络模型的输入层，以获得所述神经网络模型的调整后权重参数。In a sixth feasible implementation manner of the first aspect, when the iterative training does not satisfy the end condition, the model of the neural network model is determined based on the modified error value and the pending weight parameter. The weight parameter includes: adjusting the pending weight parameter layer by layer to the network layer of the neural network model by using a back-propagation algorithm according to the modified error value until the input layer of the neural network model to obtain the The adjusted weight parameters of the neural network model are described.

在第一方面的第七种可行的实施方式中，所述神经网络模型的待定权重参数根据如下公式调整：In a seventh feasible implementation manner of the first aspect, the pending weight parameters of the neural network model are adjusted according to the following formula:

其中，w0 _k表示第k个所述待定权重参数，w1 _k表示所述第k个调整后权重参数，β为正常数。 Wherein, w0 _k represents the k-th pending weight parameter, w1 _k represents the k-th adjusted weight parameter, and β is a normal number.

应理解，本申请实施例中提到的反向传播算法，也称为后向传播算法可能有多种，本发明实施例不做限定。通过后向传播算法，对权重参数进行训练，更新后的权重参数使神经网络模型得到进一步优化。It should be understood that there may be multiple types of back propagation algorithms, also referred to as back propagation algorithms, mentioned in the embodiments of the present application, which are not limited in the embodiments of the present invention. Through the back propagation algorithm, the weight parameters are trained, and the updated weight parameters make the neural network model further optimized.

在第一方面的第八种可行的实施方式中，对于所述迭代训练中的第N个训练周期，N为大于1的整数，M为小于N的正整数，所述结束条件包括以下条件中的一种或多种的组合：所述第N个训练周期中的原始误差值小于预设的第一阈值；所述第N个训练周期中的修正误差值小于预设的第二阈值；所述第N个训练周期中的原始误差值和所述第N-M个训练周期中的原始误差值的差异小于预设的第三阈值；所述第N个训练周期中的修正误差值和所述第N-M个训练周期中的修正误差值的差异小于预设的第四阈值；所述第N个训练周期中的待定权重参数和所述第N-M个训练周期中的待定权重参数的差异小于预设的第五阈值；和N大于预设的第六阈值。In an eighth feasible implementation manner of the first aspect, for the Nth training cycle in the iterative training, N is an integer greater than 1, M is a positive integer less than N, and the end condition includes the following conditions: A combination of one or more of: the original error value in the Nth training cycle is less than a preset first threshold; the modified error value in the Nth training cycle is less than a preset second threshold; The difference between the original error value in the Nth training cycle and the original error value in the NMth training cycle is less than a preset third threshold; the modified error value in the Nth training cycle and the first The difference between the correction error values in the NM training cycles is less than a preset fourth threshold; the difference between the pending weight parameters in the Nth training cycle and the pending weight parameters in the NM training cycle is less than a preset A fifth threshold; and N is greater than a preset sixth threshold.

在第一方面的第九种可行的实施方式中，当所述第N个训练周期不满足所述结束条件时，存储以下物理量中的一种或多种的组合：所述第N个训练周期中的原始误差值；所述第N个训练周期中的修正误差值；所述第N个训练周期中的待定权重参数；和所述第N个训练周期的周期数N。In a ninth feasible implementation manner of the first aspect, when the Nth training cycle does not satisfy the end condition, one or more combinations of the following physical quantities are stored: the Nth training cycle The original error value in; the modified error value in the N-th training cycle; the pending weight parameter in the N-th training cycle; and the number of cycles N of the N-th training cycle.

在本申请实施例中，通过合理的设置训练的结束条件，提高了训练的效率，达到训练效果和训练所消耗资源的平衡。In the embodiment of the present application, by properly setting the training end conditions, the training efficiency is improved, and the training effect and the balance of the resources consumed by training are achieved.

在第一方面的第十种可行的实施方式中，所述获得所述待定权重参数，包括：在所述迭代训练的第一个训练周期时，将预设的初始权重参数作为所述待定权重参数；在所述迭代训练的非第一个训练周期时，将所述神经网络模型的调整后权重参数作为所述待定权重参数。In a tenth feasible implementation manner of the first aspect, the obtaining the pending weight parameter includes: using a preset initial weight parameter as the pending weight during a first training cycle of the iterative training. Parameters; in a non-first training cycle of the iterative training, using the adjusted weight parameters of the neural network model as the pending weight parameters.

在第一方面的第十一种可行的实施方式中，所述神经网络模型用于图像识别；对应的，所述样本数据包括图像样本；对应的，所述输出结果包括表征为概率形式的所述图像识别的识别结果。In an eleventh feasible implementation manner of the first aspect, the neural network model is used for image recognition; correspondingly, the sample data includes image samples; and correspondingly, the output result includes all characteristics characterized as a probability form. The recognition result of image recognition is described.

在第一方面的第十二种可行的实施方式中，所述神经网络模型用于声音识别；对应的，所述样本数据包括声音样本；对应的，所述输出结果包括表征为概率形式的所述声音识别的识别结果。In a twelfth feasible implementation manner of the first aspect, the neural network model is used for voice recognition; correspondingly, the sample data includes sound samples; and correspondingly, the output result includes all features characterized as a probability form. The recognition result of voice recognition will be described.

在第一方面的第十三种可行的实施方式中，所述神经网络模型用于超分辨率图像的获取；对应的，所述样本数据包括图像样本；对应的，所述输出结果包括超分辨率处理后的图像的像素值。In a thirteenth feasible implementation manner of the first aspect, the neural network model is used for obtaining a super-resolution image; correspondingly, the sample data includes image samples; correspondingly, the output result includes super-resolution The pixel value of the processed image.

本申请实施例的上述可行的实施方式示例性的给出了本申请实施例中的神经网络模型的具体应用场景，通过神经网络模型的应用，能够提高图像识别、声音识别的识别率，提高图像超分辨率处理的图像质量，同时在其他应用领域也能取得显著的有益效果。The above feasible implementation manners of the examples of the present application exemplarily provide specific application scenarios of the neural network model in the examples of the present application. Through the application of the neural network model, the recognition rate of image recognition and sound recognition can be improved, and the image can be improved Super-resolution processed image quality can also achieve significant beneficial effects in other application areas.

第二方面，本申请实施例提供一种神经网络模型权重参数的确定设备，其特征在于，包括：前向传播模块,用于基于神经网络模型的待定权重参数，对样本数据进行处理，以获得输出结果；比较模块,用于计算所述输出结果和预设的期望结果的原始误差值，所述原始误差值为所述输出结果和所述期望结果的差异的数值化表示；修正模块,用于基于修正值，对所述原始误差值进行修正，以获得修正误差值；确定模块,用于基于所述修正误差值和所述待定权重参数，确定所述神经网络模型的模型权重参数；其中，所述修正值根据如下公式获得：In a second aspect, an embodiment of the present application provides a device for determining a weight parameter of a neural network model, which is characterized by including: a forward propagation module for processing sample data based on a pending weight parameter of the neural network model to obtain Output result; a comparison module for calculating an original error value of the output result and a preset expected result, the original error value being a numerical representation of a difference between the output result and the expected result; a correction module, which uses Based on the correction value, the original error value is modified to obtain a correction error value; a determination module is configured to determine a model weight parameter of the neural network model based on the correction error value and the pending weight parameter; , The correction value is obtained according to the following formula:

R＝(w _k-Q(w _k))×Q(w _k) R = (w _k -Q (w _k )) × Q (w _k )

在第二方面的第一种可行的实施方式中，所述修正误差值根据如下公式获得：In a first feasible implementation manner of the second aspect, the correction error value is obtained according to the following formula:

在第二方面的第二种可行的实施方式中，所述以所述修正值为自变量的函数为计算所述修正值的绝对值；对应的，所述修正误差值根据如下公式获得：In a second feasible implementation manner of the second aspect, the absolute value of the correction value is calculated by using the function of the correction value as an independent variable; correspondingly, the correction error value is obtained according to the following formula:

在第二方面的第三种可行的实施方式中，所述神经网络模型包括p个网络层，每个所述网络层包括q个所述待定权重参数，所述第k个待定权重参数为所述神经网络模型中第i个网络层的第j个待定权重参数；对应的，所述修正误差值根据如下公式获得：In a third feasible implementation manner of the second aspect, the neural network model includes p network layers, each of the network layers includes q the pending weight parameters, and the k-th pending weight parameter is The j-th pending weight parameter of the i-th network layer in the neural network model is described; correspondingly, the correction error value is obtained according to the following formula:

在第二方面的第四种可行的实施方式中，所述前向传播模块具体用于：获得所述待定权重参数；量化所述获得的待定权重参数，以得到量化权重参数，所述量化权重参数为所述待定权重参数的量化值；将所述量化权重参数作为所述神经网络模型的模型权重参数，采用前向传播算法，对所述样本数据进行处理；从所述神经网络模型的输出层获得所述输出结果。In a fourth feasible implementation manner of the second aspect, the forward propagation module is specifically configured to: obtain the pending weight parameter; quantize the obtained pending weight parameter to obtain a quantized weight parameter, and the quantized weight The parameter is a quantized value of the pending weight parameter; the quantized weight parameter is used as a model weight parameter of the neural network model, and the sample data is processed using a forward propagation algorithm; the output from the neural network model The layer obtains the output result.

在第二方面的第五种可行的实施方式中，所述神经网络模型的模型权重参数采用迭代训练的方式获得，当所述迭代训练满足结束条件时，所述确定模块具体用于：将所述量化权重参数作为所述神经网络模型的模型权重参数。In a fifth feasible implementation manner of the second aspect, the model weight parameters of the neural network model are obtained by an iterative training method, and when the iterative training meets an end condition, the determining module is specifically configured to: The quantized weight parameter is used as a model weight parameter of the neural network model.

在第二方面的第六种可行的实施方式中，还包括反向传播模块,当所述迭代训练不满足所述结束条件时，所述反向传播模块具体用于：根据所述修正误差值，采用反向传播算法，对所述神经网络模型的网络层，逐层调整所述待定权重参数，直到所述神经网络模型的输入层，以获得所述神经网络模型的调整后权重参数。In a sixth feasible implementation manner of the second aspect, further including a back propagation module, when the iterative training does not satisfy the end condition, the back propagation module is specifically configured to: according to the correction error value Using a back-propagation algorithm to adjust the pending weight parameters layer by layer for the network layer of the neural network model until the input layer of the neural network model to obtain the adjusted weight parameters of the neural network model.

在第二方面的第七种可行的实施方式中，所述神经网络模型的待定权重参数根据如下公式调整：In a seventh feasible implementation manner of the second aspect, the pending weight parameters of the neural network model are adjusted according to the following formula:

在第二方面的第八种可行的实施方式中，对于所述迭代训练中的第N个训练周期，N为大于1的整数，M为小于N的正整数，所述结束条件包括以下条件中的一种或多种的组合：所述第N个训练周期中的原始误差值小于预设的第一阈值；所述第N个训练周期中的修正误差值小于预设的第二阈值；所述第N个训练周期中的原始误差值和所述第N-M个训练周期中的原始误差值的差异小于预设的第三阈值；所述第N个训练周期中的修正误差值和所述第N-M个训练周期中的修正误差值的差异小于预设的第四阈值；所述第N个训练周期中的待定权重参数和所述第N-M个训练周期中的待定权重参数的差异小于预设的第五阈值；和N大于预设的第六阈值。In an eighth feasible implementation manner of the second aspect, for the Nth training cycle in the iterative training, N is an integer greater than 1, M is a positive integer less than N, and the end condition includes the following conditions: A combination of one or more of: the original error value in the Nth training cycle is less than a preset first threshold; the modified error value in the Nth training cycle is less than a preset second threshold; The difference between the original error value in the Nth training cycle and the original error value in the NMth training cycle is less than a preset third threshold; the modified error value in the Nth training cycle and the first The difference between the correction error values in the NM training cycles is less than a preset fourth threshold; the difference between the pending weight parameters in the Nth training cycle and the pending weight parameters in the NM training cycle is less than a preset A fifth threshold; and N is greater than a preset sixth threshold.

在第二方面的第九种可行的实施方式中，当所述第N个训练周期不满足所述结束条件时，存储以下物理量中的一种或多种的组合：所述第N个训练周期中的原始误差值；所述第N个训练周期中的修正误差值；所述第N个训练周期中的待定权重参数；和所述第N个训练周期的周期数N。In a ninth feasible implementation manner of the second aspect, when the Nth training cycle does not satisfy the end condition, one or more combinations of the following physical quantities are stored: the Nth training cycle The original error value in; the modified error value in the N-th training cycle; the pending weight parameter in the N-th training cycle; and the number of cycles N of the N-th training cycle.

在第二方面的第十种可行的实施方式中，所述前向传播模块具体用于：在所述迭代训练的第一个训练周期时，将预设的初始权重参数作为所述待定权重参数；在所述迭代训练的非第一个训练周期时，将所述神经网络模型的调整后权重参数作为所述待定权重参数。In a tenth feasible implementation manner of the second aspect, the forward propagation module is specifically configured to: during a first training cycle of the iterative training, use a preset initial weight parameter as the pending weight parameter In a non-first training cycle of the iterative training, using the adjusted weight parameter of the neural network model as the pending weight parameter.

在第二方面的第十一种可行的实施方式中，所述神经网络模型用于图像识别；对应的，所述样本数据包括图像样本；对应的，所述输出结果包括表征为概率形式的所述图像识别的识别结果。In an eleventh feasible implementation manner of the second aspect, the neural network model is used for image recognition; correspondingly, the sample data includes image samples; and correspondingly, the output result includes all characteristics characterized in a probabilistic form. The recognition result of image recognition is described.

在第二方面的第十二种可行的实施方式中，所述神经网络模型用于声音识别；对应的，所述样本数据包括声音样本；对应的，所述输出结果包括表征为概率形式的所述声音识别的识别结果。In a twelfth feasible implementation manner of the second aspect, the neural network model is used for voice recognition; correspondingly, the sample data includes sound samples; and correspondingly, the output result includes all features characterized as a probability form. The recognition result of voice recognition will be described.

在第二方面的第十三种可行的实施方式中，所述神经网络模型用于超分辨率图像的获取；对应的，所述样本数据包括图像样本；对应的，所述输出结果包括超分辨率处理后的图像的像素值。In a thirteenth feasible implementation manner of the second aspect, the neural network model is used for obtaining a super-resolution image; correspondingly, the sample data includes image samples; correspondingly, the output result includes super-resolution The pixel value of the processed image.

第三方面，本申请实施例提供一种电子设备，包括：一个或多个处理器和一个或多个存储器。一个或多个存储器与一个或多个处理器耦合，一个或多个存储器用于存储计算机程序代码，计算机程序代码包括计算机指令，当一个或多个处理器执行计算机指令时，电子设备执行如第一方面任一项的神经网络模型权重参数的确定方法。In a third aspect, an embodiment of the present application provides an electronic device including: one or more processors and one or more memories. The one or more memories are coupled with one or more processors, and the one or more memories are used to store computer program code. The computer program code includes computer instructions. When the one or more processors execute the computer instructions, the electronic device performs On the one hand, a method for determining weight parameters of a neural network model.

第四方面，本申请实施例提供一种计算机存储介质，包括计算机指令，当计算机指令在电子设备上运行时，使得电子设备执行如第一方面任一项的数据处理方法。In a fourth aspect, an embodiment of the present application provides a computer storage medium including computer instructions, and when the computer instructions are run on an electronic device, the electronic device is caused to execute the data processing method according to any one of the first aspects.

第五方面，本申请实施例提供一种计算机程序产品，当计算机程序产品在计算机上运行时，使得计算机执行如第一方面任一项的神经网络模型权重参数的确定方法。In a fifth aspect, an embodiment of the present application provides a computer program product that, when the computer program product runs on a computer, causes the computer to execute the method for determining a weight parameter of a neural network model according to any one of the first aspect.

第六方面，本申请实施例提供一种芯片，包括处理器和存储器，存储器用于存储计算机程序代码，计算机程序代码包括计算机指令，当处理器执行计算机指令时，电子设备执行如第一方面任一项的神经网络模型权重参数的确定方法。According to a sixth aspect, an embodiment of the present application provides a chip including a processor and a memory. The memory is used to store computer program code. The computer program code includes computer instructions. When the processor executes the computer instructions, the electronic device executes any of the first A method for determining weight parameters of a neural network model.

其中，关于上述第二方面至第六方面的有益效果可以参见上述第一方面中的描述，这里不再赘述。For the beneficial effects of the second aspect to the sixth aspect, reference may be made to the description in the first aspect, and details are not described herein again.

BRIEF DESCRIPTION OF THE DRAWINGS

图1为示例性的一种神经网络结构示意图；FIG. 1 is an exemplary schematic diagram of a neural network structure;

图2为示例性的一种神经元的结构示意图；2 is a schematic structural diagram of an exemplary neuron;

图3为示例性的另一种神经网络结构示意图；3 is an exemplary schematic diagram of another neural network structure;

图4为本申请实施例提供的一种电子设备的结构示意图；4 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

图5为本申请实施例提供的一种神经网络模型权重参数的确定方法的示例性流程图；5 is an exemplary flowchart of a method for determining weight parameters of a neural network model according to an embodiment of the present application;

图6为本申请实施例提供的一种神经网络模型权重参数的确定设备的示例性结构框图；6 is an exemplary structural block diagram of a device for determining a weight parameter of a neural network model according to an embodiment of the present application;

图7为本申请实施例提供的另一种电子设备的结构示意图。FIG. 7 is a schematic structural diagram of another electronic device according to an embodiment of the present application.

detailed description

神经网络(也称深度神经网络)可以用于处理各种数据，例如图像数据、音频数据等。神经网络可以包括一个或多个网络层(也称神经网络层)，网络层可以为卷积层、全连接层、反卷积层或者循环层等。一个典型的神经网络模型如图1所示。Neural networks (also known as deep neural networks) can be used to process various data, such as image data, audio data, and so on. A neural network may include one or more network layers (also called neural network layers). The network layer may be a convolutional layer, a fully connected layer, a deconvolutional layer, or a recurrent layer. A typical neural network model is shown in Figure 1.

为了便于理解本发明实施例，示例性的，给出部分与本申请实施例相关的概念，以供参考。In order to facilitate understanding of the embodiments of the present invention, some concepts related to the embodiments of the present application are given for reference.

(1)神经网络模型和前向传播(Forward propagation)：对于训练样本集(x( ⁱ)，y( ⁱ))，神经网络算法能够提供一种复杂且非线性的假设模型h _W，b(x)，它具有参数W，b，可以以此参数来拟合数据。为了描述神经网络，先从最简单的神经网络模型讲起，神经网络模型在本文中也简称为神经网络，这个神经网络仅由一个“神经元”构成，图2即为这个“神经元”的图示。 (1) Neural network model and forward propagation: For the training sample set (x ( ⁱ ), y ( ⁱ )), the neural network algorithm can provide a complex and nonlinear hypothetical model h _{W, b} ( x), which has parameters W, b, which can be used to fit the data. In order to describe the neural network, let's start with the simplest neural network model. The neural network model is also referred to as the neural network in this article. This neural network consists of only one "neuron". Figure 2 is the "neuron" Icon.

这个“神经元”是一个以x ₁，x ₂，x ₃及截距+1为输入值的运算单元，其输出为

其中函数

被称为“激活函数”。 This "neuron" is an arithmetic unit that takes x ₁ , x ₂ , x ₃ and intercept +1 as input values, and its output is

Where function

Called the "activation function".

所谓神经网络就是将许多个单一“神经元”联结在一起，这样，一个“神经元”的输出就可以是另一个“神经元”的输入。图3就是一个简单的神经网络。The so-called neural network is to connect many single "neurons" together, so that the output of one "neuron" can be the input of another "neuron". Figure 3 is a simple neural network.

使用圆圈来表示神经网络的输入，标上“+1”的圆圈被称为偏置节点，也就是截距项。神经网络最左边的一层叫做输入层，最右的一层叫做输出层(本例中，输出层只有一个节点)。中间所有节点组成的一层叫做隐藏层(在另外的实施例中，隐藏层可以不存在，或者存在多层)。同时可以看到，以上神经网络的例子中有3个输入单元(偏置单元不计在内)，3个隐藏单元及一个输出单元。The circle is used to represent the input of the neural network. The circle marked "+1" is called the bias node, which is the intercept term. The leftmost layer of the neural network is called the input layer, and the rightmost layer is called the output layer (in this example, the output layer has only one node). A layer composed of all nodes in the middle is called a hidden layer (in other embodiments, the hidden layer may not exist, or there may be multiple layers). At the same time, it can be seen that in the above neural network example, there are 3 input units (excluding the bias unit), 3 hidden units and an output unit.

不妨用n _l来表示网络的层数，本例中n _l＝3，我们将第l层记为L _l，于是L ₁是输入层，输出层是

本例神经网络有参数(W，b)＝(W ⁽¹⁾，b ⁽¹⁾，W ⁽²⁾，b ⁽²⁾)，其中

是第l层第j单元与第l+1层第i单元之间的联接参数(其实就是连接线上的权重)，

是第l+1层第i单元的偏置项。没有其他单元连向偏置单元(即偏置单元没有输入)，因为它们总是输出+1。同时，用s _l表示第l层的节点数(偏置单元不计在内)。用

表示第l层第i单元的激活值(输出值)。当l＝1时，

也就是第i个输入值(输入值的第i个特征)。对于给定参数集合W，b，神经网络就可以按照函数h _W，b(x)来计算输出结果。本例神经网络的计算步骤如下： It may be useful to use n _l to represent the number of layers in the network. In this example, n _l = 3, we label the first layer as L _l , so L ₁ is the input layer and the output layer is

The neural network in this example has parameters (W, b) = (W ⁽¹⁾ , b ⁽¹⁾ , W ⁽²⁾ , b ⁽²⁾ ), where

Is the connection parameter (in fact, the weight on the connection line) between the jth unit in the lth layer and the ith unit in the l + 1th layer,

Is the bias term for the i + 1th cell in the l + 1th layer. No other unit is connected to the bias unit (ie the bias unit has no input) because they always output +1. At the same time, by S _l l represents the number of nodes of layer (not counting the bias unit). use

Represents the activation value (output value) of the i-th unit of the l-th layer. When l = 1,

That is, the i-th input value (i-th feature of the input value). For a given parameter set W, b, the neural network can calculate the output according to the function h _{W, b} (x). The calculation steps of the neural network in this example are as follows:

不妨用

表示第l层第i单元输入加权和(包括偏置单元)，比如，

则

May wish to use

Represents the weighted sum (including the bias unit) of the l-th and i-th units

then

上面的计算步骤叫作前向传播。The above calculation step is called forward propagation.

(2)反向传播算法(也称为后向传播算法，Back propagation)和误差反向传播算法：反向传播算法主要由两个环节(激励传播、权重更新)反复循环迭代，直到网络的对输入的响应达到预定的目标范围为止。(2) Backpropagation algorithm (also known as Backpropagation) and error backpropagation algorithm: Backpropagation algorithm is mainly repeated and iterated by two links (incentive propagation and weight update) until the network The input response reaches the predetermined target range.

误差反向传播算法的学习过程由正向传播过程和反向传播过程组成。在正向传播过程中，输入信息通过输入层经隐含层，逐层处理并传向输出层。如果在输出层得不到期望的输出值，则取输出与期望的误差的各种表征形式(比如，平方和)作为目标函数，转入反向传播，逐层求出目标函数对各神经元权值的偏导数，构成目标函数对权值向量的梯量，作为修改权值的依据，网络的学习在权值修改过程中完成。误差达到所期望值时，网络学习结束。The learning process of the error back-propagation algorithm consists of a forward-propagation process and a back-propagation process. In the forward propagation process, input information passes through the input layer through the hidden layer, is processed layer by layer, and is transmitted to the output layer. If the desired output value cannot be obtained at the output layer, take various representation forms (such as the sum of squares) of the output and the expected error as the objective function, transfer to back propagation, and find the objective function for each neuron layer by layer The partial derivatives of the weights constitute the ladder of the objective function to the weight vector. As a basis for modifying the weights, the learning of the network is completed in the process of the weights modification. When the error reaches the expected value, the network learning ends.

在激励传播步骤中，每次迭代中的传播环节包含两步：将训练输入送入网络以获得激励响应(前向传播阶段)；将激励响应同训练输入对应的目标输出求差，从而获得隐藏层和输出层的响应误差(反向传播阶段)。In the incentive propagation step, the propagation link in each iteration includes two steps: sending the training input to the network to obtain the incentive response (forward propagation phase); and differentiating the incentive response with the target output corresponding to the training input to obtain the hidden Layer and output layer response error (back propagation phase).

在权重更新步骤中，对于每个神经元上的权重，按照以下步骤进行更新：将输入激励和响应误差相乘，从而获得权重的梯度；将这个梯度乘上一个比例并取反后加到权重上。这个比例将会影响到训练过程的速度和效果，因此称为“训练因子”。梯度的方向指明了误差扩大的方向，因此在更新权重的时候需要对其取反，从而减小权重引起的误差。In the weight update step, the weight on each neuron is updated according to the following steps: multiply the input excitation and response errors to obtain the gradient of the weights; multiply this gradient by a ratio and invert and add to the weights on. This ratio will affect the speed and effectiveness of the training process, so it is called "training factor". The direction of the gradient indicates the direction of the error expansion, so when updating the weights, it needs to be inverted to reduce the errors caused by the weights.

Matt Mazur在“A Step by Step Backpropagation Example”(“一个后向传播示例的步骤分解”，可从https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation- Example/中下载，全文引用于本文中)中示例性的介绍了一种误差反向传播算法的实施方式，可以应用于本申请实施例中，不做赘述。Matt Mazur's "Step by Step Back Propagation Example" ("Step Breakdown of a Backward Propagation Example" is available from https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation- Example / Download, the full text cited in this article) exemplarily introduces an implementation of an error back propagation algorithm, which can be applied to the embodiments of this application, and will not be described in detail.

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行描述。其中，在本申请实施例的描述中，除非另有说明，“/”表示或的意思，例如，A/B可以表示A或B；本文中的“和/或”仅仅是一种描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。另外，在本申请实施例的描述中，“多个”是指两个或多于两个。The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. Wherein, in the description of the embodiments of the present application, unless otherwise stated, "/" represents or means, for example, A / B may represent A or B; "and / or" herein is only a description of an associated object The association relationship indicates that there can be three kinds of relationships, for example, A and / or B can be expressed as: there are three cases where A exists alone, A and B exist simultaneously, and B exists alone. In addition, in the description of the embodiments of the present application, “multiple” means two or more than two.

本申请实施例涉及的数据处理设备为利用卷积神经网络对图像、语音等数据进行处理的电子设备，例如可以是服务器或终端。示例性的，当电子设备为终端时，电子设备具体可以是台式机、便携式电脑、掌上电脑(personal digital assistant，PDA)、平板电脑、嵌入式设备、手机、智能外设(例如智能手表、手环、眼镜等)、电视机机顶盒、监视摄像头等。本申请实施例不限定电子设备的具体类型。The data processing device involved in this embodiment of the present application is an electronic device that processes data such as images and voice by using a convolutional neural network, and may be, for example, a server or a terminal. Exemplarily, when the electronic device is a terminal, the electronic device may specifically be a desktop computer, a portable computer, a personal digital assistant (PDA), a tablet computer, an embedded device, a mobile phone, or a smart peripheral (such as a smart watch, a handheld Ring, glasses, etc.), TV set-top boxes, surveillance cameras, etc. The embodiment of the present application does not limit the specific type of the electronic device.

示例性的，图4示出了本申请实施例涉及的电子设备400的硬件结构示意图。电子设备400可以包括至少一个处理器401，通信总线402和存储器403。电子设备400还可以包括至少一个通信接口404。For example, FIG. 4 shows a schematic diagram of a hardware structure of an electronic device 400 according to an embodiment of the present application. The electronic device 400 may include at least one processor 401, a communication bus 402, and a memory 403. The electronic device 400 may further include at least one communication interface 404.

处理器401可以是一个通用中央处理器(central processing unit，CPU)，微处理器，特定应用集成电路(application-specific integrated circuit，ASIC)，图形处理器The processor 401 may be a general-purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), and a graphics processor.

(graphics processing unit，GPU)，现场可编程门阵列(field programmable gate array，FPGA)，或一个或多个用于控制本申请方案程序执行的集成电路。(graphics processing unit, GPU), field programmable gate array (FPGA), or one or more integrated circuits for controlling the execution of the program procedures of the present application.

通信总线402可包括一通路，在上述组件之间传送信息。The communication bus 402 may include a path for transmitting information between the aforementioned components.

通信接口404，使用任何收发器一类的装置，用于与其他设备或通信网络通信，如以太网，无线接入网(radio access network，RAN)，无线局域网(wireless local area networks，WLAN)等。The communication interface 404 uses any device such as a transceiver to communicate with other devices or communication networks, such as Ethernet, radio access network (RAN), wireless local area networks (WLAN), etc. .

存储器403可以是只读存储器(read-only memory，ROM)或可存储静态信息和指令的其他类型的静态存储设备，随机存取存储器(random access memory，RAM)或者可存储信息和指令的其他类型的动态存储设备，也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory，EEPROM)、只读光盘(compact disc read-only memory，CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质，但不限于此。存储器可以是独立存在，通过总线与处理器相连接。存储器也可以和处理器集成在一起。The memory 403 may be a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (RAM) or other type that can store information and instructions Dynamic storage device, can also be electrically erasable programmable read-only memory (electrically erasable programmable read-only memory (EEPROM)), read-only compact disc (compact disc-read-only memory (CD-ROM) or other optical disc storage, optical disc storage (Including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), disk storage media or other magnetic storage devices, or can be used to carry or store desired program code in the form of instructions or data structures and can be used by a computer Any other media accessed, but not limited to this. The memory may exist independently and be connected to the processor through a bus. The memory can also be integrated with the processor.

其中，存储器403用于存储执行本申请实施例提供的方案的应用程序代码，以及采用本申请实施例提供的方案的神经网络模型结构、权重、处理器401运算时的中间结果，并由处理器401来控制执行。处理器401用于执行存储器203中存储的应用程序代码，从而实现本申请下述实施例提供的数据处理的方法。The memory 403 is configured to store application program code that executes the solution provided by the embodiment of the present application, and a neural network model structure, weights, and intermediate results of the processor 401 operation using the solution provided by the embodiment of the present application. 401 to control execution. The processor 401 is configured to execute application program code stored in the memory 203, so as to implement a data processing method provided in the following embodiments of the present application.

在具体实现中，作为一种实施例，处理器401可以包括一个或多个CPU，例如图4中的CPU0和CPU1。In a specific implementation, as an embodiment, the processor 401 may include one or more CPUs, such as CPU0 and CPU1 in FIG. 4.

在具体实现中，作为一种实施例，电子设备400可以包括多个处理器，例如图4 中的处理器401和处理器407。这些处理器中的每一个可以是一个单核(single-CPU)处理器，也可以是一个多核(multi-CPU)处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。In a specific implementation, as an embodiment, the electronic device 400 may include multiple processors, such as the processor 401 and the processor 407 in FIG. 4. Each of these processors can be a single-CPU processor or a multi-CPU processor. A processor herein may refer to one or more devices, circuits, and / or processing cores for processing data (such as computer program instructions).

在具体实现中，作为一种实施例，电子设备400还可以包括输出设备405和输入设备406。输出设备405和处理器401通信，可以以多种方式来显示信息。例如，输出设备405可以是液晶显示器(liquid crystal display，LCD)，发光二级管(light emitting diode，LED)显示设备，阴极射线管(cathode ray tube，CRT)显示设备，或投影仪(projector)等。输入设备406和处理器401通信，可以以多种方式接受用户的输入。例如，输入设备406可以是鼠标、键盘、摄像头、麦克风、触摸屏设备或传感设备等。In specific implementation, as an embodiment, the electronic device 400 may further include an output device 405 and an input device 406. The output device 405 communicates with the processor 401 and can display information in a variety of ways. For example, the output device 405 may be a liquid crystal display (LCD), a light emitting diode (LED) display device, a cathode ray tube (CRT) display device, or a projector. Wait. The input device 406 is in communication with the processor 401 and can accept user input in a variety of ways. For example, the input device 406 may be a mouse, a keyboard, a camera, a microphone, a touch screen device, or a sensing device.

如前文所述，神经网络由网络层构成的，每个网络层对其输入数据进行处理再传递至下一网络层。在网络层中，根据网络层的不同属性(如卷积层、全连接层，等)，使用不同的权重对输入数据进行卷积、乘加等操作。这些操作的方式方法是由网络层的属性决定的，但不同操作使用的权重的数值是训练获得的。通过调整权重值即可获得不同的数据处理结果。As mentioned above, neural networks are composed of network layers, and each network layer processes its input data and passes it to the next network layer. In the network layer, according to different attributes of the network layer (such as a convolution layer, a fully connected layer, etc.), different input weights are used to perform convolution, multiplication and addition operations on the input data. The methods and methods of these operations are determined by the properties of the network layer, but the values of the weights used by different operations are obtained through training. Different data processing results can be obtained by adjusting the weight values.

在模型权重参数的训练过程中，过分的注重于训练数据集的数据拟合，可能导致泛化性不好，其表现为：模型在训练数据集上拟合效果很好，准确度很高，但在训练数据集外的数据集上却不能很好地拟合数据，效果不好，准确度严重降低。In the training process of model weight parameters, too much focus on the data fitting of the training data set may lead to poor generalization. Its performance is: the model fits the training data set very well, and the accuracy is very high. But on the data set outside the training data set, it can not fit the data well, the effect is not good, and the accuracy is severely reduced.

神经网络权重参数精度冗余，采用低精度的数据格式(如INT8、binary等)来记录权重参数，取代高精度的数据格式(如FP32、FP64)可以实现权重参数的压缩。The neural network weight parameter precision is redundant. Low-precision data formats (such as INT8, binary, etc.) are used to record the weight parameters. Instead of high-precision data formats (such as FP32, FP64), weight parameter compression can be achieved.

对权重参数进行量化是实现权重参数压缩的一种可行的实施方式，压缩神经网络参数精度，可以获得存储、能耗、准确度间的平衡，可进行量化的数据包括：权值(weights)、特征张量(activations)、梯度(gradients)以及神经网络模型的其它参数，不做限定。Quantifying weight parameters is a feasible implementation method for weight parameter compression. Compressing the accuracy of neural network parameters can obtain a balance between storage, energy consumption, and accuracy. Data that can be quantified include: weights, Feature tensors (activations), gradients and other parameters of the neural network model are not limited.

在一种可行的实施方式中，首先通过训练获得高精度(无量化的，一般为FP32或FP64精度的)的神经网络的权重系数，之后，将上述采用高精度数据格式进行表达的权重系数采用低精度的数据格式进行表达。这时，由于低精度的数据格式无法表达高精度的数据的细节，会导致数值的差异，该差异在运算中被多次累计，最终导致量化后的模型最终计算结果的准确度相比量化前的原始模型有所下降。为了解决上述问题，一般将量化后的模型参数在保证其低精度的数据格式的前提下再次进行训练，通过再训练调整权重取值，最终实现低精度下达到与高精度相近的模型准确度。In a feasible implementation manner, the weight coefficients of the high-precision (non-quantized, generally FP32 or FP64-accurate) neural network are first obtained through training, and then the weight coefficients expressed in the high-precision data format are used. Low-precision data format for expression. At this time, because the low-precision data format cannot express the details of high-precision data, it will cause a difference in the value. This difference is accumulated multiple times in the operation, and eventually the accuracy of the final calculation result of the quantized model is compared with that of the quantized model The original model has declined. In order to solve the above problems, the quantized model parameters are generally trained again under the premise of ensuring a low-precision data format, and the weight values are adjusted through retraining to finally achieve a model accuracy close to high accuracy at low accuracy.

具体的，在训练过程中将样本数据输入至神经网络，计算出与期望数据的差异，再使用这个差异计算出神经网络中所有权重待调整的梯度(即权重应被调整的趋向)，并调整神经网络中的权重，达到缩小差异，即达到更高准确度的目的。但是，在针对量化的神经网络，由于训练过程中调整权重的梯度值极小，很大几率小于权重精度下可表达的区间最小值，导致梯度无法实际调整权重的取值。例如，在INT4这一数据格式下，将取值范围定义为0至1，可表达的区间最小值为2 ^-4，然而梯度很可能远小于该值，假设为2 ^-6，此时，在INT4下的任意数值A，在与梯度求和后，结果应为A+2 ^-6，但在INT4的数据格式下，由于其最小区间无法表达2 ^-6，实际结果仍为A。因此，不能直接使用量化权重进行训练。 Specifically, the sample data is input to the neural network during the training process, and the difference between the expected data and the expected data is calculated, and then the difference is used to calculate the gradient of the weight of the neural network to be adjusted (that is, the trend of the weights to be adjusted), and adjust The weight in the neural network achieves the purpose of reducing the difference, that is, achieving higher accuracy. However, in the quantized neural network, because the gradient value of the weight adjustment during training is extremely small, the probability is smaller than the minimum interval value that can be expressed under the weight accuracy, which makes the gradient unable to actually adjust the weight value. For example, in the data format of INT4, the value range is defined as 0 to 1, and the minimum value of the expressible interval is 2 ^-4 . However, the gradient is likely to be much smaller than this value, assuming ^2-6 . Any value A under INT4, after summing with the gradient, the result should be A + 2 ^-6 , but in the data format of INT4, because its minimum interval cannot express 2 ^-6 , the actual result is still A. Therefore, you cannot directly use quantized weights for training.

为了解决上述问题，一种常见的训练方法是使用一个高精度的参考权重携带未量化权重信息，量化权重由该参考权重获得，误差计算由量化权重参与进行，误差获得的梯度作用于未量化的权重(也称参考权重)。参考权重累计被调整超过一定量后，即可在量化权重上表现出来。(比如，量化后权重只能表达0-255的整数，参考权重初始是1，在被调整过程中数值一直轻微浮动，直到参考权重大于1.5时，量化权重才变为2，直到参考权重小于0.5时，量化权重才变为0，否则量化权重为1)。In order to solve the above problem, a common training method is to use a high-precision reference weight to carry unquantized weight information. The quantized weight is obtained from the reference weight. The error calculation is performed by the quantized weight. The gradient of the error obtained is applied to the unquantized Weight (also called reference weight). After the reference weights are adjusted beyond a certain amount, they can be expressed in quantized weights. (For example, the weight after quantization can only express an integer from 0 to 255. The reference weight is initially 1. The value has been slightly floating during the adjustment process. Until the reference weight is greater than 1.5, the quantization weight becomes 2 until the reference weight is less than 0.5 Only when the quantization weight becomes 0, otherwise the quantization weight is 1).

上述方法带来的目前技术问题包括：与无量化的神经网络不同，量化的神经网络计算过程中使用量化权重而非参考权重，参考权重与量化权重值的差别在神经网络计算中积累，最终影响计算结果，产生量化误差。用来调整参考权重的梯度是由上述量化权重来计算推导结果并获得的推理误差推导得出的，不够准确，存在梯度误差。以上问题会造成模型训练难达到全局最优解。The current technical problems brought by the above methods include: Unlike neural networks without quantization, quantized neural network calculation uses quantized weights instead of reference weights. Differences between reference weights and quantized weight values accumulate in neural network calculations, and ultimately affect The result of the calculation is a quantization error. The gradient used to adjust the reference weight is derived from the above-mentioned quantized weight to calculate the derivation result and the inference error obtained, which is not accurate enough and there is a gradient error. The above problems will make it difficult for the model training to reach the global optimal solution.

本申请实施例提供了一种神经网络模型权重参数的确定方法，如图5所示，具体包括：An embodiment of the present application provides a method for determining a weight parameter of a neural network model, as shown in FIG. 5, which specifically includes:

S501、基于神经网络模型的待定权重参数，对样本数据进行处理，以获得输出结果。S501. Process the sample data based on the pending weight parameters of the neural network model to obtain an output result.

具体的，在一种可行的实施方式中，本步骤包括：Specifically, in a feasible implementation manner, this step includes:

S5011、获得所述待定权重参数。S5011. Obtain the pending weight parameters.

不妨设，在本申请实施例中神经网络模型的模型权重参数采用迭代训练的方式获得。It may be assumed that, in the embodiment of the present application, the model weight parameters of the neural network model are obtained in an iterative training manner.

当首次进入训练中时，即在所述迭代训练的第一个训练周期时，步骤S5011的执行包括获取默认的初始值，比如0,1等常数赋予待定权重参数，也可以根据经验值获取预先确定的数值赋予待定权重参数，比如存储下来的预先训练过的神经网络模型权重参数。When entering training for the first time, that is, during the first training cycle of the iterative training, the execution of step S5011 includes obtaining a default initial value, such as assigning constant parameters such as 0, 1 to a pending weight parameter. The determined value is given to the weight parameters to be determined, such as the stored weight parameters of the pre-trained neural network model.

当处于迭代训练过程中，即在所述迭代训练的非第一个训练周期时，步骤S5011的执行包括获取上一训练周期中通过反向传播算法更新后的待定权重参数(即调整后权重参数)作为本步骤获取的待定权重参数。具体实现方式将于后续步骤中进行详述，本步骤不再赘述。When in the iterative training process, that is, during the non-first training cycle of the iterative training, the execution of step S5011 includes obtaining the pending weight parameters (that is, the adjusted weight parameters) updated by the back propagation algorithm in the previous training cycle. ) As the pending weight parameter obtained in this step. The specific implementation manner will be detailed in the subsequent steps, and this step will not be repeated here.

S5012、量化所述获得的待定权重参数，以得到量化权重参数，所述量化权重参数为所述待定权重参数的量化值。S5012: Quantify the obtained pending weight parameters to obtain a quantized weight parameter, where the quantized weight parameter is a quantized value of the pending weight parameter.

对待定权重参数进行量化，可以采用不同的预设的量化方案，例如前文中介绍的将权重参数的高精度表达方式转化为低精度表达方式，本步骤中不做限定。To quantify the weight parameter to be determined, different preset quantization schemes can be adopted, for example, the high-precision expression mode of the weight parameter is transformed into a low-precision expression mode introduced in the foregoing, which is not limited in this step.

S5013、将所述量化权重参数作为所述神经网络模型的模型权重参数，采用前向传播算法，对所述样本数据进行处理。S5013. Use the quantized weight parameter as a model weight parameter of the neural network model, and use a forward propagation algorithm to process the sample data.

前文中介绍了前向传播算法，在本步骤中，量化后的待定权重参数被作为神经网络模型的模型参数，以样本数据为输入，以前向传播算法为计算准则，进行计算。应理解，本步骤对具体的前向传播算法不做限定。The forward propagation algorithm was introduced in the foregoing. In this step, the quantified pending weight parameters are used as the model parameters of the neural network model, taking the sample data as input, and the forward propagation algorithm as the calculation criterion for calculation. It should be understood that this step does not limit the specific forward propagation algorithm.

S5014、从所述神经网络模型的输出层获得所述输出结果。S5014. Obtain the output result from an output layer of the neural network model.

从所述神经网络模型的输出层输出针对样本数据的计算结果。示例性的，如果神经网络模型用于图像识别，则所述样本数据包括图像样本，所述输出结果包括表征为概率形式的所述图像识别的识别结果，具体的，可以为判断样本图像为目标图像的概率为90％。如果所述神经网络模型用于声音识别，则所述样本数据包括声音样本，所述输出结果包括表征为概率形式的所述声音识别的识别结果，具体的，可以为判断样本声音为目标声音的概率为20％。。如果神经网络模型用于超分辨率图像的获取，则所述样本数据包括图像样本，所述输出结果包括超分辨率处理后的图像的像素值。A calculation result for the sample data is output from an output layer of the neural network model. Exemplarily, if the neural network model is used for image recognition, the sample data includes image samples, and the output result includes the recognition result of the image recognition, which is characterized in a probabilistic form. Specifically, the sample image may be determined as a target. The probability of the image is 90%. If the neural network model is used for voice recognition, the sample data includes voice samples, and the output result includes the recognition result of the voice recognition characterized as a probabilistic form. Specifically, it may be determined that the sample voice is the target voice. The probability is 20%. . If the neural network model is used to obtain a super-resolution image, the sample data includes image samples, and the output result includes pixel values of the image after super-resolution processing.

应理解，神经网络模型还可以用于涉及人工智能领域的其他应用中，对应的，作为输入数据的样本数据和作为输出数据的输出结果也可以是其他类型的物理量，不做限定。It should be understood that the neural network model can also be used in other applications related to the field of artificial intelligence. Correspondingly, the sample data as input data and the output result as output data can also be other types of physical quantities, without limitation.

S502、计算所述输出结果和预设的期望结果的原始误差值，所述原始误差值为所述输出结果和所述期望结果的差异的数值化表示。S502. Calculate an original error value of the output result and a preset expected result, where the original error value is a numerical representation of a difference between the output result and the expected result.

和步骤S5014对应的，本步骤中计算期望中的输出结果，即期望结果，和实际输出结果的差异，并且该差异以数值化的形式进行表征。示例性的，该差异可以是识别结果的差值，比如识别结果为90％，期望结果为100％，则原始误差值为10％，也可以是样本图像对应的未下采样前的原始图像和样本图像经过超分辨率处理后的图像之间的像素差异，比如可以用二者间的信号峰值信噪比(PSNR)来表示，比如-0.2分贝(dB)，或者二者图像像素间的方差值等，根据神经网络模型的具体应用而定，不做限定。Corresponding to step S5014, the difference between the expected output result, that is, the expected result, and the actual output result is calculated in this step, and the difference is characterized in a numerical form. Exemplarily, the difference may be the difference between the recognition results, for example, the recognition result is 90%, the expected result is 100%, the original error value is 10%, or the original image corresponding to the sample image before the downsampling and the The pixel differences between the sample images after super-resolution processing can be expressed, for example, by the signal peak signal-to-noise ratio (PSNR) between the two, such as -0.2 decibels (dB), or the square between the two image pixels. Differences and the like are determined according to the specific application of the neural network model and are not limited.

S503、基于修正值，对所述原始误差值进行修正，以获得修正误差值。S503. Correct the original error value based on the correction value to obtain a correction error value.

在步骤中，首先要获得修正值，在一种可行的实施方式中，所述修正值根据如下公式获得：In the step, a correction value is first obtained. In a feasible implementation manner, the correction value is obtained according to the following formula:

R＝(w _k-Q(w _k))×Q(w _k) R = (w _k -Q (w _k )) × Q (w _k )

对应的，所述修正误差值根据如下公式获得：Correspondingly, the correction error value is obtained according to the following formula:

在一种可行的实施方式中，所述以所述修正值为自变量的函数为计算所述修正值的绝对值；In a feasible implementation manner, the function of using the correction value as an independent variable is to calculate an absolute value of the correction value;

在另一种可行的实施方式中，所述神经网络模型包括p个网络层，每个所述网络层包括q个所述待定权重参数，所述第k个待定权重参数为所述神经网络模型中第i个网络层的第j个待定权重参数；In another feasible implementation manner, the neural network model includes p network layers, and each of the network layers includes q of the pending weight parameters, and the k-th pending weight parameter is the neural network model. The j-th pending weight parameter of the i-th network layer in the network;

应理解，在一些可行的实施方式中，神经网络模型的某个网络层可能不包含待定权重参数，即此时该网络层对应的q为0，显然该网络层不会用于修正误差值的计算。It should be understood that in some feasible implementations, a certain network layer of the neural network model may not contain the undetermined weight parameters, that is, the corresponding q of the network layer is 0, and obviously the network layer will not be used to correct the error value. Calculation.

在本申请实施例中，通过使用修正值(正则化函数)将权重参数与量化后的权重参数的差值作为惩罚项，在训练过程中引导无量化的权重参数靠近其量化后权值，减少量化误差。同时，将上述作为惩罚项的差值与量化后的权重参数相乘，避免部分数值较大的权值主导神经网络的推理结果导致的过拟合问题。In the embodiment of the present application, the difference between the weight parameter and the quantized weight parameter is used as a penalty term by using the correction value (regularization function) to guide the non-quantized weight parameter close to its quantized weight value during the training process to reduce Quantization error. At the same time, the above-mentioned difference value as a penalty term is multiplied with the quantized weight parameter to avoid overfitting problems caused by the weighted values of some large values leading the reasoning results of the neural network.

接下来，基于所述修正误差值和所述待定权重参数，确定所述神经网络模型的模型权重参数。Next, a model weight parameter of the neural network model is determined based on the modified error value and the pending weight parameter.

S504、判断迭代训练是否满足结束条件。S504. Determine whether the iterative training satisfies an end condition.

不妨设在所述迭代训练中的第N个训练周期中执行步骤S504，N为大于1的整数，M为小于N的正整数，所述结束条件包括以下条件中的一种或多种的组合：It may be set to perform step S504 in the Nth training cycle in the iterative training, where N is an integer greater than 1, M is a positive integer less than N, and the ending condition includes one or more combinations of the following conditions :

所述第N个训练周期中的原始误差值小于预设的第一阈值；The original error value in the Nth training cycle is less than a preset first threshold;

所述第N个训练周期中的修正误差值小于预设的第二阈值；The correction error value in the Nth training cycle is less than a preset second threshold;

所述第N个训练周期中的原始误差值和所述第N-M个训练周期中的原始误差值的差异小于预设的第三阈值；A difference between the original error value in the N-th training cycle and the original error value in the N-M training cycle is less than a preset third threshold;

所述第N个训练周期中的修正误差值和所述第N-M个训练周期中的修正误差值的差异小于预设的第四阈值；A difference between the correction error value in the N-th training cycle and the correction error value in the N-M training cycle is less than a preset fourth threshold;

所述第N个训练周期中的待定权重参数和所述第N-M个训练周期中的待定权重参数的差异小于预设的第五阈值；和A difference between the pending weight parameter in the N-th training cycle and the pending weight parameter in the N-M training cycle is less than a preset fifth threshold; and

N大于预设的第六阈值。N is greater than a preset sixth threshold.

应理解，当M为1时，意味着在相关结束条件中，需要比较最近相邻两个训练周期中对应物理量的差异值。It should be understood that when M is 1, it means that in the relevant end condition, the difference values of the corresponding physical quantities in the two nearest neighboring training periods need to be compared.

还应理解，步骤S504可以在每个训练周期中执行，也可以在每M个训练周期中执行，本申请实施例对步骤S504的执行频率不做限定。It should also be understood that step S504 may be performed in each training cycle, and may also be performed in every M training cycles. The embodiment of this application does not limit the execution frequency of step S504.

对于一个训练周期，不妨理解为一个从计算修正误差值，根据修正误差值调整权重参数，再使用调整后的权重参数获得新的训练结果的过程。For a training cycle, it may be understood as a process of calculating a correction error value, adjusting a weight parameter according to the correction error value, and then using the adjusted weight parameter to obtain a new training result.

与步骤S504对应的，当所述第N个训练周期不满足所述结束条件时，存储以下物理量中的一种或多种的组合：Corresponding to step S504, when the Nth training cycle does not satisfy the end condition, one or more combinations of the following physical quantities are stored:

所述第N个训练周期中的原始误差值；The original error value in the Nth training cycle;

所述第N个训练周期中的修正误差值；A correction error value in the Nth training cycle;

所述第N个训练周期中的待定权重参数；和The pending weight parameters in the Nth training cycle; and

所述第N个训练周期的周期数N。The number N of the Nth training cycle.

存储后的物理量会在后续执行步骤S504时被调用。The stored physical quantity will be called in the subsequent execution of step S504.

S505、当所述迭代训练满足结束条件时，将所述量化权重参数作为所述神经网络模型的模型权重参数。S505. When the iterative training meets an end condition, use the quantized weight parameter as a model weight parameter of the neural network model.

一般的，迭代训练结束意味着通过训练，量化权重参数已经得到了期望程度的优化，即可以被确定为神经网络模型的模型权重参数。Generally, the end of iterative training means that through training, the quantized weight parameters have been optimized to the desired degree, that is, they can be determined as the model weight parameters of the neural network model.

在一些可行的实施方式中，采用训练样本集A对模型进行训练，采用测试样本集B对模型进行测试。在使用A对模型进行N个训练周期的训练后，在测试数据集B上对模型进行测试，得到第一测试结果X；继续采用A对模型训练M个训练周期后，在测试数据集B上对模型进行测试，得到第二测试结果Y，当X与Y的差异小于阈值时，结束训练，否则继续使用A对模型进行训练。In some feasible implementation modes, the training sample set A is used to train the model, and the test sample set B is used to test the model. After using A to train the model for N training cycles, test the model on test data set B to get the first test result X; continue to use A to train the model for M training cycles on test data set B Test the model to get the second test result Y. When the difference between X and Y is less than the threshold, end the training, otherwise continue to use A to train the model.

对应的，在该实施例中，结束条件包括上述第一测试结果X和第二测试结果Y的差异小于阈值。Correspondingly, in this embodiment, the ending condition includes that the difference between the first test result X and the second test result Y is smaller than a threshold.

S506、当所述迭代训练不满足所述结束条件时，根据所述修正误差值，采用反向传播算法，对所述神经网络模型的网络层，逐层调整所述待定权重参数，直到所述神经网络模型的输入层，以获得所述神经网络模型的调整后权重参数。S506. When the iterative training does not satisfy the end condition, according to the modified error value, a back propagation algorithm is used to adjust the pending weight parameters layer by layer for the network layer of the neural network model until the An input layer of a neural network model to obtain adjusted weight parameters of the neural network model.

前文已经介绍了反向传播算法，这里不再赘述。示例性的，所述神经网络模型的待定权重参数根据如下公式调整：The back-propagation algorithm has been introduced in the previous article, so I won't repeat it here. Exemplarily, the pending weight parameters of the neural network model are adjusted according to the following formula:

当从输入层的权重被调整完成后，神经网络模型的所有权重都已被获得，即获得所述神经网络模型的调整后权重参数。After the weights from the input layer are adjusted, all the weights of the neural network model have been obtained, that is, the adjusted weight parameters of the neural network model are obtained.

此后，将调整后权重参数作为待定量化参数，执行步骤S5011，继续进行迭代训练。After that, the adjusted weight parameter is used as the parameter to be quantified, and step S5011 is executed to continue the iterative training.

可以理解的是，电子设备为了执行上述方法，其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到，结合本文中所公开的实施例描述的各示例的算法步骤，本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本申请的范围。It can be understood that, in order to execute the above method, the electronic device includes a hardware structure and / or a software module corresponding to each function. Those skilled in the art should easily realize that, in combination with the algorithm steps of the examples described in the embodiments disclosed herein, this application can be implemented in hardware or a combination of hardware and computer software. Whether a certain function is performed by hardware or computer software-driven hardware depends on the specific application and design constraints of the technical solution. A professional technician can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of this application.

本申请实施例可以根据上述方法示例对电子设备进行功能模块的划分，例如，可以对应各个功能划分各个功能模块，也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现，也可以采用软件功能模块的形式实现。需要说明的是，本申请实施例中对模块的划分是示意性的，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式。The embodiments of the present application may divide the functional modules of the electronic device according to the foregoing method examples. For example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The above integrated modules can be implemented in the form of hardware or software functional modules. It should be noted that the division of the modules in the embodiments of the present application is schematic, and is only a logical function division. In actual implementation, there may be another division manner.

在采用对应各个功能划分各个功能模块的情况下，图6示出了上述实施例中涉及的电子设备的一种可能的组成示意图。In a case where each functional module is divided corresponding to each function, FIG. 6 shows a possible composition diagram of the electronic device involved in the foregoing embodiment.

一种神经网络模型权重参数的确定设备600，包括：A device 600 for determining a weight parameter of a neural network model includes:

前向传播模块601,用于基于神经网络模型的待定权重参数，对样本数据进行处理，以获得输出结果；The forward propagation module 601 is configured to process the sample data based on the neural network model's pending weight parameters to obtain an output result;

比较模块602,用于计算所述输出结果和预设的期望结果的原始误差值，所述原始误差值为所述输出结果和所述期望结果的差异的数值化表示；A comparison module 602, configured to calculate an original error value of the output result and a preset expected result, where the original error value is a numerical representation of a difference between the output result and the expected result;

修正模块603,用于基于修正值，对所述原始误差值进行修正，以获得修正误差值；A correction module 603, configured to correct the original error value based on the correction value to obtain a correction error value;

确定模块604,用于基于所述修正误差值和所述待定权重参数，确定所述神经网络模型的模型权重参数；A determining module 604, configured to determine a model weight parameter of the neural network model based on the modified error value and the pending weight parameter;

其中，所述修正值根据如下公式获得：The correction value is obtained according to the following formula:

R＝(w _k-Q(w _k))×Q(w _k) R = (w _k -Q (w _k )) × Q (w _k )

在一种可行的实施方式中，所述修正误差值根据如下公式获得：In a feasible implementation manner, the correction error value is obtained according to the following formula:

在一种可行的实施方式中，所述以所述修正值为自变量的函数为计算所述修正值的绝对值；对应的，所述修正误差值根据如下公式获得：In a feasible implementation manner, the function that uses the correction value as an independent variable is an absolute value for calculating the correction value; correspondingly, the correction error value is obtained according to the following formula:

在一种可行的实施方式中，所述神经网络模型包括p个网络层，每个所述网络层包括q个所述待定权重参数，所述第k个待定权重参数为所述神经网络模型中第i个网络层的第j个待定权重参数；对应的，所述修正误差值根据如下公式获得：In a feasible implementation manner, the neural network model includes p network layers, and each network layer includes q of the pending weight parameters, and the k-th pending weight parameter is in the neural network model. The j-th pending weight parameter of the i-th network layer; correspondingly, the correction error value is obtained according to the following formula:

在一种可行的实施方式中，所述前向传播模块601具体用于：获得所述待定权重参数；量化所述获得的待定权重参数，以得到量化权重参数，所述量化权重参数为所述待定权重参数的量化值；将所述量化权重参数作为所述神经网络模型的模型权重参数，采用前向传播算法，对所述样本数据进行处理；从所述神经网络模型的输出层获得所述输出结果。In a feasible implementation manner, the forward propagation module 601 is specifically configured to: obtain the pending weight parameter; quantize the obtained pending weight parameter to obtain a quantized weight parameter, where the quantized weight parameter is the A quantized value of a weight parameter to be determined; using the quantized weight parameter as a model weight parameter of the neural network model, using a forward propagation algorithm to process the sample data; obtaining the output layer from the neural network model Output results.

在一种可行的实施方式中，所述神经网络模型的模型权重参数采用迭代训练的方式获得，当所述迭代训练满足结束条件时，所述确定模块604具体用于：将所述量化权重参数作为所述神经网络模型的模型权重参数。In a feasible implementation manner, the model weight parameters of the neural network model are obtained by iterative training. When the iterative training meets an end condition, the determination module 604 is specifically configured to: use the quantized weight parameters As a model weight parameter of the neural network model.

在一种可行的实施方式中，还包括反向传播模块605,当所述迭代训练不满足所述结束条件时，所述反向传播模块605具体用于：根据所述修正误差值，采用反向传播算法，对所述神经网络模型的网络层，逐层调整所述待定权重参数，直到所述神经网络模型的输入层，以获得所述神经网络模型的调整后权重参数。In a feasible implementation manner, it further includes a back propagation module 605. When the iterative training does not satisfy the end condition, the back propagation module 605 is specifically configured to: The forward propagation algorithm adjusts the pending weight parameters layer by layer for the network layer of the neural network model until the input layer of the neural network model to obtain the adjusted weight parameters of the neural network model.

在一种可行的实施方式中，所述神经网络模型的待定权重参数根据如下公式调整：In a feasible implementation manner, the pending weight parameters of the neural network model are adjusted according to the following formula:

在一种可行的实施方式中，对于所述迭代训练中的第N个训练周期，N为大于1的整数，M为小于N的正整数，所述结束条件包括以下条件中的一种或多种的组合：所述第N个训练周期中的原始误差值小于预设的第一阈值；所述第N个训练周期中的修正误差值小于预设的第二阈值；所述第N个训练周期中的原始误差值和所述第N-M个训练周期中的原始误差值的差异小于预设的第三阈值；所述第N个训练周期中的修正误差值和所述第N-M个训练周期中的修正误差值的差异小于预设的第四阈值；所述第N个训练周期中的待定权重参数和所述第N-M个训练周期中的待定权重参数的差异小于预设的第五阈值；和N大于预设的第六阈值。In a feasible implementation manner, for the Nth training cycle in the iterative training, N is an integer greater than 1, M is a positive integer less than N, and the ending condition includes one or more of the following conditions Combinations: the original error value in the Nth training cycle is less than a preset first threshold; the modified error value in the Nth training cycle is less than a preset second threshold; the Nth training The difference between the original error value in the period and the original error value in the NMth training period is less than a preset third threshold; the corrected error value in the Nth training period and the NMth training period The difference between the corrected error value of is smaller than a preset fourth threshold; the difference between the pending weight parameter in the Nth training cycle and the pending weight parameter in the NMth training cycle is less than a preset fifth threshold; and N is greater than a preset sixth threshold.

在一种可行的实施方式中，当所述第N个训练周期不满足所述结束条件时，存储以下物理量中的一种或多种的组合：所述第N个训练周期中的原始误差值；所述第N个训练周期中的修正误差值；所述第N个训练周期中的待定权重参数；和所述第N个训练周期的周期数N。In a feasible implementation manner, when the Nth training cycle does not satisfy the end condition, one or more combinations of the following physical quantities are stored: the original error value in the Nth training cycle The correction error value in the Nth training cycle; the pending weight parameter in the Nth training cycle; and the number of cycles N of the Nth training cycle.

在一种可行的实施方式中，所述前向传播模块601具体用于：在所述迭代训练的第一个训练周期时，将预设的初始权重参数作为所述待定权重参数；在所述迭代训练的非第一个训练周期时，将所述神经网络模型的调整后权重参数作为所述待定权重参数。In a feasible implementation manner, the forward propagation module 601 is specifically configured to: during a first training cycle of the iterative training, use a preset initial weight parameter as the pending weight parameter; in the In the non-first training cycle of iterative training, the adjusted weight parameter of the neural network model is used as the pending weight parameter.

在一种可行的实施方式中，所述神经网络模型用于图像识别；对应的，所述样本数据包括图像样本；对应的，所述输出结果包括表征为概率形式的所述图像识别的识别结果。In a feasible implementation manner, the neural network model is used for image recognition; correspondingly, the sample data includes image samples; correspondingly, the output result includes a recognition result of the image recognition characterized as a probability form. .

在一种可行的实施方式中，所述神经网络模型用于声音识别；对应的，所述样本数据包括声音样本；对应的，所述输出结果包括表征为概率形式的所述声音识别的识别结果。In a feasible implementation manner, the neural network model is used for voice recognition; correspondingly, the sample data includes voice samples; correspondingly, the output result includes the recognition result of the voice recognition characterized as a probability form. .

在一种可行的实施方式中，所述神经网络模型用于超分辨率图像的获取；对应的，所述样本数据包括图像样本；对应的，所述输出结果包括超分辨率处理后的图像的像素值。In a feasible implementation manner, the neural network model is used for obtaining a super-resolution image; correspondingly, the sample data includes image samples; correspondingly, the output result includes a super-resolution processed image. Pixel values.

需要说明的是，上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述，在此不再赘述。It should be noted that all relevant content of each step involved in the foregoing method embodiments can be referred to the functional description of the corresponding functional module, and will not be repeated here.

当然，电子设备包括但不限于上述所列举的单元模块，例如，电子设备还可以包括通信单元，该通信单元可以包括用于向其他设备发送数据或者信号的发送单元，接收其他设备发送数据或者信号的接收单元等。并且，上述功能单元的具体所能够实现的功能也包括但不限于上述实例的方法步骤对应的功能，电子设备的其他单元的详细描述可以参考其所对应方法步骤的详细描述，本申请实施例这里不再赘述。Of course, the electronic device includes but is not limited to the above-listed unit modules. For example, the electronic device may further include a communication unit. The communication unit may include a sending unit for sending data or signals to other devices, and receiving data or signals sent by other devices. Receiving unit, etc. In addition, the functions that can be implemented by the above functional units also include, but are not limited to, the functions corresponding to the method steps of the above examples. For detailed descriptions of other units of the electronic device, refer to the detailed description of the corresponding method steps. No longer.

其中，图7中的处理单元701可以是处理器或控制器，例如可以是中央处理器CPU，通用处理器，数字信号处理器(digital signal processor，DSP)，专用集成电路ASIC，现场可编程门阵列FPGA，图形处理器GPU，或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框，模块和电路。处理器也可以是实现计算功能的组合，例如包含一个或多个微处理器组合，DSP和微处理器的组合等等。存储单元702可以是存储器。通信单元可以是收发器、射频电路或通信接口等。处理单元701执行如图5所示的神经网络模型权重参数的确定方法。The processing unit 701 in FIG. 7 may be a processor or a controller. For example, the processing unit 701 may be a central processing unit CPU, a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit ASIC, or a field programmable gate. Array FPGA, graphics processor GPU, or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. It may implement or execute various exemplary logical blocks, modules, and circuits described in connection with the disclosure of this application. A processor may also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a DSP and a microprocessor, and so on. The storage unit 702 may be a memory. The communication unit may be a transceiver, a radio frequency circuit, or a communication interface. The processing unit 701 executes a method for determining a weight parameter of a neural network model as shown in FIG. 5.

本申请实施例还包括一种计算机存储介质，包括计算机指令，当所述计算机指令在电子设备上运行时，使得所述电子设备执行如图5所示的神经网络模型权重参数的确定方法。An embodiment of the present application further includes a computer storage medium including computer instructions, and when the computer instructions are executed on an electronic device, the electronic device is caused to execute a method for determining a weight parameter of a neural network model shown in FIG. 5.

本申请实施例还包括一种计算机程序产品，当所述计算机程序产品在计算机上运行时，使得所述计算机执行如图5所示的神经网络模型权重参数的确定方法。The embodiment of the present application further includes a computer program product, when the computer program product is run on a computer, the computer is caused to execute a method for determining a weight parameter of a neural network model shown in FIG. 5.

通过以上的实施方式的描述，所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，仅以上述各功能模块的划分进行举例说明，实际应用中，可以根据需要而将上述功能分配由不同的功能模块完成，即将装置的内部结构划分成不同的功能模块，以完成以上描述的全部或者部分功能。Through the description of the above embodiments, those skilled in the art can clearly understand that, for the convenience and brevity of the description, only the division of the above functional modules is used as an example. In practical applications, the above functions can be allocated according to needs It is completed by different functional modules, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.

在本申请所提供的几个实施例中，应该理解到，所揭露的装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，模块或单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个装置，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of modules or units is only a logical function division. In actual implementation, there may be another division manner. For example, multiple units or components may be combined or It can be integrated into another device, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, which may be electrical, mechanical or other forms.

作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是一个物理单元或多个物理单元，即可以位于一个地方，或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may be one physical unit or multiple physical units, that is, they may be located in one place, or may be distributed to multiple different places. Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.

另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each of the units may exist separately physically, or two or more units may be integrated into one unit. The above integrated unit may be implemented in the form of hardware or in the form of software functional unit.

集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个可读取存储介质中。基于这样的理解，本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来，该软件产品存储在一个存储介质中，包括若干指令用以使得一个设备(可以是单片机，芯片等)或处理器(processor)执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(read only memory，ROM)、随机存取存储器(random access memory，RAM)、磁碟或者光盘等各种可以存储程序代码的介质。When the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a readable storage medium. Based on such an understanding, the technical solutions of the embodiments of the present application essentially or partly contribute to the existing technology or all or part of the technical solutions may be embodied in the form of a software product, which is stored in a storage medium The instructions include a number of instructions for causing a device (which can be a single-chip microcomputer, a chip, or the like) or a processor to execute all or part of the steps of the methods in the embodiments of the present application. The foregoing storage media include: U disks, mobile hard disks, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks, which can store program codes.

以上所述，仅为本申请的具体实施方式，但本申请的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应以所述权利要求的保护范围为准。The above is only a specific implementation of this application, but the scope of protection of this application is not limited to this. Any person skilled in the art can easily think of changes or replacements within the technical scope disclosed in this application. It should be covered by the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.

Claims

A method for determining weight parameters of a neural network model, which includes:

Process the sample data based on the undetermined weight parameters of the neural network model to obtain output results;

Calculating an original error value of the output result and a preset expected result, where the original error value is a numerical representation of a difference between the output result and the expected result;

Modifying the original error value based on the correction value to obtain a correction error value;

Determining a model weight parameter of the neural network model based on the modified error value and the pending weight parameter;

The correction value is obtained according to the following formula:

R = (w _k -Q (w _k )) × Q (w _k )

R represents the correction value, w _k represents the k-th pending weight parameter of the neural network model, Q (w _k ) represents a quantized value of the k-th pending weight parameter, and k is a non-negative integer.

The method according to claim 1, wherein the correction error value is obtained according to the following formula:

Among them, E1 represents the correction error value, E0 represents the original error value, α is a constant, m is the total number of pending weight parameters used to process the sample data, and F ((w _k -Q (w _k ) ) × Q (w _k )) represents a function in which the correction value is an independent variable, and m is a positive integer.

The method according to claim 2, wherein the function of the correction value as an independent variable is used to calculate an absolute value of the correction value;

Correspondingly, the correction error value is obtained according to the following formula:

Among them, | (w _k -Q (w _k )) × Q (w _k ) | means calculating the absolute value of (w _k -Q (w _k )) × Q (w _k ).

The method according to any one of claims 1 to 3, wherein the neural network model includes p network layers, and each of the network layers includes q of the pending weight parameters, and the kth pending The weight parameter is the j-th pending weight parameter of the i-th network layer in the neural network model;

Among them, p and q are positive integers, and i and j are non-negative integers.

The method according to any one of claims 1 to 4, wherein the processing of sample data based on the neural network model's pending weight parameters comprises:

Obtaining the pending weight parameters;

Quantizing the obtained pending weight parameters to obtain a quantized weight parameter, where the quantized weight parameter is a quantized value of the pending weight parameter;

Using the quantized weight parameter as a model weight parameter of the neural network model, and using a forward propagation algorithm to process the sample data;

The output result is obtained from an output layer of the neural network model.

The method according to claim 5, wherein the model weight parameters of the neural network model are obtained in an iterative training manner, and when the iterative training meets an end condition, the based on the modified error value and the The weight parameter to be determined, and determining a model weight parameter of the neural network model includes:

The quantized weight parameter is used as a model weight parameter of the neural network model.

The method according to claim 6, characterized in that, when the iterative training does not satisfy the end condition, the determining a model weight of the neural network model based on the modified error value and the pending weight parameter Parameters, including:

According to the modified error value, a back-propagation algorithm is used to adjust the pending weight parameter layer by layer for the network layer of the neural network model until the input layer of the neural network model to obtain the Adjusted weight parameters.

The method according to claim 7, wherein the pending weight parameters of the neural network model are adjusted according to the following formula:

Wherein, w0 _k represents the k-th pending weight parameter, w1 _k represents the k-th adjusted weight parameter, and β is a normal number.

The method according to any one of claims 6 to 8, wherein, for the Nth training cycle in the iterative training, N is an integer greater than 1, M is a positive integer less than N, and the end condition Include a combination of one or more of the following conditions:

The original error value in the Nth training cycle is less than a preset first threshold;

The correction error value in the Nth training cycle is less than a preset second threshold;

A difference between the original error value in the N-th training cycle and the original error value in the N-M training cycle is less than a preset third threshold;

A difference between the correction error value in the N-th training cycle and the correction error value in the N-M training cycle is less than a preset fourth threshold;

A difference between the pending weight parameter in the N-th training cycle and the pending weight parameter in the N-M training cycle is less than a preset fifth threshold; and

N is greater than a preset sixth threshold.

The method according to claim 9, characterized in that when the Nth training cycle does not satisfy the end condition, one or more combinations of the following physical quantities are stored:

The original error value in the Nth training cycle;

A correction error value in the Nth training cycle;

The pending weight parameters in the Nth training cycle; and

The number N of the Nth training cycle.

The method according to any one of claims 6 to 10, wherein the obtaining the pending weight parameter comprises:

During the first training cycle of the iterative training, using a preset initial weight parameter as the pending weight parameter;

In a non-first training cycle of the iterative training, the adjusted weight parameter of the neural network model is used as the pending weight parameter.

The method according to any one of claims 1 to 11, wherein the neural network model is used for image recognition;

Correspondingly, the sample data includes image samples;

Correspondingly, the output result includes a recognition result of the image recognition characterized as a probability form.

The method according to any one of claims 1 to 11, wherein the neural network model is used for voice recognition;

Correspondingly, the sample data includes sound samples;

Correspondingly, the output result includes a recognition result of the voice recognition characterized as a probability form.

The method according to any one of claims 1 to 11, wherein the neural network model is used for acquiring a super-resolution image;

Correspondingly, the sample data includes image samples;

Correspondingly, the output result includes pixel values of the image after super-resolution processing.

A device for determining a weight parameter of a neural network model, including:

The forward propagation module is used to process the sample data based on the neural network model's pending weight parameters to obtain the output result;

A comparison module, configured to calculate an original error value of the output result and a preset expected result, where the original error value is a numerical representation of a difference between the output result and the expected result;

A correction module, configured to correct the original error value based on the correction value to obtain a correction error value;

A determining module, configured to determine a model weight parameter of the neural network model based on the correction error value and the pending weight parameter;

The correction value is obtained according to the following formula:

R = (w _k -Q (w _k )) × Q (w _k )

The device according to claim 15, wherein the correction error value is obtained according to the following formula:

The device according to claim 16, wherein the function of using the correction value as an independent variable is an absolute value for calculating the correction value;

The device according to any one of claims 15 to 17, wherein the neural network model includes p network layers, each of the network layers includes q the pending weight parameters, and the kth pending The weight parameter is the j-th pending weight parameter of the i-th network layer in the neural network model;

The device according to any one of claims 15 to 18, wherein the forward propagation module is specifically configured to:

Obtaining the pending weight parameters;

The output result is obtained from an output layer of the neural network model.

The device according to claim 19, wherein the model weight parameters of the neural network model are obtained in an iterative training manner, and when the iterative training meets an end condition, the determination module is specifically configured to:

The device according to claim 20, further comprising a back propagation module, and when the iterative training does not satisfy the end condition, the back propagation module is specifically configured to:

The device according to claim 21, wherein the pending weight parameters of the neural network model are adjusted according to the following formula:

The device according to any one of claims 20 to 22, wherein for the Nth training cycle in the iterative training, N is an integer greater than 1, M is a positive integer less than N, and the end condition Include a combination of one or more of the following conditions:

N is greater than a preset sixth threshold.

The device according to claim 23, wherein when the Nth training cycle does not satisfy the end condition, one or more combinations of the following physical quantities are stored:

The original error value in the Nth training cycle;

A correction error value in the Nth training cycle;

The pending weight parameters in the Nth training cycle; and

The number N of the Nth training cycle.

The device according to any one of claims 20 to 24, wherein the forward propagation module is specifically configured to:

The device according to any one of claims 15 to 25, wherein the neural network model is used for image recognition;

Correspondingly, the sample data includes image samples;

The device according to any one of claims 15 to 25, wherein the neural network model is used for voice recognition;

Correspondingly, the sample data includes sound samples;

The device according to any one of claims 15 to 25, wherein the neural network model is used for acquiring a super-resolution image;

Correspondingly, the sample data includes image samples;

An electronic device, comprising: one or more processors and one or more memories;

The one or more memories are coupled to the one or more processors, and the one or more memories are used to store computer program code, where the computer program code includes computer instructions, and when the one or more processors When the computer instruction is executed, the electronic device executes a method for determining a weight parameter of a neural network model according to any one of claims 1 to 14.

A computer storage medium, comprising computer instructions that, when the computer instructions run on an electronic device, cause the electronic device to execute the weight parameter of the neural network model according to any one of claims 1 to 14. Determine the method.

A computer program product, wherein when the computer program product is run on a computer, the computer is caused to execute the method for determining a weight parameter of a neural network model according to any one of claims 1 to 14.