[go: up one dir, main page]

CN116611470A - Method for setting quantization bit number of neural network model - Google Patents

Method for setting quantization bit number of neural network model Download PDF

Info

Publication number
CN116611470A
CN116611470A CN202310460434.9A CN202310460434A CN116611470A CN 116611470 A CN116611470 A CN 116611470A CN 202310460434 A CN202310460434 A CN 202310460434A CN 116611470 A CN116611470 A CN 116611470A
Authority
CN
China
Prior art keywords
quantization
model
memory
loss
activation value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310460434.9A
Other languages
Chinese (zh)
Inventor
陈其宾
段强
姜凯
李锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Inspur Science Research Institute Co Ltd
Original Assignee
Shandong Inspur Science Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Inspur Science Research Institute Co Ltd filed Critical Shandong Inspur Science Research Institute Co Ltd
Priority to CN202310460434.9A priority Critical patent/CN116611470A/en
Priority to PCT/CN2023/100668 priority patent/WO2024221573A1/en
Publication of CN116611470A publication Critical patent/CN116611470A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Complex Calculations (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

本发明涉及模型量化技术领域,具体为一种神经网络模型量化位数设置方法,包括以下步骤:准备包含少量数据的数据集;定义量化损失评估指标;定义模型量化位数;获取模型相关内存限制;定义目标神经网络模型;计算激活值不同量化位数的量化损失;分析模型推理步骤及各时刻内存中激活值;有益效果为:本发明提出的神经网络模型量化位数设置方法解决在内存限制条件下的激活值量化位数选择问题。首先分析激活值在不同位宽下的精度损失,其次分析不同时刻在内存中的激活值集合,构建一个优化问题,在限制所有时刻激活值内存占用的条件下,通过整数线性规划方法,优化得到最优的精度损失。

The present invention relates to the technical field of model quantization, specifically a method for setting quantization digits of a neural network model, comprising the following steps: preparing a data set containing a small amount of data; defining quantization loss evaluation indicators; defining model quantization digits; obtaining model-related memory limits ; define the target neural network model; calculate the quantization loss of different quantization digits of the activation value; analyze the model reasoning step and the activation value in the memory at each moment; the beneficial effect is: the neural network model quantization digit setting method proposed by the present invention solves the problem of memory limitation Conditional activation value quantization digit selection problem. Firstly, analyze the precision loss of activation values under different bit widths, and then analyze the set of activation values in memory at different times to construct an optimization problem. Under the condition of limiting the memory usage of activation values at all times, through integer linear programming method, optimize to get Optimal precision loss.

Description

一种神经网络模型量化位数设置方法A method for setting quantization digits of neural network model

技术领域technical field

本发明涉及模型量化技术领域,具体为一种神经网络模型量化位数设置方法。The invention relates to the technical field of model quantization, in particular to a method for setting quantization digits of a neural network model.

背景技术Background technique

随着深度学习相关技术不断发展,神经网络模型在很多行业和场景得到广泛应用。但是深度学习模型参数量和计算量大,对硬件资源要求较高,往往与硬件资源的局限性形成冲突。With the continuous development of deep learning related technologies, neural network models have been widely used in many industries and scenarios. However, the deep learning model has a large amount of parameters and calculations, and has high requirements for hardware resources, which often conflict with the limitations of hardware resources.

现有技术中,物联网领域有大量的嵌入式设备,由于设备硬件资源极为有限,难以部署深度学习模型。另外,深度学习模型越来越大,即便是在服务器或云端,部署一些大模型的压力也逐步凸显。其中,内存限制是制约模型部署的关键因素,决定了是否能将模型部署到设备上。In the existing technology, there are a large number of embedded devices in the Internet of Things field. Due to the extremely limited hardware resources of the devices, it is difficult to deploy deep learning models. In addition, the deep learning model is getting bigger and bigger, even on the server or cloud, the pressure to deploy some large models is gradually highlighted. Among them, memory limitation is a key factor restricting model deployment, which determines whether the model can be deployed to the device.

但是,在深度学习模型推理过程中,一般是激活值占据大部分内存。模型量化是用来降低模型内存占用的有效方法,尤其是多精度模型量化。但是,相比于传统的量化方法,多精度量化往往导致模型精度损失较大,如何为不同的权重和激活值选择不同的量化位数,是影响模型精度损失的关键。目前,有一些方法是针对模型权重的多精度量化,没有涉及激活值位数选择。而有一些针对激活值位数选择的,又没有考虑模型精度下降的目标。However, during the inference process of deep learning models, it is generally the activation values that occupy most of the memory. Model quantization is an effective method to reduce the memory usage of models, especially multi-precision model quantization. However, compared with traditional quantization methods, multi-precision quantization often leads to a greater loss of model accuracy. How to choose different quantization digits for different weights and activation values is the key to affecting the loss of model accuracy. At present, there are some methods for multi-precision quantization of model weights, which do not involve the selection of activation value digits. However, some are selected for the number of activation values, and do not consider the goal of reducing the accuracy of the model.

发明内容Contents of the invention

本发明的目的在于提供一种神经网络模型量化位数设置方法,以解决上述背景技术中提出的问题。The purpose of the present invention is to provide a method for setting the number of quantization digits of a neural network model to solve the problems raised in the above-mentioned background technology.

为实现上述目的,本发明提供如下技术方案:一种神经网络模型量化位数设置方法,所述方法包括以下步骤:In order to achieve the above object, the present invention provides the following technical solutions: a method for setting the number of quantized digits of a neural network model, said method comprising the following steps:

准备包含少量数据的数据集;Prepare datasets containing small amounts of data;

定义量化损失评估指标;Define quantitative loss assessment indicators;

定义模型量化位数;Define the number of quantized digits of the model;

获取模型相关内存限制;Get the memory limit related to the model;

定义目标神经网络模型;Define the target neural network model;

计算激活值不同量化位数的量化损失;Calculate the quantization loss of different quantization digits of the activation value;

分析模型推理步骤及各时刻内存中激活值;Analyze model reasoning steps and activation values in memory at each moment;

优化问题构建,并采用整数线性规划方法求解;The optimization problem is constructed and solved using the integer linear programming method;

模型量化和推理。Model quantization and inference.

优选的,准备包含少量数据的数据集时,据集作为校准数据集以及量化损失计算数据集。Preferably, when preparing a data set containing a small amount of data, the data set is used as a calibration data set and a quantization loss calculation data set.

优选的,定义量化损失评估指标时,计算非量化激活值数据以及量化激活值数据的差异,差异采用但不限于均方误差、余弦相似度等计算,作为量化损失。Preferably, when defining the quantitative loss evaluation index, the difference between the non-quantized activation value data and the quantized activation value data is calculated, and the difference is calculated by but not limited to mean square error, cosine similarity, etc., as the quantized loss.

优选的,定义模型量化位数时,基于硬件支持情况及模型精度要求,模型量化位数是但不限于2,4,6,8位数,使用Bitwidth表示量化位数集合,Preferably, when defining model quantization digits, based on hardware support and model accuracy requirements, model quantization digits are but not limited to 2, 4, 6, and 8 digits, and Bitwidth is used to represent the set of quantization digits.

Bitwidth={2,4,6,8}。Bitwidth = {2, 4, 6, 8}.

优选的,获取模型相关内存限制时,模型推理过程中,推理相关内存占用包括推理框架内存占用、代码数据内存占用、模型权重内存占用以及激活值内存占用,其中模型相关内存占用主要包括模型权重内存占用以及激活值内存占用;假设硬件内存大小为mtotal,模型相关内存占用外的内存为mother,剩余内存大小mmodel,即模型相关内存限制。Preferably, when obtaining the model-related memory limit, during the model inference process, the inference-related memory usage includes the inference framework memory usage, code data memory usage, model weight memory usage, and activation value memory usage, where the model-related memory usage mainly includes model weight memory Occupancy and activation value memory occupation; suppose the hardware memory size is m total , the memory other than the model-related memory occupation is m other , and the remaining memory size is m model , which is the model-related memory limit.

优选的,定义目标神经网络模型时,神经网络模型为包括但不限于ResNet、MobileNet、SSD的主流神经网络模型。Preferably, when defining the target neural network model, the neural network model is a mainstream neural network model including but not limited to ResNet, MobileNet, SSD.

优选的,计算激活值不同量化位数的量化损失时,针对损失计算数据集中的每个样本数据,使用量化前模型进行推理,并获取所有的激活值数据,针对所有激活值数据,采用所有的量化位数,计算各激活值的不同量化位数的量化损失,完成所有样本数据推理后,汇总各激活值不同量化位数的量化损失,并采用但不限于均值等方式,计算所有样本数据的各激活值的不同量化位数的量化损失集合Loss;公式如下,其中lossab表示激活值数据a使用量化位数b的量化损失,Loss={lossab},a∈Activations,b∈Bitwidth。Preferably, when calculating the quantization loss of different quantization digits of the activation value, for each sample data in the loss calculation data set, use the pre-quantization model for reasoning, and obtain all the activation value data, and for all the activation value data, use all Quantization digits, calculate the quantization loss of different quantization digits of each activation value, after completing the inference of all sample data, summarize the quantization losses of different quantization digits of each activation value, and use but not limited to the mean value to calculate the quantization loss of all sample data The quantization loss set Loss of different quantization bits of each activation value; the formula is as follows, where loss ab represents the quantization loss of activation value data a using quantization bits b, Loss={loss ab }, a∈Activations,b∈Bitwidth.

优选的,分析模型推理步骤及各时刻内存中激活值时,在模型推理过程中,模型权重一直保存在内存中,而不再使用的激活值会及时销毁,释放占用的内存空间;令Activations表示模型所有的激活值,activation_sett表示t时刻在内存中的激活值集合,因此任一activationsett都是Activations的子集,而不同时刻的中的元素会重复;逐步推理,得到所有时刻内存中激活值集合Activationsets,公式如下,其中T表示推理过程中的所有时刻的集合。Preferably, when analyzing the model reasoning steps and the activation values in the memory at each moment, during the model reasoning process, the model weights are always stored in the memory, and the activation values that are no longer used will be destroyed in time to release the occupied memory space; let Activations represent All the activation values of the model, activation_set t represents the set of activation values in memory at time t, so any activation sett is a subset of Activations, and at different times The elements in will be repeated; through step-by-step reasoning, the activation value set Activation sets in the memory at all times can be obtained, the formula is as follows, where T represents the set of all moments in the inference process.

优选的,优化问题构建,并采用整数线性规划方法求解时,令优化目标为总量化损失最小,约束条件为每个时刻的激活值集合占用的内存小于激活值内存限制,由于模型权重占用内存可以提前计算,假设为mweight,令mc表示激活值内存限制,使用如下公式计算mcPreferably, when the optimization problem is constructed and solved using the integer linear programming method, the optimization objective is to minimize the total quantization loss, and the constraint condition is that the memory occupied by the activation value set at each moment is less than the activation value memory limit. Since the model weight occupies memory It can be calculated in advance, assuming m weight , let m c represent the activation value memory limit, use the following formula to calculate m c ,

mc=mmodel-mweight m c = m model - m weight

令total_loss表示总量化损失,mt表示时刻t激活值集合占用的内存大小,ma表示激活值a占用的内存大小,sizea表示激活值a的元素个数,ba表示激活值a的量化位数,从Loss集合得到,Let total_loss represent the total quantization loss, m t represents the memory size occupied by the activation value set at time t, m a represents the memory size occupied by the activation value a, size a represents the number of elements of the activation value a, b a represents the activation value a quantization bits, Obtained from the Loss collection,

min total_lossmin total_loss

ma=sizea*ba m a = size a *b a

优化问题是一个整数线性规划问题,优化变量是每个激活值的量化位数ba,采用整数线性规划方法求解,得到每个激活值的量化位数。The optimization problem is an integer linear programming problem, and the optimization variable is the quantization bit b a of each activation value, which is solved by using the integer linear programming method to obtain the quantization bit number of each activation value.

优选的,模型量化和推理时,使用优化得到的量化位数,进行模型量化,并进而进行模型推理。Preferably, during model quantization and inference, the optimized quantization bits are used to perform model quantization and then perform model inference.

与现有技术相比,本发明的有益效果是:Compared with prior art, the beneficial effect of the present invention is:

本发明提出的神经网络模型量化位数设置方法解决在内存限制条件下的激活值量化位数选择问题。首先分析激活值在不同位宽下的精度损失,其次分析不同时刻在内存中的激活值集合,构建一个优化问题,在限制所有时刻激活值内存占用的条件下,通过整数线性规划方法,优化得到最优的精度损失。通过上述方式,可以在硬件内存限制条件下,选择最优的激活值量化位数,使得精度损失最小,具有较高的实用价值和创新价值。The method for setting the quantization digits of the neural network model proposed by the invention solves the problem of selecting the quantization digits of the activation value under the condition of memory limitation. Firstly, analyze the precision loss of activation values under different bit widths, and then analyze the set of activation values in memory at different times to construct an optimization problem. Under the condition of limiting the memory usage of activation values at all times, through integer linear programming method, optimize to get Optimal precision loss. Through the above method, the optimal activation value quantization bit can be selected under the condition of hardware memory limitation, so that the precision loss is minimized, and it has high practical value and innovative value.

附图说明Description of drawings

图1为本发明方法流程图。Fig. 1 is a flow chart of the method of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案进行清楚、完整地描述,及优点更加清楚明白,以下结合附图对本发明实施例进行进一步详细说明。应当理解,此处所描述的具体实施例是本发明一部分实施例,而不是全部的实施例,仅仅用以解释本发明实施例,并不用于限定本发明实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to clearly and completely describe the purpose, technical solution, and advantages of the present invention, the embodiments of the present invention will be further described in detail below in conjunction with the accompanying drawings. It should be understood that the specific embodiments described here are part of the embodiments of the present invention, rather than all embodiments, and are only used to explain the embodiments of the present invention, and are not intended to limit the embodiments of the present invention. All other embodiments obtained under the premise of creative labor belong to the protection scope of the present invention.

请参阅图1,本发明提供一种技术方案:一种神经网络模型量化位数设置方法,所述方法包括以下步骤:Please refer to Fig. 1, the present invention provides a kind of technical scheme: a kind of quantization digit setting method of neural network model, described method comprises the following steps:

首先,准备包含少量数据的数据集。该数据集作为校准数据集以及量化损失计算数据集。First, prepare a dataset containing a small amount of data. This dataset serves as a calibration dataset as well as a quantization loss computation dataset.

其次,定义量化损失评估指标。量化损失计算是计算非量化激活值数据以及量化激活值数据的差异,该差异可以采用但不限于均方误差、余弦相似度等计算,作为量化损失。非量化激活值数据是指采用浮点数的权重和输入(上一个激活值数据)计算相应算子,如卷积、全连接等,计算结果即非量化激活值数据。量化激活值数据是指分别对浮点数的权重和输入数据量化,使用量化后数据计算相应算子,并将计算后的结果进行反量化,得到量化激活值数据。Second, define quantitative loss evaluation metrics. Quantized loss calculation is to calculate the difference between non-quantized activation value data and quantized activation value data. The difference can be calculated by using but not limited to mean square error, cosine similarity, etc., as quantized loss. Non-quantized activation value data refers to the use of floating-point weights and inputs (previous activation value data) to calculate corresponding operators, such as convolution, full connection, etc., and the calculation result is non-quantized activation value data. Quantizing the activation value data refers to quantizing the weight of the floating-point number and the input data respectively, using the quantized data to calculate the corresponding operator, and dequantizing the calculated result to obtain the quantized activation value data.

第三,定义模型量化位数。基于硬件支持情况及模型精度要求,模型量化位数可以是但不限于2,4,6,8等位数,使用Bitwidth表示量化位数集合。Third, define the model quantization digits. Based on hardware support and model accuracy requirements, the model quantization digits can be but not limited to 2, 4, 6, and 8 digits, and Bitwidth is used to represent the set of quantization digits.

Bitwidth={2,4,6,8}Bitwidth={2, 4, 6, 8}

第四,获取模型相关内存限制。模型推理过程中,推理相关内存占用包括推理框架内存占用、代码数据内存占用、模型权重内存占用以及激活值内存占用等,其中模型相关内存占用主要包括模型权重内存占用以及激活值内存占用。假设硬件内存大小为mtotal,模型相关内存占用外的内存为mother,剩余内存大小mmodel,即模型相关内存限制。Fourth, get the model-related memory limit. During model inference, inference-related memory usage includes inference framework memory usage, code data memory usage, model weight memory usage, and activation value memory usage, among which model-related memory usage mainly includes model weight memory usage and activation value memory usage. Assume that the hardware memory size is m total , the memory other than the model-related memory occupation is m other , and the remaining memory size is m model , that is, the model-related memory limit.

mmodel=mtotal-mother m model = m total - m other

第五,定义目标神经网络模型。神经网络模型为包括但不限于ResNet、MobileNet、SSD等主流神经网络模型。Fifth, define the target neural network model. The neural network model includes but is not limited to ResNet, MobileNet, SSD and other mainstream neural network models.

第六,计算激活值不同量化位数的量化损失。针对损失计算数据集中的每个样本数据,使用量化前模型进行推理,并获取所有的激活值数据。同时,针对所有激活值数据,采用所有的量化位数,计算各激活值的不同量化位数的量化损失。完成所有样本数据推理后,汇总各激活值不同量化位数的量化损失,并采用但不限于均值等方式,计算所有样本数据的各激活值的不同量化位数的量化损失集合Loss。公式如下,其中lossab表示激活值数据a使用量化位数b的量化损失。Sixth, calculate the quantization loss of different quantization digits of the activation value. For each sample data in the loss calculation dataset, use the pre-quantization model for inference, and obtain all activation value data. At the same time, for all activation value data, all quantization bits are used to calculate the quantization loss of different quantization bits for each activation value. After the inference of all sample data is completed, the quantization losses of different quantization digits of each activation value are summarized, and the quantization loss set Loss of different quantization digits of each activation value of all sample data is calculated by means of but not limited to the mean value. The formula is as follows, where loss ab represents the quantization loss of activation value data a using quantization bits b.

Loss={lossab},a∈Activations,b∈BitwidthLoss={loss ab },a∈Activations,b∈Bitwidth

第七,分析模型推理步骤及各时刻内存中激活值。在模型推理过程中,模型权重一直保存在内存中,而不再使用的激活值会及时销毁,释放占用的内存空间。令Activations表示模型所有的激活值,activation_sett表示t时刻在内存中的激活值集合,因此任一都是Activations的子集,而不同时刻的/>中的元素可能会重复。逐步推理,得到所有时刻内存中激活值集合Activationsets,公式如下,其中T表示推理过程中的所有时刻的集合。Seventh, analyze the model reasoning steps and the activation values in memory at each moment. During the model inference process, the model weights are always stored in memory, and the activation values that are no longer used will be destroyed in time to release the occupied memory space. Let Activations represent all the activation values of the model, and activation_set t represents the set of activation values in memory at time t, so any They are all subsets of Activations, but at different times /> Elements in may be repeated. Step by step reasoning, the activation value set Activation sets in the memory at all moments is obtained, the formula is as follows, where T represents the set of all moments in the reasoning process.

第八,优化问题构建,并采用整数线性规划方法求解。令优化目标为总量化损失最小,约束条件为每个时刻的激活值集合占用的内存小于激活值内存限制。由于模型权重占用内存可以提前计算,假设为mweight,令mc表示激活值内存限制,使用如下公式计算mcEighth, the optimization problem is constructed and solved using the integer linear programming method. The optimization goal is to minimize the total quantization loss, and the constraint condition is that the memory occupied by the activation value set at each moment is less than the activation value memory limit. Since the memory occupied by the model weight can be calculated in advance, assuming it is m weight , let m c represent the memory limit of the activation value, and use the following formula to calculate m c .

mc=mmodel-mweight m c = m model - m weight

令total_loss表示总量化损失,mt表示时刻t激活值集合占用的内存大小,ma表示激活值a占用的内存大小,sizea表示激活值a的元素个数,ba表示激活值a的量化位数。可从Loss集合得到。Let total_loss represent the total quantization loss, m t represents the memory size occupied by the activation value set at time t, m a represents the memory size occupied by the activation value a, size a represents the number of elements of the activation value a, b a represents the activation value a quantization bits. Available from the Loss collection.

min total_lossmin total_loss

ma=sizea*ba m a = size a *b a

上述优化问题是一个整数线性规划问题,优化变量是每个激活值的量化位数ba,可以采用整数线性规划方法求解,得到每个激活值的量化位数。The above optimization problem is an integer linear programming problem, and the optimization variable is the quantization digit b a of each activation value, which can be solved by an integer linear programming method to obtain the quantization digit of each activation value.

第九,模型量化和推理。使用优化得到的量化位数,进行模型量化,并进而进行模型推理。Ninth, model quantification and inference. Use the optimized quantization digits to perform model quantization and then perform model inference.

尽管已经示出和描述了本发明的实施例,对于本领域的普通技术人员而言,可以理解在不脱离本发明的原理和精神的情况下可以对这些实施例进行多种变化、修改、替换和变型,本发明的范围由所附权利要求及其等同物限定。Although the embodiments of the present invention have been shown and described, those skilled in the art can understand that various changes, modifications and substitutions can be made to these embodiments without departing from the principle and spirit of the present invention. and modifications, the scope of the invention is defined by the appended claims and their equivalents.

Claims (10)

1. A method for setting quantization bit number of neural network model is characterized in that: the method comprises the following steps:
preparing a dataset comprising a small amount of data;
defining a quantization loss evaluation index;
defining a model quantization bit number;
acquiring a related memory limit of a model;
defining a target neural network model;
calculating quantization losses of different quantization bits of the activation value;
analyzing the model reasoning step and the activation values in the memory at each moment;
constructing an optimization problem, and solving by adopting an integer linear programming method;
model quantization and reasoning.
2. The method for setting quantization bit numbers of a neural network model according to claim 1, wherein: when preparing a data set containing a small amount of data, the data set is used as a calibration data set and a quantization loss calculation data set.
3. The method for setting quantization bit numbers of a neural network model according to claim 1, wherein: when the quantization loss evaluation index is defined, the non-quantization activation value data and the difference of the quantization activation value data are calculated, and the difference is calculated by adopting, but not limited to, mean square error, cosine similarity and the like as the quantization loss.
4. The method for setting quantization bit numbers of a neural network model according to claim 1, wherein: when defining the model quantization bit number, based on the hardware support condition and the model precision requirement, the model quantization bit number is but not limited to 2,4,6,8 bit number, using Bitwidth to represent the quantization bit number set,
Bitwidth={2,4,6,8}。
5. the method for setting quantization bit numbers of a neural network model according to claim 1, wherein: when the model related memory limit is acquired, in the model reasoning process, reasoning related memory occupation comprises reasoning frame memory occupation, code data memory occupation, model weight memory occupation and activation value memory occupation, wherein the model related memory occupation mainly comprises model weight memory occupation and activation value memory occupation; assume that the hardware memory size is m total The memory outside the related memory occupation of the model is m other Remaining memory size m model I.e. model dependent memory limitations.
6. The method for setting quantization bit numbers of a neural network model according to claim 1, wherein: when defining the target neural network model, the neural network model is a mainstream neural network model including, but not limited to ResNet, mobileNet, SSD.
7. The method for setting quantization bit numbers of a neural network model according to claim 1, wherein: when calculating quantization losses for different number of quantization bits of the activation value, each sample in the data set is calculated for the lossThe method comprises the steps of (1) reasoning data by using a pre-quantization model, obtaining all activation value data, calculating quantization losses of different quantization digits of each activation value by adopting all quantization digits for all activation value data, summarizing the quantization losses of different quantization digits of each activation value after all sample data reasoning is completed, and calculating a quantization Loss set Loss of different quantization digits of each activation value of all sample data by adopting a mean value mode and the like; the formula is as follows, wherein loss ab Represents quantization Loss of the activation value data a using the quantization bit number b, loss= { Loss ab },a∈Activations,b∈
Bitwidth。
8. The method for setting quantization bit numbers of a neural network model according to claim 1, wherein: when analyzing the model reasoning step and the activation values in the memory at all times, in the model reasoning process, the model weight is always stored in the memory, and the activation values which are not used any more can be destroyed in time, so that the occupied memory space is released; let the actions represent all the activation values of the model, activation_set t Representing the set of activation values in memory at time t, and therefore eitherAre all subsets of actions, but +.>The elements of (a) are repeated; gradually reasoning to obtain the Activation value set Activation in the memory at all moments sets The formula is as follows> Where T represents the set of all moments in the reasoning process.
9. According to claim 1The method for setting the quantization bit number of the neural network model is characterized by comprising the following steps of: when the optimization problem is constructed and solved by adopting an integer linear programming method, the optimization target is made to be minimum in total loss, the constraint condition is that the memory occupied by the activation value set at each moment is smaller than the memory limit of the activation value, and the memory occupied by the model weight can be calculated in advance and is assumed to be m weight Let m c Representing an activation value memory constraint, m is calculated using the following formula c
m c =m model -m weight
Let total_loss denote total loss, m t Representing the memory size, m occupied by the time t activation value set a Representing the memory size and size occupied by the activation value a a The number of elements representing the activation value a, b a Quantization bit number, loss, representing activation value a aba From the set of Loss,
min total_loss
m a =size a *b a
the optimization problem is an integer linear programming problem, and the optimization variable is the quantization bit number b of each activation value a And solving by adopting an integer linear programming method to obtain the quantization bit number of each activation value.
10. The method for setting quantization bit numbers of a neural network model according to claim 1, wherein: during model quantization and reasoning, model quantization is performed by using the quantization bit number obtained by optimization, and model reasoning is further performed.
CN202310460434.9A 2023-04-23 2023-04-23 Method for setting quantization bit number of neural network model Pending CN116611470A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202310460434.9A CN116611470A (en) 2023-04-23 2023-04-23 Method for setting quantization bit number of neural network model
PCT/CN2023/100668 WO2024221573A1 (en) 2023-04-23 2023-06-16 Method for setting quantization bit of neural network model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310460434.9A CN116611470A (en) 2023-04-23 2023-04-23 Method for setting quantization bit number of neural network model

Publications (1)

Publication Number Publication Date
CN116611470A true CN116611470A (en) 2023-08-18

Family

ID=87675535

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310460434.9A Pending CN116611470A (en) 2023-04-23 2023-04-23 Method for setting quantization bit number of neural network model

Country Status (2)

Country Link
CN (1) CN116611470A (en)
WO (1) WO2024221573A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118331734A (en) * 2024-04-22 2024-07-12 上海交通大学 Large language model memory scheduling management method, system and storage medium
WO2025222637A1 (en) * 2024-04-23 2025-10-30 山东浪潮科学研究院有限公司 Mixture-of-experts model quantization method and apparatus, and device and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110799994A (en) * 2017-08-14 2020-02-14 美的集团股份有限公司 Adaptive Bit Width Reduction for Neural Networks
WO2021012148A1 (en) * 2019-07-22 2021-01-28 深圳市大疆创新科技有限公司 Data processing method and apparatus based on deep neural network, and mobile device
CN113554097A (en) * 2021-07-26 2021-10-26 北京市商汤科技开发有限公司 Model quantization method and device, electronic equipment and storage medium
KR20210144534A (en) * 2020-05-22 2021-11-30 삼성전자주식회사 Neural network based training method, inference mtethod and apparatus
CN114692814A (en) * 2020-12-31 2022-07-01 合肥君正科技有限公司 Quantification method for optimizing neural network model activation
KR20220125112A (en) * 2021-03-04 2022-09-14 삼성전자주식회사 Neural network computation method and apparatus using quantization
CN115983349A (en) * 2023-02-10 2023-04-18 哲库科技(上海)有限公司 Method and device for quantizing convolutional neural network, electronic device and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111950715A (en) * 2020-08-24 2020-11-17 云知声智能科技股份有限公司 8-bit integer full-quantization inference method and device based on self-adaptive dynamic shift
CN114021691A (en) * 2021-10-13 2022-02-08 山东浪潮科学研究院有限公司 Neural network model quantification method, system, device and computer readable medium
CN114841339A (en) * 2022-04-18 2022-08-02 美的集团(上海)有限公司 Network model quantification method and device, electronic equipment and storage medium
CN115879525A (en) * 2022-12-01 2023-03-31 Oppo(重庆)智能科技有限公司 Neural network model quantification method and device, storage medium and electronic device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110799994A (en) * 2017-08-14 2020-02-14 美的集团股份有限公司 Adaptive Bit Width Reduction for Neural Networks
WO2021012148A1 (en) * 2019-07-22 2021-01-28 深圳市大疆创新科技有限公司 Data processing method and apparatus based on deep neural network, and mobile device
KR20210144534A (en) * 2020-05-22 2021-11-30 삼성전자주식회사 Neural network based training method, inference mtethod and apparatus
CN114692814A (en) * 2020-12-31 2022-07-01 合肥君正科技有限公司 Quantification method for optimizing neural network model activation
KR20220125112A (en) * 2021-03-04 2022-09-14 삼성전자주식회사 Neural network computation method and apparatus using quantization
CN113554097A (en) * 2021-07-26 2021-10-26 北京市商汤科技开发有限公司 Model quantization method and device, electronic equipment and storage medium
CN115983349A (en) * 2023-02-10 2023-04-18 哲库科技(上海)有限公司 Method and device for quantizing convolutional neural network, electronic device and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
EFSTATHIA SOUFLERI: "Network Compression via Mixed Precision Quantization Using a Multi-Layer Perceptron for the Bit-Width Allocation", IEEE ACCESS, 29 September 2021 (2021-09-29), pages 135059, XP011881165, DOI: 10.1109/ACCESS.2021.3116418 *
尹文枫;梁玲燕;彭慧民;曹其春;赵健;董刚;赵雅倩;赵坤;: "卷积神经网络压缩与加速技术研究进展", 计算机系统应用, no. 09, 15 September 2020 (2020-09-15), pages 20 - 29 *
张明成: "《数学建模方法及应用》", 31 March 2020, 山东人民出版社, pages: 46 - 50 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118331734A (en) * 2024-04-22 2024-07-12 上海交通大学 Large language model memory scheduling management method, system and storage medium
WO2025222637A1 (en) * 2024-04-23 2025-10-30 山东浪潮科学研究院有限公司 Mixture-of-experts model quantization method and apparatus, and device and storage medium

Also Published As

Publication number Publication date
WO2024221573A1 (en) 2024-10-31

Similar Documents

Publication Publication Date Title
KR102728799B1 (en) Method and apparatus of artificial neural network quantization
KR102782965B1 (en) Method and apparatus of artificial neural network quantization
CN115392477B (en) A Deep Learning-Based Method and Apparatus for Estimating Cardinality of Skyline Queries
CN116611470A (en) Method for setting quantization bit number of neural network model
Lu et al. One proxy device is enough for hardware-aware neural architecture search
CN114781650B (en) Data processing method, device, equipment and storage medium
CN108805257A (en) A kind of neural network quantization method based on parameter norm
CN110874625B (en) A data processing method and device
CN113537370A (en) Cloud computing-based financial data processing method and system
CN114168318B (en) Storage release model training method, storage release method and device
CN114661665B (en) Execution engine determination method, model training method and device
CN110503182A (en) Network layer operation method and device in deep neural network
CN118297121A (en) A hybrid expert model quantization method, device, equipment and storage medium
CN117033391A (en) Database indexing method, device, server and medium
CN119884672B (en) Watershed water quality prediction method, device, equipment and medium
CN112257958A (en) Power saturation load prediction method and device
CN116187387A (en) Neural network model quantification method, device, computer equipment and storage medium
CN116090571B (en) Quantum linear solution method, device and medium based on generalized minimal residual
CN115862653A (en) Audio denoising method, device, computer equipment and storage medium
CN116611494A (en) Training method and device for electric power defect detection model, computer equipment and medium
CN109766993B (en) A Convolutional Neural Network Compression Method Suitable for Hardware
CN116738127A (en) A method, device, medium and electronic device for solving differential equations
CN120449946B (en) Self-adaptive hybrid precision quantization network generation method based on self-learning
CN116263883B (en) Quantum linear solving method, device and equipment based on polynomial preprocessor
US20240361988A1 (en) Optimizing method and computing system for deep learning network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination