CN116611470A - Method for setting quantization bit number of neural network model - Google Patents
Method for setting quantization bit number of neural network model Download PDFInfo
- Publication number
- CN116611470A CN116611470A CN202310460434.9A CN202310460434A CN116611470A CN 116611470 A CN116611470 A CN 116611470A CN 202310460434 A CN202310460434 A CN 202310460434A CN 116611470 A CN116611470 A CN 116611470A
- Authority
- CN
- China
- Prior art keywords
- quantization
- model
- memory
- loss
- activation value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Complex Calculations (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
本发明涉及模型量化技术领域,具体为一种神经网络模型量化位数设置方法,包括以下步骤:准备包含少量数据的数据集;定义量化损失评估指标;定义模型量化位数;获取模型相关内存限制;定义目标神经网络模型;计算激活值不同量化位数的量化损失;分析模型推理步骤及各时刻内存中激活值;有益效果为:本发明提出的神经网络模型量化位数设置方法解决在内存限制条件下的激活值量化位数选择问题。首先分析激活值在不同位宽下的精度损失,其次分析不同时刻在内存中的激活值集合,构建一个优化问题,在限制所有时刻激活值内存占用的条件下,通过整数线性规划方法,优化得到最优的精度损失。
The present invention relates to the technical field of model quantization, specifically a method for setting quantization digits of a neural network model, comprising the following steps: preparing a data set containing a small amount of data; defining quantization loss evaluation indicators; defining model quantization digits; obtaining model-related memory limits ; define the target neural network model; calculate the quantization loss of different quantization digits of the activation value; analyze the model reasoning step and the activation value in the memory at each moment; the beneficial effect is: the neural network model quantization digit setting method proposed by the present invention solves the problem of memory limitation Conditional activation value quantization digit selection problem. Firstly, analyze the precision loss of activation values under different bit widths, and then analyze the set of activation values in memory at different times to construct an optimization problem. Under the condition of limiting the memory usage of activation values at all times, through integer linear programming method, optimize to get Optimal precision loss.
Description
技术领域technical field
本发明涉及模型量化技术领域,具体为一种神经网络模型量化位数设置方法。The invention relates to the technical field of model quantization, in particular to a method for setting quantization digits of a neural network model.
背景技术Background technique
随着深度学习相关技术不断发展,神经网络模型在很多行业和场景得到广泛应用。但是深度学习模型参数量和计算量大,对硬件资源要求较高,往往与硬件资源的局限性形成冲突。With the continuous development of deep learning related technologies, neural network models have been widely used in many industries and scenarios. However, the deep learning model has a large amount of parameters and calculations, and has high requirements for hardware resources, which often conflict with the limitations of hardware resources.
现有技术中,物联网领域有大量的嵌入式设备,由于设备硬件资源极为有限,难以部署深度学习模型。另外,深度学习模型越来越大,即便是在服务器或云端,部署一些大模型的压力也逐步凸显。其中,内存限制是制约模型部署的关键因素,决定了是否能将模型部署到设备上。In the existing technology, there are a large number of embedded devices in the Internet of Things field. Due to the extremely limited hardware resources of the devices, it is difficult to deploy deep learning models. In addition, the deep learning model is getting bigger and bigger, even on the server or cloud, the pressure to deploy some large models is gradually highlighted. Among them, memory limitation is a key factor restricting model deployment, which determines whether the model can be deployed to the device.
但是,在深度学习模型推理过程中,一般是激活值占据大部分内存。模型量化是用来降低模型内存占用的有效方法,尤其是多精度模型量化。但是,相比于传统的量化方法,多精度量化往往导致模型精度损失较大,如何为不同的权重和激活值选择不同的量化位数,是影响模型精度损失的关键。目前,有一些方法是针对模型权重的多精度量化,没有涉及激活值位数选择。而有一些针对激活值位数选择的,又没有考虑模型精度下降的目标。However, during the inference process of deep learning models, it is generally the activation values that occupy most of the memory. Model quantization is an effective method to reduce the memory usage of models, especially multi-precision model quantization. However, compared with traditional quantization methods, multi-precision quantization often leads to a greater loss of model accuracy. How to choose different quantization digits for different weights and activation values is the key to affecting the loss of model accuracy. At present, there are some methods for multi-precision quantization of model weights, which do not involve the selection of activation value digits. However, some are selected for the number of activation values, and do not consider the goal of reducing the accuracy of the model.
发明内容Contents of the invention
本发明的目的在于提供一种神经网络模型量化位数设置方法,以解决上述背景技术中提出的问题。The purpose of the present invention is to provide a method for setting the number of quantization digits of a neural network model to solve the problems raised in the above-mentioned background technology.
为实现上述目的,本发明提供如下技术方案:一种神经网络模型量化位数设置方法,所述方法包括以下步骤:In order to achieve the above object, the present invention provides the following technical solutions: a method for setting the number of quantized digits of a neural network model, said method comprising the following steps:
准备包含少量数据的数据集;Prepare datasets containing small amounts of data;
定义量化损失评估指标;Define quantitative loss assessment indicators;
定义模型量化位数;Define the number of quantized digits of the model;
获取模型相关内存限制;Get the memory limit related to the model;
定义目标神经网络模型;Define the target neural network model;
计算激活值不同量化位数的量化损失;Calculate the quantization loss of different quantization digits of the activation value;
分析模型推理步骤及各时刻内存中激活值;Analyze model reasoning steps and activation values in memory at each moment;
优化问题构建,并采用整数线性规划方法求解;The optimization problem is constructed and solved using the integer linear programming method;
模型量化和推理。Model quantization and inference.
优选的,准备包含少量数据的数据集时,据集作为校准数据集以及量化损失计算数据集。Preferably, when preparing a data set containing a small amount of data, the data set is used as a calibration data set and a quantization loss calculation data set.
优选的,定义量化损失评估指标时,计算非量化激活值数据以及量化激活值数据的差异,差异采用但不限于均方误差、余弦相似度等计算,作为量化损失。Preferably, when defining the quantitative loss evaluation index, the difference between the non-quantized activation value data and the quantized activation value data is calculated, and the difference is calculated by but not limited to mean square error, cosine similarity, etc., as the quantized loss.
优选的,定义模型量化位数时,基于硬件支持情况及模型精度要求,模型量化位数是但不限于2,4,6,8位数,使用Bitwidth表示量化位数集合,Preferably, when defining model quantization digits, based on hardware support and model accuracy requirements, model quantization digits are but not limited to 2, 4, 6, and 8 digits, and Bitwidth is used to represent the set of quantization digits.
Bitwidth={2,4,6,8}。Bitwidth = {2, 4, 6, 8}.
优选的,获取模型相关内存限制时,模型推理过程中,推理相关内存占用包括推理框架内存占用、代码数据内存占用、模型权重内存占用以及激活值内存占用,其中模型相关内存占用主要包括模型权重内存占用以及激活值内存占用;假设硬件内存大小为mtotal,模型相关内存占用外的内存为mother,剩余内存大小mmodel,即模型相关内存限制。Preferably, when obtaining the model-related memory limit, during the model inference process, the inference-related memory usage includes the inference framework memory usage, code data memory usage, model weight memory usage, and activation value memory usage, where the model-related memory usage mainly includes model weight memory Occupancy and activation value memory occupation; suppose the hardware memory size is m total , the memory other than the model-related memory occupation is m other , and the remaining memory size is m model , which is the model-related memory limit.
优选的,定义目标神经网络模型时,神经网络模型为包括但不限于ResNet、MobileNet、SSD的主流神经网络模型。Preferably, when defining the target neural network model, the neural network model is a mainstream neural network model including but not limited to ResNet, MobileNet, SSD.
优选的,计算激活值不同量化位数的量化损失时,针对损失计算数据集中的每个样本数据,使用量化前模型进行推理,并获取所有的激活值数据,针对所有激活值数据,采用所有的量化位数,计算各激活值的不同量化位数的量化损失,完成所有样本数据推理后,汇总各激活值不同量化位数的量化损失,并采用但不限于均值等方式,计算所有样本数据的各激活值的不同量化位数的量化损失集合Loss;公式如下,其中lossab表示激活值数据a使用量化位数b的量化损失,Loss={lossab},a∈Activations,b∈Bitwidth。Preferably, when calculating the quantization loss of different quantization digits of the activation value, for each sample data in the loss calculation data set, use the pre-quantization model for reasoning, and obtain all the activation value data, and for all the activation value data, use all Quantization digits, calculate the quantization loss of different quantization digits of each activation value, after completing the inference of all sample data, summarize the quantization losses of different quantization digits of each activation value, and use but not limited to the mean value to calculate the quantization loss of all sample data The quantization loss set Loss of different quantization bits of each activation value; the formula is as follows, where loss ab represents the quantization loss of activation value data a using quantization bits b, Loss={loss ab }, a∈Activations,b∈Bitwidth.
优选的,分析模型推理步骤及各时刻内存中激活值时,在模型推理过程中,模型权重一直保存在内存中,而不再使用的激活值会及时销毁,释放占用的内存空间;令Activations表示模型所有的激活值,activation_sett表示t时刻在内存中的激活值集合,因此任一activationsett都是Activations的子集,而不同时刻的中的元素会重复;逐步推理,得到所有时刻内存中激活值集合Activationsets,公式如下,其中T表示推理过程中的所有时刻的集合。Preferably, when analyzing the model reasoning steps and the activation values in the memory at each moment, during the model reasoning process, the model weights are always stored in the memory, and the activation values that are no longer used will be destroyed in time to release the occupied memory space; let Activations represent All the activation values of the model, activation_set t represents the set of activation values in memory at time t, so any activation sett is a subset of Activations, and at different times The elements in will be repeated; through step-by-step reasoning, the activation value set Activation sets in the memory at all times can be obtained, the formula is as follows, where T represents the set of all moments in the inference process.
优选的,优化问题构建,并采用整数线性规划方法求解时,令优化目标为总量化损失最小,约束条件为每个时刻的激活值集合占用的内存小于激活值内存限制,由于模型权重占用内存可以提前计算,假设为mweight,令mc表示激活值内存限制,使用如下公式计算mc,Preferably, when the optimization problem is constructed and solved using the integer linear programming method, the optimization objective is to minimize the total quantization loss, and the constraint condition is that the memory occupied by the activation value set at each moment is less than the activation value memory limit. Since the model weight occupies memory It can be calculated in advance, assuming m weight , let m c represent the activation value memory limit, use the following formula to calculate m c ,
mc=mmodel-mweight m c = m model - m weight
令total_loss表示总量化损失,mt表示时刻t激活值集合占用的内存大小,ma表示激活值a占用的内存大小,sizea表示激活值a的元素个数,ba表示激活值a的量化位数,从Loss集合得到,Let total_loss represent the total quantization loss, m t represents the memory size occupied by the activation value set at time t, m a represents the memory size occupied by the activation value a, size a represents the number of elements of the activation value a, b a represents the activation value a quantization bits, Obtained from the Loss collection,
min total_lossmin total_loss
ma=sizea*ba m a = size a *b a
优化问题是一个整数线性规划问题,优化变量是每个激活值的量化位数ba,采用整数线性规划方法求解,得到每个激活值的量化位数。The optimization problem is an integer linear programming problem, and the optimization variable is the quantization bit b a of each activation value, which is solved by using the integer linear programming method to obtain the quantization bit number of each activation value.
优选的,模型量化和推理时,使用优化得到的量化位数,进行模型量化,并进而进行模型推理。Preferably, during model quantization and inference, the optimized quantization bits are used to perform model quantization and then perform model inference.
与现有技术相比,本发明的有益效果是:Compared with prior art, the beneficial effect of the present invention is:
本发明提出的神经网络模型量化位数设置方法解决在内存限制条件下的激活值量化位数选择问题。首先分析激活值在不同位宽下的精度损失,其次分析不同时刻在内存中的激活值集合,构建一个优化问题,在限制所有时刻激活值内存占用的条件下,通过整数线性规划方法,优化得到最优的精度损失。通过上述方式,可以在硬件内存限制条件下,选择最优的激活值量化位数,使得精度损失最小,具有较高的实用价值和创新价值。The method for setting the quantization digits of the neural network model proposed by the invention solves the problem of selecting the quantization digits of the activation value under the condition of memory limitation. Firstly, analyze the precision loss of activation values under different bit widths, and then analyze the set of activation values in memory at different times to construct an optimization problem. Under the condition of limiting the memory usage of activation values at all times, through integer linear programming method, optimize to get Optimal precision loss. Through the above method, the optimal activation value quantization bit can be selected under the condition of hardware memory limitation, so that the precision loss is minimized, and it has high practical value and innovative value.
附图说明Description of drawings
图1为本发明方法流程图。Fig. 1 is a flow chart of the method of the present invention.
具体实施方式Detailed ways
为了使本发明的目的、技术方案进行清楚、完整地描述,及优点更加清楚明白,以下结合附图对本发明实施例进行进一步详细说明。应当理解,此处所描述的具体实施例是本发明一部分实施例,而不是全部的实施例,仅仅用以解释本发明实施例,并不用于限定本发明实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to clearly and completely describe the purpose, technical solution, and advantages of the present invention, the embodiments of the present invention will be further described in detail below in conjunction with the accompanying drawings. It should be understood that the specific embodiments described here are part of the embodiments of the present invention, rather than all embodiments, and are only used to explain the embodiments of the present invention, and are not intended to limit the embodiments of the present invention. All other embodiments obtained under the premise of creative labor belong to the protection scope of the present invention.
请参阅图1,本发明提供一种技术方案:一种神经网络模型量化位数设置方法,所述方法包括以下步骤:Please refer to Fig. 1, the present invention provides a kind of technical scheme: a kind of quantization digit setting method of neural network model, described method comprises the following steps:
首先,准备包含少量数据的数据集。该数据集作为校准数据集以及量化损失计算数据集。First, prepare a dataset containing a small amount of data. This dataset serves as a calibration dataset as well as a quantization loss computation dataset.
其次,定义量化损失评估指标。量化损失计算是计算非量化激活值数据以及量化激活值数据的差异,该差异可以采用但不限于均方误差、余弦相似度等计算,作为量化损失。非量化激活值数据是指采用浮点数的权重和输入(上一个激活值数据)计算相应算子,如卷积、全连接等,计算结果即非量化激活值数据。量化激活值数据是指分别对浮点数的权重和输入数据量化,使用量化后数据计算相应算子,并将计算后的结果进行反量化,得到量化激活值数据。Second, define quantitative loss evaluation metrics. Quantized loss calculation is to calculate the difference between non-quantized activation value data and quantized activation value data. The difference can be calculated by using but not limited to mean square error, cosine similarity, etc., as quantized loss. Non-quantized activation value data refers to the use of floating-point weights and inputs (previous activation value data) to calculate corresponding operators, such as convolution, full connection, etc., and the calculation result is non-quantized activation value data. Quantizing the activation value data refers to quantizing the weight of the floating-point number and the input data respectively, using the quantized data to calculate the corresponding operator, and dequantizing the calculated result to obtain the quantized activation value data.
第三,定义模型量化位数。基于硬件支持情况及模型精度要求,模型量化位数可以是但不限于2,4,6,8等位数,使用Bitwidth表示量化位数集合。Third, define the model quantization digits. Based on hardware support and model accuracy requirements, the model quantization digits can be but not limited to 2, 4, 6, and 8 digits, and Bitwidth is used to represent the set of quantization digits.
Bitwidth={2,4,6,8}Bitwidth={2, 4, 6, 8}
第四,获取模型相关内存限制。模型推理过程中,推理相关内存占用包括推理框架内存占用、代码数据内存占用、模型权重内存占用以及激活值内存占用等,其中模型相关内存占用主要包括模型权重内存占用以及激活值内存占用。假设硬件内存大小为mtotal,模型相关内存占用外的内存为mother,剩余内存大小mmodel,即模型相关内存限制。Fourth, get the model-related memory limit. During model inference, inference-related memory usage includes inference framework memory usage, code data memory usage, model weight memory usage, and activation value memory usage, among which model-related memory usage mainly includes model weight memory usage and activation value memory usage. Assume that the hardware memory size is m total , the memory other than the model-related memory occupation is m other , and the remaining memory size is m model , that is, the model-related memory limit.
mmodel=mtotal-mother m model = m total - m other
第五,定义目标神经网络模型。神经网络模型为包括但不限于ResNet、MobileNet、SSD等主流神经网络模型。Fifth, define the target neural network model. The neural network model includes but is not limited to ResNet, MobileNet, SSD and other mainstream neural network models.
第六,计算激活值不同量化位数的量化损失。针对损失计算数据集中的每个样本数据,使用量化前模型进行推理,并获取所有的激活值数据。同时,针对所有激活值数据,采用所有的量化位数,计算各激活值的不同量化位数的量化损失。完成所有样本数据推理后,汇总各激活值不同量化位数的量化损失,并采用但不限于均值等方式,计算所有样本数据的各激活值的不同量化位数的量化损失集合Loss。公式如下,其中lossab表示激活值数据a使用量化位数b的量化损失。Sixth, calculate the quantization loss of different quantization digits of the activation value. For each sample data in the loss calculation dataset, use the pre-quantization model for inference, and obtain all activation value data. At the same time, for all activation value data, all quantization bits are used to calculate the quantization loss of different quantization bits for each activation value. After the inference of all sample data is completed, the quantization losses of different quantization digits of each activation value are summarized, and the quantization loss set Loss of different quantization digits of each activation value of all sample data is calculated by means of but not limited to the mean value. The formula is as follows, where loss ab represents the quantization loss of activation value data a using quantization bits b.
Loss={lossab},a∈Activations,b∈BitwidthLoss={loss ab },a∈Activations,b∈Bitwidth
第七,分析模型推理步骤及各时刻内存中激活值。在模型推理过程中,模型权重一直保存在内存中,而不再使用的激活值会及时销毁,释放占用的内存空间。令Activations表示模型所有的激活值,activation_sett表示t时刻在内存中的激活值集合,因此任一都是Activations的子集,而不同时刻的/>中的元素可能会重复。逐步推理,得到所有时刻内存中激活值集合Activationsets,公式如下,其中T表示推理过程中的所有时刻的集合。Seventh, analyze the model reasoning steps and the activation values in memory at each moment. During the model inference process, the model weights are always stored in memory, and the activation values that are no longer used will be destroyed in time to release the occupied memory space. Let Activations represent all the activation values of the model, and activation_set t represents the set of activation values in memory at time t, so any They are all subsets of Activations, but at different times /> Elements in may be repeated. Step by step reasoning, the activation value set Activation sets in the memory at all moments is obtained, the formula is as follows, where T represents the set of all moments in the reasoning process.
第八,优化问题构建,并采用整数线性规划方法求解。令优化目标为总量化损失最小,约束条件为每个时刻的激活值集合占用的内存小于激活值内存限制。由于模型权重占用内存可以提前计算,假设为mweight,令mc表示激活值内存限制,使用如下公式计算mc。Eighth, the optimization problem is constructed and solved using the integer linear programming method. The optimization goal is to minimize the total quantization loss, and the constraint condition is that the memory occupied by the activation value set at each moment is less than the activation value memory limit. Since the memory occupied by the model weight can be calculated in advance, assuming it is m weight , let m c represent the memory limit of the activation value, and use the following formula to calculate m c .
mc=mmodel-mweight m c = m model - m weight
令total_loss表示总量化损失,mt表示时刻t激活值集合占用的内存大小,ma表示激活值a占用的内存大小,sizea表示激活值a的元素个数,ba表示激活值a的量化位数。可从Loss集合得到。Let total_loss represent the total quantization loss, m t represents the memory size occupied by the activation value set at time t, m a represents the memory size occupied by the activation value a, size a represents the number of elements of the activation value a, b a represents the activation value a quantization bits. Available from the Loss collection.
min total_lossmin total_loss
ma=sizea*ba m a = size a *b a
上述优化问题是一个整数线性规划问题,优化变量是每个激活值的量化位数ba,可以采用整数线性规划方法求解,得到每个激活值的量化位数。The above optimization problem is an integer linear programming problem, and the optimization variable is the quantization digit b a of each activation value, which can be solved by an integer linear programming method to obtain the quantization digit of each activation value.
第九,模型量化和推理。使用优化得到的量化位数,进行模型量化,并进而进行模型推理。Ninth, model quantification and inference. Use the optimized quantization digits to perform model quantization and then perform model inference.
尽管已经示出和描述了本发明的实施例,对于本领域的普通技术人员而言,可以理解在不脱离本发明的原理和精神的情况下可以对这些实施例进行多种变化、修改、替换和变型,本发明的范围由所附权利要求及其等同物限定。Although the embodiments of the present invention have been shown and described, those skilled in the art can understand that various changes, modifications and substitutions can be made to these embodiments without departing from the principle and spirit of the present invention. and modifications, the scope of the invention is defined by the appended claims and their equivalents.
Claims (10)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310460434.9A CN116611470A (en) | 2023-04-23 | 2023-04-23 | Method for setting quantization bit number of neural network model |
| PCT/CN2023/100668 WO2024221573A1 (en) | 2023-04-23 | 2023-06-16 | Method for setting quantization bit of neural network model |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310460434.9A CN116611470A (en) | 2023-04-23 | 2023-04-23 | Method for setting quantization bit number of neural network model |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN116611470A true CN116611470A (en) | 2023-08-18 |
Family
ID=87675535
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202310460434.9A Pending CN116611470A (en) | 2023-04-23 | 2023-04-23 | Method for setting quantization bit number of neural network model |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN116611470A (en) |
| WO (1) | WO2024221573A1 (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118331734A (en) * | 2024-04-22 | 2024-07-12 | 上海交通大学 | Large language model memory scheduling management method, system and storage medium |
| WO2025222637A1 (en) * | 2024-04-23 | 2025-10-30 | 山东浪潮科学研究院有限公司 | Mixture-of-experts model quantization method and apparatus, and device and storage medium |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110799994A (en) * | 2017-08-14 | 2020-02-14 | 美的集团股份有限公司 | Adaptive Bit Width Reduction for Neural Networks |
| WO2021012148A1 (en) * | 2019-07-22 | 2021-01-28 | 深圳市大疆创新科技有限公司 | Data processing method and apparatus based on deep neural network, and mobile device |
| CN113554097A (en) * | 2021-07-26 | 2021-10-26 | 北京市商汤科技开发有限公司 | Model quantization method and device, electronic equipment and storage medium |
| KR20210144534A (en) * | 2020-05-22 | 2021-11-30 | 삼성전자주식회사 | Neural network based training method, inference mtethod and apparatus |
| CN114692814A (en) * | 2020-12-31 | 2022-07-01 | 合肥君正科技有限公司 | Quantification method for optimizing neural network model activation |
| KR20220125112A (en) * | 2021-03-04 | 2022-09-14 | 삼성전자주식회사 | Neural network computation method and apparatus using quantization |
| CN115983349A (en) * | 2023-02-10 | 2023-04-18 | 哲库科技(上海)有限公司 | Method and device for quantizing convolutional neural network, electronic device and storage medium |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111950715A (en) * | 2020-08-24 | 2020-11-17 | 云知声智能科技股份有限公司 | 8-bit integer full-quantization inference method and device based on self-adaptive dynamic shift |
| CN114021691A (en) * | 2021-10-13 | 2022-02-08 | 山东浪潮科学研究院有限公司 | Neural network model quantification method, system, device and computer readable medium |
| CN114841339A (en) * | 2022-04-18 | 2022-08-02 | 美的集团(上海)有限公司 | Network model quantification method and device, electronic equipment and storage medium |
| CN115879525A (en) * | 2022-12-01 | 2023-03-31 | Oppo(重庆)智能科技有限公司 | Neural network model quantification method and device, storage medium and electronic device |
-
2023
- 2023-04-23 CN CN202310460434.9A patent/CN116611470A/en active Pending
- 2023-06-16 WO PCT/CN2023/100668 patent/WO2024221573A1/en active Pending
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110799994A (en) * | 2017-08-14 | 2020-02-14 | 美的集团股份有限公司 | Adaptive Bit Width Reduction for Neural Networks |
| WO2021012148A1 (en) * | 2019-07-22 | 2021-01-28 | 深圳市大疆创新科技有限公司 | Data processing method and apparatus based on deep neural network, and mobile device |
| KR20210144534A (en) * | 2020-05-22 | 2021-11-30 | 삼성전자주식회사 | Neural network based training method, inference mtethod and apparatus |
| CN114692814A (en) * | 2020-12-31 | 2022-07-01 | 合肥君正科技有限公司 | Quantification method for optimizing neural network model activation |
| KR20220125112A (en) * | 2021-03-04 | 2022-09-14 | 삼성전자주식회사 | Neural network computation method and apparatus using quantization |
| CN113554097A (en) * | 2021-07-26 | 2021-10-26 | 北京市商汤科技开发有限公司 | Model quantization method and device, electronic equipment and storage medium |
| CN115983349A (en) * | 2023-02-10 | 2023-04-18 | 哲库科技(上海)有限公司 | Method and device for quantizing convolutional neural network, electronic device and storage medium |
Non-Patent Citations (3)
| Title |
|---|
| EFSTATHIA SOUFLERI: "Network Compression via Mixed Precision Quantization Using a Multi-Layer Perceptron for the Bit-Width Allocation", IEEE ACCESS, 29 September 2021 (2021-09-29), pages 135059, XP011881165, DOI: 10.1109/ACCESS.2021.3116418 * |
| 尹文枫;梁玲燕;彭慧民;曹其春;赵健;董刚;赵雅倩;赵坤;: "卷积神经网络压缩与加速技术研究进展", 计算机系统应用, no. 09, 15 September 2020 (2020-09-15), pages 20 - 29 * |
| 张明成: "《数学建模方法及应用》", 31 March 2020, 山东人民出版社, pages: 46 - 50 * |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118331734A (en) * | 2024-04-22 | 2024-07-12 | 上海交通大学 | Large language model memory scheduling management method, system and storage medium |
| WO2025222637A1 (en) * | 2024-04-23 | 2025-10-30 | 山东浪潮科学研究院有限公司 | Mixture-of-experts model quantization method and apparatus, and device and storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2024221573A1 (en) | 2024-10-31 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| KR102728799B1 (en) | Method and apparatus of artificial neural network quantization | |
| KR102782965B1 (en) | Method and apparatus of artificial neural network quantization | |
| CN115392477B (en) | A Deep Learning-Based Method and Apparatus for Estimating Cardinality of Skyline Queries | |
| CN116611470A (en) | Method for setting quantization bit number of neural network model | |
| Lu et al. | One proxy device is enough for hardware-aware neural architecture search | |
| CN114781650B (en) | Data processing method, device, equipment and storage medium | |
| CN108805257A (en) | A kind of neural network quantization method based on parameter norm | |
| CN110874625B (en) | A data processing method and device | |
| CN113537370A (en) | Cloud computing-based financial data processing method and system | |
| CN114168318B (en) | Storage release model training method, storage release method and device | |
| CN114661665B (en) | Execution engine determination method, model training method and device | |
| CN110503182A (en) | Network layer operation method and device in deep neural network | |
| CN118297121A (en) | A hybrid expert model quantization method, device, equipment and storage medium | |
| CN117033391A (en) | Database indexing method, device, server and medium | |
| CN119884672B (en) | Watershed water quality prediction method, device, equipment and medium | |
| CN112257958A (en) | Power saturation load prediction method and device | |
| CN116187387A (en) | Neural network model quantification method, device, computer equipment and storage medium | |
| CN116090571B (en) | Quantum linear solution method, device and medium based on generalized minimal residual | |
| CN115862653A (en) | Audio denoising method, device, computer equipment and storage medium | |
| CN116611494A (en) | Training method and device for electric power defect detection model, computer equipment and medium | |
| CN109766993B (en) | A Convolutional Neural Network Compression Method Suitable for Hardware | |
| CN116738127A (en) | A method, device, medium and electronic device for solving differential equations | |
| CN120449946B (en) | Self-adaptive hybrid precision quantization network generation method based on self-learning | |
| CN116263883B (en) | Quantum linear solving method, device and equipment based on polynomial preprocessor | |
| US20240361988A1 (en) | Optimizing method and computing system for deep learning network |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |