WO2025105520A1 - Procédé de quantification de modèle d'apprentissage profond léger - Google Patents
Procédé de quantification de modèle d'apprentissage profond léger Download PDFInfo
- Publication number
- WO2025105520A1 WO2025105520A1 PCT/KR2023/018255 KR2023018255W WO2025105520A1 WO 2025105520 A1 WO2025105520 A1 WO 2025105520A1 KR 2023018255 W KR2023018255 W KR 2023018255W WO 2025105520 A1 WO2025105520 A1 WO 2025105520A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- deep learning
- learning model
- quantization
- maximum values
- average
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0495—Quantised networks; Sparse networks; Compressed networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/15—Correlation function computation including computation of convolution operations
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/15—Correlation function computation including computation of convolution operations
- G06F17/153—Multidimensional correlation or convolution
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Definitions
- the present invention relates to a method for quantizing a deep learning model, and more specifically, to a method for updating a quantization scale for a hardware accelerator-based deep learning model installed in a mobile device.
- the present invention has been made to solve the above problems, and an object of the present invention is to provide a lightweight deep learning model quantization method that predicts and updates the quantization scale in the next learning step based on the parameter distribution in the previous learning step, as a quantization method applicable to a hardware accelerator model in a mobile device, which is a resource-limited environment.
- a method for quantizing a deep learning model includes: a step of storing and accumulating maximum values of deep learning operations during training of a deep learning model; a step of calculating an average of the accumulated maximum values; a step of updating a quantization scale with the calculated average; and a step of quantizing a deep learning model based on the updated quantization scale.
- the accumulation step may be to store the maximum values for each batch and accumulate them.
- the calculation step could be to calculate the average of the maximum values stored per epoch.
- the deep learning model quantization method according to the present invention may further include a step of performing learning of the next epoch on the quantized deep learning model.
- Deep learning operations can be convolution operations.
- the update step may update the quantization scale without knowing the distribution of the output feature map by the convolution operation.
- Quantization can be symmetric quantization.
- the accumulation step could be to store the maximum of the absolute values of the deep learning operation values.
- Deep learning models can be deployed on mobile devices.
- a deep learning operation device characterized by including: an operation unit that stores and accumulates maximum values of deep learning operations during training of a deep learning model, calculates an average of the accumulated maximum values, updates a quantization scale with the calculated average, and quantizes a deep learning model based on the updated quantization scale; and a memory that provides storage space required for the operation unit.
- a method for quantizing a deep learning model characterized by including the steps of: updating a quantization scale with an average of maximum values of deep learning operations during training of a deep learning model; quantizing the deep learning model based on the updated quantization scale; and training the quantized deep learning model.
- a deep learning operation device characterized by including: an operation unit that updates a quantization scale by an average of maximum values of deep learning operations during training of a deep learning model, quantizes the deep learning model based on the updated quantization scale, and trains the quantized deep learning model; and a memory that provides storage space required for the operation unit.
- Figure 1 Example of problems occurring in hardware structure when applying quantization learning model in software.
- Figure 4-5 Comparison of maximum and average distributions when applying the current epoch learning scale based on the previous distribution values.
- Figure 6-9. Example of scale update process by epoch and batch unit during convolution operation process.
- Quantization technology for real-time operation is being widely used in various fields of object recognition related to mobile devices through cameras.
- lightweighting is a major challenge in implementing learning models in environments with limited memory, such as NPU hardware accelerator design.
- Figure 1 is a diagram illustrating the general process of quantization learning in software and the problems that arise when the method is applied as is to a hardware accelerator.
- the quantization scale of the input feature map before convolution and the quantization scale of the output feature map after convolution have different values due to changes in the distribution value due to the intermediate operation process.
- a real-time value analysis and comparison process is required to quantize the parameters based on the accurate value distribution.
- Fig. 2 is a diagram illustrating a model update method based on a quantization scale calculation method applicable to an embodiment of the present invention.
- a symmetric quantization structure is applied as the quantization structure.
- negative and positive values can be separately checked to minimize value loss occurring during quantization, but a separate calculation is added for the intermediate value that serves as a reference, so it is not suitable for hardware design.
- the maximum absolute value of the values output after convolution is stored in a buffer.
- the maximum value of the absolute value of the convolution output is updated and maintained during learning within the batch size during learning, and the maximum value is stored as a cumulative sum in a separate buffer at the end of each batch learning.
- the value stored as a cumulative sum is calculated as an average value at the unit epoch point when the dataset learning is completed once (the cumulative sum is divided by the number of batches) and applied as the quantization scale value of the next epoch.
- the same is true for the filter value, which updates the quantization scale on an epoch basis.
- FIG. 3 is a diagram illustrating a quantization scale update method according to one embodiment of the present invention.
- the maximum value of the absolute value of the convolution output value is selected and stored for each batch. Since this is performed for each batch, the maximum values for each batch are accumulated. When training is completed for one epoch, the accumulated maximum values are divided by the number of batches to calculate the average of the maximum values.
- the quantization scale is updated with the next calculated maximum value average, and the weights and activation functions of the deep learning model are quantized based on the updated quantization scale, and learning of the next epoch is performed on the quantized deep learning model.
- quantization scale update is possible without understanding the distribution of the output feature map by convolution.
- FIG. 4 and FIG. 5 show the difference between the distribution of the actual value and the distribution of the quantized scale value when the quantized scale update is performed according to the method suggested in the embodiment of the present invention. It can be confirmed that the scale value based on the average of the accumulated maximum value for each batch (Mean: red below) has a value suitable for the parameter distribution in the current epoch learning even though the feature map distribution is not identified in real time. On the other hand, there is a problem that the scale updated with only the simple maximum value (Prev. Xpoch Abs Max: blue above)) may cause a problem of performance degradation during learning because it is updated based on an excessively large value.
- Prev. Xpoch Abs Max blue above
- the quantization scale update through the average of the maximum values for each batch is impossible to operate in real time like software operation, but it is possible to check the average value for multiple image data and at the same time roughly understand the distribution change according to the overall learning tendency.
- it has the advantage of being able to implement a hardware accelerator because the process of understanding the real-time distribution parameters is excluded.
- Figure 6-9 shows a more detailed process of a quantization scale update method according to an embodiment of the present invention for a convolution operation process.
- a limited memory such as a mobile device
- the batch size that can be learned at one time i.e., the number of image data, cannot be small.
- FIG. 10 is a diagram illustrating a configuration of a mobile deep learning computing device according to another embodiment of the present invention.
- the mobile deep learning computing device according to the embodiment of the present invention is configured to include a communication interface (110), a deep learning computing device (120), and a memory (130).
- the communication interface (110) communicates with an external host system to receive a dataset and parameters of a pre-trained deep learning model.
- the deep learning operator (120) quantizes and trains the loaded deep learning model according to the method presented in FIG. 3 described above.
- the memory (130) provides the storage space required for the deep learning operator (120) to perform the operation.
- a method for quantization learning of a hardware accelerator model in a mobile device which is an environment with limited resources, is presented by predicting and updating the average of the absolute values of the maximum values of deep learning operations per batch in the previous epoch at a quantization scale.
- the technical idea of the present invention can be applied to a computer-readable recording medium storing a computer program that performs the functions of the device and method according to the present embodiment.
- the technical idea according to various embodiments of the present invention can be implemented in the form of a computer-readable code recorded on a computer-readable recording medium.
- the computer-readable recording medium can be any data storage device that can be read by a computer and store data.
- the computer-readable recording medium can be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical disk, a hard disk drive, etc.
- the computer-readable code or program stored on the computer-readable recording medium can be transmitted through a network connected between computers.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Databases & Information Systems (AREA)
- Algebra (AREA)
- Neurology (AREA)
- Image Analysis (AREA)
Abstract
L'invention concerne un procédé de quantification de modèle d'apprentissage profond léger. Un procédé de quantification de modèle d'apprentissage profond selon un mode de réalisation de la présente invention consiste à : stocker et cumuler des valeurs maximales d'opérations d'apprentissage profond pendant l'apprentissage d'un modèle d'apprentissage profond ; calculer une moyenne des valeurs maximales cumulées ; mettre à jour une échelle de quantification avec la moyenne calculée ; et quantifier le modèle d'apprentissage profond sur la base de l'échelle de quantification mise à jour. Par conséquent, une échelle de quantification d'une étape d'apprentissage suivante est prédite et est mise à jour sur la base d'une distribution de paramètres d'une étape d'apprentissage précédente, ce qui permet un apprentissage rapide et à haute performance lors de l'apprentissage de quantification d'un modèle d'accélérateur matériel dans un dispositif mobile qui est un environnement à ressources limitées.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| KR10-2023-0157219 | 2023-11-14 | ||
| KR1020230157219A KR20250070865A (ko) | 2023-11-14 | 2023-11-14 | 경량 딥러닝 모델 양자화 방법 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025105520A1 true WO2025105520A1 (fr) | 2025-05-22 |
Family
ID=95743158
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/KR2023/018255 Pending WO2025105520A1 (fr) | 2023-11-14 | 2023-11-14 | Procédé de quantification de modèle d'apprentissage profond léger |
Country Status (2)
| Country | Link |
|---|---|
| KR (1) | KR20250070865A (fr) |
| WO (1) | WO2025105520A1 (fr) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20200050284A (ko) * | 2018-11-01 | 2020-05-11 | 삼성전자주식회사 | 영상 적응적 양자화 테이블을 이용한 영상의 부호화 장치 및 방법 |
| CN111401518A (zh) * | 2020-03-04 | 2020-07-10 | 杭州嘉楠耘智信息科技有限公司 | 一种神经网络量化方法、装置及计算机可读存储介质 |
| KR20210004306A (ko) * | 2019-07-04 | 2021-01-13 | 삼성전자주식회사 | 뉴럴 네트워크 장치 및 뉴럴 네트워크의 파라미터 양자화 방법 |
| KR20210018352A (ko) * | 2019-06-12 | 2021-02-17 | 상하이 캠브리콘 인포메이션 테크놀로지 컴퍼니 리미티드 | 신경망의 양자화 파라미터 확정방법 및 관련제품 |
| KR20220013946A (ko) * | 2020-05-21 | 2022-02-04 | 상하이 센스타임 인텔리전트 테크놀로지 컴퍼니 리미티드 | 양자화 트레이닝, 이미지 처리 방법 및 장치, 저장 매체 |
-
2023
- 2023-11-14 WO PCT/KR2023/018255 patent/WO2025105520A1/fr active Pending
- 2023-11-14 KR KR1020230157219A patent/KR20250070865A/ko active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20200050284A (ko) * | 2018-11-01 | 2020-05-11 | 삼성전자주식회사 | 영상 적응적 양자화 테이블을 이용한 영상의 부호화 장치 및 방법 |
| KR20210018352A (ko) * | 2019-06-12 | 2021-02-17 | 상하이 캠브리콘 인포메이션 테크놀로지 컴퍼니 리미티드 | 신경망의 양자화 파라미터 확정방법 및 관련제품 |
| KR20210004306A (ko) * | 2019-07-04 | 2021-01-13 | 삼성전자주식회사 | 뉴럴 네트워크 장치 및 뉴럴 네트워크의 파라미터 양자화 방법 |
| CN111401518A (zh) * | 2020-03-04 | 2020-07-10 | 杭州嘉楠耘智信息科技有限公司 | 一种神经网络量化方法、装置及计算机可读存储介质 |
| KR20220013946A (ko) * | 2020-05-21 | 2022-02-04 | 상하이 센스타임 인텔리전트 테크놀로지 컴퍼니 리미티드 | 양자화 트레이닝, 이미지 처리 방법 및 장치, 저장 매체 |
Also Published As
| Publication number | Publication date |
|---|---|
| KR20250070865A (ko) | 2025-05-21 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109766840B (zh) | 人脸表情识别方法、装置、终端及存储介质 | |
| US11556761B2 (en) | Method and device for compressing a neural network model for machine translation and storage medium | |
| CN113128419B (zh) | 一种障碍物识别方法和装置、电子设备及存储介质 | |
| WO2018230832A1 (fr) | Appareil et procédé de traitement d'images utilisant une carte de caractéristiques multicanaux | |
| EP4123513A1 (fr) | Procédé et appareil à virgule fixe pour réseau neuronal | |
| US20210176174A1 (en) | Load balancing device and method for an edge computing network | |
| WO2022080790A1 (fr) | Systèmes et procédés de recherche de quantification à précision mixte automatique | |
| WO2021006650A1 (fr) | Procédé et système de mise en œuvre d'un réseau neuronal à précision variable | |
| WO2022146080A1 (fr) | Algorithme et procédé de modification dynamique de la précision de quantification d'un réseau d'apprentissage profond | |
| WO2021230463A1 (fr) | Procédé d'optimisation d'un modèle de réseau neuronal sur dispositif à l'aide d'un module de recherche de sous-noyau et dispositif l'utilisant | |
| US20240135698A1 (en) | Image classification method, model training method, device, storage medium, and computer program | |
| WO2023003432A1 (fr) | Procédé et dispositif pour déterminer une plage de quantification basée sur un taux de saturation pour la quantification d'un réseau de neurones | |
| CN110381310B (zh) | 一种检测视觉系统的健康状态的方法及装置 | |
| WO2025105520A1 (fr) | Procédé de quantification de modèle d'apprentissage profond léger | |
| WO2023014124A1 (fr) | Procédé et appareil de quantification d'un paramètre de réseau neuronal | |
| CN113344214A (zh) | 数据处理模型的训练方法、装置、电子设备及存储介质 | |
| WO2020091139A1 (fr) | Compression de réseau efficace à l'aide d'un élagage itératif guidé par simulation | |
| WO2025041887A1 (fr) | Procédé d'élagage itératif de réseau neuronal par auto-distillation | |
| CN117809095A (zh) | 一种图像分类方法、装置、设备和计算机可读存储介质 | |
| WO2023177025A1 (fr) | Procédé et appareil pour calculer un réseau neuronal artificiel sur la base d'une quantification de paramètre à l'aide d'une hystérésis | |
| CN111814813A (zh) | 神经网络训练和图像分类方法与装置 | |
| CN117475206A (zh) | 一种事件流的处理方法、装置、终端设备和存储介质 | |
| CN116740134A (zh) | 基于分层注意力策略的图像目标跟踪方法、装置、设备 | |
| WO2023128024A1 (fr) | Procédé et système de quantification de réseau d'apprentissage profond | |
| WO2023080292A1 (fr) | Appareil et procédé pour générer un paramètre adaptatif pour un dispositif d'accélération d'apprentissage profond |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23958947 Country of ref document: EP Kind code of ref document: A1 |