[go: up one dir, main page]

WO2024135861A1 - Deep learning training method applying variable data representation type and mobile device applying same - Google Patents

Deep learning training method applying variable data representation type and mobile device applying same Download PDF

Info

Publication number
WO2024135861A1
WO2024135861A1 PCT/KR2022/020665 KR2022020665W WO2024135861A1 WO 2024135861 A1 WO2024135861 A1 WO 2024135861A1 KR 2022020665 W KR2022020665 W KR 2022020665W WO 2024135861 A1 WO2024135861 A1 WO 2024135861A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
deep learning
phenotype
layer
learning network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/KR2022/020665
Other languages
French (fr)
Korean (ko)
Inventor
이상설
장성준
박종희
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Korea Electronics Technology Institute
Original Assignee
Korea Electronics Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Korea Electronics Technology Institute filed Critical Korea Electronics Technology Institute
Publication of WO2024135861A1 publication Critical patent/WO2024135861A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0495Quantised networks; Sparse networks; Compressed networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Definitions

  • the present invention relates to deep learning learning, and more specifically, to a method of additionally learning and re-learning a model that has been trained on a server on a mobile device.
  • re-learning In order to operate a model that has been trained on a server on a new device, re-learning must be performed to regenerate deep learning parameters using the data used for learning and the data used for testing.
  • the present invention was created to solve the above problems, and the purpose of the present invention is to provide a method for additional learning or relearning in mobile devices with limited memory resources, and a deep learning computing device and method using variable data phenotypes. In providing.
  • a deep learning network learning method includes the steps of setting a data phenotype for each layer constituting a deep learning network; It includes the step of training a deep learning network while changing the phenotype of the data input to each layer according to the set data phenotype.
  • the setting step may be setting the number of bits in the exponent part and setting the number of bits in the mantissa part for each layer.
  • the setting step may be setting the data representation for each channel for each layer.
  • the deep learning network training method according to the present invention may further include quantizing a training dataset to be used for deep learning network training.
  • the learning step may be to modify the phenotype of the data between layers from the phenotype of the previous layer to the phenotype of the next layer.
  • the setting step may be setting the data expression for each layer based on the data expression received from the outside.
  • Data phenotype can be determined based on the importance of each layer.
  • a control unit that sets data phenotypes for each layer constituting a deep learning network; and a data conversion unit that changes the phenotype of data input to each layer according to the data phenotype set by the control unit.
  • a deep learning network device comprising a deep learning accelerator that trains a deep learning network with data whose phenotype is changed by a data conversion unit.
  • variable data phenotypes for each layer and channel when performing additional learning or relearning, there is a significant reduction in accuracy in mobile devices with limited memory resources and power. It is possible to perform additional learning and retraining of the deep learning model without it.
  • 1 is a diagram showing the inference process (forward path) in the learning process
  • Figure 2 is a diagram showing the backpropagation process (Backward path) in the learning process
  • Figure 6 shows learning performance using data suggested phenotypes
  • Figure 7 is a mobile device according to an embodiment of the present invention.
  • FP16 which usually uses fewer bits, is used, or hardware is configured using fixed points to reduce the size of the operator.
  • this also has the disadvantage of slowing the learning speed due to bandwidth problems occurring when inputting and outputting data to and from external memory, and lowering the learning accuracy due to the low accuracy of the calculator.
  • an embodiment of the present invention presents a deep learning learning method applying variable data phenotypes.
  • This is a technology that processes data (storage, calculation, transmission, etc.) by changing it into a flexible data format according to a specific distribution so that various phenotypes (fixed point, floating point, etc.) shown in Figure 4 can be applied for learning data processing and calculation.
  • the data expression can be set differently for each layer. By distinguishing between important and non-important layers, the data expression can be set differently.
  • data expression is set for each channel.
  • the data phenotype can be set differently for each channel. By distinguishing between important channels and unimportant channels, the data phenotype can be set differently.
  • Figure 5 shows variable data phenotypes that can be set.
  • PEE Partial Exponent Expression
  • Revised FP represents the mantissa part
  • the data expression can be set by flexibly changing the number of bits in both the exponent part and the mantissa part.
  • the data phenotype can be set/controlled by receiving the data phenotype set from an external source, such as the host.
  • Revised FP size of Revised FP can be changed to a variety of bits depending on hardware resources.
  • Revised FP if the operation is performed with an existing fixed point, all bits are considered as mantissa, and when processing with FP operation, small exponent expression and mantissa are used. It can be expressed by dividing it into .
  • the deep learning network is trained by changing the phenotype of the data input to each layer according to the set data phenotype, that is, changing the phenotype of the data between layers from the phenotype of the previous layer to the phenotype of the next layer.
  • the method according to the embodiment of the present invention is mainly used in the low-bit or data pre-processing process during hardware processing, it can operate similarly to existing data input by adding simple hardware logic, resulting in performance similar to original-level learning. It is possible to derive
  • memory usage can be further reduced by using the minimum number of bits in the learning process, and the intermediate data storage space for the learning data can be used as additional batch data. It is also possible to increase the size.
  • FIG. 7 is a block diagram of a mobile device according to an embodiment of the present invention.
  • a mobile device according to an embodiment of the present invention is configured to include a memory 110, an MCU 120, a deep learning accelerator 130, a quantization unit 140, and a data conversion unit 150, as shown. .
  • the MCU 120 receives some learning data from the server, stores it in the memory 110, and sets the data phenotype for each layer/channel of the deep learning network.
  • the quantization unit 140 performs quantization on the training data stored in the memory 110, and the data conversion unit 150 converts the quantized training data into a phenotype set in each layer/channel.
  • the deep learning accelerator 130 further trains or retrains the deep learning model learned on the server using learning data that is quantized by the quantization unit 140 and then converted into phenotype by the data conversion unit 150.
  • variable data phenotypes are used for each layer/channel. It was made applicable.
  • a computer-readable recording medium can be any data storage device that can be read by a computer and store data.
  • computer-readable recording media can be ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical disk, hard disk drive, etc.
  • computer-readable codes or programs stored on a computer-readable recording medium may be transmitted through a network connected between computers.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Machine Translation (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

Provided are a deep learning training method applying a variable data representation type and a mobile device applying same. The deep learning network training method according to an embodiment of the present invention comprises: configuring a data representation type for each of layers constituting a deep learning network; and training the deep learning network while changing a representation type of data, which is input into each layer, according to the configured data representation type. Accordingly, by applying a variable data representation type for each layer and each channel when performing additional training or retraining, it is possible to perform additional training and retraining for a deep learning model on a mobile device with limited memory resources and power without a significant reduction in accuracy.

Description

가변 데이터 표현형을 적용한 딥러닝 학습 방법 및 이를 적용한 모바일 디바이스Deep learning learning method applying variable data phenotype and mobile device to which it is applied

본 발명은 딥러닝 학습에 관한 것으로, 더욱 상세하게는 서버에서 학습이 완료된 모델을 모바일 디바이스에서 추가학습 및 재학습하는 방법에 관한 것이다.The present invention relates to deep learning learning, and more specifically, to a method of additionally learning and re-learning a model that has been trained on a server on a mobile device.

서버에서 학습이 완료된 모델을 신규 디바이스에 동작시키기 위해서는 학습에 사용된 데이터 및 테스트에 사용된 데이터를 이용하여 딥러닝 파라미터 재생성을 위한 재학습을 수행하여야 한다.In order to operate a model that has been trained on a server on a new device, re-learning must be performed to regenerate deep learning parameters using the data used for learning and the data used for testing.

즉 신규 디바이스의 딥러닝 모델에 대해 재학습을 수행하여 최소한의 성능 손실을 갖는 딥러닝 파라미터를 생성한 후 최종 어플리케이션에 업데이트하는 형태로 개발하고 있다.In other words, it is being developed by re-learning the deep learning model of a new device to generate deep learning parameters with minimal performance loss and then updating them in the final application.

이 때 서버를 통하여 학습을 진행할 경우에 FP64 ~ FP32를 주로 사용하고, 저사양의 서버급 하드웨어를 이용하면 FP16으로 학습을 진행한다. 하지만 이는 전력이 풍부하고 하드웨어 리소스가 충분한 서버에서는 충분히 학습이 가능한 수준이며, 모바일향 디바이스에서 동작하기에는 전력 소모 및 하드웨어 리소스가 부족한 문제가 발생한다.At this time, when learning is conducted through a server, FP64 to FP32 are mainly used, and when low-specification server-level hardware is used, learning is performed with FP16. However, this is a level that can be sufficiently learned on servers with abundant power and sufficient hardware resources, but problems arise such as power consumption and insufficient hardware resources to operate on mobile devices.

본 발명은 상기와 같은 문제점을 해결하기 위하여 안출된 것으로서, 본 발명의 목적은, 메모리 자원이 제한적인 모바일 디바이스에서 추가학습이나 재학습을 위한 방안으로, 가변 데이터 표현형을 적용한 딥러닝 연산 장치 및 방법을 제공함에 있다.The present invention was created to solve the above problems, and the purpose of the present invention is to provide a method for additional learning or relearning in mobile devices with limited memory resources, and a deep learning computing device and method using variable data phenotypes. In providing.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른, 딥러닝 네트워크 학습 방법은 딥러닝 네트워크를 구성하는 레이어들 마다 데이터 표현형을 각각 설정하는 단계; 설정된 데이터 표현형에 따라 각 레이어에 입력되는 데이터의 표현형을 변경하면서 딥러닝 네트워크를 학습시키는 단계;를 포함한다. In order to achieve the above object, a deep learning network learning method according to an embodiment of the present invention includes the steps of setting a data phenotype for each layer constituting a deep learning network; It includes the step of training a deep learning network while changing the phenotype of the data input to each layer according to the set data phenotype.

설정 단계는, 각 레이어 마다 지수부의 비트수를 설정하고 가수부의 비트수를 설정하는 것일 수 있다.The setting step may be setting the number of bits in the exponent part and setting the number of bits in the mantissa part for each layer.

설정 단계는, 각 레이어에 대해 채널 마다 데이터 표현형을 각각 설정하는 것일 수 있다.The setting step may be setting the data representation for each channel for each layer.

본 발명에 따른 딥러닝 네트워크 학습 방법은 딥러닝 네트워크 학습에 이용할 학습 데이터셋을 양자화 하는 단계;를 더 포함할 수 있다.The deep learning network training method according to the present invention may further include quantizing a training dataset to be used for deep learning network training.

학습 단계는, 레이어들 사이에서 데이터의 표현형을 이전 레이어의 표현형에서 다음 레이어의 표현형으로 수정하는 것일 수 있다.The learning step may be to modify the phenotype of the data between layers from the phenotype of the previous layer to the phenotype of the next layer.

설정 단계는, 외부로부터 전달받은 데이터 표현형을 기초로, 레이어들 마다 데이터 표현형을 각각 설정하는 것일 수 있다.The setting step may be setting the data expression for each layer based on the data expression received from the outside.

데이터 표현형은, 레이어들 각각의 중요도를 기초로 결정될 수 있다.Data phenotype can be determined based on the importance of each layer.

본 발명의 다른 측면에 따르면, 딥러닝 네트워크를 구성하는 레이어들 마다 데이터 표현형을 각각 설정하는 컨트롤 유닛; 및 컨트롤 유닛에 의해 설정된 데이터 표현형에 따라 각 레이어에 입력되는 데이터의 표현형을 변경하는 데이터 변환부; 데이터 변환부에 의해 표현형이 변경되는 데이터로 딥러닝 네트워크를 학습시키는 딥러닝 가속기;를 포함하는 것을 특징으로 하는 딥러닝 네트워크 장치가 제공된다.According to another aspect of the present invention, a control unit that sets data phenotypes for each layer constituting a deep learning network; and a data conversion unit that changes the phenotype of data input to each layer according to the data phenotype set by the control unit. A deep learning network device is provided, comprising a deep learning accelerator that trains a deep learning network with data whose phenotype is changed by a data conversion unit.

이상 설명한 바와 같이, 본 발명의 실시예들에 따르면, 추가학습이나 재학습을 수행함에 있어 레이어 별 그리고 채널 별로 가변 데이터 표현형을 적용함으로써, 메모리 자원과 전원이 제한적인 모바일 디바이스에서 정확도에 대한 큰 감소 없이 딥러닝 모델의 추가학습과 재학습을 수행할 수 있게 된다.As described above, according to embodiments of the present invention, by applying variable data phenotypes for each layer and channel when performing additional learning or relearning, there is a significant reduction in accuracy in mobile devices with limited memory resources and power. It is possible to perform additional learning and retraining of the deep learning model without it.

도 1은 학습 과정에서 추론 과정(Forward path)을 나타낸 도면,1 is a diagram showing the inference process (forward path) in the learning process;

도 2는 학습 과정에서 역전파 과정(Backward path)을 나타낸 도면,Figure 2 is a diagram showing the backpropagation process (Backward path) in the learning process;

도 3은 웨이트 업데이트 개념을 나타낸 수식들,Figure 3 shows equations showing the weight update concept,

도 4는 데이터 표현형 개념,Figure 4 shows the data phenotype concept,

도 5는 가변 데이터 표현형,5 is a variable data phenotype,

도 6은 데이터 제안 표현형을 이용한 학습 성능,Figure 6 shows learning performance using data suggested phenotypes,

도 7은 본 발명의 일 실시예에 따른 모바일 디바이스이다.Figure 7 is a mobile device according to an embodiment of the present invention.

이하에서는 도면을 참조하여 본 발명을 보다 상세하게 설명한다.Hereinafter, the present invention will be described in more detail with reference to the drawings.

현재 대부분의 딥러닝 네트워크는 다수의 고성능 GPU와 많은 양의 메모리를 보유하고 있는 서버 단에서 학습을 진행하기 때문에 컴퓨팅 리소스에 의해 비롯되는 문제가 없다.Currently, most deep learning networks are trained on servers with multiple high-performance GPUs and large amounts of memory, so there are no problems caused by computing resources.

하지만 모바일 디바이스와 같이 리소스와 파워가 제한적인 경우, 필요한 연산량과 중간 데이터 연산을 위한 메모리 사용량을 줄이는 방법으로 하드웨어 개발이 진행되어야 한다.However, in cases where resources and power are limited, such as mobile devices, hardware development must proceed in a way that reduces the amount of calculation required and memory usage for intermediate data calculation.

특히 모바일 다바이스를 위한 학습 전용 ASIC 및 FPGA 플랫폼에서 대량의 원본 학습 데이터 및 FP32 이상의 연산 표현형을 이용한 추가학습이나 재학습은 불가능하기 때문에 이를 보완하기 위한 기술을 필요로 한다.In particular, additional learning or re-learning using large amounts of original learning data and FP32 or higher computational phenotypes is not possible on ASIC and FPGA platforms dedicated to learning for mobile devices, so technology to supplement this is required.

인공지능 학습은 외부 메모리와의 통신 데이터량의 막대한 사용이 필요하므로, 모바일 디바이스에 최적화된 온-디바이스 학습 기술을 필요로 하는 어플리케이션이 다수 존재한다. Since artificial intelligence learning requires enormous use of communication data with external memory, there are many applications that require on-device learning technology optimized for mobile devices.

학습은 크게 추론(inference) 과정과 역전파(back-propagation) 과정으로 이루어진다. 추론 과정인 Forward path를 도 1에 나타내었고, 역전파 과정인 Backward path를 도 2에 나타내었다.Learning largely consists of an inference process and a back-propagation process. The forward path, which is the inference process, is shown in Figure 1, and the backward path, which is the backpropagation process, is shown in Figure 2.

딥러닝 네트워크의 학습을 위해서는 현재 웨이트를 기반으로 추론 과정을 거친 후에 에러 값을 연산하고 그래디언트를 계산하여 이전 웨이트를 업데이트 하는 과정을 거치며, 이를 수식으로 나타내면 도 3과 같다. 서버 단에서 도 3의 모든 수식에 대한 연산은 FP32 이상의 데이터 표현형으로 진행된다. In order to learn a deep learning network, after going through an inference process based on the current weight, the error value is calculated, the gradient is calculated, and the previous weight is updated. This is expressed in a formula as shown in Figure 3. At the server end, calculations for all formulas in FIG. 3 are performed using data representations of FP32 or higher.

하지만 리소스가 한정적인 모바일 디바이스에서 서버와 같이 모든 데이터셋을 FP32 레벨로 저장하기에 하드웨어 메모리 공간은 한정되어 있고, 연산기 또한 하드웨어 면적 증가로 제한적인 bit-크기 및 병렬화만 가능하다.However, in resource-limited mobile devices, like servers, all data sets are stored at the FP32 level, so hardware memory space is limited, and only limited bit-size and parallelism are possible due to the increase in hardware area of the calculator.

이를 해결하기 위하여 보통 적은 bit를 사용하는 FP16를 사용하거나 연산기의 크기를 줄이기 위하여 Fixed point를 사용하여 하드웨어를 구성하기도 한다. 하지만 이 또한 외부 메모리와의 데이터 입출력 시 Bandwidth 문제가 발생하여 학습 속도가 느려지고, 연산기의 정확도가 낮아 학습 정확도가 낮아지는 단점이 존재한다.To solve this, FP16, which usually uses fewer bits, is used, or hardware is configured using fixed points to reduce the size of the operator. However, this also has the disadvantage of slowing the learning speed due to bandwidth problems occurring when inputting and outputting data to and from external memory, and lowering the learning accuracy due to the low accuracy of the calculator.

이에 따라 본 발명의 실시예에서는 가변 데이터 표현형을 적용한 딥러닝 학습 방법을 제시한다. 학습 데이터 처리 및 연산을 위하여 도 4에 제시된 다양한 표현형(Fixed point, Floating point 등)이 적용가능하도록 특정 분포에 따른 유연한 데이터 포맷으로 변경하여 데이터를 처리(저장, 연산, 전송 등)하는 기술이다.Accordingly, an embodiment of the present invention presents a deep learning learning method applying variable data phenotypes. This is a technology that processes data (storage, calculation, transmission, etc.) by changing it into a flexible data format according to a specific distribution so that various phenotypes (fixed point, floating point, etc.) shown in Figure 4 can be applied for learning data processing and calculation.

먼저 딥러닝 네트워크의 학습 데이터를 원하는 표현형으로 변경하기 전에 8bit 이하로 양자화(Quantization) 하기 위해 Scale/exponent/bias를 적용하여 Quantized value로 데이터를 변환하면 기존 데이터 표현형 대비 1/8 수준으로 줄일 수 있다. 양자화된 데이터는 De-Quantization을 이용하여 기존의 데이터 표현형과 유사한 데이터 값으로 복원이 가능하다.First, before changing the learning data of the deep learning network into the desired phenotype, by applying scale/exponent/bias to quantize the data to 8 bits or less and convert the data into a quantized value, it can be reduced to 1/8 of the existing data phenotype. . Quantized data can be restored to data values similar to the existing data phenotype using De-Quantization.

다음 딥러닝 네트워크를 구성하는 각 레이어들 마다 데이터 표현형을 설정한다. 즉 레이어 마다 데이터 표현형을 다르게 설정할 수 있다. 중요 레이어와 그렇지 않은 레이어를 구분하여 데이터 표현형을 다르게 설정할 수 있는 것이다.Next, set the data phenotype for each layer that makes up the deep learning network. In other words, the data expression can be set differently for each layer. By distinguishing between important and non-important layers, the data expression can be set differently.

또한 레이어 내에서는 각 채널 마다 데이터 표현형을 설정한다. 즉 채널 마다 데이터 표현형을 다르게 설정할 수 있다. 중요 채널과 그렇지 않은 채널을 구분하여 데이터 표현형을 다르게 설정할 수 있는 것이다.Additionally, within the layer, data expression is set for each channel. In other words, the data phenotype can be set differently for each channel. By distinguishing between important channels and unimportant channels, the data phenotype can be set differently.

도 5에는 설정 가능한 가변 데이터 표현형을 나타내었다. 도 5에서 PEE(Partial Exponent Expression)는 지수부이고 Revised FP는 가수부를 나타내는데, 지수부와 가수부 모두 비트수를 유연하게 변경하여 데이터 표현형을 설정할 수 있다. 데이터 표현형은 외부, 이를 테면 호스트에서 설정한 데이터 표현형을 전달받아 설정/제어가 가능하다.Figure 5 shows variable data phenotypes that can be set. In Figure 5, PEE (Partial Exponent Expression) represents the exponent part and Revised FP represents the mantissa part, and the data expression can be set by flexibly changing the number of bits in both the exponent part and the mantissa part. The data phenotype can be set/controlled by receiving the data phenotype set from an external source, such as the host.

한편 Revised FP의 크기는 하드웨어 리소스에 따라 다양한 비트수로 변경이 가능하며, Revised FP의 경우 기존 Fixed point로 연산을 수행한다면 모든 비트를 mantissa 처럼 간주하고, FP 연산으로 처리할 때 작은 exponent 표현과 mantissa로 나눠서 표현할 수 있다.Meanwhile, the size of Revised FP can be changed to a variety of bits depending on hardware resources. In the case of Revised FP, if the operation is performed with an existing fixed point, all bits are considered as mantissa, and when processing with FP operation, small exponent expression and mantissa are used. It can be expressed by dividing it into .

학습 과정에서는 설정된 데이터 표현형에 따라 각 레이어에 입력되는 데이터의 표현형을 변경, 즉 레이어들 사이에서 데이터의 표현형을 이전 레이어의 표현형에서 다음 레이어의 표현형으로 변경하면서 딥러닝 네트워크를 학습시킨다.In the learning process, the deep learning network is trained by changing the phenotype of the data input to each layer according to the set data phenotype, that is, changing the phenotype of the data between layers from the phenotype of the previous layer to the phenotype of the next layer.

한편 학습 과정에서는 weight 및 gradiant 분포가 확률적으로 편향되어 있어 유의미한 Dynamic range는 넓지 않아 작은 exponent 표현형 만으로도 학습이 가능하다. 도 6에 제시된 결과표에서 확인이 가능하다.Meanwhile, in the learning process, the weight and gradient distributions are stochastically biased, so the meaningful dynamic range is not wide, so learning is possible with only small exponent phenotypes. This can be confirmed in the result table presented in Figure 6.

본 발명의 실시예에 따른 방법으로 하드웨어 처리시에 보통 Low-bit 혹은 데이터 전처리 과정에서 주로 사용하는 방법이기 때문에 간단한 하드웨어 로직 추가로 기존 데이터 입력과 유사하게 동작할 수 있어 원본 수준의 학습과 유사한 성능 도출이 가능하다.Since the method according to the embodiment of the present invention is mainly used in the low-bit or data pre-processing process during hardware processing, it can operate similarly to existing data input by adding simple hardware logic, resulting in performance similar to original-level learning. It is possible to derive

이를 공개 데이터 셋(CIFAR-10, ImageNet 등)을 이용하여 측정해본 결과 본 발명의 실시예에서 제시한 데이터 표현형을 이용한 경우 측정된 성능은 도 6에 제시된 바와 같이 대부분 표현형에서 FP32와 비교하여 성능 열화가 거의 없음을 확인할 수 있다.As a result of measuring this using public data sets (CIFAR-10, ImageNet, etc.), when using the data phenotype presented in the embodiment of the present invention, the measured performance showed a performance deterioration compared to FP32 in most phenotypes, as shown in Figure 6. It can be seen that there is almost no .

또한 사용자가 필요 없는 레이어/채널 데이터 표현형의 경우 학습 과정에서 최소의 비트를 사용하여 메모리 사용량을 더 줄일 수 있는 효과가 있으며, 해당 학습 데이터 중간 데이터 저장 공간에 추가 Batch 데이터로의 이용이 가능하여 Batch 크기를 늘리는 것도 가능하다.In addition, in the case of layer/channel data phenotypes that do not require the user, memory usage can be further reduced by using the minimum number of bits in the learning process, and the intermediate data storage space for the learning data can be used as additional batch data. It is also possible to increase the size.

도 7은 본 발명의 일 실시예에 따른 모바일 디바이스의 블럭도이다. 본 발명의 실시예에 따른 모바일 디바이스는, 도시된 바와 같이, 메모리(110), MCU(120), 딥러닝 가속기(130), 양자화부(140) 및 데이터 변환부(150)를 포함하여 구성된다.Figure 7 is a block diagram of a mobile device according to an embodiment of the present invention. A mobile device according to an embodiment of the present invention is configured to include a memory 110, an MCU 120, a deep learning accelerator 130, a quantization unit 140, and a data conversion unit 150, as shown. .

MCU(120)는 서버로부터 일부 학습 데이터를 전달 받아 메모리(110)에 저장하고, 딥러닝 네트워크의 각 레이어/채널들에 대한 데이터 표현형을 설정한다. 양자화부(140)는 메모리(110)에 저장된 학습 데이터에 대해 양자화를 수행하고, 데이터 변환부(150)는 양자화된 학습 데이터에 대해 각 레이어/채널에서 설정한 표현형을 변환한다.The MCU 120 receives some learning data from the server, stores it in the memory 110, and sets the data phenotype for each layer/channel of the deep learning network. The quantization unit 140 performs quantization on the training data stored in the memory 110, and the data conversion unit 150 converts the quantized training data into a phenotype set in each layer/channel.

딥러닝 가속기(130)는 양자화부(140)에 의해 양자화 된 후 데이터 변환부(150)에서 표현형이 변환되는 학습 데이터로 서버에서 학습된 딥러닝 모델을 추가학습이나 재학습시킨다.The deep learning accelerator 130 further trains or retrains the deep learning model learned on the server using learning data that is quantized by the quantization unit 140 and then converted into phenotype by the data conversion unit 150.

지금까지 가변 데이터 표현형을 적용한 딥러닝 학습 방법 및 이를 적용한 모바일 디바이스에 대해 바람직한 실시예를 들어 상세히 설명하였다.So far, the deep learning learning method using variable data phenotypes and the mobile device to which it is applied have been described in detail with preferred embodiments.

본 발명의 실시예에서는 기존 학습된 원본 데이터를 이용한 재학습이 아닌 특정 데이터만을 이용하여 재학습을 진행할 때 학습 과정에서 사용되는 하드웨어 리소스 감소 및 저 전력화를 위해, 각 레이어/채널 별로 가변 데이터 표현형을 적용할 수 있도록 하였다.In an embodiment of the present invention, in order to reduce hardware resources used in the learning process and reduce power when relearning is performed using only specific data rather than relearning using existing learned original data, variable data phenotypes are used for each layer/channel. It was made applicable.

이에 의해 모바일 디바이스를 이용한 딥러닝 네트워크 학습시 낮은 정확도 저하 만으로 적은 에너지 소비로 빠른 속도로 딥러닝 네트워크 학습이 가능하도록 하였다.As a result, deep learning network learning was made possible at high speed with low energy consumption with only low accuracy loss when learning deep learning networks using mobile devices.

한편, 본 실시예에 따른 장치와 방법의 기능을 수행하게 하는 컴퓨터 프로그램을 수록한 컴퓨터로 읽을 수 있는 기록매체에도 본 발명의 기술적 사상이 적용될 수 있음은 물론이다. 또한, 본 발명의 다양한 실시예에 따른 기술적 사상은 컴퓨터로 읽을 수 있는 기록매체에 기록된 컴퓨터로 읽을 수 있는 코드 형태로 구현될 수도 있다. 컴퓨터로 읽을 수 있는 기록매체는 컴퓨터에 의해 읽을 수 있고 데이터를 저장할 수 있는 어떤 데이터 저장 장치이더라도 가능하다. 예를 들어, 컴퓨터로 읽을 수 있는 기록매체는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광디스크, 하드 디스크 드라이브, 등이 될 수 있음은 물론이다. 또한, 컴퓨터로 읽을 수 있는 기록매체에 저장된 컴퓨터로 읽을 수 있는 코드 또는 프로그램은 컴퓨터간에 연결된 네트워크를 통해 전송될 수도 있다.Meanwhile, of course, the technical idea of the present invention can be applied to a computer-readable recording medium containing a computer program that performs the functions of the device and method according to this embodiment. Additionally, the technical ideas according to various embodiments of the present invention may be implemented in the form of computer-readable code recorded on a computer-readable recording medium. A computer-readable recording medium can be any data storage device that can be read by a computer and store data. For example, of course, computer-readable recording media can be ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical disk, hard disk drive, etc. Additionally, computer-readable codes or programs stored on a computer-readable recording medium may be transmitted through a network connected between computers.

또한, 이상에서는 본 발명의 바람직한 실시예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어져서는 안될 것이다.In addition, although preferred embodiments of the present invention have been shown and described above, the present invention is not limited to the specific embodiments described above, and the technical field to which the invention pertains without departing from the gist of the present invention as claimed in the claims. Of course, various modifications can be made by those skilled in the art, and these modifications should not be understood individually from the technical idea or perspective of the present invention.

Claims (8)

딥러닝 네트워크를 구성하는 레이어들 마다 데이터 표현형을 각각 설정하는 단계;Setting a data phenotype for each layer constituting a deep learning network; 설정된 데이터 표현형에 따라 각 레이어에 입력되는 데이터의 표현형을 변경하면서 딥러닝 네트워크를 학습시키는 단계;를 포함하는 것을 특징으로 하는 딥러닝 네트워크 학습 방법.A deep learning network learning method comprising: training a deep learning network while changing the phenotype of data input to each layer according to the set data phenotype. 청구항 1에 있어서,In claim 1, 설정 단계는,The setup steps are: 각 레이어 마다 지수부의 비트수를 설정하고 가수부의 비트수를 설정하는 것을 특징으로 하는 딥러닝 네트워크 학습 방법.A deep learning network learning method characterized by setting the number of bits in the exponent part and setting the number of bits in the mantissa part for each layer. 청구항 1에 있어서,In claim 1, 설정 단계는,The setup steps are: 각 레이어에 대해 채널 마다 데이터 표현형을 각각 설정하는 것을 특징으로 하는 딥러닝 네트워크 학습 방법.A deep learning network learning method characterized by setting data phenotypes for each channel for each layer. 청구항 1에 있어서,In claim 1, 딥러닝 네트워크 학습에 이용할 학습 데이터셋을 양자화 하는 단계;를 더 포함하는 것을 특징으로 하는 딥러닝 네트워크 학습 방법.A deep learning network learning method further comprising the step of quantizing a learning dataset to be used for deep learning network learning. 청구항 1에 있어서,In claim 1, 학습 단계는,The learning stage is, 레이어들 사이에서 데이터의 표현형을 이전 레이어의 표현형에서 다음 레이어의 표현형으로 수정하는 것을 특징으로 하는 딥러닝 네트워크 학습 방법.A deep learning network learning method characterized by modifying the phenotype of data between layers from the phenotype of the previous layer to the phenotype of the next layer. 청구항 1에 있어서,In claim 1, 설정 단계는,The setup steps are: 외부로부터 전달받은 데이터 표현형을 기초로, 레이어들 마다 데이터 표현형을 각각 설정하는 것을 특징으로 하는 딥러닝 네트워크 학습 방법.A deep learning network learning method characterized by setting the data phenotype for each layer based on the data phenotype received from the outside. 청구항 1에 있어서,In claim 1, 데이터 표현형은,The data phenotype is, 레이어들 각각의 중요도를 기초로 결정되는 것을 특징으로 하는 딥러닝 네트워크 학습 방법.A deep learning network learning method characterized in that it is determined based on the importance of each layer. 딥러닝 네트워크를 구성하는 레이어들 마다 데이터 표현형을 각각 설정하는 컨트롤 유닛; 및A control unit that sets the data phenotype for each layer constituting the deep learning network; and 컨트롤 유닛에 의해 설정된 데이터 표현형에 따라 각 레이어에 입력되는 데이터의 표현형을 변경하는 데이터 변환부;a data conversion unit that changes the phenotype of data input to each layer according to the data phenotype set by the control unit; 데이터 변환부에 의해 표현형이 변경되는 데이터로 딥러닝 네트워크를 학습시키는 딥러닝 가속기;를 포함하는 것을 특징으로 하는 딥러닝 네트워크 장치.A deep learning network device comprising a deep learning accelerator that trains a deep learning network with data whose phenotype is changed by a data conversion unit.
PCT/KR2022/020665 2022-12-19 2022-12-19 Deep learning training method applying variable data representation type and mobile device applying same Ceased WO2024135861A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020220177737A KR20240096949A (en) 2022-12-19 2022-12-19 Deep learning method using variable data type and mobile device applying the same
KR10-2022-0177737 2022-12-19

Publications (1)

Publication Number Publication Date
WO2024135861A1 true WO2024135861A1 (en) 2024-06-27

Family

ID=91589081

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2022/020665 Ceased WO2024135861A1 (en) 2022-12-19 2022-12-19 Deep learning training method applying variable data representation type and mobile device applying same

Country Status (2)

Country Link
KR (1) KR20240096949A (en)
WO (1) WO2024135861A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190014900A (en) * 2017-08-04 2019-02-13 삼성전자주식회사 Method and apparatus for quantizing parameter of neural network
KR20190068255A (en) * 2017-12-08 2019-06-18 삼성전자주식회사 Method and apparatus for generating fixed point neural network
JP2019212295A (en) * 2018-06-08 2019-12-12 インテル・コーポレーション Artificial neural network training using flexible floating point tensors
US20200264876A1 (en) * 2019-02-14 2020-08-20 Microsoft Technology Licensing, Llc Adjusting activation compression for neural network training
KR20210121946A (en) * 2020-03-31 2021-10-08 삼성전자주식회사 Method and apparatus for neural network quantization

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190014900A (en) * 2017-08-04 2019-02-13 삼성전자주식회사 Method and apparatus for quantizing parameter of neural network
KR20190068255A (en) * 2017-12-08 2019-06-18 삼성전자주식회사 Method and apparatus for generating fixed point neural network
JP2019212295A (en) * 2018-06-08 2019-12-12 インテル・コーポレーション Artificial neural network training using flexible floating point tensors
US20200264876A1 (en) * 2019-02-14 2020-08-20 Microsoft Technology Licensing, Llc Adjusting activation compression for neural network training
KR20210121946A (en) * 2020-03-31 2021-10-08 삼성전자주식회사 Method and apparatus for neural network quantization

Also Published As

Publication number Publication date
KR20240096949A (en) 2024-06-27

Similar Documents

Publication Publication Date Title
CN114298287A (en) Prediction method and device based on knowledge distillation, electronic equipment, storage medium
WO2022146080A1 (en) Algorithm and method for dynamically changing quantization precision of deep-learning network
WO2024143909A1 (en) Method for converting image in stages by taking into consideration angle changes
CN116976428A (en) Model training method, device, equipment and storage medium
WO2023033194A1 (en) Knowledge distillation method and system specialized for pruning-based deep neural network lightening
WO2024135861A1 (en) Deep learning training method applying variable data representation type and mobile device applying same
WO2022107910A1 (en) Mobile deep learning hardware device capable of retraining
WO2023014124A1 (en) Method and apparatus for quantizing neural network parameter
CN114444688B (en) Neural network quantization method, device, equipment, storage medium and program product
WO2020091259A1 (en) Improvement of prediction performance using asymmetric tanh activation function
CN115438784A (en) Sufficient training method for hybrid bit width hyper-network
CN117010461A (en) Neural network training method, device, equipment and storage medium
CN113688990B (en) Data-free quantitative training method for power edge calculation classification neural network
WO2025041887A1 (en) Method for iteratively pruning neural network through self-distillation
WO2020149511A1 (en) Electronic device and control method therefor
CN115759209B (en) Quantification method and device of neural network model, electronic equipment and medium
WO2023085457A1 (en) Memory structure and control method for efficient deep learning training
WO2022107951A1 (en) Method for training ultra-lightweight deep learning network
WO2023128024A1 (en) Method and system for quantizing deep-learning network
CN115623575A (en) A Power Allocation Method in CR-NOMA Scenario
WO2023214608A1 (en) Quantum circuit simulation hardware
WO2023080292A1 (en) Apparatus and method for generating adaptive parameter for deep learning acceleration device
WO2023214609A1 (en) Quantum circuit computation method for efficiently computing state vectors
CN115374397A (en) Method for constructing wireless communication precoder based on generalized singular value decomposition
CN115168572A (en) Text processing method, text processing apparatus and electronic device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22969295

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22969295

Country of ref document: EP

Kind code of ref document: A1