WO2024158174A1 - Dispositif électronique et procédé de quantification d'opérateur associé à un calcul de modèle - Google Patents
Dispositif électronique et procédé de quantification d'opérateur associé à un calcul de modèle Download PDFInfo
- Publication number
- WO2024158174A1 WO2024158174A1 PCT/KR2024/000928 KR2024000928W WO2024158174A1 WO 2024158174 A1 WO2024158174 A1 WO 2024158174A1 KR 2024000928 W KR2024000928 W KR 2024000928W WO 2024158174 A1 WO2024158174 A1 WO 2024158174A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- processor
- data type
- operator
- weights
- electronic device
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0495—Quantised networks; Sparse networks; Compressed networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Definitions
- an electronic device may include memory and at least one processor.
- the at least one processor may obtain, from the memory, a first weight set of a first data type included in a first operator, which is one of at least one operator included in the model.
- the at least one processor generates a set of sub-output data corresponding to a set of input data used for computation of the first operator from profile information stored in the memory and generated by execution of the model. It can be obtained.
- the at least one processor may obtain a second weight set of a second data type supported by at least one accelerator by performing quantization on the first weight set based on the set of sub-output data. .
- the at least one processor may store the second weight set in the memory based on obtaining the second weight set.
- a method performed by an electronic device includes obtaining, in a memory, a first set of weights of a first data type included in a first operator, one of at least one operator included in the model.
- the method includes an operation of obtaining a set of sub-output data corresponding to a set of input data used for computation of the first operator from profile information stored in the memory and generated by execution of the model. It can be included.
- the method may include, based on the set of sub-output data, obtaining a second weight set of a second data type supported by at least one accelerator by performing quantization on the first weight set. .
- the method may include storing the second set of weights in the memory based on obtaining the second set of weights.
- a computer readable storage medium storing one or more programs
- the one or more programs when executed by a processor of an electronic device, are stored in memory, at least one of the programs included in the model. and instructions causing the electronic device to obtain a first set of weights of a first data type included in a first operator, one of the operators.
- the at least one program obtains a set of sub-output data corresponding to a set of input data used for computation of the first operator from profile information stored in the memory and generated by execution of the model. It may include instructions that cause the electronic device to do so.
- the electronic device allows the at least one program to obtain a second weight set of a second data type supported by at least one accelerator by performing quantization on the first weight set based on the set of sub-output data. It may contain instructions that cause .
- the at least one or more programs may include instructions that cause the electronic device to store the second set of weights in the memory based on obtaining the second set of weights.
- the one or more programs are included in the model in the memory when executed by a processor of an electronic device. and instructions that cause the electronic device to obtain a first set of weights of a first data type included in a first operator, one of at least one operator.
- the one or more programs obtain, from profile information generated by execution of the model and stored in the memory, a set of sub-output data corresponding to a set of input data used for computation of the first operator. It may include instructions that cause the electronic device to do so.
- the one or more programs based on the set of sub-output data, perform quantization on the first set of weights to obtain a second set of weights of a second data type supported by at least one accelerator. It may contain instructions that cause .
- the one or more programs may include instructions that cause the electronic device to store the second set of weights in the memory based on obtaining the second set of weights.
- an electronic device may include a memory that stores instructions, and at least one processor.
- the instructions when executed by the at least one processor, cause the electronic device to generate, in the memory, a first set of weights of a first data type included in a first operator that is one of at least one operator included in the model.
- FIG. 1 is a block diagram of an electronic device in a network environment, according to embodiments.
- FIG. 2 is a block diagram illustrating one or more processors included in an electronic device according to an embodiment.
- FIG. 3 is an exemplary diagram illustrating a neural network running on an electronic device according to an embodiment.
- Figure 4 is an example diagram for explaining quantization of weights according to an embodiment.
- Figure 5 is a block diagram for explaining the operation of a memory included in an electronic device according to an embodiment.
- FIG. 6 is a block diagram illustrating reliability evaluation of a model executed in an electronic device according to an embodiment.
- FIG. 7 illustrates a flow of operations of an electronic device for storing quantized second weight sets according to an embodiment.
- Figure 8 shows a flow of operations for executing quantization using a server according to an embodiment.
- FIG. 9 illustrates a flow of operations of an electronic device for acquiring a second set of weights through quantization according to an embodiment.
- FIG. 10 illustrates a flow of operations of an electronic device for storing a second set of weights based on reliability evaluation according to an embodiment.
- FIG. 11 illustrates a flow of operations of an electronic device for identifying a quantization method according to reliability, according to an embodiment.
- Terms used in the following description refer to signals (e.g., signal, information, message, signaling), terms for operational states (e.g., step, operation, procedure), and terms that refer to data.
- Terms referring to artificial intelligence (AI) e.g. packet, user stream, information, bit, symbol, codeword
- terms referring to artificial intelligence (AI) e.g. neural network
- AI artificial intelligence
- neural network network
- weight e.g. weight
- term referring to profile information, operator e.g., layer
- terms referring to network entities, terms referring to components of a device, etc. are exemplified for convenience of explanation. Accordingly, the present disclosure is not limited to the terms described below, and other terms having equivalent technical meaning may be used.
- terms such as '... part', '... base', '... water', and '... body' used hereinafter mean at least one shape structure or a unit that processes a function. It can mean.
- the expressions greater than or less than may be used to determine whether a specific condition is satisfied or fulfilled, but this is only a description for expressing an example, and the description of more or less may be used. It's not exclusion. Conditions written as ‘more than’ can be replaced with ‘more than’, conditions written as ‘less than’ can be replaced with ‘less than’, and conditions written as ‘more than and less than’ can be replaced with ‘greater than and less than’.
- 'A' to 'B' means at least one of the elements from A to (including A) and B (including B).
- 'C' and/or 'D' means including at least one of 'C' or 'D', for example ⁇ 'C', 'D', 'C' and 'D' ⁇ .
- FIG. 1 is a block diagram of an electronic device 101 in a network environment 100, according to various embodiments.
- the electronic device 101 communicates with the electronic device 102 through a first network 198 (e.g., a short-range wireless communication network) or a second network 199. It is possible to communicate with at least one of the electronic device 104 or the server 108 through (e.g., a long-distance wireless communication network). According to one embodiment, the electronic device 101 may communicate with the electronic device 104 through the server 108.
- a first network 198 e.g., a short-range wireless communication network
- a second network 199 e.g., a long-distance wireless communication network.
- the electronic device 101 may communicate with the electronic device 104 through the server 108.
- the electronic device 101 includes a processor 120, a memory 130, an input module 150, an audio output module 155, a display module 160, an audio module 170, and a sensor module ( 176), interface 177, connection terminal 178, haptic module 179, camera module 180, power management module 188, battery 189, communication module 190, subscriber identification module 196 , or may include an antenna module 197.
- at least one of these components eg, the connection terminal 178) may be omitted or one or more other components may be added to the electronic device 101.
- some of these components e.g., sensor module 176, camera module 180, or antenna module 197) are integrated into one component (e.g., display module 160). It can be.
- Processor 120 may, for example, execute software (e.g., program 140) to operate at least one other component (e.g., hardware or software component) of electronic device 101 connected to processor 120. It can be controlled and various data processing or operations can be performed. According to one embodiment, as at least part of data processing or computation, the processor 120 stores commands or data received from another component (e.g., sensor module 176 or communication module 190) in volatile memory 132. The commands or data stored in the volatile memory 132 can be processed, and the resulting data can be stored in the non-volatile memory 134.
- software e.g., program 140
- the processor 120 stores commands or data received from another component (e.g., sensor module 176 or communication module 190) in volatile memory 132.
- the commands or data stored in the volatile memory 132 can be processed, and the resulting data can be stored in the non-volatile memory 134.
- the processor 120 includes a main processor 121 (e.g., a central processing unit or an application processor) or an auxiliary processor 123 that can operate independently or together (e.g., a graphics processing unit, a neural network processing unit ( It may include a neural processing unit (NPU), an image signal processor, a sensor hub processor, or a communication processor).
- a main processor 121 e.g., a central processing unit or an application processor
- auxiliary processor 123 e.g., a graphics processing unit, a neural network processing unit ( It may include a neural processing unit (NPU), an image signal processor, a sensor hub processor, or a communication processor.
- the electronic device 101 includes a main processor 121 and a auxiliary processor 123
- the auxiliary processor 123 may be set to use lower power than the main processor 121 or be specialized for a designated function. You can.
- the auxiliary processor 123 may be implemented separately from the main processor 121 or as part of it.
- the auxiliary processor 123 may, for example, act on behalf of the main processor 121 while the main processor 121 is in an inactive (e.g., sleep) state, or while the main processor 121 is in an active (e.g., application execution) state. ), together with the main processor 121, at least one of the components of the electronic device 101 (e.g., the display module 160, the sensor module 176, or the communication module 190) At least some of the functions or states related to can be controlled.
- coprocessor 123 e.g., image signal processor or communication processor
- may be implemented as part of another functionally related component e.g., camera module 180 or communication module 190. there is.
- the auxiliary processor 123 may include a hardware structure specialized for processing artificial intelligence models.
- Artificial intelligence models can be created through machine learning. For example, such learning may be performed in the electronic device 101 itself on which the artificial intelligence model is performed, or may be performed through a separate server (e.g., server 108).
- Learning algorithms may include, for example, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning, but It is not limited.
- An artificial intelligence model may include multiple artificial neural network layers.
- Artificial neural networks include deep neural network (DNN), convolutional neural network (CNN), recurrent neural network (RNN), restricted boltzmann machine (RBM), belief deep network (DBN), bidirectional recurrent deep neural network (BRDNN), It may be one of deep Q-networks or a combination of two or more of the above, but is not limited to the examples described above.
- artificial intelligence models may additionally or alternatively include software structures.
- the memory 130 may store various data used by at least one component (eg, the processor 120 or the sensor module 176) of the electronic device 101. Data may include, for example, input data or output data for software (e.g., program 140) and instructions related thereto.
- Memory 130 may include volatile memory 132 or non-volatile memory 134.
- the program 140 may be stored as software in the memory 130 and may include, for example, an operating system 142, middleware 144, or application 146.
- the input module 150 may receive commands or data to be used in a component of the electronic device 101 (e.g., the processor 120) from outside the electronic device 101 (e.g., a user).
- the input module 150 may include, for example, a microphone, mouse, keyboard, keys (eg, buttons), or digital pen (eg, stylus pen).
- the sound output module 155 may output sound signals to the outside of the electronic device 101.
- the sound output module 155 may include, for example, a speaker or a receiver. Speakers can be used for general purposes such as multimedia playback or recording playback.
- the receiver can be used to receive incoming calls. According to one embodiment, the receiver may be implemented separately from the speaker or as part of it.
- the display module 160 can visually provide information to the outside of the electronic device 101 (eg, a user).
- the display module 160 may include, for example, a display, a hologram device, or a projector, and a control circuit for controlling the device.
- the display module 160 may include a touch sensor configured to detect a touch, or a pressure sensor configured to measure the intensity of force generated by the touch.
- the audio module 170 can convert sound into an electrical signal or, conversely, convert an electrical signal into sound. According to one embodiment, the audio module 170 acquires sound through the input module 150, the sound output module 155, or an external electronic device (e.g., directly or wirelessly connected to the electronic device 101). Sound may be output through the electronic device 102 (e.g., speaker or headphone).
- the electronic device 102 e.g., speaker or headphone
- the sensor module 176 detects the operating state (e.g., power or temperature) of the electronic device 101 or the external environmental state (e.g., user state) and generates an electrical signal or data value corresponding to the detected state. can do.
- the sensor module 176 includes, for example, a gesture sensor, a gyro sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a grip sensor, a proximity sensor, a color sensor, an IR (infrared) sensor, a biometric sensor, It may include a temperature sensor, humidity sensor, or light sensor.
- the interface 177 may support one or more designated protocols that can be used to connect the electronic device 101 directly or wirelessly with an external electronic device (eg, the electronic device 102).
- the interface 177 may include, for example, a high definition multimedia interface (HDMI), a universal serial bus (USB) interface, an SD card interface, or an audio interface.
- HDMI high definition multimedia interface
- USB universal serial bus
- SD card interface Secure Digital Card interface
- audio interface audio interface
- connection terminal 178 may include a connector through which the electronic device 101 can be physically connected to an external electronic device (eg, the electronic device 102).
- the connection terminal 178 may include, for example, an HDMI connector, a USB connector, an SD card connector, or an audio connector (eg, a headphone connector).
- the haptic module 179 may convert electrical signals into mechanical stimulation (e.g., vibration or movement) or electrical stimulation that the user can perceive through tactile or kinesthetic senses.
- the haptic module 179 may include, for example, a motor, a piezoelectric element, or an electrical stimulation device.
- the camera module 180 can capture still images and moving images.
- the camera module 180 may include one or more lenses, image sensors, image signal processors, or flashes.
- the power management module 188 can manage power supplied to the electronic device 101.
- the power management module 188 may be implemented as at least a part of, for example, a power management integrated circuit (PMIC).
- PMIC power management integrated circuit
- the battery 189 may supply power to at least one component of the electronic device 101.
- the battery 189 may include, for example, a non-rechargeable primary battery, a rechargeable secondary battery, or a fuel cell.
- Communication module 190 is configured to provide a direct (e.g., wired) communication channel or wireless communication channel between electronic device 101 and an external electronic device (e.g., electronic device 102, electronic device 104, or server 108). It can support establishment and communication through established communication channels. Communication module 190 operates independently of processor 120 (e.g., an application processor) and may include one or more communication processors that support direct (e.g., wired) communication or wireless communication.
- processor 120 e.g., an application processor
- the communication module 190 may be a wireless communication module 192 (e.g., a cellular communication module, a short-range wireless communication module, or a global navigation satellite system (GNSS) communication module) or a wired communication module 194 (e.g., : LAN (local area network) communication module, or power line communication module) may be included.
- a wireless communication module 192 e.g., a cellular communication module, a short-range wireless communication module, or a global navigation satellite system (GNSS) communication module
- GNSS global navigation satellite system
- wired communication module 194 e.g., : LAN (local area network) communication module, or power line communication module
- the corresponding communication module is a first network 198 (e.g., a short-range communication network such as Bluetooth, wireless fidelity (WiFi) direct, or infrared data association (IrDA)) or a second network 199 (e.g., legacy It may communicate with an external electronic device 104 through a telecommunication network such as a cellular network, a 5G network, a next-generation communication network, the Internet, or a computer network (e.g., LAN or WAN).
- a telecommunication network such as a cellular network, a 5G network, a next-generation communication network, the Internet, or a computer network (e.g., LAN or WAN).
- a telecommunication network such as a cellular network, a 5G network, a next-generation communication network, the Internet, or a computer network (e.g., LAN or WAN).
- a telecommunication network such as a cellular network, a 5G network, a next-generation communication network
- the wireless communication module 192 uses subscriber information (e.g., International Mobile Subscriber Identifier (IMSI)) stored in the subscriber identification module 196 to communicate within a communication network such as the first network 198 or the second network 199.
- subscriber information e.g., International Mobile Subscriber Identifier (IMSI)
- IMSI International Mobile Subscriber Identifier
- the wireless communication module 192 may support 5G networks after 4G networks and next-generation communication technologies, for example, NR access technology (new radio access technology).
- NR access technology provides high-speed transmission of high-capacity data (eMBB (enhanced mobile broadband)), minimization of terminal power and access to multiple terminals (mMTC (massive machine type communications)), or high reliability and low latency (URLLC (ultra-reliable and low latency). -latency communications)) can be supported.
- the wireless communication module 192 may support a high frequency band (eg, mmWave band), for example, to achieve a high data rate.
- a high frequency band eg, mmWave band
- the wireless communication module 192 uses various technologies to secure performance in high frequency bands, for example, beamforming, massive array multiple-input and multiple-output (MIMO), and full-dimensional multiplexing. It can support technologies such as input/output (FD-MIMO: full dimensional MIMO), array antenna, analog beam-forming, or large scale antenna.
- the wireless communication module 192 may support various requirements specified in the electronic device 101, an external electronic device (e.g., electronic device 104), or a network system (e.g., second network 199).
- the wireless communication module 192 supports peak data rate (e.g., 20 Gbps or more) for realizing eMBB, loss coverage (e.g., 164 dB or less) for realizing mmTC, or U-plane latency (e.g., 164 dB or less) for realizing URLLC.
- peak data rate e.g., 20 Gbps or more
- loss coverage e.g., 164 dB or less
- U-plane latency e.g., 164 dB or less
- the antenna module 197 may transmit or receive signals or power to or from the outside (eg, an external electronic device).
- the antenna module 197 may include an antenna including a radiator made of a conductor or a conductive pattern formed on a substrate (eg, PCB).
- the antenna module 197 may include a plurality of antennas (eg, an array antenna). In this case, at least one antenna suitable for a communication method used in a communication network such as the first network 198 or the second network 199 is, for example, connected to the plurality of antennas by the communication module 190. can be selected Signals or power may be transmitted or received between the communication module 190 and an external electronic device through the at least one selected antenna.
- other components eg, radio frequency integrated circuit (RFIC) may be additionally formed as part of the antenna module 197.
- RFIC radio frequency integrated circuit
- the antenna module 197 may form a mmWave antenna module.
- a mmWave antenna module includes a printed circuit board, an RFIC disposed on or adjacent to a first side (e.g., bottom side) of the printed circuit board and capable of supporting a designated high-frequency band (e.g., mmWave band); And a plurality of antennas (e.g., array antennas) disposed on or adjacent to the second side (e.g., top or side) of the printed circuit board and capable of transmitting or receiving signals in the designated high frequency band. can do.
- a mmWave antenna module includes a printed circuit board, an RFIC disposed on or adjacent to a first side (e.g., bottom side) of the printed circuit board and capable of supporting a designated high-frequency band (e.g., mmWave band); And a plurality of antennas (e.g., array antennas) disposed on or adjacent to the second side (e.g., top or side)
- peripheral devices e.g., bus, general purpose input and output (GPIO), serial peripheral interface (SPI), or mobile industry processor interface (MIPI)
- signal e.g. commands or data
- commands or data may be transmitted or received between the electronic device 101 and the external electronic device 104 through the server 108 connected to the second network 199.
- Each of the external electronic devices 102 or 104 may be of the same or different type as the electronic device 101.
- all or part of the operations performed in the electronic device 101 may be executed in one or more of the external electronic devices 102, 104, or 108.
- the electronic device 101 may perform the function or service instead of executing the function or service on its own.
- one or more external electronic devices may be requested to perform at least part of the function or service.
- One or more external electronic devices that have received the request may execute at least part of the requested function or service, or an additional function or service related to the request, and transmit the result of the execution to the electronic device 101.
- the electronic device 101 may process the result as is or additionally and provide it as at least part of a response to the request.
- cloud computing distributed computing, mobile edge computing (MEC), or client-server computing technology can be used.
- the electronic device 101 may provide an ultra-low latency service using, for example, distributed computing or mobile edge computing.
- the external electronic device 104 may include an Internet of Things (IoT) device.
- Server 108 may be an intelligent server using machine learning and/or neural networks.
- the external electronic device 104 or server 108 may be included in the second network 199.
- the electronic device 101 may be applied to intelligent services (e.g., smart home, smart city, smart car, or healthcare) based on 5G communication technology and IoT-related technology.
- Electronic devices may be of various types.
- Electronic devices may include, for example, portable communication devices (e.g., smartphones), computer devices, portable multimedia devices, portable medical devices, cameras, wearable devices, or home appliances.
- Electronic devices according to embodiments of this document are not limited to the above-described devices.
- first, second, or first or second may be used simply to distinguish one element from another, and may be used to distinguish such elements in other respects, such as importance or order) is not limited.
- One (e.g. first) component is said to be “coupled” or “connected” to another (e.g. second) component, with or without the terms “functionally” or “communicatively”.
- any of the components can be connected to the other components directly (e.g. wired), wirelessly, or through a third component.
- module used in various embodiments of this document may include a unit implemented in hardware, software, or firmware, and is interchangeable with terms such as logic, logic block, component, or circuit, for example. It can be used as A module may be an integrated part or a minimum unit of the parts or a part thereof that performs one or more functions. For example, according to one embodiment, the module may be implemented in the form of an application-specific integrated circuit (ASIC).
- ASIC application-specific integrated circuit
- Various embodiments of the present document are one or more instructions stored in a storage medium (e.g., built-in memory 136 or external memory 138) that can be read by a machine (e.g., electronic device 101). It may be implemented as software (e.g., program 140) including these.
- a processor e.g., processor 120
- the one or more instructions may include code generated by a compiler or code that can be executed by an interpreter.
- a storage medium that can be read by a device may be provided in the form of a non-transitory storage medium.
- 'non-transitory' only means that the storage medium is a tangible device and does not contain signals (e.g. electromagnetic waves).
- This term refers to cases where data is stored semi-permanently in the storage medium. There is no distinction between temporary storage cases.
- Computer program products are commodities and can be traded between sellers and buyers.
- the computer program product may be distributed in the form of a machine-readable storage medium (e.g. compact disc read only memory (CD-ROM)) or through an application store (e.g. Play StoreTM) or on two user devices (e.g. It can be distributed (e.g. downloaded or uploaded) directly between smart phones) or online.
- a machine-readable storage medium e.g. compact disc read only memory (CD-ROM)
- an application store e.g. Play StoreTM
- two user devices e.g. It can be distributed (e.g. downloaded or uploaded) directly between smart phones) or online.
- at least a portion of the computer program product may be at least temporarily stored or temporarily created in a machine-readable storage medium, such as the memory of a manufacturer's server, an application store's server, or a relay server.
- each component (e.g., module or program) of the above-described components may include a single or plural entity, and some of the plurality of entities may be separately placed in other components. there is.
- one or more of the components or operations described above may be omitted, or one or more other components or operations may be added.
- multiple components eg, modules or programs
- the integrated component may perform one or more functions of each component of the plurality of components identically or similarly to those performed by the corresponding component of the plurality of components prior to the integration. .
- operations performed by a module, program, or other component may be executed sequentially, in parallel, iteratively, or heuristically, or one or more of the operations may be executed in a different order, or omitted. Alternatively, one or more other operations may be added.
- FIG. 2 is a block diagram illustrating one or more processors included in the electronic device 101 according to an embodiment.
- the electronic device 101 of FIG. 2 may correspond to the electronic device 101 of FIG. 1 .
- the electronic device 101 may include at least one of a CPU 200, an NPU 210, a GPU 220, or a memory 130.
- the CPU 200, the NPU 210, the GPU 220, and the memory 130 are electrically and/or connected to each other by an electronic component such as a communication bus 230.
- the type and/or number of hardware components included in the electronic device 101 are not limited to those shown in FIG. 2.
- the electronic device 101 includes the display module 160 and the communication module of FIG. 1. (190) may be further included.
- the CPU 200 of the electronic device 101 may include hardware components for processing data based on one or more instructions.
- Hardware components for processing data may include, for example, an Arithmetic and Logic Unit (ALU), a Floating Point Unit (FPU), and/or a Field Programmable Gate Array (FPGA).
- the FPU may be a module for efficiently processing floating point operations.
- the ALU may be a module for efficiently processing integer operations.
- the CPU 200 may have the structure of a multi-core processor such as dual core, quad core, or hexa core.
- the CPU 200 of FIG. 2 may correspond to an example of the processor 120 and/or the main processor 121 of FIG. 1.
- the GPU 220 of the electronic device 101 may include one or more pipelines that perform a plurality of operations required to execute instructions related to computer graphics.
- the pipeline of the GPU 220 is a graphics pipeline or rendering pipeline for generating a 3D image and generating a 2D raster image from the generated 3D image. may include.
- the graphics pipeline is included in a file stored in the memory 130 and can be controlled based on code written in a shading language. For example, code written in a shading language may be compiled by the CPU 200 into instructions executable on the GPU 220.
- the NPU 210 of the electronic device 101 may include hardware components to support one or more functions based on a neural network.
- the neural network is a recognition model implemented in software or hardware that imitates the computational ability of a biological system using a large number of artificial neurons (or nodes).
- a neural network may be referred to as a model.
- the electronic device 101 may execute functions similar to human cognitive functions or learning processes based on a neural network.
- one or more functions based on the neural network supported by the NPU 210 include the function of training a neural network, image recognition, voice recognition, and/or handwriting recognition using a trained neural network. It may include a function to perform, a function personalized to the user of the electronic device 101 based on a neural network, and a function to control a neural network based on an application using an API (Application Programming Interface).
- API Application Programming Interface
- the CPU 200, NPU 210, and GPU 220 of FIG. 2 are each included as different integrated circuits in the electronic device 101, or are based on a System on Chip (SoC). may be included in a single integrated circuit (single IC).
- SoC System on Chip
- the CPU 200, the NPU 210, the GPU 220, or a combination thereof may be included in a single integrated circuit included in the electronic device 101.
- the type of processing unit included based on the SoC is not limited to the above example, and for example, other hardware components (e.g., communication processor) not shown in FIG. 2 may be included in the CPU 200 and the NPU.
- 210 and the GPU 220 may be included in a single integrated circuit.
- the memory 130 of the electronic device 101 stores data and/or instructions input and/or output to the CPU 200, the NPU 210, and/or the GPU 220. May include hardware components for storage.
- the memory 130 may include, for example, volatile memory 132 such as RAM (Random-Access Memory) and/or non-volatile memory such as ROM (Read-Only Memory). It may include (134).
- the volatile memory 132 may include, for example, at least one of Dynamic RAM (DRAM), Static RAM (SRAM), Cache RAM, and Pseudo SRAM (PSRAM).
- the non-volatile memory 134 is, for example, at least one of PROM (Programmable ROM), EPROM (Erasable PROM), EEPROM (Electrically Erasable PROM), flash memory, hard disk, compact disk, and eMMC (Embedded Multi Media Card). It can contain one.
- PROM Programmable ROM
- EPROM Erasable PROM
- EEPROM Electrical Erasable PROM
- flash memory hard disk, compact disk, and eMMC (Embedded Multi Media Card). It can contain one.
- the memory 130, the volatile memory 132, and the non-volatile memory 134 of FIG. 2 may correspond to the memory 130, the volatile memory 132, and the non-volatile memory 134 of FIG. 1, respectively. .
- a set of parameters for calculating a neural network may be stored.
- Parameters representing the neural network may, for example, represent a plurality of nodes included in the neural network and/or weights assigned to connections between the plurality of nodes.
- the structure of a neural network represented by a set of parameters stored in the memory 130 of the electronic device 101 according to one embodiment will be described later with reference to FIG. 3.
- at least one of the CPU 200, GPU 220, or NPU 210 may perform an operation based on the set of parameters.
- the NPU 210 may include a neural engine, buffer, and/or controller.
- the neural engine, the buffer, and the controller may be electrically and/or operatively connected to each other by electronic devices such as a communication bus.
- the neural engine and/or controller may be implemented in software.
- the neural engine and/or controller may be implemented in hardware.
- the NPU 210 may perform operations required to execute network-related functions.
- the NPU 210 may at least temporarily store one or more numerical values used in the calculation or one or more output numerical values in order to perform the calculation.
- the controller of the NPU 210 may control operations based on the neural engine included in the NPU 210.
- the controller may be a software module.
- the controller may be a hardware module.
- the electronic device 101 may identify a specific numerical value from one or more bits based on a plurality of data types.
- the data type may be a predetermined category for interpretation of one or more bits by the electronic device 101.
- the electronic device 101 may interpret a set of one or more bits based on a data type corresponding to the set and identify data represented by the set. For example, when the electronic device 101 stores one or more bits representing a specific numerical value in the memory 130, the number of bits corresponding to the specific numerical value within the memory 130 varies depending on the data type. may be differentiated.
- the NPU 210 may support binary arithmetic operations for each of a plurality of data types. As the NPU 210 supports binary arithmetic operations of multiple data types, the buffer can be managed more efficiently. Hereinafter, multiple data types may have different precisions.
- the NPU 210 may support performance of arithmetic operations based on a designated data type. For example, if the data type included in the neural network stored in the memory 130 and the data type supported by the NPU 210 are different, the NPU 210 may not support operations of the different data type. there is.
- the electronic device 101 is capable of performing neural network operations such as the NPU 210, a graphic processing unit (GPU), a tensor processing unit (TPU), and/or a digital signal processor (DSP). If the precision (or data type) supported by the module is different from the precision (or data type) of the parameters (e.g.
- the precision of the parameters can be changed using a type conversion algorithm such as ).
- the electronic device 101 inputs the parameters to the CPU 200, performs an operation related to the neural network, and changes the precision of the parameters using the results of the operation. You can.
- the precision refers to quantization level, resolution, bit depth, digital level, and representation density in terms of representing the unit of each representation value within the quantized range. (representation density), quantum density, quantum range, indication capability, indication density, indication level. and/or may be referred to by equivalent technical terms.
- Quantization may refer to the operation of converting a real number variable into an integer variable.
- the CPU 200 may convert a variable of a data type representing a floating point number based on 16 bits into a variable of a data type representing an integer based on 8 bits.
- the CPU 200 may convert a variable of a data type representing a floating point number based on 8 bits to a variable of a data type representing an integer based on 4 bits.
- Methods for performing the quantization can be divided into symmetric methods and asymmetric methods.
- the values of weights and/or biases included in the model can be proportionally converted.
- the values of weights and/or biases included in the model may change non-proportionally. For example, based on an asymmetric method, when parameters (e.g., weights and/or biases) of a first data type for expressing floating point numbers are quantized into a second data type for expressing integers, the first The length of the first section of the first data type matching the first integer of the second data type and the length of the second section of the first data type matching the second integer of the second data type may be different from each other. .
- the method of performing quantization may be determined depending on the quantization method supported by the processor.
- quantization can be performed in an asymmetric manner.
- the processor may perform quantization through a quantizer.
- the quantizer may include an application and/or an application programming interface (API) (eg, runtime library) provided for model calculation.
- API application programming interface
- the embodiment is not limited to this, and the quantizer may include an accelerator and/or a combination of applications (or instructions) for controlling the accelerator.
- the quantizer may include a software application for quantizing a set of weights and a bias.
- the accelerator may support operations for operators to execute models.
- the model may include a machine learning model.
- the accelerator may be a hardware component (eg, a processor, a component included in the processor, or a component separate from the processor) and/or software within the processor.
- the information described below with respect to FIG. 2 may be referred to.
- the present invention can perform quantization in units of individual operators constituting a neural network model. Quantization operations for individual operators are described in Figure 4.
- the quantized operator can be stored along with the hash value of the operator before quantization.
- the electronic device may perform an operation on the operator before quantization based on the quantized weight and bias.
- the quantized weights and biases may be identified based on the stored hash values.
- the electronic device may perform a reliability evaluation on the quantized neural network model and replace the model before quantization with a neural network model that has passed the reliability evaluation.
- the operation for the reliability evaluation is described in FIG. 6. If the neural network model does not pass the reliability evaluation, the neural network model before quantization may be quantized again in the server.
- the server may transmit sub-output data from various electronic devices (e.g., terminals) and perform quantization on a neural network model before quantization through the transmitted data.
- the operation for quantization through the server is described in FIG. 8.
- the quantizer may be a software application that quantizes the first weight and bias of the first data type to generate the second weight and bias of the second data type.
- the processor 120 may perform quantization through a quantizer.
- the quantizer may include an application and/or an application programming interface (API) (eg, runtime library) provided for model calculation.
- API application programming interface
- the embodiment is not limited to this, and the quantizer may include an accelerator and/or a combination of applications (or instructions) for controlling the accelerator.
- an accelerator may be a hardware component or a software application to support operations for an operator.
- the accelerator may support operations for operators to execute models.
- the model may include a machine learning model.
- the accelerator may include hardware components within the processor 120.
- the accelerator may include a cortex matrix engine (CME) within a central processing unit (CPU).
- the accelerator may include a tensor processing unit (TPU) within a graphic processing unit (GPU).
- the accelerator may include software executed by the processor 120.
- the accelerator may be a software application executed by at least one of the CPU, neural processing unit (NPU), or GPU.
- the accelerator may be a processor 120 capable of executing the model.
- the accelerator may include a CPU.
- the accelerator may include an NPU.
- the accelerator may include a GPU.
- the accelerator may include a TPU.
- FIG. 3 is an exemplary diagram illustrating a neural network 300 running on an electronic device according to an embodiment.
- the electronic device of FIG. 3 may correspond to an example of the electronic device 101 of FIG. 1 and/or FIG. 2 .
- the neural network 300 of FIG. 3 may be obtained, for example, from a set of parameters stored in a memory (e.g., memory 130 of FIGS. 1 and/or 2) by an electronic device according to one embodiment. .
- the neural network 300 may include a plurality of layers.
- the neural network 300 may include an input layer 310, one or more hidden layers 320, and an output layer 330.
- the input layer 310 may correspond to a vector and/or matrix representing input data of the neural network 300.
- a vector representing the input data may have elements corresponding to the number of nodes included in the input layer 310.
- elements included in the matrix representing the input data may correspond to each of the nodes included in the input layer 310.
- Signals generated by the input data at each node in the input layer 310 may be transmitted from the input layer 310 to the hidden layers 320.
- the output layer 330 may generate output data of the neural network 300 based on one or more signals received from the hidden layers 320.
- the output data may correspond to a vector and/or matrix having elements corresponding to the number of nodes included in the output layer 330.
- the first nodes included in a specific layer among the plurality of layers included in the neural network 300 are at least the second nodes of the previous layer of the specific layer within the sequence of the plurality of layers. It can correspond to a single weighted sum.
- the electronic device 101 may identify a weight to be applied to at least one of the second nodes from a set of parameters stored in the memory. Training the neural network 300 may include changing and/or determining one or more weights related to the weighted sum.
- one or more hidden layers 320 may be located between the input layer 310 and the output layer 330, and predict input data transmitted through the input layer 310. It can be easily converted to a value.
- the input layer 310, the one or more hidden layers 320, and the output layer 330 may include a plurality of nodes.
- One or more hidden layers 320 may be a convolutional filter or a fully connected layer in a convolutional neural network (CNN), or various types of filters or layers grouped based on special functions or characteristics. You can.
- the one or more hidden layers 320 may be a layer based on a recurrent neural network (RNN) whose output value is re-input to the hidden layer at the current time.
- the neural network 300 may include numerous hidden layers 320 to form a deep neural network. Training a deep neural network is called deep learning.
- nodes included in the hidden layers 320 are referred to as hidden nodes.
- Nodes included in the input layer 310 and the one or more hidden layers 320 may be connected to each other through connection lines with connection weights, and nodes included in the hidden layer and the output layer 330 may also be connected. They can be connected to each other through connection lines with weights.
- Tuning and/or training the neural network 300 includes layers included in the neural network 300 (e.g., an input layer 310, one or more hidden layers 320, and an output layer 330). This may mean changing the connection weight between nodes included in each. Tuning of the neural network 300 may be performed based on, for example, supervised learning and/or unsupervised learning.
- the electronic device may tune the neural network 300 based on reinforcement learning in unsupervised learning. For example, the electronic device may change policy information used by the neural network 300 to control the agent based on the interaction between the agent and the environment. The electronic device according to one embodiment may cause a change in the policy information by the neural network 300 in order to maximize the agent's goal and/or reward due to the interaction.
- the electronic device in a state of acquiring the neural network 300, may include the input layer 310, the one or more hidden layers 320, and/or the output layer ( 330), the weight corresponding to the connecting line between them can be identified.
- the electronic device uses a plurality of layers (e.g., the input layer 310, the one or more hidden layers) of the neural network 300.
- a weighted sum based on the connection line may be obtained sequentially along the layers 320 and the output layer 330.
- the obtained weighted sum may be stored in the NPU 210 and/or memory 130 of FIG. 2.
- the electronic device may repeatedly update the weighted sum stored in the memory by sequentially obtaining the weighted sum along the plurality of layers.
- Each of the plurality of layers of the neural network 300 may have an independent data type and/or precision. For example, when connection lines between a first layer and a second layer among the plurality of layers have weights based on a first data type to represent a floating point number, the electronic device may From the numerical values corresponding to the nodes of the first layer and the weights, weighted sums based on the first data type can be obtained. In the above example, when the connecting lines between the second layer and the third layer among the plurality of layers have weights based on the second data type to represent an integer number, the electronic device includes the obtained weighted sums and From weights based on the second data type, weighted sums based on the second data type can be obtained.
- the electronic device may, for example, use the NPU 210 of FIG. 2 to provide information to each of the plurality of layers based on the different data types. Corresponding weighted sums can be obtained. As the electronic device accesses memory based on weighted sums obtained based on different data types, the bandwidth of the memory can be used more efficiently. As the bandwidth of memory is used more efficiently, the electronic device according to one embodiment can more quickly obtain output data from the neural network 300 based on the plurality of layers.
- An electronic device may store sets of parameters representing each of a plurality of neural networks with different precisions.
- a neural network involving super resolution for upscaling images and/or video may require the precision of a data type to represent a floating point number based on 32 bits.
- a neural network related to super resolution for upscaling images and/or video may use a data type (e.g. in IEEE 754) to represent floating point numbers based on 16 bits.
- the precision of the half-precision floating point format defined by For example, a neural network for recognizing a subject included in an image and/or video may require the precision of a data type to represent an integer based on 8 bits and/or 4 bits.
- a neural network for performing handwriting recognition may require precision of a data type to represent an integer based on the first bit and/or the second bit.
- An electronic device may perform an operation to obtain a weighted sum based on different precisions corresponding to each of a plurality of neural networks.
- Figure 4 is an example diagram for explaining quantization of weights according to an embodiment.
- layers included in the neural network model e.g., the input layer 310, hidden layers 320, and/or output layer 330 in FIG. 3 may correspond to at least one operator.
- model 401 may include a first operator 405 with a first set of weights of a first data type.
- Model 403 can be created by quantizing model 401. For example, the quantization of model 403 may be obtained from model 401.
- the 'first' operator 407 can be created by quantizing the first operator 405.
- Output data may be generated by operations included in the models 401 and 403 based on the input data.
- Model 403 may include a first' operator 407 with a second set of weights of a second data type.
- an operator may be a unit of computation performed by a neural network and/or model.
- a model may be formed by sequential concatenation of operators distinguished by different parameters (eg, weights and/or biases).
- the model 401 may include at least one operator.
- the first operator 405 may be one of at least one operator included in the model 401.
- the first operator 405 may include a first set of weights and a bias of a first data type.
- the first data type may represent a floating point number.
- the first data type may represent a floating point number using 32 bits.
- the first data type may represent a floating point number using 16 bits.
- the second data type may represent an integer.
- the second data type may represent an integer using 8 bits.
- the second data type may represent an integer using 4 bits.
- An electronic device e.g., the electronic device 101 of FIG.
- At least one processor may obtain first sub-output data from input data through an operation of a first operator.
- the first sub-output data may be input as input data to a second operator connected to the first operator.
- the at least one processor 120 may obtain second sub-output data from the first sub-output data through an operation of a second operator.
- the second sub-output data may be input as input data to a third operator connected to the second operator.
- the at least one processor may obtain output data of the model 401 through an operation of an output operator on sub-output data.
- the output operator may generate output data of the neural network model.
- the at least one processor 120 may change the first weight set of the first data type to the second weight set of the second data type.
- the at least one processor 120 may change a first weight set, which is a floating point number expressed using 16 bits, to a second weight set, which is an integer expressed using 8 bits.
- the at least one processor 120 may change a first weight set, which is a floating point number expressed using 32 bits, to a second weight set, which is an integer expressed using 4 bits.
- An accelerator may be a hardware component or a software application to support operations for an operator. The accelerator may support operations for operators to execute models.
- the model may include a machine learning model.
- the accelerator can process calculations related to neural networks.
- the accelerator may perform matrix operations requiring floating point numbers and/or integers of a specific data type.
- the matrix operations may include multiplication operations of different matrices.
- the accelerator may include a plurality of FPUs and/or a plurality of arithmetic logic units (ALUs) for multiplication operations of floating point numbers and/or integers of a specific data type.
- ALUs arithmetic logic units
- the data type of the weight set may vary depending on the operator. Therefore, whether to quantize an individual operator can be determined based on the data type of the individual operator.
- the at least one processor 120 may obtain a set of sub-output data from a set of input data based on an operation of an individual operator.
- the at least one processor 120 may obtain a set of output data from a set of input data by performing an operation on at least one operator.
- the at least one processor 120 may store the set of input data, the set of sub-output data, and the set of output data as profile information.
- the at least one processor 120 calculates the first set of weights based on the distribution of the set of sub-output data.
- the second weight set can be obtained by performing quantization.
- the at least one processor 120 may obtain a set of sub-output data by performing an operation on the first operator 405 on a set of input data that is actually used data. Quantization of the first weight of the first operator 405 may be generated based on the section of the sub-output data.
- the numeric interval that can be expressed by the first data type for expressing a floating point number may be the first interval.
- the numeric interval that can be expressed by the second data type for expressing an integer may be the second interval.
- the first section may range from negative infinity to positive infinity.
- the second interval may be from -128 to 127 in the case of a data type expressed by 8 bits.
- the second interval may be from -32768 to 32767.
- the first weight set of the first operator 405 is expressed as an 8-bit integer instead of a data type for expressing a floating point number.
- the at least one processor 120 may change the data type of the first weight set based on the section of the sub-output data. The at least one processor 120 determines the number of second sections having a length smaller than the length of the first section from the first weight set of a first data type for expressing the number of the first sections through quantization.
- a second weight set of a second data type for expressing can be obtained.
- the at least one processor 120 may store the quantized second weight set in memory.
- the at least one processor 120 may obtain output data by performing an operation on the first operator 405 based on the second weight set.
- the quantization may be performed for each operator. Even if the data type of the first operator's weight set is different from the data type supported by the accelerator and quantization is performed for the first operator, if the data type of the second operator's weight set is the same as the data type supported by the accelerator, the second operator Quantization may not proceed for the operator.
- the at least one processor 120 may store the second weight set in the memory along with a hash value of the first weight set.
- the at least one processor 120 may identify a second weight set based on a hash value of the first weight set when the model 401 is executed after the model 401 is quantized.
- the at least one processor 120 may perform an operation for the first operator based on a second weight set corresponding to a hash value of the first weight set.
- an accelerator may be a hardware component or a software application to support operations for an operator.
- the accelerator may support operations for operators to execute models.
- the model may include a machine learning model.
- the type of the accelerator the information described below with respect to FIG. 2 may be referred to.
- the accelerator can process calculations related to neural networks.
- the accelerator may perform matrix operations requiring floating point numbers and/or integers of a specific data type.
- the matrix operations may include multiplication operations of different matrices.
- the accelerator may include a plurality of FPUs and/or a plurality of arithmetic logic units (ALUs) for multiplication operations of floating point numbers and/or integers of a specific data type.
- the FPU may be a module for efficiently processing floating point operations.
- the ALU may be a module for efficiently processing integer operations.
- the accelerator may not support operations on high-precision data depending on the model. For example, the accelerator may perform an operation based on a second data type for representing integers rather than a first data type for representing floating point numbers.
- the accelerator may perform an operation based on a second data type for representing an integer through 8 bits rather than a first data type for representing an integer through 64 bits.
- the accelerator can mainly perform low-precision calculations for versatility. Therefore, the at least one processor 120 can increase accelerator utilization by changing the first weight set of the first data type to the second weight set of the second data type.
- the model 403 may include at least one operator.
- the 'first' operator 407 may be one of at least one operator included in the model 403.
- the 'first' operator 407 may include a second set of weights and a bias based on a second data type.
- the second data type may represent a floating point number.
- the second data type may represent a floating point number using 16 bits.
- the second data type may represent an integer.
- the second data type may represent an integer using 8 bits.
- the second data type may represent an integer using 4 bits.
- the second data type may represent an integer using 2 bits.
- Figure 5 is a block diagram for explaining the operation of a memory included in an electronic device according to an embodiment.
- an electronic device may include a memory 501 , a first processor 511 , and a second processor 521 .
- the memory 501 can store profile information 507.
- the profile information 507 may include input data 502, data for the model 503, and output data 505.
- the model 503 may include at least one operator including the kth operator 504.
- the kth operator 504 may include a first set of weights and a bias.
- the kth operator 504 can be quantized through the quantizer 514.
- the kth operator 504 can be changed to the k'th operator 506 through a quantization process.
- the first processor 511 may include a quantizer 514.
- the first processor 511 and/or the second processor 521 may include a floating point unit (FPU) and/or an accelerator.
- An accelerator may be a hardware component or a software application to support operations for an operator.
- the accelerator may support operations for operators to execute models.
- the model may include a machine learning model.
- the type of the accelerator the information described below with respect to FIG. 2 may be referred to.
- the memory 501, the first processor 511, and the second processor 521 are electrically and/or operationally connected to each other by an electronic component such as a communication bus 531. Can be connected (electronically and/or operably coupled with each other).
- the processor e.g., the first processor 511 and/or the second processor 521
- the processor may perform operations on operators having data types supported by the FPU through the FPU.
- the processor may perform operations on operators having data types supported by the accelerator through the accelerator.
- the quantizer may be a software application for quantizing a weight set and bias. The first processor 511 may quantize the first weight and bias of the first data type into the second weight and bias of the second data type through the quantizer 514.
- the first processor 511 may be a central processing unit (CPU).
- the first processor 511 may generate the k'th operator 506 by quantizing the first weight and bias of the kth operator 504 through the quantizer 514.
- the FPU can perform an operation on the first weight of the first data type that requires high precision before quantization.
- the FPU may be a module for efficiently processing floating point operations.
- the quantizer 514 may include an application and/or an application programming interface (API) (eg, runtime library) provided for model calculation. The embodiment is not limited to this, and the quantizer 514 may include a combination of an accelerator and/or an application for controlling the accelerator.
- the quantizer 514 may be a software application for quantizing the weight set and bias calculated by the accelerator.
- the processor may quantize the first weight and bias of the first data type into the second weight and bias of the second data type through the quantizer 514.
- the accelerator may support operations for operators to execute models.
- the first processor 511 may identify the second data type supported by the accelerator for computation for an operator.
- the second data type may correspond to low precision.
- the accelerator may support only data types expressed as 8-bit integers and data types expressed as 16-bit integers.
- the first processor 511 may identify the first data type of the first weight set included in the first operator that is different from the second data type.
- the first data type may be a data type expressed as a 32-bit floating point number.
- the first processor 511 is included in the kth operator 504, which is one of at least one operator included in the model 503, based on identifying the first data type that is different from the second data type.
- the first weight set of the first data type may be obtained.
- the kth operator 504 of the model 503 has a first weight of a first data type expressed as a floating point number of 32 bits, and the accelerators have a first weight of a second data type expressed as an integer of 8 bits. can support.
- the first processor 511 may obtain the first weight set of the kth operator 504 based on identifying that the first data type and the second data type are different.
- the first processor 511 may quantize the obtained first weight set.
- the first processor 511 calculates the first weight set based on the distribution of the set of sub-output data.
- the second weight set can be obtained by performing quantization. For example, if the set of sub-output data has values between -120 and 110, the first weight set of the kth operator is the 8-bit set supported by the accelerator instead of the data type for representing floating point numbers. Being written as a data type expressed as an integer may be advantageous in terms of computational speed and use of accelerators included in electronic devices. Therefore, the first processor 511 can change the data type of the first weight set based on the section of the sub-output data.
- the first processor 511 calculates the number of second sections having a length smaller than the length of the first section from the first weight set of the first data type for expressing the number of the first sections through quantization. A second weight set of a second data type for expression may be obtained. The first processor 511 may store the quantized second weight set in the memory 501. When performing the neural network model 503, the first processor 511 may obtain output data by performing an operation on the kth operator 504 based on the second weight set.
- the second data type of the second weight set of the k'th operator 506 obtained through the quantization may be operable by the accelerator. This is because the accelerator supports the second data type.
- the first processor 511 may perform an operator's calculation for the second weight set in at least one of the accelerators. When two or more neural network models are executed, the first processor 511 can designate an accelerator on which operations of operators included in each model will be performed. Therefore, the computation speed and efficiency of a quantized model may be higher than the computation speed and efficiency of a non-quantized model. Since the input data and output data input to the model by the user are used to evaluate the accuracy of the quantized model, the electronic device 101 can obtain a model personalized to the user based on quantization.
- the electronic device After obtaining the quantized operator through the operation described above in FIG. 5, the electronic device can measure the reliability of a model including the quantized operator. Based on the result of measuring the reliability, the electronic device may determine whether to replace the existing operator included in the model with the quantized operator. In Figure 6 below, the reliability measurement operation is explained.
- FIG. 6 is a block diagram illustrating reliability evaluation of a model executed in an electronic device according to an embodiment.
- model 611 may include a neural network.
- the model 611 may include at least one operator including a first operator 613.
- the model 621 can be created by quantizing the model 611. For example, the quantization of model 621 may be obtained from model 611 by the operation described in FIG. 5 .
- the 'first' operator 623 may be created by quantizing the first operator 613.
- First output data 615 may be generated by an operation included in the model 611 based on the input data 601.
- Second output data 625 may be generated by an operation included in the model 621 based on the input data 601.
- At least one processor (e.g., processor 120 in FIG. 1) generates a quantized model 621 based on the difference between the first output data 615 and the second output data 625. ) can be evaluated. Since the model 621 is created by quantizing the model 611, the reliability of the model 621 can be evaluated in order to replace the model 611 with the model 621.
- the at least one processor 120 may evaluate reliability by comparing first output data 615 of the model 611 and second output data 625 of the model 621. This may be because when the difference between output data for the same input data is less than the reference value, there is a low possibility that a problem will occur even if the model 621 replaces the model 611.
- the at least one processor 120 may identify, from the profile information generated by execution of the model 611, a set of first output data 615 obtained by performing an operation for the at least one operator. You can. Since the model 611 has been executed more times than necessary for quantization and/or reliability evaluation, the profile information may include more than a specified number of input data, more than a specified number of sub-output data, and more than a specified number of output data. Therefore, the profile information may include a set of input data, a set of sub-output data, and a set of output data for the model 611.
- the at least one processor 120 may perform a reliability evaluation on the model 621 based on obtaining the second weight set. For example, the at least one processor 120 may perform a reliability evaluation on the model 621 based on identifying that the creation of the model 621 is complete. According to one embodiment, the at least one processor 120 may identify the set of the first output data 615 obtained by performing an operation for the at least one operator from the profile information. The at least one processor 120 obtains a second set of output data 625 from the set of input data 601 by performing an operation for the at least one operator based on the second weight set. can do. For example, the at least one processor 120 may obtain a set of second output data 625 through an operation on the input data 601 in the model 621.
- the at least one processor 120 generates a first output data of the model 611 based on a set of difference values between the first output data 615 and the corresponding second output data 625.
- the first weight set included in the operator 613 can be replaced with the second weight set. This is because when the sets of difference values between the first output data 615 and the second output data 625 are all less than the reference value, there is a low possibility of a problem occurring even if the model 621 replaces the model 611. . For example, if the sets of difference values are about 0.5%, about 0.7%, about 0.5%, about 0.2%, about 0.9%, and the reference value is about 1%, then the model 621 is ) can be replaced.
- the model 621 may replace the model 611. I can't. For example, if the sets of difference values are about 0.5%, about 0.7%, about 0.5%, about 3.5%, about 0.9%, and the reference value is about 1%, then the model 621 is the model 611. cannot replace .
- the at least one processor 120 performs quantization again, or Quantization can be performed through the server.
- the at least one processor 120 when at least one difference value among the set of difference values between the first output data 615 and the second output data 625 is greater than or equal to a reference value, the at least one processor 120, Quantization can be performed again. Since quantization takes time and resources, the time zone in which quantization of the model 611 is performed can be adjusted. For example, the at least one processor 120 may perform quantization of the model 611 during off-peak hours. As an example, quantization of the model 611 may be performed in the middle of the night. For example, based on the amount of mobile phone usage depending on the time of day, quantization may be performed during times when mobile phone usage is low. In the case of students, the times when mobile phone usage is low may be class times.
- the times when mobile phone usage is low may be business hours.
- the at least one processor 120 may display a notification on the display to guide the user to select a time zone at which quantization will be performed.
- the at least one processor 120 may perform quantization at a time designated by the user.
- Quantization can be performed through the server. Quantization performed in the server is described below in FIG. 8.
- the second operator includes weight sets of the second data type supported by the accelerator, so quantization may not be performed.
- a second operator on which quantization is not performed may have the same weight set and bias in the model 621 and the model 611.
- the model 621 is shown as being stored in memory separately from the model 611, but the embodiment of the present disclosure is not limited thereto.
- the at least one processor 120 may not store the model 621 itself, but may store the second weight set and bias of the quantized operator in memory.
- the at least one processor 120 may store the quantized second weight set in the memory together with a hash value of the first weight set.
- the at least one processor 120 may identify a second weight set based on a hash value of the first weight set.
- the at least one processor 120 may perform an operation for the first operator 613 based on a second weight set corresponding to a hash value of the first weight set.
- FIG. 7 illustrates a flow of operations of an electronic device for storing quantized second weight sets according to an embodiment.
- At least one processor may obtain a first set of weights.
- a model including a neural network may include a first operator (eg, the first operator 613 in FIG. 6).
- the first weight set may be weights included in the first operator.
- the at least one processor 120 identifies a second data type of the accelerator that is different from the first data type of the first operator 613, the absence of quantization information, and / Or it can be identified whether to identify the absence of a quantization model.
- the at least one processor 120 may perform operation 704. If quantization information is not identified, the at least one processor 120 may perform operation 704.
- the at least one processor 120 may perform operation 704.
- the at least one processor 120 may perform operation 703.
- the at least one processor 120 may perform operation 703.
- the at least one processor 120 may identify a second data type of the accelerator that is different from the first data type of the first operator (eg, the first operator 613 in FIG. 6).
- the first data type may represent a 32-bit floating point number.
- the accelerator can only support a data type expressed as an 8-bit integer and a second data type expressed as a 16-bit integer.
- the first data type may represent a floating point number expressed using 16 bits.
- the second data type can represent an integer using 8 bits.
- the precision and/or range of the second data type is lower than that of the first data type, and the electronic device 101 uses a model based on the second data type. If the calculation of is possible, the quantization function can be performed.
- the at least one processor 120 may identify the absence of quantization information and/or a quantization model. According to one embodiment, the at least one processor 120 may identify the absence of a quantization model.
- the quantization model may be a quantized model.
- the quantized model may be a 'first' operator corresponding to the hash value of the first operator. If there is no other operator corresponding to the hash value of the first operator, the at least one processor 120 may identify the absence of a quantized model.
- the at least one processor 120 may identify the absence of quantization information.
- the quantization information may be a value included in the quantized model.
- the quantization information may be a second weight set and/or bias within the first' operator. In the absence of the second set of weights and/or bias in the first' operator, the at least one processor 120 may identify the absence of quantized information.
- the at least one processor 120 may perform a first operator operation on the second quantized weight set.
- the quantized second weight set may be stored in the memory along with a hash value of the first weight set.
- the at least one processor 120 may identify a second weight set based on a hash value of the first weight set.
- the at least one processor 120 may perform an operation on the first operator 613 based on a second weight set corresponding to a hash value of the first weight set.
- the at least one processor 120 may perform a first operator operation with a first set of weights through a quantizer. This is because sub-output data is required for quantization.
- the first operator operation using the first weight set may take more time than the first operator operation using the quantized second weight set.
- the at least one processor 120 may collect and store profile information through the quantizer.
- the profile information may include a set of input data, a set of sub-output data, and a set of output data for the neural network model.
- the set of input data may be obtained by executing the neural network model.
- the at least one processor 120 may generate and store second weight sets through the quantizer. According to one embodiment, the at least one processor 120 may obtain the second weight set by performing quantization on the first weight set based on the distribution of the set of sub-output data. For example, the at least one processor 120 may obtain a set of sub-output data by performing an operation for the first operator on a set of input data that is actually used data. Quantization for the first weight of the first operator may be generated based on the section of the sub-output data. As an example, the numeric interval that can be expressed by the first data type for expressing a floating point number may be the first interval. The numeric interval that can be expressed by the second data type for expressing an integer may be the second interval.
- the first section may range from negative infinity to positive infinity.
- the second interval may be from -128 to 127 in the case of a data type expressed by 8 bits. In the case of a data type expressed in 16 bits, the second interval may be from -32768 to 32767.
- the first weight set of the first operator is described as a data type expressed as an 8-bit integer instead of a data type for expressing a floating point number. This can be advantageous in terms of computational speed and use of accelerators. Therefore, the at least one processor 120 may change the data type of the first weight set based on the section of the sub-output data.
- the at least one processor 120 may store the quantized second weight set in memory. When the neural network model is executed, the at least one processor 120 may obtain output data by performing an operation for the first operator based on the second weight set.
- Figure 8 shows a flow of operations for executing quantization using a server according to an embodiment.
- an electronic device eg, the electronic device 101 of FIG. 1 may perform the following operations.
- At least one processor e.g., processor 120 of FIG. 1 performs quantization by a first set of weights through a quantizer.
- the first operator can operate.
- the operation 801 may be performed similarly to the operation 704 of FIG. 7.
- the at least one processor 120 performs operation 801 when identifying a second data type of the accelerator that is different from the first data type of the first operator or identifying the absence of quantization information and/or the absence of a quantization model. can do.
- the at least one processor 120 identifies a second data type of the accelerator that is the same as the first data type of the first operator, and when identifying quantization information and a quantization model, the at least one processor 120 determines the first data type for the second quantized weight set. Operations can be performed on operators. Sub-output data may be required for quantization.
- the at least one processor 120 may collect and store profile information through the quantizer.
- the profile information may include a set of input data, a set of sub-output data, and a set of output data.
- the at least one processor 120 may obtain the second weight set by performing quantization on the first weight set based on the distribution of the set of sub-output data.
- the at least one processor 120 may identify a reliability of the model that is outside a reference range. To perform operation 803, the at least one processor 120 may evaluate the reliability of the quantized model. The reliability of the quantized model may be identified based on whether at least one difference value among a set of difference values between first output data and second output data is greater than or equal to a reference value. First output data may be obtained by performing an operation for at least one operator based on the first weight set. The second output data may be obtained by performing an operation for at least one operator based on the second weight set.
- the at least one processor 120 when at least one difference value among the set of difference values between the first output data and the second output data is greater than or equal to a reference value, the at least one processor 120 operates on a server (e.g., FIG. 1 Quantization can be performed through the server 108).
- a server e.g., FIG. 1 Quantization can be performed through the server 108.
- the at least one processor 120 may transmit sub-output data information.
- the at least one processor 120 may transmit the sub-output data to the server 108.
- the at least one processor 120 selects the profile information based on identifying that at least one difference value of the set of difference values between the first output data and the corresponding second output data is greater than or equal to a reference value.
- the acquired set of sub-output data may be transmitted to the server 108 through the communication circuit.
- the sub-output data information may be generated through the first operator operation including the first weight set.
- the server 108 may receive sub-output data information.
- the sub-output data information may be received from a plurality of electronic devices.
- the at least one processor 120 may perform quantization through sub-output data information transmitted from a plurality of electronic devices of the same model. Since input data, which is actual data, is not transmitted to the server 108, it may be advantageous in terms of privacy protection compared to the case where input data is transmitted to the server 108. For example, since the input data and output data input to the model by the user are used to evaluate the accuracy of the quantized model, the electronic device 101 may obtain a model personalized to the user based on quantization. You can.
- the server 108 may aggregate sub-output data information.
- the at least one processor 120 may stop receiving sub-output data information.
- the server 108 may generate third weight sets.
- the server 108 may obtain the third weight set by performing quantization on the first weight set based on the distribution of the set of sub-output data. For example, quantization for the first weight of the first operator may be generated based on the section of the sub-output data.
- the server 108 expresses the number of third sections having a length smaller than the length of the first section from the first weight set of the first data type for expressing the number of the first section through quantization.
- a third weight set of the second data type may be obtained. For information about quantization, the information described in operation 706 of FIG. 7 may be referred to.
- the server 108 may transmit third weight sets. Based on the sub-output data, the server 108 transmits third weight sets of the second data type on which quantization of the first weight set has been performed to the electronic device 101 through the communication circuit. You can. The server 108 may transmit the third weight sets to a plurality of electronic devices that have requested quantization according to execution of the application.
- the at least one processor 120 may store the received third weight sets.
- the at least one processor 120 may store the received third weight set in the memory.
- the at least one processor 120 may obtain output data by performing an operation for the first operator based on the third weight set in a neural network.
- FIG. 9 illustrates a flow of operations of an electronic device for acquiring a second set of weights through quantization according to an embodiment.
- the at least one processor 120 may obtain a first set of weights. According to one embodiment, the at least one processor 120 performs operation 901 of FIG. 9 to determine whether the data type of the first weight set is different from the data type supported by the accelerator and whether quantization information is absent. , and/or based on the absence of a quantization model. According to one embodiment, the at least one processor 120 may identify a second data type of the accelerator that is different from the first data type of the operator (eg, the first operator 405 in FIG. 4). The at least one processor 120 may identify the absence of quantization information and/or the absence of a quantization model.
- the at least one processor 120 determines the first weight set.
- a first set of weights may be obtained to quantize the set of weights.
- the precision required for the first data type may be higher than the precision required for the second data type.
- the first data type may represent a floating point number using 16 bits.
- the second data type can represent an integer using 8 bits.
- the data type of the weight set may be different depending on the individual operator. Therefore, whether to quantize an individual operator can be determined based on the data type of the individual operator.
- the at least one processor 120 may obtain a set of sub-output data from profile information. This is because quantization is performed based on the set of sub-output data.
- the profile information may include the set of input data, the set of sub-output data, and the set of output data.
- the at least one processor 120 may obtain a set of sub-output data from a set of input data based on the operation of the first operator.
- the at least one processor 120 may obtain a set of output data from a set of input data by performing an operation on at least one operator.
- the at least one processor 120 may obtain a second set of weights based on a set of sub-output data.
- the at least one processor 120 calculates the first set of weights based on the distribution of the set of sub-output data.
- the second weight set can be obtained by performing quantization. Quantization for the first weight of the first operator may be generated based on the section of the sub-output data.
- the numeric interval that can be expressed by the first data type for expressing a floating point number may be the first interval.
- the numeric interval that can be expressed by the second data type for expressing an integer may be the second interval.
- the at least one processor 120 determines the number of second sections having a length smaller than the length of the first section from the first weight set of a first data type for expressing the number of the first sections through quantization. A second weight set of a second data type for expressing can be obtained.
- the at least one processor 120 may store a second set of weights based on obtaining the second set of weights.
- the at least one processor 120 may perform calculations based on the quantized second weight set when performing a neural network model.
- the at least one processor 120 may obtain output data by performing an operation for the first operator based on the second weight set.
- the at least one processor 120 may store the second weight set in the memory along with a hash value of the first weight set.
- the at least one processor 120 may identify a second weight set based on a hash value of the first weight set.
- the at least one processor 120 may perform an operation for the first operator based on a second weight set corresponding to a hash value of the first weight set.
- FIG. 10 illustrates a flow of operations of an electronic device for storing a second set of weights based on reliability evaluation according to an embodiment.
- the flow of operation of FIG. 10 may embody operation 904 of FIG. 9 .
- the at least one processor 120 may obtain a set of first output data from profile information. According to one embodiment, the at least one processor 120 may perform a reliability evaluation on a quantized neural network model based on obtaining the second weight set. For example, based on identifying that the generation of the quantized neural network model is complete, a reliability evaluation of the quantized neural network model may be performed. According to one embodiment, the at least one processor 120 may identify a set of first output data obtained by performing an operation for the at least one operator from the profile information.
- the at least one processor 120 may obtain a second output data set based on a second set of weights.
- the at least one processor 120 may obtain a second set of output data from the set of input data by performing an operation for the at least one operator based on the second weight set.
- the at least one processor 120 may obtain a set of second output data through an operation on the input data in the quantized neural network model.
- the at least one processor 120 replaces the first set of weights with a second set of weights based on the difference values between the first output data and the second output data. You can.
- the at least one processor 120 is configured to operate the first operator included in the neural network model based on a set of difference values between the first output data and the corresponding second output data.
- the first weight set can be replaced with the second weight set.
- the quantized neural network model is It can replace the neural network model.
- the quantized neural network model cannot replace the neural network model before quantization.
- the quantized neural network model is There is no replacement for the network model.
- the at least one processor 120 performs quantization again or performs quantization through a server. can do.
- FIG. 11 illustrates a flow of operations of an electronic device for identifying a quantization method according to reliability, according to an embodiment.
- the at least one processor 120 may obtain a first set of weights. According to one embodiment, the at least one processor 120 performs operation 1101 of FIG. 11 to determine whether the data type of the first weight set is different from the data type supported by the accelerator and whether quantization information is absent. , and/or based on the absence of a quantization model. According to one embodiment, the at least one processor 120 may identify a second data type of the accelerator that is different from the first data type of the operator. The at least one processor 120 may identify the absence of quantization information and/or the absence of a quantization model.
- the at least one processor 120 may obtain a first weight set from profile information in order to quantize the first weight set.
- the at least one processor 120 may obtain a set of sub-output data from profile information. This is because quantization is performed based on the set of sub-output data. According to one embodiment, the at least one processor 120 may obtain a set of sub-output data from a set of input data based on the operation of the first operator. Operation 1102 may be performed similarly to operation 902 of FIG. 9 . Hereinafter, duplicate descriptions will be omitted.
- the at least one processor 120 may obtain a second set of weights based on a set of sub-output data.
- the at least one processor 120 calculates the first set of weights based on the distribution of the set of sub-output data.
- the second weight set can be obtained by performing quantization. Quantization of the first weight of the first operator may be performed based on the section of the sub-output data. Operation 1103 may be performed similarly to operation 903 of FIG. 9 . Hereinafter, duplicate descriptions will be omitted.
- the at least one processor 120 may identify whether the reliability is within a reference range. When the reliability is within the reference range, the at least one processor 120 may perform operation 1105. When the reliability is outside the reference range, the at least one processor 120 may perform operation 1106. According to one embodiment, the at least one processor 120 obtains the first information by performing an operation on the at least one operator from profile information generated by executing (or simulating) a neural network model before quantization. A set of output data can be identified. According to one embodiment, the at least one processor 120 may perform a reliability evaluation on the quantized neural network model based on obtaining the second weight set. For example, the at least one processor 120 may perform a reliability evaluation based on identifying that quantization of the neural network model before quantization is complete.
- the at least one processor 120 may obtain a second set of output data from the set of input data by performing an operation for the at least one operator based on the second weight set.
- the at least one processor 120 may obtain reliability greater than or equal to a specified criterion for a quantized neural network based on a set of difference values between the first output data and the corresponding second output data. there is. This is because when the sets of difference values between the first output data and the second output data are all less than the reference value, the possibility of a problem occurring is low.
- the at least one processor 120 obtains reliability greater than a specified standard for the quantized neural network when at least one difference value among the set of difference values between the first output data and the second output data is greater than or equal to a reference value. Can not.
- the at least one processor 120 may store a second set of weights.
- the at least one processor 120 when the sets of difference values between the first output data and the corresponding second output data are all less than a reference value, sets the first weight set to the second weight value. It can be replaced as a set. This is because it is advantageous in terms of time resources to perform calculations for the operator based on the second weight set.
- the at least one processor 120 may transmit a set of sub-output data to a server.
- the first weight set cannot replace the second weight set.
- the at least one processor 120 may perform quantization again or may perform quantization through a server.
- the at least one processor 120 may receive a third quantized weight set from a server. Based on the sub-output data, the server may transmit the third weight sets on which quantization of the first weight set was performed to the electronic device 101 through the communication circuit. The server may transmit the third weight sets to a plurality of electronic devices that have requested quantization according to execution of the application. The at least one processor 120 may store the received third weight set in the memory. When the neural network model is executed, the at least one processor 120 may obtain output data by performing an operation for the first operator based on the third weight set. When executing an operation for the first operator based on the third weight set of the second data type, the operation may be executed based on an accelerator optimized for operations on machine learning, such as an NPU. Calculation speed can be improved and current consumption can be reduced.
- an accelerator optimized for operations on machine learning such as an NPU. Calculation speed can be improved and current consumption can be reduced.
- the present disclosure relates to an electronic device and method for quantizing a weight set included in an operator based on the range of sub-output data.
- the electronic device can improve calculation speed and reduce current consumption by performing quantization for each operator based on the range of sub-output data.
- reliability evaluation is performed and quantization is performed on the server for models with reliability less than a reference value, thereby reducing resources for performing quantization of various electronic devices.
- the electronic device 101 may include a memory 130; 501 and at least one processor 120; 200; 511.
- the at least one processor (120;200;511) in the memory (130;501), operates the first operator (613) included in the first operator (613), which is one of the at least one operator included in the model (503;611).
- a first set of weights of the data type may be obtained.
- the at least one processor (120; 200; 511) calculates the operation ( A set of sub-output data corresponding to the set of input data 502 and 601 used for computation can be obtained.
- the at least one processor (120; 200; 511) performs quantization on the first set of weights based on the set of sub-output data, thereby generating a second weight of a second data type supported by at least one accelerator. You can obtain a set.
- the at least one processor (120;200;511) may store the second weight set in the memory (130;501) based on obtaining the second weight set.
- the at least one processor 120; 200; 511) uses a weight set of the first data type for representing a floating point number of the first interval. From the first weight set, through the quantization, the second weight set of the second data type for expressing the number of second sections having a length smaller than the length of the first section can be obtained.
- the accelerator may include a circuit for performing an operation based on the second data type, either the first data type or the second data type.
- the accelerator may be configured to perform an operation based on the second data type, either the first data type or the second data type.
- the at least one processor responds to a request to execute a function associated with the model (503; 611), either the first weight set or the second weight set. , By inputting the second weight set to the accelerator, the calculation of the first operator 613 can be additionally performed.
- a set of output data 615 can be identified.
- the at least one processor (120; 200; 511) performs an operation for the at least one operator based on the second set of weights, thereby generating second output data from the set of input data (502; 601). 625) set can be obtained.
- the at least one processor (120; 200; 511) operates on the model (503) based on a set of difference values between the first output data (615) and the corresponding second output data (625). ;611), the first weight set included in the first operator 613 can be replaced with the second weight set.
- the electronic device may additionally include a communication circuit.
- the at least one processor (120; 200; 511) determines that at least one difference value among the set of difference values between the first output data 615 and the corresponding second output data 625 is greater than or equal to a reference value. Based on the identification, the set of sub-output data obtained from the profile information can be additionally transmitted to the server through a communication circuit.
- the at least one processor (120; 200; 511) based on the sub-output data, sends a third weight set of the second data type on which quantization of the first weight set was performed to a server through the communication circuit. You can receive additional information from.
- the at least one processor (120;200;511) may additionally store the received third weight set in the memory (130;501).
- the at least one processor 120; 200; 511
- the second weight set can be obtained by performing quantization on the weight set.
- the at least one processor (120;200;511) may additionally store the second weight set in the memory (130;501) along with the hash value of the first weight set.
- the at least one processor (120; 200; 511) additionally performs an operation on the first operators 613 based on a second weight set corresponding to a hash value of the first weight set. It can be done.
- the set of sub-output data of the first operator 613 is the input data 502; 601 of a second operator connected to the first operator 613, which is one of the at least one operator. It may be a set of
- the at least one processor (120; 200; 511) is connected to the accelerator for computation for an operator.
- the second data type supported by can be identified.
- the at least one processor (120;200;511) includes a first operator (613) different from the second data type. A first data type of the first weight set may be identified.
- the at least one processor 120;200;5111, based on identifying the first data type that is different from the second data type, , in the memory 130; 501, to obtain the first weight set of the first data type included in the first operator 613, which is one of at least one operator included in the model 503; 611. You can.
- the at least one processor (120; 200; 511) is connected to the accelerator for computation for an operator.
- the second data type supported by can be identified.
- the at least one processor (120;200;511) includes a first operator (613) different from the second data type. Identifying a first data type of a first weight set, or identifying a data type of a second weight set corresponding to a hash value of the first weight set with the first data type, or identifying a data type of a second weight set corresponding to the hash value of the first weight set. Based on identifying that the corresponding weight set is only the first weight set, in the memory 130; 501, the first operator 613, which is one of at least one operator included in the model 503; 611, The first weight set of the included first data type may be obtained.
- the at least one processor (120;200;511) selects the second data type supported by the accelerator. can be identified.
- the at least one processor (120;200;511) includes a first operator (613) different from the second data type. Identifying a first data type of a first weight set and whether a weight set of the second data type obtained from the first weight set based on quantization is stored in the memory or whether weights are included in the weight set Based on this, it can be determined whether to perform quantization on the first weight set of the first data type.
- the method performed by the electronic device 101 is performed in the memory 130; 501 by one of at least one operator included in the model 503; 611. It may include an operation of acquiring a first weight set of the first data type included in the first operator 613.
- the method uses input data 502 used for computation of the first operator 613 from profile information generated by execution of the model 503; 611, stored in the memory 130; 501. ;601) may include an operation of acquiring a set of sub-output data corresponding to the set.
- the method may include, based on the set of sub-output data, obtaining a second weight set of a second data type supported by at least one accelerator by performing quantization on the first weight set. .
- the method may include storing the second set of weights in the memory (130; 501) based on obtaining the second set of weights.
- the operation of acquiring the weight set of the second data type includes, through the quantization, the first weight set of the first data type for representing a floating point number of the first interval. It may include obtaining the second weight set of the second data type to represent the number of second sections having a length smaller than the length of the first section.
- the accelerator may be configured to perform an operation based on the second data type, either the first data type or the second data type.
- the method is configured to, in response to a request to execute a function associated with the model (503; 611), apply the second set of weights, either the first set of weights or the second set of weights, to the accelerator.
- the operation of performing the operation of the first operator 613 may be additionally included.
- the method identifies a set of first output data 615 obtained by performing an operation for the at least one operator from the profile information, based on obtaining the second set of weights.
- the method may include obtaining a second set of output data (625) from the set of input data (502; 601) by performing an operation on the at least one operator based on the second set of weights. You can.
- the method is based on a set of difference values between the first output data 615 and the corresponding second output data 625, and the first operator 613 included in the model 503; 611. ) may include replacing the first weight set included in the second weight set with the second weight set.
- the method is based on identifying that at least one difference value of the set of difference values between the first output data 615 and the corresponding second output data 625 is greater than or equal to a reference value. , It may additionally include an operation of transmitting the set of sub-output data obtained from the profile information to a server through a communication circuit. The method may additionally include receiving, based on the sub-output data, a third weight set of the second data type on which quantization of the first weight set has been performed from a server through the communication circuit. The method may additionally include storing the received third weight set in the memory (130;501).
- the operation of obtaining the second weight set of the second data type is performed by performing quantization on the first weight set based on the distribution of the set of sub-output data. It may include an operation to obtain.
- the operation of storing the second weight set together with the hash value of the first weight set in the memory (130; 501) may be additionally included.
- the method may additionally include performing an operation on the first operators 613 based on a second weight set corresponding to a hash value of the first weight set.
- the set of sub-output data of the first operator 613 is the input data 502; 601 of a second operator connected to the first operator 613, which is one of the at least one operator. It may be a set of
- obtaining the first set of weights includes identifying the second data type supported by the accelerator for computation for an operator.
- the operation of acquiring the first weight set includes identifying a first data type of the first weight set included in the first operator 613 that is different from the second data type.
- the operation of obtaining the first set of weights generates, in the memory 130;501, a model 503 based on identifying the first data type that is different from the second data type.
- 611) may include an operation of acquiring the first weight set of the first data type included in the first operator 613, which is one of the at least one operator included in 611).
- obtaining the first set of weights includes identifying the second data type supported by the accelerator for computation for an operator. may include.
- the operation of acquiring the first weight set includes identifying a first data type of the first weight set included in the first operator 613 that is different from the second data type. , identifying the data type of the second weight set corresponding to the hash value of the first weight set as the first data type, or the weight set corresponding to the hash value of the first weight set being the first data type.
- the first data included in the first operator 613 which is one of at least one operator included in the model 503; 611. and obtaining the first weight set of type.
- obtaining the first set of weights may include identifying the second data type supported by the accelerator.
- the operation of acquiring the first weight set includes identifying a first data type of the first weight set included in the first operator 613 that is different from the second data type. may include.
- the operation of acquiring the first weight set may be performed by determining whether the weight set of the second data type obtained from the first weight set based on quantization is stored in the memory or the weight set. The method may include determining whether to perform quantization on the first weight set of the first data type based on whether a weight is included in the set.
- the operation of acquiring the second weight set of the second data type is by performing quantization on the first weight set based on the section in which the values of the set of sub-output data are distributed. It may include an operation of obtaining the second weight set.
- the operation of acquiring the second weight set of the second data type includes the electronic device using the size of the section in which the values of the sub-output data set are distributed to determine the second data. It may include determining a quantization level for a type and performing quantization on the first weight set according to the quantization level.
- the range of numbers that can be expressed according to the second data type and the quantization level may include a section of the sub-output data set.
- the one or more programs are stored in the memory (130; 501) when executed by a processor of an electronic device.
- the one or more programs are input used for computation of the first operator 613 from profile information generated by execution of the model 503; 611, stored in the memory 130; 501. and instructions causing the electronic device to obtain a set of sub-output data corresponding to the set of data 502;601.
- the one or more programs based on the set of sub-output data, perform quantization on the first set of weights to obtain a second set of weights of a second data type supported by at least one accelerator. It may contain instructions that cause .
- the one or more programs may include instructions that cause the electronic device to store the second set of weights in the memory (130; 501) based on obtaining the second set of weights.
- the one or more programs when executed by a processor of an electronic device, include a weight set of the second data type.
- the at least one processor 120; 200; 511
- the quantization from the first weight set of the first data type for representing the floating point number of the first interval, and instructions that cause the electronic device to obtain the second weight set of the second data type to represent the number of second intervals having a length less than the length of one interval.
- the accelerator stores the first data.
- type or the second data type may include instructions that cause the electronic device to perform an operation based on the second data type.
- a computer readable storage medium storing one or more programs
- the one or more programs when executed by a processor of an electronic device, display the model (503; 611) and In response to a request to execute a related function, perform an operation of the first operator 613 by inputting the second set of weights, out of the first set of weights or the second set of weights, to the accelerator. It may contain instructions that trigger an electronic device.
- a computer readable storage medium storing one or more programs
- the one or more programs when executed by a processor of an electronic device, obtain the second set of weights. Based on this, identify a set of first output data 615 obtained by performing an operation for the at least one operator from the profile information, and perform an operation for the at least one operator based on the second set of weights.
- a computer readable storage medium storing one or more programs
- the first output data 615 and the set of sub-output data obtained from the profile information based on identifying that at least one difference value of the set of difference values between the corresponding second output data 625 is greater than or equal to a reference value.
- a computer readable storage medium storing one or more programs
- the first data type of the second data type instructions that cause the electronic device to obtain the second set of weights by performing quantization on the first set of weights, based on a distribution of the set of sub-output data, to obtain a set of two weights.
- the one or more programs when executed by a processor of an electronic device, include a hash value of the first weight set. together with instructions that cause the electronic device to store the second set of weights in the memory (130;501).
- the one or more programs when executed by a processor of an electronic device, include a hash value of the first weight set. may include instructions that cause the electronic device to perform an operation for first operators 613 based on a second set of weights corresponding to .
- a computer readable storage medium storing one or more programs, wherein the set of sub-output data of the first operator (613) is one of the at least one operator, It may be a set of input data (502; 601) of a second operator connected to the first operator (613).
- the one or more programs are stored in the memory (130; 501) when executed by a processor of an electronic device. , to obtain the first set of weights, identify the second data type supported by the accelerator, and determine the first weight set included in the first operator 613 that is different from the second data type. Whether a data type has been identified, whether a weight set of the second data type obtained from the first weight set based on quantization is stored in the memory, or whether weights are included in the weight set of the second data type and instructions that cause the electronic device to determine whether to perform quantization on the first weight set of the first data type.
- the electronic device 101 may include a memory 130; 501 that stores instructions, and at least one processor 120; 200; 511.
- the electronic device 101 executes at least one processor included in the model 503; 611.
- the instructions when executed by the at least one processor 120; 200; 511, cause the electronic device 101 to use a first data type to obtain a weight set of the second data type. From the first weight set of the first data type for expressing the floating point number of the interval, through the quantization, the second weight set for expressing the number of the second interval having a length smaller than the length of the first interval Obtaining the second weight set of data types.
- the instructions when executed by the at least one processor (120; 200; 511), request the electronic device 101 to execute a function related to the model (503; 611).
- the second weight set may be input to the accelerator, thereby causing the first operator 613 to perform an operation.
- the electronic device 101 when the instructions are executed by the at least one processor 120; 200; 511, the electronic device 101 generates the profile information based on obtaining the second weight set. Identifying a set of first output data 615 obtained by performing an operation on the at least one operator from, and performing an operation on the at least one operator based on the second set of weights, the input data Obtain a set of second output data 625 from the set of (502; 601), and a set of difference values between the first output data 615 and the corresponding second output data 625. Based on this, the first weight set included in the first operator 613 included in the model 503 (611) may be caused to be replaced with the second weight set.
- the electronic device 101 may include a communication circuit.
- the instructions when executed by the at least one processor 120; 200; 511, cause the electronic device 101 to output information between the first output data 615 and the corresponding second output data 625. Based on identifying that at least one difference value among the set of difference values is greater than or equal to a reference value, transmitting the set of sub-output data obtained from the profile information to a server through the communication circuit, and based on the sub-output data Thus, a third weight set of the second data type on which quantization of the first weight set has been performed is received from the server through the communication circuit, and the received third weight set is stored in the memory 130; 501. It can cause it to be saved in .
- the instructions when executed by the at least one processor 120; 200; 511, cause the electronic device 101 to obtain the second weight set of the second data type. , based on the distribution of the set of sub-output data, may cause the second weight set to be obtained by performing quantization on the first weight set.
- the instructions when executed by the at least one processor (120; 200; 511), cause the electronic device 101 to store the second weight together with the hash value of the first weight set. This may cause the set to be stored in the memory (130; 501).
- the electronic device 101 when the instructions are executed by the at least one processor 120; 200; 511, the electronic device 101 generates a second weight set corresponding to a hash value of the first weight set. It can cause the first operators 613 to perform operations based on .
- the instructions when executed by the at least one processor 120; 200; 511, are supported by the electronic device 101 by the accelerator to obtain the first set of weights. Identifying the second data type, and identifying the first data type of the first weight set included in the first operator 613 that is different from the second data type, the first weight set based on quantization Based on whether a weight set of the second data type obtained from is stored in the memory (130; 501), or whether a weight is included in the weight set of the second data type, the first data type may result in determining whether to perform quantization on the first set of weights.
- the instructions when executed by the at least one processor, cause the electronic device to set values of the set of sub-output data to obtain the second weight set of the second data type. Based on the distributed interval, quantization may be performed on the first weight set to obtain the second weight set.
- the electronic device when the instructions are executed by the at least one processor, distributes values of the sub-output data set to obtain the second weight set of the second data type. Using the size of the section, a quantization level for the second data type can be determined, and quantization on the first weight set can be performed according to the quantization level.
- the range of numbers that can be expressed according to the second data type and the quantization level may include a section of the sub-output data set.
- a computer-readable storage medium that stores one or more programs (software modules) may be provided.
- One or more programs stored in a computer-readable storage medium are configured for execution by one or more processors in an electronic device.
- One or more programs include instructions that cause the electronic device to execute methods according to embodiments described in the claims or specification of the present disclosure.
- the one or more programs may be included and provided in a computer program product.
- Computer program products are commodities and can be traded between sellers and buyers.
- the computer program product may be distributed in the form of a machine-readable storage medium (e.g. compact disc read only memory (CD-ROM)) or through an application store (e.g. Play StoreTM) or on two user devices (e.g. It can be distributed (e.g.
- At least a portion of the computer program product may be at least temporarily stored or temporarily created in a machine-readable storage medium, such as the memory of a manufacturer's server, an application store's server, or a relay server.
- These programs may include random access memory, non-volatile memory, including flash memory, read only memory (ROM), and electrically erasable programmable ROM. (electrically erasable programmable read only memory, EEPROM), magnetic disc storage device, compact disc-ROM (CD-ROM), digital versatile discs (DVDs), or other types of disk storage. It can be stored in an optical storage device or magnetic cassette. Alternatively, it may be stored in a memory consisting of a combination of some or all of these. Additionally, multiple configuration memories may be included.
- non-volatile memory including flash memory, read only memory (ROM), and electrically erasable programmable ROM. (electrically erasable programmable read only memory, EEPROM), magnetic disc storage device, compact disc-ROM (CD-ROM), digital versatile discs (DVDs), or other types of disk storage. It can be stored in an optical storage device or magnetic cassette. Alternatively, it may be stored in a memory consisting of a combination of some or all of these. Additionally, multiple configuration memories may
- the program may be distributed through a communication network such as the Internet, an intranet, a local area network (LAN), a wide area network (WAN), or a storage area network (SAN), or a combination thereof. It may be stored on an attachable storage device that is accessible. This storage device can be connected to a device performing an embodiment of the present disclosure through an external port. Additionally, a separate storage device on a communications network may be connected to the device performing embodiments of the present disclosure.
- a communication network such as the Internet, an intranet, a local area network (LAN), a wide area network (WAN), or a storage area network (SAN), or a combination thereof. It may be stored on an attachable storage device that is accessible. This storage device can be connected to a device performing an embodiment of the present disclosure through an external port. Additionally, a separate storage device on a communications network may be connected to the device performing embodiments of the present disclosure.
- one or more of the components or operations described above may be omitted, or one or more other components or operations may be added.
- multiple components eg, modules or programs
- the integrated component may perform one or more functions of each component of the plurality of components identically or similarly to those performed by the corresponding component of the plurality of components prior to the integration.
- operations performed by a module, program, or other component may be executed sequentially, in parallel, iteratively, or heuristically, or one or more of the operations may be executed in a different order, omitted, or , or one or more other operations may be added.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Image Analysis (AREA)
Abstract
Selon un mode de réalisation, l'invention concerne un dispositif électronique qui peut comprendre une mémoire et au moins un processeur. L'au moins un processeur peut obtenir, à partir de la mémoire, un premier ensemble de poids d'un premier type de données inclus dans un premier opérateur, qui est l'un d'au moins un opérateur inclus dans le modèle. L'au moins un processeur peut obtenir un ensemble de données de sous-sortie correspondant à un ensemble de données d'entrée utilisées pour le calcul du premier opérateur à partir d'informations de profil stockées dans la mémoire et générées par l'exécution du modèle. L'au moins un processeur peut obtenir un second ensemble de poids d'un second type de données pris en charge par au moins un accélérateur par réalisation d'une quantification sur le premier ensemble de poids sur la base de l'ensemble de données de sous-sortie. L'au moins un processeur peut stocker le second ensemble de poids dans la mémoire sur la base de l'obtention du second ensemble de poids. Divers autres modes de réalisation sont possibles.
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| KR10-2023-0009760 | 2023-01-25 | ||
| KR20230009760 | 2023-01-25 | ||
| KR10-2023-0030093 | 2023-03-07 | ||
| KR1020230030093A KR20240117447A (ko) | 2023-01-25 | 2023-03-07 | 모델의 연산과 관련된 오퍼레이터의 양자화를 위한 전자 장치 및 방법 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024158174A1 true WO2024158174A1 (fr) | 2024-08-02 |
Family
ID=91970843
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/KR2024/000928 Ceased WO2024158174A1 (fr) | 2023-01-25 | 2024-01-19 | Dispositif électronique et procédé de quantification d'opérateur associé à un calcul de modèle |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2024158174A1 (fr) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110969251A (zh) * | 2019-11-28 | 2020-04-07 | 中国科学院自动化研究所 | 基于无标签数据的神经网络模型量化方法及装置 |
| KR20210018352A (ko) * | 2019-06-12 | 2021-02-17 | 상하이 캠브리콘 인포메이션 테크놀로지 컴퍼니 리미티드 | 신경망의 양자화 파라미터 확정방법 및 관련제품 |
| KR20210156538A (ko) * | 2020-06-18 | 2021-12-27 | 삼성전자주식회사 | 뉴럴 네트워크를 이용한 데이터 처리 방법 및 데이터 처리 장치 |
| KR20220055256A (ko) * | 2020-10-26 | 2022-05-03 | 에스케이텔레콤 주식회사 | 신경망의 가중치 매개변수를 양자화하는 방법 및 상기 방법을 수행하는 객체 인식 장치 |
| US20220237455A1 (en) * | 2021-01-26 | 2022-07-28 | Denso Corporation | Neural-network quantization method and apparatus |
-
2024
- 2024-01-19 WO PCT/KR2024/000928 patent/WO2024158174A1/fr not_active Ceased
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20210018352A (ko) * | 2019-06-12 | 2021-02-17 | 상하이 캠브리콘 인포메이션 테크놀로지 컴퍼니 리미티드 | 신경망의 양자화 파라미터 확정방법 및 관련제품 |
| CN110969251A (zh) * | 2019-11-28 | 2020-04-07 | 中国科学院自动化研究所 | 基于无标签数据的神经网络模型量化方法及装置 |
| KR20210156538A (ko) * | 2020-06-18 | 2021-12-27 | 삼성전자주식회사 | 뉴럴 네트워크를 이용한 데이터 처리 방법 및 데이터 처리 장치 |
| KR20220055256A (ko) * | 2020-10-26 | 2022-05-03 | 에스케이텔레콤 주식회사 | 신경망의 가중치 매개변수를 양자화하는 방법 및 상기 방법을 수행하는 객체 인식 장치 |
| US20220237455A1 (en) * | 2021-01-26 | 2022-07-28 | Denso Corporation | Neural-network quantization method and apparatus |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2022019538A1 (fr) | Modèle de langage et dispositif électronique le comprenant | |
| WO2022080634A1 (fr) | Procédé pour entraîner un réseau neuronal artificiel et dispositif électronique le prenant en charge | |
| WO2022010157A1 (fr) | Procédé permettant de fournir un écran dans un service de secrétaire virtuel à intelligence artificielle, et dispositif de terminal d'utilisateur et serveur pour le prendre en charge | |
| WO2022177343A1 (fr) | Dispositif électronique de configuration de géorepérage et son procédé de fonctionnement | |
| WO2022154379A1 (fr) | Dispositif électronique et procédé de réglage de luminosité | |
| WO2023153752A1 (fr) | Dispositif électronique permettant d'attribuer une ressource de mémoire à une tâche et procédé de fonctionnement du dispositif électronique | |
| WO2024158174A1 (fr) | Dispositif électronique et procédé de quantification d'opérateur associé à un calcul de modèle | |
| WO2024053910A1 (fr) | Appareil et procédé de sélection d'accélérateur approprié pour un modèle d'apprentissage automatique | |
| WO2023048379A1 (fr) | Serveur et dispositif électronique pour traiter un énoncé d'utilisateur, et son procédé de fonctionnement | |
| WO2022177162A1 (fr) | Processeur pour initialiser un fichier modèle d'application, et dispositif électronique le comprenant | |
| WO2025079929A1 (fr) | Dispositif électronique permettant de traiter des données audio et son procédé de commande | |
| WO2025183418A1 (fr) | Procédé et dispositif électronique de distillation de connaissances à partir d'un modèle enseignant et de transfert de connaissances distillées à un modèle étudiant | |
| WO2025009874A1 (fr) | Dispositif électronique comprenant une unité de traitement neuronal et son procédé de fonctionnement | |
| WO2025249984A1 (fr) | Dispositif électronique, son procédé de fonctionnement et support de stockage | |
| WO2025178288A1 (fr) | Dispositif électronique, procédé et support d'enregistrement non transitoire lisible par ordinateur permettant de protéger un modèle d'intelligence artificielle | |
| WO2025095329A1 (fr) | Dispositif, procédé et support de stockage pour gérer des données pour un modèle dans un apprentissage fédéré | |
| WO2024053886A1 (fr) | Dispositif électronique et procédé de transmission de signal pour rétroaction | |
| WO2025084597A1 (fr) | Dispositif et procédé de génération de modèle personnalisé, et support de stockage | |
| WO2023068489A1 (fr) | Dispositif électronique comprenant une npu prenant en charge différents types de données, et son procédé de commande | |
| WO2025159318A1 (fr) | Dispositif électronique, procédé permettant de déléguer un entraînement d'un modèle d'intelligence artificielle et support d'enregistrement non transitoire lisible par ordinateur | |
| WO2024112157A1 (fr) | Dispositif électronique et procédé de traitement d'instructions d'édition collaborative | |
| WO2025100700A1 (fr) | Dispositif électronique comprenant une caméra et procédé de fonctionnement associé | |
| WO2025095481A1 (fr) | Dispositif électronique pour effectuer un calcul à l'aide d'un modèle d'intelligence artificielle et procédé de fonctionnement du dispositif électronique | |
| WO2024043696A1 (fr) | Dispositif électronique pour effectuer une opération à l'aide d'un modèle d'intelligence artificielle et procédé pour faire fonctionner un dispositif électronique | |
| WO2023106591A1 (fr) | Dispositif électronique et procédé de fourniture de contenu sur la base de l'émotion d'utilisateur |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24747411 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |