[go: up one dir, main page]

WO2023035053A1 - Procédé et système d'émulation d'une unité à virgule flottante - Google Patents

Procédé et système d'émulation d'une unité à virgule flottante Download PDF

Info

Publication number
WO2023035053A1
WO2023035053A1 PCT/CA2021/051241 CA2021051241W WO2023035053A1 WO 2023035053 A1 WO2023035053 A1 WO 2023035053A1 CA 2021051241 W CA2021051241 W CA 2021051241W WO 2023035053 A1 WO2023035053 A1 WO 2023035053A1
Authority
WO
WIPO (PCT)
Prior art keywords
floating
point
integers
format
point format
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CA2021/051241
Other languages
English (en)
Inventor
Seyed Alireza GHAFFARI
Wei Hsiang Wu
Vahid PARTOVI NIA
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to PCT/CA2021/051241 priority Critical patent/WO2023035053A1/fr
Publication of WO2023035053A1 publication Critical patent/WO2023035053A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions

Definitions

  • the present disclosure relates to emulating hardware in software, in particular methods and systems for emulating a floating-point unit.
  • CPUs Central Processing Units
  • MCUs Microcontroller Units
  • CPUs are the backbone of the modem data centers, which provide essential services such as the Internet, cloud services, and mobile networks.
  • MCUs are critical for processing data on the Internet of things (loT) devices and smart sensors.
  • GPUs Graphical Processing Units
  • DSPs Digital Signal Processors
  • processing units are usually generic and used for various tasks, including machine learning algorithms. Also, the processing units may have high processing power, which may be correlated with energy consumption.
  • Processing units include several arithmetic units responsible for performing computations on operands, such as ALUs and FPUs.
  • some industrial and research tasks including the ones that use machine learning algorithms, may not need the full processing power of these processing units.
  • these industrial and research tasks may not require the full computing power of the arithmetic units.
  • customized arithmetic units may be more efficient for performing operations than using the full power of the arithmetic units.
  • a machine learning task is considered as an example. In that case, different machine learning algorithms for different tasks (i.e. , face detection, face recognition, image segmentation, etc.) may require different customized arithmetic units.
  • the present disclosure describes methods and systems for emulating a floating-point unit that operates on data having a custom floating-point format that departs from the generally available floating-point formats described in the IEEE754 standard.
  • the emulated floating-point unit may be fabricated in a processing unit to operate on data having the custom floating-point format.
  • a floating-point unit is emulated using an emulation engine comprising a software library, a control unit, and at least one computation module.
  • the computation module includes at least one computation unit configured for performing computations using data having the custom floating-point format. Further, the control unit controls the sequence of computations needed to be performed by the computation unit.
  • the software library implements processes of a high-level language that performs various computations, such as training a deep neural network.
  • the software library interfaces with the control unit, which controls the computation units that perform emulated floating-point computations for the tasks.
  • An example embodiment is a computer-implemented method for emulating a floating-point unit.
  • the method may receive one or more floating-point operands having a first floating-point format.
  • the method may convert each of the one or more floating-point operands having the first floating-point format into a first set of integers having the first floating-point format.
  • the method may convert each of the first set of integers into a second set of integers having a second floating-point format that is different from the first floating-point format.
  • the first set of integers and the second set of integers each has a defined bit length depending on respective floatingpoint format.
  • the method may perform computations for a task using each of the second set of integers to emulate computations performed by the floating-point unit using the one or more floating-point operands having the second floating-point format.
  • a total bit length defined by the first floating-point format and the second floating-point format is different.
  • the task is for training a deep learning model.
  • the method further comprises repeating the converting of each of the one or more floating-point operands into the first set of integers, converting of each of the first set of integers into the second set of integers, and performing computations for a plurality of deep learning sessions of training. For each session of training a different second floating-point format may be used. Also, the method may select one of the different second floating-point formats as a final second floating-point format and may emulate a further floating-point unit for processing operands that are formatted according to the final second floating-point format. [0013] In another example embodiment, the method further comprises evaluating numerical stability of the training of the deep learning model using the one or more floating-point operands having the different second floating-point formats. In another example embodiment of the method, computations may be performed in parallel.
  • the method further comprises fabricating into a hardware component the emulated floating-point unit for performing operations using the second floating-point format.
  • the method may convert each of the first set of integers into the respective second set of integer using a rounding operation of one of round truncate, round to odd, round to even, round toward zero, round away from zero, round toward infinity, and stochastic rounding.
  • the first floating-point format may be based on one of the formats described in the IEEE754 standard.
  • each of the second set of integers may comprise a first integer value representing a sign value of the respective floating-point operand, a second integer value representing an exponent value of the respective floating-point operand, and a third integer value representing a fraction value of the respective floating-point operand.
  • Another example embodiment is of a system for emulating a floating-point unit.
  • the system comprises a processor, and a memory storing instructions which, when executed by the processor, cause the system to receive one or more floatingpoint operands having a first floating-point format. Further, the instructions may cause the system to convert each of the one or more floating-point operands having the first floating-point format into a first set of integers having the first floating-point format. The instructions may also cause the system to convert each of the first set of integers into a second set of integers having a second floating-point format that is different from the first floating-point format. The first set of integers and the second set of integers each has a defined bit length depending on the respective floating-point format. The instructions may also cause the system to perform computations for a task using each of the second set of integers to emulate computations performed by the floating-point unit using the one or more floating-point operands having the second floating-point format.
  • a total bit length defined by the first floating-point format and the second floating-point format is different.
  • the task is for training a deep learning model.
  • the system may comprise instructions which, when executed by the processor, cause the system to repeat the converting of each of the one or more floating-point operands into the first set of integers, converting of each of the first set of integers into the second set of integers, and performing computations for a plurality of deep learning sessions of training. For each session of training, a different second floating-point format is used. Also, the instructions may cause the system to select one of the different second floating-point formats as a final second floating-point format, and emulate a further floating-point unit for processing operands that are formatted according to the final second floating-point format.
  • system may comprise instructions which, when executed by the processor, cause the system to evaluate numerical stability of the training of the deep learning model using the one or more floating-point operands having the different second floating-point formats.
  • the emulated floating-point unit for performing operations using the second floating-point format may be fabricated into a hardware component.
  • the system may convert each of the first set of integers into the respective second set of integers using a rounding operation of one of round truncate, round to odd, round to even, round toward zero, round away from zero, round toward infinity, and stochastic rounding.
  • the first floating-point format may be based on one of the formats described in the IEEE754 standard.
  • each of the second set of integers comprises a first integer value representing a sign value of the respective floating-point operand, a second integer value representing an exponent value of the respective floating-point operand, and a third integer value representing a fraction value of the respective each floating-point operand.
  • Another example embodiment is a non-transitory machine readable medium having tangibly stored thereon executable instructions for execution by a processor, wherein the executable instructions, when executed by the processor, cause the processor to perform any one of the method embodiments above.
  • FIG. 1 is an illustrative example of one structure described in the IEEE754 standard commonly used in computing devices, specifically with floating-point processing units, to represent floating-point numbers, in accordance with an example embodiment.
  • FIG. 2 is a block diagram illustrating an example computing device that can be employed to implement the methods and systems disclosed herein in accordance with an example embodiment.
  • FIG. 3 is a block diagram for an emulation engine illustrating its modules in accordance with an example embodiment.
  • FIG. 4 is a schematic of an example emulation engine illustrating operation and data flow in the emulation engine in accordance with an example embodiment.
  • FIG. 5 is an example algorithm illustrating an addition computation between two integer value sets of the integer value representation unpacked from floating values using a single computation unit.
  • FIG. 6 is a flowchart of an example deep neural network training method using the emulation engine in accordance with an example embodiment.
  • FIG. 7 is a flowchart of an example method for emulating a floating-point unit in accordance with an example embodiment.
  • a floating-point unit which is a hardware component in a processing unit, performs arithmetic computations on floating-point operands.
  • the computations performed by a FPU on floating-point operands are traditionally based on one of the floating-point formats of the IEEE754 standard, described in IEEE Computer Society. "IEEE Standard for Floating-Point Arithmetic. " IEEE Std 754-2008 (2008): 1-70.
  • the floating-point formats of the IEEE754 standard usually represent a floating-point number with high-precision using, for example, 32 bits.
  • many tasks, including tasks for machine learning, which perform computations on floating-point operands do not need high precision.
  • a custom FPU designed to operate on operands having a custom floating-point format which may result in less precision than data containing floating-point numbers having a floating-point format of the IEEE754 standard, may be necessary.
  • FPU is a hardware component of a processing unit
  • emulated FPU is a software version of the FPU
  • the present disclosure describes methods and systems for emulating an FPU in software.
  • the emulated FPU performs computations on data having a custom floating-point format that deviates from the floating-point formats of the IEEE754 standard.
  • the emulated FPU is fabricated into a processing unit as a hardware component.
  • the FPU fabricated from the emulated FPU may replace the FPU that operates on a floating-point format of the IEEE754 standard.
  • arXiv preprint arXiv:1912.01703 is a deep learning framework that supports data represented using values in a 16-bit floating-point format as described in Bfloat discussed in Kalamkar, Dhiraj, et al. "A study of BFLOAT1 6 for deep learning training. " arXiv preprint arXiv: 1905. 12322 (2019).
  • IBM has also proposed using data represented by an 8-bit floating-point format to train neural networks described in Naigang Wang, Jungwook Choi, Daniel Brand, Chia-Yu Chen, and Kailash Gopalakrishnan. Training deep neural networks with 8-bit floating-point numbers.
  • pages 7675-7684, 2018, and described in Sun, Xiao, et al. “Hybrid 8-bit floating point (HFP8) training and inference for deep neural networks.
  • HFP8 Hybrid 8-bit floating point
  • a low precision training (i.e. , using 8-bit formats instead of the formats described in the IEEE754 standard) of deep learning models was proposed in prior works to address the energy efficiency problem of deep learning models. More specifically, when a floating-point format representing a floating-point operand becomes smaller (e.g., a 32-bit single-precision number may be reduced to a floating-point format of 16-bits or 8-bits), the amount of data read from and written to memory by the processing unit decreases. Thus, reducing the number of operations saves energy and generates a more efficient deep learning model. As a result, custom FPUs can be designed and fabricated to consume less energy when performing computations on data having a custom floating-point format.
  • Fig. 1 is an illustrative example of a format of the IEEE754 standard for floating-point format commonly used to represent floating-point numbers in computing devices having processing unis that include a FPU.
  • This example embodiment illustrates a single-precision floating-point format of the IEEE754 standard. This example is one of the structures described in the IEEE754 standard.
  • the singleprecision floating-point format representation includes 1 bit reserved for the sign value of a floating-point number 102, 8 bits reserved for the exponent value of the floatingpoint number 106, and 23 bits reserved for the fraction value of the floating-point number.
  • the fraction value is also referred to as mantissa value 108.
  • the terms fraction value and mantissa value may be used interchangeably throughout this disclosure.
  • a floating-point number having the floating-point format of FIG. 1 comprises three binary strings: fraction, exponent, and sign. It is apparent to a person skilled in the art that a string of binary numbers can be represented as a decimal number. Therefore, each binary string (104, 106, and 108) can be represented as a decimal number. Hence, a floating-point number can be represented by three integer values.
  • the floating-point number 1 .984 can be represented by a set of integers comprising: a sign value representing a positive number (perhaps +1 ), exponent value of -3, and fraction value of 1984.
  • any deviation from the described structures (formats) specified in the IEEE754 standard i.e. , bits reserved for a sign, exponent, and fraction
  • a custom FPU may be used.
  • This custom FPU is designed to perform arithmetic based on the custom floating-point format.
  • the custom floating-point format may change the precision and range of the floating value since it is represented differently.
  • this disclosure provides a computer-aided design (CAD) tool to emulate an FPU that performs computations on a custom floatingpoint format before designing such FPU as hardware.
  • CAD computer-aided design
  • FIG. 2 is a block diagram illustrating a computing device 200 in which a FPU may be emulated.
  • the computing device 200 may be an individual physical computer, multiple physical computers such as s server, a virtual machine, or multiple virtual machines. Dashed blocks represent optional components.
  • the computing device 200 is configured for FPU emulation.
  • Other computing devices suitable for implementing examples described in the present disclosure may be used, which may include components different from those discussed below.
  • FIG. 2 shows a single instance of each component, there may be multiple instances of each component in the computing device 200. Also, the computing device 200 could be implemented using parallel and/or distributed architecture.
  • the computing device 200 includes one or more processing units 202, such as a CPU, GPU, an MCU, an ASIC, a field-programmable gate array (FPGA), a dedicated logic circuitry, or combinations thereof.
  • processing units 202 such as a CPU, GPU, an MCU, an ASIC, a field-programmable gate array (FPGA), a dedicated logic circuitry, or combinations thereof.
  • Each of the aforementioned processing units may include various hardware components, whether fabricated on-chip or separate.
  • the CPU may include one or more accumulators, registrars, multipliers, decoders, floating-point unit 218, and arithmetic and logic unit. While the arithmetic and logic unit performs bitwise operations on integer binary numbers, the floating-unit 218, described further below, operates on floatingpoint numbers. It is to be understood that other processing units, GPU, may include similar components.
  • a processing unit may include a floating-point unit (FPU) 218 for performing arithmetic computations.
  • the FPU may be fabricated on the same chip as the computing unit or a separate unit within the computing device 200.
  • the FPU 218 is usually a hardware component enabling fast computations.
  • the FPU 218 performs primitive computations such as addition, subtraction, multiplication, division, square root, etc. With instructions from the processing unit 202, complex operations may be performed by combining the primitive computations, including training deep learning algorithms.
  • the FPU 218 is usually designed to perform computations based on a specific floating-point format, most commonly a format of the IEEE754 standard described in FIG. 1 .
  • the computing device 200 may also include one or more optional input/output (I/O) interfaces 204, enabling interfacing with one or more optional input devices 212 and/or output devices 214.
  • the computing device 200 may include one or more network interfaces 206 for wired or wireless communication with a network (e.g., an intranet, the Internet, a P2P network, a WAN and/or a LAN).
  • the network interface(s) 206 may include wired links (e.g., Ethernet cable) and/or wireless links (e.g., one or more antennas) for intra-network and/or inter-network communications for receiving parameters or sending results.
  • the computing device 200 includes one or more storage units 208, which may include a mass storage unit such as a solid-state drive, a hard disk drive, a magnetic disk drive and/or an optical disk drive.
  • the computing device 200 also includes one or more memories 210, which may have a volatile or non-volatile memory (e.g., a flash memory, a random access memory (RAM), and/or a read-only memory (ROM)).
  • the memory(ies) 210 (as well as storage unit 208) may store instructions for execution by the processing unit(s) 202.
  • the memory(ies) 210 may include software instructions for implementing an operating system (OS) and other applications/functions.
  • OS operating system
  • instructions may also be provided by an external memory (e.g., an external drive in communication with the computing device 200) or may be provided by a transitory or non-transitory computer-readable medium.
  • Examples of non-transitory computer-readable media include a RAM, a ROM, an erasable programmable ROM (EPROM), an electrically erasable programmable ROM (EEPROM), a flash memory, a CD-ROM, or other portable memory storage.
  • the computing device 200 also includes a module for emulating an FPU referred to as an emulation engine 216.
  • a “module” can refer to a combination of a hardware processing circuit and machine-readable instructions (software and/or firmware) executable on the hardware processing circuit.
  • a hardware processing circuit can include any or some combination of a microprocessor, a core of a multi-core microprocessor, a microcontroller, a programmable integrated circuit, a programmable gate array, a digital signal processor, an application-specific integrated circuit (ASIC), field-programmable gate array (FPGA), GPU (Graphical Processing Unit), or a system on a chip (SoC) or another hardware processing circuit.
  • ASIC application-specific integrated circuit
  • FPGA field-programmable gate array
  • SoC system on a chip
  • the computing device 200 shows the emulation engine 216 as instructions in memory 202 when executed by the processing unit 202 causes the processing unit 202 to perform arithmetic computations otherwise performed by FPU 218.
  • Other example embodiments may have the emulation engine 216 as a hardware component connected with bus 220 that facilitates communication between various computing device 200 components.
  • the emulation engine 216 may be implemented in components of the computing device 200 or may be offered as a software as a service (SaaS) by a cloud computing provider.
  • SaaS software as a service
  • the emulation engine 216 may also be available on servers accessed by the computing device 200 through the network interface 206.
  • Example embodiments describe the emulation engine 216 as being parametrized, customizable, heterogeneous and/or parallelized.
  • the emulation engine 216 is parameterized because it can receive floating-point operands as a set of integers.
  • the emulation engine 216 may be controlled by users; such users can choose different rounding operations and enter custom floating-point formats (explained below).
  • the emulation engine 216 may be customizable such that it can be modified for emulating different custom floating-point formats.
  • the emulation engine 216 can be implemented with CPUs, GPUs, other processing units 202 explained above, or a combination of thereof; therefore, it is heterogeneous.
  • computations performed in the emulation engine 216 may be parallelized, allowing for parallel computations.
  • Optional input device(s) 212 e.g., a keyboard, a mouse, a microphone, a touchscreen, and/or a keypad
  • optional output device(s) 314 e.g., a display, a speaker and/or a printer
  • one or more of the input device(s) 212 and/or the output device(s) 214 may be included as a component of the computing device 200.
  • FIG. 3 is a block diagram of modules of the emulation engine 216.
  • the disclosure departs from traditional computing devices where hardware, specifically FPU 218, performs floating-point computations. Instead, the computations that are supposed to be performed by the FPU 218 are performed by the emulation engine 216.
  • the emulation engine 216 may contain three modules - software library 302, control unit 304, and one or more computation modules (306-1 , 306-2, ..., 306-N), each computation module performing computations on floating-point operands having custom floating-point format.
  • FIG. 3 describes an embodiment of emulation engine 216 emulating operations of multiple FPUs 218.
  • Each FPU 218 can be emulated using a computation module (306-1 , 306-2, ..., or 306-N).
  • example embodiments may use only one computation module for a single FPU 218, referred to simply by 306. Therefore, a computation module 306 could be any of the computation modules 306-1 , 306-2, ... , 306-N.
  • Each computation module 306 comprises a plurality of computation units 308, where each computation unit 308 is responsible for performing arithmetic computations.
  • Each computation unit 308 is configured and can perform primitive computations such as addition, subtraction, multiplication, square root, absolute, etc., but the combination of such primitive computations can compute complex operations.
  • the sequence of the combination performed by the computation units 308 to compute a more complex operation, i.e. , inner product, is controlled by the control unit 304.
  • the control unit 304 is a module that sends instructions to computation modules (306-1 , 306-2, ..., 306-N) to schedule and instruct the computation units 308 to perform various computations of a broad spectrum.
  • the computations may be as simple as computing the inner-product of vectors or much more complicated computations such as deep learning training algorithms. Therefore, the control unit 304 ensures the sequential consistency of the computations. For example, suppose the task is to compute the inner product between two vectors. In that case, the control unit 304 sends instructions to the computation modules (306-1 , 306-2,..., 306-N) to use one or more computation units 308 to perform the inner product computation.
  • the control unit 304 also sends the sequence of the computations, which includes first, multiplication, then addition. The addition and multiplication are computations understood by computation units 308 of each computation module (306-1 , 306-2, ..., or 306-N).
  • the control unit 304 ensures sequential consistency of computations, for high-level computations, such as training a deep neural network
  • a software library 302 is used.
  • the software library 302 administers the control unit 304.
  • the software library 302 may be an application programming interface (API) of a high- level language configured to send instructions and control the control unit 304.
  • the software library 302 may be an API that modifies computationally expensive software packages, such as PyTorchTM, TensorFlowTM and scikit-learnTM , and other machine learning libraries, to use the control unit 304 and the computation modules (306-1 , 306- 2, ..., and 306-N) to perform the floating-point computations instead of using the FPU 218.
  • a sequence of complex operations is performed for forward propagation and backpropagation (discussed in detail below). These operations are usually implemented in the software library 302, which sends instructions to control unit 304 containing the steps that need to be performed, e.g., a sequence of inner products. Then, the control unit 304 assigns the number of computation units 308 to participate in the operations' computations required by the software library 302. Further, the control unit 304 sends the sequence of primitive computations needed to be performed by each computation unit 308.
  • the emulation engine 216 consists of a hierarchy of controlling modules, starting as high-level operations in the software library 302, which are then interpreted by the control unit 304 into primitive computations to be performed by the computation units 308.
  • Example embodiments may include more than one computation module (306-1 , 306-2, ... , 306-N) for a task, and each computation module (306-1 , ... , 306-N) has computation units 308 configured for a custom floating-point format.
  • the task may be performed using all computation modules (306-1 , 306-2, ... , 306-N).
  • the custom floating-point format that achieves a desired performance, according to a performance measure, is selected.
  • the selected custom floating-point format is used in designing and fabricating an FPU 218 that is based on processing values formatted according to the selected custom floating-point format.
  • FIG. 4 is a schematic diagram of an emulation engine 216 illustrating operations and data flow in the emulation engine 216.
  • the emulation engine 216 receives data, which is one or more floating-point operands 402-1 , ... , 402-N, each having a floating-point value (a floating-point number).
  • the received floating-point operands (402-1 , ..., 402-N ) may be formatted according to one example of the IEEE754 standard described in FIG. 1 or any other format.
  • the disclosure refers to the format of the floating-point operands 402-1 , ..., 402-N, as a first floating-point format.
  • This first floating-point format may be a format of the IEEE745 standard such as the one described in FIG. 1 .
  • the floating values 402-1 , ... , 402-N are represented as strings of binary bits, similar to FIG.1 .
  • the module convert to integer 404 converts each floating-point operand (402-1 , ..., 402-N) to a respective set of integers (406-1 , ...406-N) having the custom floatingpoint format in two steps.
  • a person of ordinary skill in the art understands the method of converting a floating-point value to a set of integers. Basically, a floating-point value is represented as a set of integers, as illustrated in FIG. 1.
  • the first step, convert to integer 404 converts the floating-point operands having a first floating-point format to a set of integers also having the first floating-point format.
  • An example is in FIG.1 .
  • the second step, convert to integer 404 converts, via rounding (explained below), the set of integers having the first floating-point format to the set of integers (406-1 ,... , 406-N) having a custom floating-point format. Therefore, the output of convert to integer 404 illustrates each floating-point operates value is represented as a respective set of integers (406-1 , ... ,406-N).
  • Each set of integers, whether having the first floating-point format or the custom floating-point format contains three integer values.
  • Each set of integers (406-1 , ... ,406-N) has a sign value, an exponent value, and a fraction (or mantissa) value.
  • Example embodiments describe the custom floating-point format representing the floating-point operands with a different number of mantissa bits 108 than the first floating-point format.
  • Example embodiments describe the custom floatingpoint format having a different exponent bias. For instance, an exponent bias value is applied when determining the exponent value of the floating-point operands having a format of the IEEE754 standard. For a single-precision number, the exponent value is stored in the range between 1-254. Further, the exponent value corresponds to the exponent value of the floating value minus 127 (the exponent bias value) to obtain an exponent value in the range -126 to +127.
  • This exponent bias value may be different in a custom floating-point format when representing the set of integers 406-1 , ... ,406-N.
  • the second step of the convert to integer 404 operations includes a rounding module (not shown) that converts the set of integers of the floating-point operands having the first floating-point format into the second set of integers 406-1 , ... ,406-N having a custom floating-point format.
  • Example embodiments describe the custom floating-point format to have fewer bits than the first floating-point format. For example, the number of mantissa bits 108 of the set of integer 406-1 , ... ,406-N is fewer than the number of mantissa bit 108 of the floating-point operands having the first floating-point format.
  • There are several methods for rounding a floating-point number including round truncate, round to odd, round to even, round toward zero, round away from zero, round toward infinity, stochastic rounding, etc.
  • Round truncate returns the fraction value (mantissa value 108) of a floating-point operand truncated to a specific number of decimal places.
  • the method first truncates the fraction value 108 of a floating-point operand to the number of bits to represent the fraction value having the custom floatingpoint format. Further, if any of the removed bits that are truncated has a value of binary 1 , then the last bit of the fraction value is assigned binary 1 .
  • the computation module 306 receives the sets of integer values of (406-1 , ... ,406-N) of each floating-point operand and performs computations following instructions from the control unit 304. It is worth mentioning again, the sets of integer values (406-1 , ... , 406-N) have the custom floating-point format.
  • the computation module 306 has computation units 308 configured to perform computations according to the custom floating-point format on the sets of integer values (406-1 , 406-2, ... , 406-N).
  • Some example embodiments describe the emulation engine 216 with a plurality of computation modules, such as computation module 306-1 , 306-2, ..., 306-N of FIG.
  • each emulation engine 216 configured for operating on a custom floating-point format different from the other.
  • the emulation engine 216 has a convert to integer 404 module responsible for converting the floating-point operands 402-1 , ..., 402-N to a respective set of integers 402-1 , ... , 402-N for each custom floating-point format.
  • the computation module 306 comprises a plurality of computation units 308 responsible for performing computations on one or more sets of integer values (406-1 , 406-N).
  • Each computation unit 308 comprises a plurality of modules, including a sign engine 408, an exponent engine 410, a fraction engine 412, rounding 414, and alignment 416.
  • each computation unit 308 can perform primitive computations. Primitive computations include addition, subtraction, multiplication, absolute, square root, etc. The combination of such primitive computations can compute complex operations.
  • Each computation unit 308 is configured to perform the primitive computation for the custom floating-point format of the respective computation module 306 using the sign engine 408, exponent engine 410, fraction engine 412, rounding 414 and alignment 416.
  • the sign engine 408 is configured to perform the primitive computations on the sign values of the sets of integer values (406-1 , 406-N).
  • the sign engine defines the behaviour of the computation unit 308 when computing the value of the sign value that results from the computation instructed by the control unit 304.
  • the exponent engine 410 and the fraction engine 412 are configured to perform primitive computations on the exponent value and the fraction value of the sets of integer values (406-1 , ..., 406-N), respectively. Therefore, the exponent engine 410 and the fraction engine 412 define the behaviour of the computation units 308 when computing the exponent value and fraction value that result from the computation instructed by the control unit 304.
  • the module rounding 414 When performing computations, the module rounding 414 is used. Rounding 414 performs operations similar to the rounding module of the convert to integer 404. Rounding 414 is configured to ensure that the result of the computation instructed by the control unit 304 and performed by the fraction engine 412 is within the designated number of bits of the custom floating-point format.
  • the module alignment 416 is configured to ensure that computations instructed by the control unit 304 and performed by the exponent engine 410, and the fraction engine 412 yield an aligned set of integers. Alignment (normalization) is performed on the result of the computations performed by the computation units 308. Therefore, the alignment 416 may generate a normalized floating-point result 418.
  • a normalized floating-point result 418 is an integer set with a fraction value that starts with binary 1 . This normalization is achieved by shifting the fraction value (in binary) to the left until the most significant bit is 1 (binary). For every shift to the left, the exponent is reduced by 1 .
  • the fraction value is of 5 bits with a value of 5, i.e., 00101
  • the fraction value is shifted to the left twice to become 10100, and accordingly, the exponent value is adjusted by 2 -2 .
  • the generated floating-point result 418 (a normalized floating value), after alignment 416, would have a fraction value of 10100 with an exponent of 2 (e.g. 2 -2 ).
  • the floating-point result 418 may be a subnormal floating value. Subnormality occurs when the adjustment to the exponent value would be out of the range of value that can be represented, e.g., exponent of less than e -127 . In this situation, the subnormal floating value is carried over for the next computations.
  • Example embodiments may describe the computation unit 308 to include other modules for controlling behaviour in a floating-point computation error.
  • One example is catastrophic cancellation, a phenomenon that may result by subtracting two rounded numbers that yields a bad approximation, i.e. , one that may add the approximation error of both rounded values.
  • the computation units 308 receive instructions from the control unit 304 on what computations to perform.
  • the control unit 304 may be responsible for four tasks: control sequential consistency, control rounding, control custom floating-point format, and control number of computation units 308.
  • control unit 304 sends instructions for organizing the computation sequence that needs to be performed by the computation units 308.
  • Controlling sequential consistency also includes controlling the number of computation units 308 participating in the computations and deciding whether the computations are performed in parallel or serial.
  • the control unit 304 controls the level of parallelism of the computation units 308; control unit 304 sends computation sequence instructions using common means of parallel processing synchronization such as mutexes and semaphores. Therefore, the control unit 304 acts as a scheduler for the computation units 308 by arranging the order of computations, i.e., add, multiply, accumulate, etc., of each participating computation unit 308.
  • control unit 304 also instructs the module convert to integer 404 regarding the custom floating-point format.
  • the instruction may include information about the number of bits for the sign, fraction, exponent values, and value of the exponent bias.
  • Example embodiments describe users of the computing device 200 dynamically changing the custom floating-point format.
  • the rounding operation performed in the convert to integer 404, and in computation units 308 may be controlled by the control unit 304 to perform one of the rounding methods described above.
  • the user may select different rounding operations and observe the effect on the performance of a task.
  • the emulation engine 216 is used to train a deep neural network; the user may observe the effect of a custom floating-point format and rounding method and performance.
  • control unit 304 sends instructions to the computation module 306, deciding the number of computation units 308 participating in performing desired computations.
  • Example embodiments describe a computing device 200 having emulation engine 216 with multiple computation modules 306 as in 306-1 , 306-2, ..., 306-N in FIG.
  • control unit 304 may switch between computation modules 306 in sessions of training, each session of training having a different custom floating-point format.
  • each session of training includes training a deep learning model using a different custom floating-point format.
  • Such a feature enables the computing device 200 to observe task performance for other custom floating-point formats. For instance, example embodiments describe implementing a deep neural network using custom floating-point format 1 in computation module 306-1 , another time using custom floating-point format 2 in computation module 306-2, etc.
  • a computing device 200 having emulation engine 216 with multiple computation modules 306-1 , 306-2, ..., 306-N, each with a custom floating-point format may be used as parts of a single task. For instance, if the task is to implement a deep learning neural network, then a custom floating-point format may be used for training, and another custom floating-point format, which is different from the first one, may be used for inference making.
  • the control unit 304 schedules the sequence of computations, the control unit 304 is controlled by the software library 302, which is another module in the emulation engine 216. Therefore, the control unit 304 receives instructions from a high- level language, software library 302, and configures the computation units 308 accordingly based on the received instructions.
  • the software library 302 includes a high-level language library that drives the control unit 304 to perform complex arithmetic operations, for example, training a deep learning model using the custom floating-point format without using the FPU 218 of the computing device 200.
  • a high-level library for a deep learning task may include functions that implement a convolutional layer, fully connected layers, gradient computation, backpropagation, etc.
  • Floatingpoint result 418 has a set of integers.
  • FIG. 4 shows the output of the computation module 306 is a single floating-point result 418, it is understood that this is dependent on the computations performed in the computation units 308 as it could be more than one floating-point result. For example, if the computations add two numbers, the result is the sum, which is a single number; hence, a single floating-point result. However, if the computations generate a matrix, then there are multiple floating-point results 418.
  • the set of integers of the floating-point result 418 includes a sign value, an exponent value, and a fraction value. Also, the set of integers of the floating-point result 418 has the custom floating-point format. Further, the set of integers of the floating-point result 418 may be in the same format as the sets of integers (402-1 , ... , 402-N) received by the computation module 306.
  • the emulation engine 216 may include a module, convert to float 420, responsible for converting the set of integers 418 into a floating-point output having the first floating-point format.
  • Example embodiments describe the convert to float 420 to convert the set of integers 418 to a custom floating-point format.
  • the output of the convert to float 420 is the output of the emulation engine 216.
  • computation module 306 to include a plurality of computation units 308, it may be possible to have a computation module 306 with a single computation unit 308.
  • input to emulation engine 216 is described as floating-point operands 402-1 , ... , 402-N having a format of the IEEE754 standard; however, the IEEE754 standard is an example and not a limiting factor - floating-point operands 402-1 , ... , 402-N in other formats may be received.
  • the sets of integers 406-1 , ... , 406-2, which is the output of the convert to integer 404, and the set of integers of the floating-point result 418 are illustrated to have three integer values, other representations are equally valid.
  • the above-discussed processing unit 202 for FIG. 4 may be a floating-point processing unit 202, representing a floating-point number as illustrated in FIG. 1 ; however, a fixed-point processing unit 202 represents a floating-point number different from FIG. 1 .
  • the floating-point number is represented as three integer values: sign value, integer part value, and fraction part value, where these values may be determined differently from how they are determined for the sign value, exponent value, and fraction value in a floating-point processing unit.
  • the exponent value does not exist; instead, the position of the decimal point remains fixed, independent of the floating value it is representing.
  • Example embodiments may describe the floating-point number being represented for a fixed-point processing unit.
  • the set of integers representing the floating-point number in a fixed-point processing unit contains two integers: integer part value and fractional part value. The most significant bit of the integer part value is the sign value bit.
  • FIG. 5 is an example algorithm illustrating an addition computation between two sets of integers 406-1 and 406-2 converted from floating-point operands 402-1 and 402-2 using a single computation unit 308.
  • the computation unit 308 is configured to perform steps 504 - 518 to perform an addition computation.
  • step 506 begins.
  • the computation unit 308 right- shits m a by a number of bits equals to e tmp - e a and right-shifts m b by a number of bits equal to e tmp - e b . For instance, if the fraction value of m a is 90 then the binary representation of 90 is 1011010.
  • step 512 starts.
  • Step 512 also determines a temporary mantissa value (m tmp ).
  • Step 512 is completed, and step 514 starts where the common exponent value e tmp aligns the temporary mantissa m tmp to determine e c and a second temporary mantissa value m tmp2 , where e c is the exponent value of f c .
  • the alignment at step 514 also referred to as normalization 514, is performed by alignment 416, as explained above.
  • step 516 After completing step 514, step 516 begins.
  • the second temporary mantissa value m tmp2 undergoes a rounding operation in rounding 414 to determine m c of f c .
  • Several rounding operations may be performed as described above.
  • Example embodiments also implement an optional step 518, which starts after completing step 516, for error checking. Error checking may include detecting a subnormal set of integers or the existence of a catastrophic cancellation.
  • the result of the addition is a floating-point result 418.
  • more complex computations that include multiple operations may be stored in an accumulator (not shown).
  • performing a vector multiplication requires multiple multiplications and additions and results in an integer value set of preliminary computations stored in the accumulator (not shown).
  • the final result (e.g., vector multiplication result) determined from further computations is the floating-point result 418.
  • the computation module 306 may emulate the arithmetic operations of MAC (multiply-accumulate) units, which are widely used for matrix multiplication.
  • MAC multiply-accumulate
  • example embodiments may describe using the emulation engine 216 with a custom floating-point format instead of a format of the IEEE754 standard.
  • the emulation engine 216 for MAC units may also accept parameters that control the accumulator bitwidth and data-path of the MAC unit. Example embodiments may describe using the emulation engine 216 for emulating computations of the MAC unit, which may be used for deep learning applications.
  • FIG. 6 is a flowchart of a training method 600 for using the emulation engine 216 in training a deep neural network.
  • the emulation engine 216 may be applied in the context of deep learning to perform modelling, extraction, preprocessing, training, and the like on training data. For example, training a deep neural network model uses emulation engine 216 instead of the FPU 218 to optimize a deep neural network model.
  • examples disclosed herein relate to a large number of neural network applications. For ease of understanding, the following describes some concepts relevant to neural networks and some relevant terms that may be related to examples disclosed herein.
  • a neural network consists of neurons structured as layers of neurons.
  • a neuron is a module that uses x s as inputs to the neuron.
  • An output from the module may be:
  • W s is a weight of x s
  • b is an offset (i.e. , bias) of the neuron
  • a is an activation function of the neuron and used to introduce a nonlinear feature to the neural network. It is to be appreciated that most of the values of W s , x s , and b are floating values, and computation of equation (1 ) may be performed in the FPU 218. However, example embodiments of the present disclosure utilize the emulation engine 216 instead.
  • the output of the activation function may be used as an input to a neuron of a following layer in the neural network.
  • the activation function may be a sigmoid function, for example.
  • the neural network is formed by joining a plurality of the foregoing single neurons.
  • a deep neural network is also referred to as a multi-layer neural network and may be understood as a neural network that includes a first layer (generally referred to as an input layer), a plurality of hidden layers, and a final layer (generally referred to as an output layer).
  • the "plurality" herein does not have a special metric.
  • a layer is considered to be a fully connected layer when there is a full connection between two adjacent layers of the neural network. To be specific, for two adjacent layers (e.g., the i-th layer and the (i+1 )-th layer) to be fully connected, each and every neuron in the i-th layer must be connected to each and every neuron in the (i+1 )-th layer.
  • weight W is used as an example.
  • V7 2 3 4 a linear weight from a fourth neuron at a second layer to a second neuron at a third layer
  • the superscript 3 indicates a layer (i.e., the third layer (or layer-3) in this example) of the weight W, and the subscript indicates the output is at layer-3 index 2 (i.e., the second neuron of the third layer) and the input is at layer-2 index 4 (i.e., the fourth neuron of the second layer).
  • a weight from a k-th neuron at an (L-1 )-th layer to a j-th neuron at an L-th layer may be denoted It should be noted that there is no W parameter at the input layer.
  • More hidden layers in a DNN may enable the DNN to better model a complex situation (e.g., a real-world situation).
  • a DNN with more parameters is more complex, has a larger capacity (which may refer to the ability of a learned model to fit a variety of possible scenarios), and indicates that the DNN can complete a more complex learning objective.
  • Training of the DNN is a process of learning the weight matrix.
  • a purpose of the training is to obtain a trained deep neural network model, which consists of parameters with the values of the learned weights W of all layers of the DNN and biases b.
  • Training is the process of generating a DNN model. All model parameter values are initialized at step 602. The parameters include values W and b.
  • step 604 the DNN model is to be trained over multiple epochs.
  • the epoch number is initialized to 0.
  • a full corpus of training data is split into multiple batches (as well as a validation dataset).
  • Method 600 then proceeds to step 606, where method 600 compares the epoch number to a target number of epochs. If the target number of epochs is not reached, the method 600 proceeds to step 608, where the DNN model is optimized.
  • the model optimization at 608 includes two primary steps: performing forward propagation at step 610 and backpropagation at step 612.
  • method 600 sends each batch of training data through forward propagation to generate outputs of the DNN model.
  • the outputs of the DNN model which are the predicted values, are compared to desired target values (e.g., ground-truth values), and an error (loss) is computed.
  • the loss is a way to quantitatively represent how close the predicted values are to the target values.
  • the method 600 then proceeds to step 612, at which the loss is backpropagated to adjust the weights W and biases b of the DNN model before receiving the next batch of training data.
  • a defined loss function is calculated from forward propagation at step 610 of an input batch to an output of the DNN model.
  • Backpropagation at step 612 calculates a gradient of the loss function with respect to the parameters (W and b) of the DNN, and a gradient algorithm (e.g., gradient descent) is used to update the parameters to reduce the loss function.
  • a gradient algorithm e.g., gradient descent
  • Backpropagation is performed iteratively so that the loss function is converged or minimized.
  • This model optimization at step 608 repeats the forward propagation and backpropagation of batches until all batches of the epoch are processed. All computations performed in the model optimization at step 608 uses the emulation engine 216.
  • step 614 at which the epoch number is incremented, and another epoch starts with different batches from the same training data.
  • step 614 After incrementing the number of epochs at step 614, method 600 compares the epoch number to the target number of epochs at step 606. If the target number of epochs is reached, method 600 terminates at step 616, and an optimized DNN model is generated and outputted at step 618.
  • the weights and biases of the DNN model should converge to an equilibrium state, indicating that the DNN model has been optimally trained relative to the full set of training data.
  • Method 600 is performed by the emulation engine 216.
  • a user may enter parameters such as the custom floating-point format and rounding operation to use.
  • the forward propagation at step 610 and the backpropagation at step 612 are implemented in a high-level language in the software library 302.
  • the software library 302 sends instructions to the control unit 304 containing computations required to be performed to achieve the forward propagation at step 610 and backpropagation at step 612.
  • the control unit 304 sends instructions to the computation module 306 to perform the computations.
  • the activation function may involve multiple operations as well, i.e. , implementing a sigmoid function. Therefore, the control unit 304 sends instructions to the computation units 308 to perform the computations.
  • the emulation engine 216 converts floating-point operands to integer sets, the floating-point operands including parameters of the DNN such as W, b, and the batch training data as input to the emulation engine 216.
  • the emulation engine 216 After performing computations in the forward propagation 610 and backpropagation 612, the emulation engine 216 converts the integer value of the floating-point result 418 to floating-point outputs 422, which will become floating-point operands for the forward propagation step 610 in another iteration of method 600.
  • the processing unit 202 processes a library function that is part of the software library 302, then the emulation engine 216 is involved. In this case, the computations are performed in software rather than hardware (FPU 218). On the other hand, if the processing unit 202 processes a library function that is not part of the software library 302, then the processing unit 202 uses an alternative method for computing, such as the FPU 218.
  • the training method 600 illustrated the forward propagation at step 610 and the backpropagation at step 612 are part of the software library 302.
  • Other example embodiments may describe method 600 to include other steps as part of the emulation engine 216.
  • Other example embodiments may describe fewer steps are part of the software library 302, i.e. , just the forward propagation at step 610 or just the backpropagation at step 612.
  • Example embodiments disclose using the emulation engine 216 to maintain and examine the computational stability of deep learning models, such as deep neural network models, for deep learning tasks. Stability is the study of how the performance, based on a performance measure, is affected by small changes in parameters. In this case, the small change in parameters may result from converting the parameters (W, b, and x) from a first floating-point format, such as one format of the IEEE754 standard, to a custom floating-point format, as described in detail in FIG. 3.
  • a first floating-point format such as one format of the IEEE754 standard
  • Advantages may arise from using an FPU according to the custom floating-point format rather than the formats of the IEEE754 standard include speed, accuracy, and energy consumption.
  • the emulation engine 216 is an essential tool for hardware designers to study the behaviour of a custom floating-point format for various applications before fabricating the custom floating-point format into a hardware device such as FPU 218.
  • FIG. 7 is a flowchart for the emulation engine method 700 for a floatingpoint unit.
  • Method 700 starts at step 702 where one or more floating-point operands having a first floating-point format are received. Afterwards step 704 begins. At step 704, the one or more floating-point operands having the first floating-point format are converted into a first-set of integers having the first floating-point format. Method 700 then proceeds to step 706. At step 706, each of the first set of integers is converted into a second set of integers having a second floating-point format that is different from the first floating-point format. The first set of integers and the second set of integers each has a defined bit length depending on respective floating-point format.
  • the first floatingpoint format may be a floating-point format according to one or more formats of the IEEE754 standard.
  • the second floating-point format may be a custom floating-point format different from the first floating-point format.
  • Step 706 ends and step 708 begins.
  • computations for a task are performed on the second set of integers to emulate computations performed by the floating-point unit using the one or more floating-point operands having the second floating-point format.
  • the disclosed methods may be carried out by modules, routines, or subroutines of software executed by the computing device 200. Coding of software for carrying out the steps of the methods is well within the scope of a person of ordinary skill in the art having regard to the methods of emulating floating-point unit using an emulation engine.
  • the emulation engine method 700 may contain additional or fewer steps than shown and described, and the steps may be performed in a different order.
  • Computer-readable instructions, executable by the processor(s) of the computing device 200 may be stored in the memory 210 of the computing device 200 or a computer-readable medium. It is to be emphasized that the steps of the emulation engine method need not be performed in the exact sequence as shown unless otherwise indicated.
  • the emulation engine method 700 of the present disclosure once implemented, can be performed by the computing device 200 in a fully automatic manner, which is convenient for users to use as no manual interaction is needed.
  • systems, devices and processes disclosed for emulating an FPU and shown herein may comprise a specific number of elements/components, the systems, devices, and assemblies could be modified to include additional or fewer of such elements/components.
  • any of the elements/components disclosed may be referenced as being singular, the embodiments disclosed herein could be modified to include a plurality of such elements/components.
  • the subject matter described herein intends to cover and embrace all suitable changes in technology.
  • the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual requirements to achieve the objectives of the solutions of the embodiments.
  • functional units in the example embodiments may be integrated into one computing device 200, or each of the units may exist alone physically, or two or more units are integrated into one unit.
  • the functions When the functions are implemented in the form of a software functional unit and sold or used as an independent product, they may be stored in a storage medium and include several instructions for instructing a computer device (which may be a personal computer, a server, or a network device) to perform all or some of the steps of the methods described in the embodiments of this application.
  • a computer device which may be a personal computer, a server, or a network device
  • the foregoing storage medium includes any medium that can store program code, such as a universal serial bus (USB) flash drive, a removable hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disc, among others.
  • USB universal serial bus
  • ROM read-only memory
  • RAM random access memory
  • magnetic disk or an optical disc, among others.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Nonlinear Science (AREA)
  • Complex Calculations (AREA)

Abstract

L'invention concerne des systèmes et des procédés d'émulation d'une unité à virgule flottante. Le procédé reçoit un ou plusieurs opérandes à virgule flottante présentant un premier format en virgule flottante. L'opérande ou chacun des opérandes à virgule flottante présentant le premier format en virgule flottante est converti en un premier ensemble d'entiers présentant le premier format en virgule flottante. En outre, chaque entier du premier ensemble d'entiers est converti en un second ensemble d'entiers présentant un second format en virgule flottante qui est différent du premier format en virgule flottante. Le premier ensemble d'entiers et le second ensemble d'entiers présentent chacun une longueur en bits définie dépendant du format respectif en virgule flottante. Enfin, le procédé effectue des calculs pour une tâche en utilisant chaque entier du second ensemble d'entiers pour émuler des calculs effectués par l'unité à virgule flottante en utilisant l'opérande ou les opérandes à virgule flottante présentant le second format en virgule flottante.
PCT/CA2021/051241 2021-09-08 2021-09-08 Procédé et système d'émulation d'une unité à virgule flottante Ceased WO2023035053A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CA2021/051241 WO2023035053A1 (fr) 2021-09-08 2021-09-08 Procédé et système d'émulation d'une unité à virgule flottante

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CA2021/051241 WO2023035053A1 (fr) 2021-09-08 2021-09-08 Procédé et système d'émulation d'une unité à virgule flottante

Publications (1)

Publication Number Publication Date
WO2023035053A1 true WO2023035053A1 (fr) 2023-03-16

Family

ID=85506028

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA2021/051241 Ceased WO2023035053A1 (fr) 2021-09-08 2021-09-08 Procédé et système d'émulation d'une unité à virgule flottante

Country Status (1)

Country Link
WO (1) WO2023035053A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20240036817A1 (en) * 2022-08-01 2024-02-01 Electronics And Telecommunications Research Institute System-on-a-chip including soft float function circuit
GB2634624A (en) * 2023-09-08 2025-04-16 Advanced Risc Mach Ltd System emulation of a floating-point dot product operation

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006318382A (ja) * 2005-05-16 2006-11-24 Renesas Technology Corp 演算装置および型変換装置
US20190340499A1 (en) * 2018-05-04 2019-11-07 Microsoft Technology Licensing, Llc Quantization for dnn accelerators
CN110555508A (zh) * 2018-05-31 2019-12-10 北京深鉴智能科技有限公司 人工神经网络调整方法和装置
US10574260B2 (en) * 2016-01-20 2020-02-25 Cambricon Technologies Corporation Limited Techniques for floating-point number conversion
US20210109709A1 (en) * 2019-02-06 2021-04-15 International Business Machines Corporation Hybrid floating point representation for deep learning acceleration
US11043962B2 (en) * 2018-02-26 2021-06-22 Fujitsu Limited Information processing apparatus, information processing method, and recording medium
US20210208881A1 (en) * 2020-01-07 2021-07-08 SK Hynix Inc. Neural network system with multiplication and accumulation(mac) operator

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006318382A (ja) * 2005-05-16 2006-11-24 Renesas Technology Corp 演算装置および型変換装置
US10574260B2 (en) * 2016-01-20 2020-02-25 Cambricon Technologies Corporation Limited Techniques for floating-point number conversion
US11043962B2 (en) * 2018-02-26 2021-06-22 Fujitsu Limited Information processing apparatus, information processing method, and recording medium
US20190340499A1 (en) * 2018-05-04 2019-11-07 Microsoft Technology Licensing, Llc Quantization for dnn accelerators
CN110555508A (zh) * 2018-05-31 2019-12-10 北京深鉴智能科技有限公司 人工神经网络调整方法和装置
US20210109709A1 (en) * 2019-02-06 2021-04-15 International Business Machines Corporation Hybrid floating point representation for deep learning acceleration
US20210208881A1 (en) * 2020-01-07 2021-07-08 SK Hynix Inc. Neural network system with multiplication and accumulation(mac) operator

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
BENOIT JACOB ET AL: "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference", ARXIV, 15 December 2017 (2017-12-15), pages 1 - 14, XP002798211, Retrieved from the Internet <URL:https://arxiv.org/pdf/1712.05877.pdf> [retrieved on 20200310] *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20240036817A1 (en) * 2022-08-01 2024-02-01 Electronics And Telecommunications Research Institute System-on-a-chip including soft float function circuit
GB2634624A (en) * 2023-09-08 2025-04-16 Advanced Risc Mach Ltd System emulation of a floating-point dot product operation

Similar Documents

Publication Publication Date Title
US20230267319A1 (en) Training neural network accelerators using mixed precision data formats
US10621486B2 (en) Method for optimizing an artificial neural network (ANN)
US20230196085A1 (en) Residual quantization for neural networks
EP3899801B1 (fr) Apprentissage mis à l&#39;échelle pour apprentissage de dnn
US12045724B2 (en) Neural network activation compression with outlier block floating-point
US11645493B2 (en) Flow for quantized neural networks
CN109543816B (zh) 一种基于权重捏合的卷积神经网络计算方法和系统
US20180046903A1 (en) Deep processing unit (dpu) for implementing an artificial neural network (ann)
US20190340499A1 (en) Quantization for dnn accelerators
Lin et al. Towards fully 8-bit integer inference for the transformer model
US20170061279A1 (en) Updating an artificial neural network using flexible fixed point representation
JP2019139338A (ja) 情報処理装置、情報処理方法、およびプログラム
Murillo et al. Energy-efficient MAC units for fused posit arithmetic
JP2022042467A (ja) 人工ニューラルネットワークモデル学習方法およびシステム
CN113869517B (zh) 一种基于深度学习模型的推理方法
WO2023035053A1 (fr) Procédé et système d&#39;émulation d&#39;une unité à virgule flottante
US20230376769A1 (en) Method and system for training machine learning models using dynamic fixed-point data representations
CN110770696B (zh) 基于贡献估计的处理核心操作抑制
CN111860838A (zh) 一种神经网络的全连接层计算方法和装置
CN110956252B (zh) 执行多个神经网络的计算的方法和计算装置
CN119376686A (zh) 一种数据处理方法及相关设备
US7945061B1 (en) Scalable architecture for subspace signal tracking
CN114692861A (zh) 计算图更新方法、计算图处理方法以及相关设备
EP4535162A1 (fr) Dispositif et procédé de calcul
JP7506276B2 (ja) 半導体ハードウェアにおいてニューラルネットワークを処理するための実装および方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21956275

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21956275

Country of ref document: EP

Kind code of ref document: A1