[go: up one dir, main page]

CN119536684A - A floating-point multiplier, calculation method and device based on FPGA - Google Patents

A floating-point multiplier, calculation method and device based on FPGA Download PDF

Info

Publication number
CN119536684A
CN119536684A CN202510104500.8A CN202510104500A CN119536684A CN 119536684 A CN119536684 A CN 119536684A CN 202510104500 A CN202510104500 A CN 202510104500A CN 119536684 A CN119536684 A CN 119536684A
Authority
CN
China
Prior art keywords
mantissa
floating
point number
input
adder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202510104500.8A
Other languages
Chinese (zh)
Other versions
CN119536684B (en
Inventor
桑健
魏朝飞
赵鑫鑫
姜凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Inspur Science Research Institute Co Ltd
Original Assignee
Shandong Inspur Science Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Inspur Science Research Institute Co Ltd filed Critical Shandong Inspur Science Research Institute Co Ltd
Priority to CN202510104500.8A priority Critical patent/CN119536684B/en
Publication of CN119536684A publication Critical patent/CN119536684A/en
Application granted granted Critical
Publication of CN119536684B publication Critical patent/CN119536684B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • G06F7/487Multiplying; Dividing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Nonlinear Science (AREA)
  • Complex Calculations (AREA)

Abstract

本申请涉及浮点乘法领域,具体公开了一种基于FPGA的浮点乘法器、计算方法及设备,浮点乘法器包括:符号计算模块,用于通过异或门以及第一输入浮点数以及第二输入浮点数的符号位,确定目标输出浮点数的符号;指数加法模块,用于将第一输入浮点数以及第二输入浮点数的指数相加,并减去相应格式的偏差值,以得到目标输出浮点数的指数输出;二进制乘法器模块,基于第一输入浮点数以及第二输入浮点数的尾数位宽,使用Karatsuba算法以及Urdhva‑Tiryagbhyam算法,进行乘法运算,以得到目标输出浮点数的尾数乘积;结果归一化模块,用于基于尾数乘积进行归一化操作。能够减少计算延迟,还降低了硬件面积的百分比增长。

The present application relates to the field of floating-point multiplication, and specifically discloses a floating-point multiplier, a calculation method and a device based on FPGA, wherein the floating-point multiplier comprises: a sign calculation module, which is used to determine the sign of a target output floating-point number through an XOR gate and the sign bits of a first input floating-point number and a second input floating-point number; an exponent addition module, which is used to add the exponents of the first input floating-point number and the second input floating-point number, and subtract a deviation value of a corresponding format to obtain the exponential output of the target output floating-point number; a binary multiplier module, which performs multiplication based on the mantissa bit width of the first input floating-point number and the second input floating-point number, using the Karatsuba algorithm and the Urdhva‑Tiryagbhyam algorithm, to obtain the mantissa product of the target output floating-point number; and a result normalization module, which is used to perform a normalization operation based on the mantissa product. The calculation delay can be reduced, and the percentage increase of the hardware area is also reduced.

Description

Floating point multiplier based on FPGA, calculation method and equipment
Technical Field
The application relates to the field of floating point multiplication, in particular to a floating point multiplier based on an FPGA, a calculation method and equipment.
Background
In recent years, with the development and iteration of intelligent products, electronic devices are rapidly updated in the directions of small volume, low power consumption and high speed. For electronic products, the speed depends on arithmetic operation, and meanwhile, in the current intelligent age, the application of technologies such as multimedia, artificial intelligence, machine learning, deep learning, internet of things and the like also relates to huge basic arithmetic calculation. Floating point multiplication is a critical operation in high performance computing applications. However, floating point operations are not only complex, but they require more hardware area and power consumption than fixed point multipliers. With the increase in accuracy, the area, delay and power consumption of the floating-point multiplier all increase dramatically. Over the past few decades, efforts have been made to improve the performance of floating point computing.
The traditional floating-point multiplier design method has certain limitations when facing the problems, and often cannot meet the requirements of high speed, low power consumption and high precision. The IEEE 754 standard supports different floating point formats, such as single precision, double precision, etc., but multiplier designs of each format face similar challenges. Therefore, there is a need for a new floating-point multiplier that overcomes the deficiencies of the prior art.
Disclosure of Invention
In order to solve the problems, the application provides a floating-point multiplier, a calculation method and equipment based on an FPGA, wherein the floating-point multiplier is applied to floating-point multiplication under the IEEE-754 format and comprises a sign calculation module, an exponent addition module and a binary multiplier module, wherein the sign calculation module is used for determining signs of target output floating points through an exclusive OR gate and sign bits of first input floating points and second input floating points, the exponent addition module is used for adding exponents of the first input floating points and the second input floating points and subtracting deviation values in corresponding formats to obtain exponent output of the target output floating points, the binary multiplier module is used for multiplying the first mantissas of the first input floating points and the second mantissas of the second input floating points by using a Karatuba algorithm and a Urdhva-Tiryagbhyam algorithm, and the normalization module is used for performing normalization operation based on the mantissa product and comprises at least one of shifting operation and exponent value adjustment.
In one example, the multiplying operation is performed on the first mantissa of the first input floating point number and the second mantissa of the second input floating point number by using a Karatsuba algorithm and a Urdhva-Tiryagbhyam algorithm based on the mantissa bit widths of the first input floating point number and the second input floating point number, specifically includes determining that the mantissa bit widths meet a first preset condition, performing a divide-by-divide multiplication operation on the first mantissa and the second mantissa by using an improved Karatsuba algorithm until a mantissa product of the target output floating point number is obtained or an intermediate product result meets a second preset condition, and when the intermediate product result meets the second preset condition, inputting the intermediate product result as an improved Urdhva-Tiryagbhyam algorithm, and continuing the multiplying operation until a mantissa product of the target output floating point number is obtained.
In one example, the dividing and multiplying the first mantissa and the second mantissa by using a modified Karatsuba algorithm specifically includes determining a length ratio between the first mantissa and the second mantissa, determining a numerical distribution of the first mantissa and the second mantissa, the numerical distribution being a number of digits and a digit position of values zero in adjacent preset digits, determining dividing points of the first mantissa and the second mantissa based on the length ratio and the numerical distribution, and dividing the first mantissa and the second mantissa according to the dividing points.
In one example, an improved Karatsuba algorithm is used for carrying out divide-by-multiply operation on the first mantissa and the second mantissa, and the method specifically comprises the steps of setting a parallel computing unit, a register and a control logic module in the floating point multiplier based on the FPGA, wherein the parallel computing unit is used for concurrently executing computing tasks in the same clock cycle, the computing tasks comprise multiplication computation, addition computation and result merging, the register is used for temporarily storing intermediate results and transmitting data between different computing tasks, and the control logic module is used for managing starting, completion and error processing of each computing task.
In one example, before the intermediate product result is input as the improved Urdhva-Tiryagbhyam algorithm and multiplication operation is continued, the binary multiplier module is further configured to determine the number of computations corresponding to different preset computing tasks respectively when the Urdhva-Tiryagbhyam algorithm is used, where the preset computing tasks include the same value pairs appearing at different time points or different positions and fixed digit combinations appearing in the multiplication tasks, and store the input item group of the preset computing tasks, the type of adder used and the computing result after hash mapping in a preset computing table.
In one example, the method for inputting the intermediate product result as the improved Urdhva-Tiryagbhyam algorithm and continuing the multiplication operation specifically includes determining a local correlation between the intermediate product result and each of the input term groups in the preset computation table, if there is a local correlation between an input term group and the intermediate product result that is higher than a first preset threshold and lower than a second preset threshold, reading an adder model corresponding to the input term group in the preset computation table, using the adder model as an adder of the intermediate product result, and if the local correlation is higher than the second preset threshold, reading a corresponding computation result in the preset computation table as a computation result of a partial computation task of the first mantissa and the second mantissa.
In one example, the binary multiplier module has a plurality of types of adders disposed therein, wherein the types of adders include at least one of a carry propagate adder, a carry save adder, a carry select adder, and a carry look ahead adder.
In one example, the binary multiplier module is further configured to dynamically select a type of adder based on a bit width and a data characteristic of an intermediate product result, the dynamically selecting the type of adder based on the bit width and the data characteristic of the intermediate product result, specifically including determining an addition carry probability of the intermediate product result based on a vector machine model to determine a first adder expectation of the intermediate product result, determining a second adder expectation of the intermediate product result based on the bit width of the intermediate product result, determining a time limit for an addition operation based on a delay requirement set by a user, and determining a third adder expectation of the intermediate product result based on the time limit, determining a fourth adder expectation of the intermediate product result based on a number and a type of available hardware resources, and determining an adder type corresponding to the intermediate product result based on the first adder expectation, the second adder expectation, the third adder expectation, and the fourth adder expectation.
The application further provides a calculation method of the floating-point multiplier based on the FPGA, which comprises the steps of determining the sign of a target output floating point through an exclusive OR gate and sign bits of a first input floating point and a second input floating point, adding the exponents of the first input floating point and the second input floating point, subtracting deviation values of corresponding formats to obtain exponent output of the target output floating point, carrying out multiplication operation on the first mantissa of the first input floating point and the second mantissa of the second input floating point by using a Karatuba algorithm and a Urdhva-Tiryagbhyam algorithm based on the mantissa width of the first input floating point and the second mantissa of the second input floating point to obtain a mantissa product of the target output floating point, and carrying out normalization operation based on the mantissa product, wherein the normalization operation comprises at least one of shifting operation and adjusting the exponent value.
The application also provides a computing device of the floating-point multiplier based on the FPGA, which comprises at least one processor and a memory which is in communication connection with the at least one processor, wherein the memory stores instructions which can be executed by the at least one processor, the instructions are executed by the at least one processor, so that the at least one processor can execute the steps of determining the sign of a target output floating point through sign bits of an exclusive-OR gate, a first input floating point and a second input floating point, adding the exponents of the first input floating point and the second input floating point and subtracting deviation values in corresponding formats to obtain the exponent output of the target output floating point, multiplying the first mantissa and the second mantissa of the first input floating point by using a Kalsusuba algorithm and a Urdhva-Tiryagbhyam algorithm to obtain the sign of the target output floating point, and performing normalization operation on the mantissa based on the normalized product of the first input floating point and the exponent value, and performing at least one of the normalization operation based on the exponent product and the shift operation.
The method provided by the application has the following beneficial effects:
1. By combining the Karatuba algorithm and the Urdhva-Tiryagbhyam algorithm, the time of mantissa multiplication operation can be obviously reduced, so that the operation speed of the whole floating-point multiplier is improved. Especially, the fast floating point multiplication capability can greatly improve the performance and response speed of the system in the face of high-speed application scenes such as real-time signal processing and graphic rendering.
2. Compared with the traditional floating point multiplier design method, the method adopts the algorithm combination to reduce the number of multipliers and hardware complexity. Particularly, when high-precision floating point operation is processed, a large number of complex hardware circuits are not needed to realize multiplication operation, so that the chip area requirement is reduced. For hardware platforms such as FPGA, the method can more effectively utilize limited resources and reduce manufacturing cost.
3. Because of the reduction of hardware complexity and the improvement of operation speed, the power consumption of the floating-point multiplier in the running process is correspondingly reduced. In applications sensitive to power consumption, such as mobile devices and embedded systems, the characteristic of low power consumption can prolong the endurance time of the device and improve the practicability of the device.
4. The floating point multiplier design provided by the application is based on an IEEE-754 standard format, and has good compatibility with the existing computer system and software. The system can be conveniently integrated into various computer hardware platforms and application programs, and large-scale modification of the existing system is not needed, so that the application cost and popularization difficulty are reduced.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1 is a schematic diagram of a floating-point multiplier based on an FPGA in an embodiment of the application;
FIG. 2 is a schematic diagram of an FPGA-based floating point multiplier in-process calculation in an embodiment of the application;
FIG. 3 is a schematic diagram of a computation process in a binary multiplier module according to an embodiment of the present application;
FIG. 4 is a schematic diagram illustrating a calculation process of a Karatuba multiplier according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a hardware configuration of a 4x4 Urdhva-Tiryagbhyam multiplier according to an embodiment of the present application;
fig. 6 is a flowchart of a method for calculating a floating point multiplier based on an FPGA according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The following describes in detail the technical solutions provided by the embodiments of the present application with reference to the accompanying drawings.
As shown in FIGS. 1 and 2, the present application provides a floating-point multiplier based on an FPGA, which is applied to floating-point multiplication under IEEE-754 format, where a floating-point number is composed of a sign, an exponent, a mantissa and an exponent base. For the multiplication of two floating-point sums, the final result is obtained by calculating the product of sign, exponent and mantissa separately. The floating-point multiplier includes:
And the symbol calculation module is used for determining the symbol of the target output floating point number through the exclusive OR gate and the symbol bits of the first input floating point number and the second input floating point number. Specifically, the sign computation module determines the sign of the product based on the Most Significant Bit (MSB) of the input floating point number. The sign of the product may be positive, for example, by a simple XOR gate, if the sign bits of the two floating point numbers are the same, and negative if the sign bits are different. The sign calculating method based on the exclusive-OR gate is simple and efficient, does not need a complex circuit structure, can quickly determine the sign of the product, and saves time for the whole floating-point multiplication operation.
And the exponent addition module is used for adding the exponents of the first input floating point number and the second input floating point number and subtracting the deviation value of the corresponding format to obtain the exponent output of the target output floating point number. Specifically, the exponent adding module is configured to add the input exponents and subtract the deviation values in the corresponding formats to obtain the actual product exponents. In the IEEE-754 standard, the exponent in single precision format is 8 bits wide, the offset is 127, the exponent in double precision format is 11 bits wide, and the offset is 1023. Because the calculation time of mantissa multiplication operation is much longer than that of exponent addition, simple travelling wave carry adder and travelling wave borrow subtracter are adopted to carry out the addition and subtraction operation of exponents, the occupied resources are less, and the hardware complexity and cost can be reduced.
And the binary multiplier module is used for multiplying the first mantissa of the first input floating point number and the second mantissa of the second input floating point number by using a Karatuba algorithm and a Urdhva-Tiryagbhyam algorithm based on the mantissa bit widths of the first input floating point number and the second input floating point number so as to obtain a mantissa product of the target output floating point number. In floating point multiplication, the most important, complex part is mantissa multiplication. Multiplication requires more time than addition. And as the number of bits increases, it consumes more area and time. In order to reduce the area of multiplication and improve the efficiency of multiplication, the invention adopts a method of combining Karatuba algorithm and Urdhva-Tiryagbhyam algorithm.
And the result normalization module is used for performing normalization operation based on the mantissa product, wherein the normalization operation comprises at least one of shifting operation on the mantissa and adjusting index value. In an IEEE-754 floating point representation, there is one hidden bit for the mantissa, typically 1. This hidden bit has a special handling during storage and operation. When the most significant bit of the multiplication result is not the first bit to the left of the hidden bit, the result needs to be normalized to meet the requirements of the IEEE-754 standard format. The operation of the result normalization module includes shifting the mantissa and adjusting the exponent value accordingly. If the most significant bit of the multiplication result is not the first bit to the left of the hidden bit, the result is shifted to the left until the most significant bit is 1 (non-zero is 1 at radix 2). The index value is incremented by 1 for each left shift operation.
By combining the advantages of the Karatuba algorithm and the Urdhva-Tiryagbhyam algorithm, the application not only reduces the delay, but also reduces the percentage increase of the hardware area, and has remarkable advantages compared with the traditional method.
Specifically, as shown in fig. 3, when the binary multiplier module performs floating-point multiplication, the Karatsuba algorithm is a divide-and-conquer algorithm, and is suitable for processing multiplication operations of high-order numbers. When the input bit width is larger, the algorithm is more efficient. However, at lower bit widths, the Karatsuba algorithm is not efficient. To solve this problem, the present invention uses Urdhva-Tiryagbhyam algorithm applicable to low bit widths at lower bit widths. Specifically, it is determined that the mantissa bit width satisfies a first preset condition (e.g., when the bit widths of the operands are all greater than eight bits), and a modified Karatsuba algorithm is used to perform a divide-by-divide multiplication operation on the first mantissa and the second mantissa until a mantissa product of the target output floating point number is obtained, or the intermediate product result satisfies a second preset condition (e.g., when the bit widths of the operands are all eight bits). When the intermediate product result meets a second preset condition, the intermediate product result is input as an improved Urdhva-Tiryagbhyam algorithm, and multiplication operation is continued until the mantissa product of the target output floating point number is obtained.
The differences between the improved Karatsuba algorithm of the present application and the existing Karatsuba algorithm are described herein:
The traditional Karatsuba algorithm adopts a fixed segmentation mode, divides two large integers into two parts, and then recursively calculates the result of each group of sub-problems. This approach is applicable in most cases, but may not be the optimal choice in some special cases. For example, when one operand is much larger than another, direct application of Karatsuba may result in unnecessary complexity.
Therefore, the application introduces a dynamic segmentation strategy, and adaptively adjusts segmentation points according to the characteristics of input data. For unbalanced operands, the larger numbers may be split asymmetrically so that each recursive call can maximize the speed advantage of utilizing the addition operation. This reduces unnecessary levels of recursion and reduces overall delay. In the asymmetric division, a length ratio between the first mantissa (or the first operand in the calculation process) and the second mantissa (or the second operand in the calculation process) can be determined, and a numerical distribution of the first mantissa and the second mantissa, which is a number of digits and a digit position of which a numerical value in adjacent preset digits is zero, is determined. And then determining the dividing points of the first mantissa and the second mantissa based on the length ratio and the numerical distribution, and dividing the first mantissa and the second mantissa according to the dividing points. When determining the segmentation points, the segmentation expectations corresponding to the segmentation points arranged at different positions can be calculated through preset weights, and the proper segmentation points can be selected through the segmentation expectations. In a hardware implementation, the partitioning point may be determined according to a performance index of the hardware (e.g., delay of an adder, delay of a multiplier, etc.). If the performance overhead of the different length operand multiply and add operations on the hardware is known, the split point can be selected based on this information so that the overall computational overhead is minimized. The asymmetric segmentation provided by the application can reduce the recursion level, thereby simplifying the calculation flow, and for unbalanced operands, the asymmetric segmentation can segment larger numbers into parts which are closer to smaller numbers, thereby reducing the number of recursion calls. Meanwhile, the addition load is balanced, the parallel processing capacity is improved, and the addition tasks generated by recursion at each time can be more uniformly distributed through asymmetric segmentation, so that excessive or insufficient addition requirements at certain stages are avoided. And operands with different scales are also adapted, so that resource waste is avoided, and for operands with larger length differences, symmetrical segmentation can cause that a part of computing resources are wasted on processing redundant data. The asymmetric segmentation can be flexibly adjusted according to the actual input, so that each part can be processed most effectively.
Meanwhile, the conventional Karatsuba algorithm is performed serially, i.e., it is necessary to wait for the completion of the previous step before starting the next step. This results in a long delay time in hardware implementation, especially when processing high precision values. Therefore, the application adopts a multi-stage pipeline structure, the task of each stage is decomposed into smaller subtasks, and the subtasks are executed in parallel. Specifically, a plurality of parallel units may be provided in hardware, each of which is responsible for different sub-multiplications and addition operations. This not only reduces overall delay but also improves throughput, since multiple multiplications can be performed at the same time. Specifically, a parallel computing unit, a register and a control logic module are arranged in a binary multiplier module of the floating-point multiplier. The parallel computing unit is used for concurrently executing computing tasks in the same clock period, wherein the computing tasks comprise multiplication computing, addition computing and result merging. The register is used for temporarily storing intermediate results and transmitting data among different computing tasks, and the control logic module is used for managing the starting, the completion and the error processing of each computing task.
In one embodiment, the existing Urdhva-Tiryagbhyam algorithm is essentially a method of vertical and cross multiplication that quickly generates partial products through a series of rules. However, for some common patterns or repeated data combinations, each recalculation results in wasted resources.
Thus, in the improved Urdhva-Tiryagbhyam algorithm of the present application, the local correlation of the data can be used for pre-computation and buffering when generating partial products. Meanwhile, a pre-calculation table mechanism is added, and a pre-calculation part of multiplication results are stored in a small lookup table. The result can be read directly from the look-up table whenever the same pattern of data is encountered, avoiding repeated calculations. The look-up table may be tailored to the expected application scenario to cover the most frequently occurring data patterns. In addition, the size of the lookup table can be reduced by combining the compression technology, so that the memory space is saved. Specifically, when Urdhva-Tiryagbhyam algorithm is used, the number of computations corresponding to different preset computing tasks is determined, where the preset computing tasks include the same number pairs appearing at different time points or different positions and fixed digit combinations appearing in the multiplication tasks, and then the input item group of the preset computing tasks, the type of adder used and the computation result after hash mapping are stored in a preset computing table.
Further, when the improved Urdhva-Tiryagbhyam algorithm is used for calculation, the local correlation between the intermediate product result and each input item group in the preset calculation table needs to be determined, if the local correlation between the input item group and the intermediate product result is higher than a first preset threshold and lower than a second preset threshold, the adder model corresponding to the input item group is read in the preset calculation table, and the adder model is used as an adder of the intermediate product result. And if the local correlation is higher than a second preset threshold value, reading a corresponding calculation result in a preset calculation table to serve as a calculation result of part of calculation tasks of the first mantissa and the second mantissa. The calculation of local dependencies is described herein as being performed by traversing two input operands to determine whether there are any numerical duplicates, identical bit level patterns, or local digits of a mathematical nature in the two operands. The bit level pattern here refers to that some bit combinations frequently occur in multiplication operations, for example, the low order or high order bits are always fixed. Mathematical characteristics herein refer to patterns formed based on mathematical laws, such as the input operand being the power of another input operand or a history operand, a multiple relationship, and so on.
In one embodiment, in order to accelerate the calculation process of the floating-point multiplier, the floating-point multiplier is provided with a plurality of types of adders, the adders can be selected based on the characteristics of input operands when the addition calculation is performed, and in one embodiment, the types of the adders comprise a carry propagation adder, a carry save adder, a carry select adder and a carry look ahead adder.
And each adder corresponds to the characteristics of different input operands, in particular, the carry propagation adder is suitable for small-scale or low-delay less demanding addition operations. The carry save adder is adapted to accumulate the results of a plurality of addition operations, reducing carry propagation delay. The carry select adder is suitable for large bit width data and can quickly generate a final result. Carry look ahead adders are suitable for delay sensitive applications such as high speed operations. In selecting an adder, a set of selection criteria may be defined to determine the type of adder that best suits the current situation. The criteria are exemplified herein, wherein the dynamically selecting the type of adder according to the bit width and the data characteristic of the intermediate product result specifically includes:
The method comprises the steps of determining an addition carry probability of an intermediate product result based on a vector machine model to determine a first adder expectation of the intermediate product result, determining a second adder expectation of the intermediate product result based on a bit width of the intermediate product result, determining a time limit of an addition operation based on a delay requirement set by a user, determining a third adder expectation of the intermediate product result based on the time limit, determining a fourth adder expectation of the intermediate product result based on the number and types of available hardware resources, and finally determining an adder type corresponding to the intermediate product result based on the first adder expectation, the second adder expectation, the third adder expectation and the fourth adder expectation. Finally, the adder type dynamic selection process can be managed through the use of decision trees or finite state machines where each node or state represents one possible selection condition and an edge or transition represents a transition from one condition to another.
Fig. 6 is a flowchart illustrating a method for calculating a floating point multiplier based on an FPGA according to one or more embodiments of the present disclosure. The process may be performed by computing devices in the respective areas, with some input parameters or intermediate results in the process allowing manual intervention adjustments to help improve accuracy.
The implementation of the analysis method according to the embodiment of the present application may be a terminal device or a server, which is not particularly limited in the present application. For ease of understanding and description, the following embodiments are described in detail with reference to a server.
It should be noted that the server may be a single device, or may be a system composed of a plurality of devices, that is, a distributed server, which is not particularly limited in the present application.
As shown in fig. 6, an embodiment of the present application provides a method for calculating a floating point multiplier based on an FPGA, including:
S601, determining the sign of the target output floating point number through the exclusive OR gate and the sign bits of the first input floating point number and the second input floating point number.
S602, adding the exponents of the first input floating point number and the second input floating point number, and subtracting the deviation value of the corresponding format to obtain the exponent output of the target output floating point number.
S603, based on the mantissa bit widths of the first input floating point number and the second input floating point number, performing multiplication operation on the first mantissa of the first input floating point number and the second mantissa of the second input floating point number by using a Karatuba algorithm and a Urdhva-Tiryagbhyam algorithm to obtain a mantissa product of the target output floating point number.
And S604, performing normalization operation based on the mantissa product, wherein the normalization operation comprises at least one of shifting mantissa and adjusting exponent value.
The embodiments of the present application are described in a progressive manner, and the same and similar parts of the embodiments are all referred to each other, and each embodiment is mainly described in the differences from the other embodiments. In particular, for the apparatus and medium embodiments, the description is relatively simple, as it is substantially similar to the method embodiments, with reference to the section of the method embodiments being relevant.
The devices and media provided in the embodiments of the present application are in one-to-one correspondence with the methods, so that the devices and media also have similar beneficial technical effects as the corresponding methods, and since the beneficial technical effects of the methods have been described in detail above, the beneficial technical effects of the devices and media are not repeated here.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims (10)

1.一种基于FPGA的浮点乘法器,其特征在于,应用于IEEE-754格式下的浮点数乘法,包括:1. A floating-point multiplier based on FPGA, characterized in that it is applied to floating-point multiplication under IEEE-754 format, comprising: 符号计算模块,用于通过异或门以及第一输入浮点数以及第二输入浮点数的符号位,确定目标输出浮点数的符号;A sign calculation module, used for determining the sign of the target output floating point number through an XOR gate and the sign bits of the first input floating point number and the second input floating point number; 指数加法模块,用于将所述第一输入浮点数以及所述第二输入浮点数的指数相加,并减去相应格式的偏差值,以得到所述目标输出浮点数的指数输出;An exponential addition module, used for adding the exponents of the first input floating point number and the second input floating point number, and subtracting a deviation value of a corresponding format to obtain an exponential output of the target output floating point number; 二进制乘法器模块,基于所述第一输入浮点数以及所述第二输入浮点数的尾数位宽,使用Karatsuba算法以及Urdhva-Tiryagbhyam算法,对所述第一输入浮点数的第一尾数以及第二输入浮点数的第二尾数进行乘法运算,以得到所述目标输出浮点数的尾数乘积;A binary multiplier module, based on the mantissa bit widths of the first input floating point number and the second input floating point number, uses a Karatsuba algorithm and an Urdhva-Tiryagbhyam algorithm to perform a multiplication operation on a first mantissa of the first input floating point number and a second mantissa of the second input floating point number to obtain a mantissa product of the target output floating point number; 结果归一化模块,用于基于所述尾数乘积进行归一化操作,所述归一化操作包括对尾数进行移位操作以及调整指数值中的至少一种。The result normalization module is used to perform a normalization operation based on the mantissa product, wherein the normalization operation includes at least one of shifting the mantissa and adjusting the exponent value. 2.根据权利要求1所述的基于FPGA的浮点乘法器,其特征在于,所述基于所述第一输入浮点数以及所述第二输入浮点数的尾数位宽,使用Karatsuba算法以及Urdhva-Tiryagbhyam算法,对所述第一输入浮点数的第一尾数以及第二输入浮点数的第二尾数进行乘法运算,具体包括:2. The FPGA-based floating-point multiplier according to claim 1, characterized in that the multiplication operation is performed on the first mantissa of the first input floating-point number and the second mantissa of the second input floating-point number using the Karatsuba algorithm and the Urdhva-Tiryagbhyam algorithm based on the mantissa bit widths of the first input floating-point number and the second input floating-point number, specifically comprising: 确定所述尾数位宽满足第一预设条件,使用改进的Karatsuba算法,对所述第一尾数与第二尾数进行分治乘法运算,直至得到所述目标输出浮点数的尾数乘积,或中间乘积结果满足第二预设条件;Determine that the mantissa bit width satisfies a first preset condition, and use an improved Karatsuba algorithm to perform a divide-and-conquer multiplication operation on the first mantissa and the second mantissa until the mantissa product of the target output floating-point number is obtained, or an intermediate product result satisfies a second preset condition; 当所述中间乘积结果满足第二预设条件时,将所述中间乘积结果作为改进的Urdhva-Tiryagbhyam算法输入,继续进行乘法运算,直至得到所述目标输出浮点数的尾数乘积。When the intermediate product result satisfies the second preset condition, the intermediate product result is used as an input of the improved Urdhva-Tiryagbhyam algorithm, and the multiplication operation is continued until the mantissa product of the target output floating-point number is obtained. 3.根据权利要求2所述的基于FPGA的浮点乘法器,其特征在于,所述使用改进的Karatsuba算法,对所述第一尾数与第二尾数进行分治乘法运算,具体包括:3. The FPGA-based floating-point multiplier according to claim 2, wherein the improved Karatsuba algorithm is used to perform a divide-and-conquer multiplication operation on the first mantissa and the second mantissa, specifically comprising: 确定所述第一尾数与所述第二尾数之间的长度比值;determining a length ratio between the first mantissa and the second mantissa; 确定所述第一尾数与所述第二尾数的数值分布,所述数值分布为相邻预设数位内数值为零的数位个数以及数位位置;Determine the numerical distribution of the first mantissa and the second mantissa, wherein the numerical distribution is the number of digits with zero values and the digit positions in adjacent preset digits; 基于所述长度比值以及所述数值分布,确定所述第一尾数以及第二尾数的分割点,并根据所述分割点对所述第一尾数以及第二尾数进行分割。Based on the length ratio and the numerical distribution, a segmentation point between the first mantissa and the second mantissa is determined, and the first mantissa and the second mantissa are segmented according to the segmentation point. 4.根据权利要求2所述的基于FPGA的浮点乘法器,其特征在于,使用改进的Karatsuba算法,对所述第一尾数与第二尾数进行分治乘法运算,具体包括:4. The FPGA-based floating-point multiplier according to claim 2, characterized in that the improved Karatsuba algorithm is used to perform a divide-and-conquer multiplication operation on the first mantissa and the second mantissa, specifically comprising: 在所述基于FPGA的浮点乘法器中设置并行计算单元、寄存器、控制逻辑模块;Setting a parallel computing unit, a register, and a control logic module in the FPGA-based floating-point multiplier; 所述并行计算单元用于在同一时钟周期内并发执行计算任务;所述计算任务包括乘法计算、加法计算以及结果合并;The parallel computing unit is used to concurrently execute computing tasks in the same clock cycle; the computing tasks include multiplication calculation, addition calculation and result merging; 所述寄存器用于暂存中间结果,并在不同计算任务之间传递数据;The registers are used to temporarily store intermediate results and transfer data between different computing tasks; 所述控制逻辑模块用于管理各计算任务的启动、完成以及错误处理。The control logic module is used to manage the initiation, completion and error handling of each computing task. 5.根据权利要求2所述的基于FPGA的浮点乘法器,其特征在于,所述将所述中间乘积结果作为改进的Urdhva-Tiryagbhyam算法输入,继续进行乘法运算之前,所述二进制乘法器模块还用于:5. The FPGA-based floating-point multiplier according to claim 2, wherein the intermediate product result is used as an input of the improved Urdhva-Tiryagbhyam algorithm, and before continuing the multiplication operation, the binary multiplier module is further used for: 在使用Urdhva-Tiryagbhyam算法时,确定不同预设计算任务分别对应的计算次数,所述预设计算任务包括出现在不同时间点或不同位置相同的数值对以及在乘法任务中出现的固定数位组合;When using the Urdhva-Tiryagbhyam algorithm, determining the number of calculations corresponding to different preset calculation tasks, wherein the preset calculation tasks include the same value pairs appearing at different time points or different positions and the fixed digit combination appearing in the multiplication task; 将所述预设计算任务的输入项组、所用加法器类型以及进行哈希映射后的计算结果存储于预设计算表中。The input item group of the preset calculation task, the adder type used, and the calculation result after hash mapping are stored in a preset calculation table. 6.根据权利要求5所述的基于FPGA的浮点乘法器,其特征在于,所述将所述中间乘积结果作为改进的Urdhva-Tiryagbhyam算法输入,继续进行乘法运算,具体包括:6. The FPGA-based floating-point multiplier according to claim 5, characterized in that the step of using the intermediate product result as an input of the improved Urdhva-Tiryagbhyam algorithm to continue the multiplication operation specifically comprises: 确定所述中间乘积结果与所述预设计算表中各输入项组之间的局部相关性;Determine the local correlation between the intermediate product result and each input item group in the preset calculation table; 若存在输入项组与所述中间乘积结果的局部相关性高于第一预设阈值且低于第二预设阈值,则在所述预设计算表中读取所述输入项组对应的加法器模型,并使用所述加法器模型作为所述中间乘积结果的加法器;If there is a local correlation between an input item group and the intermediate product result that is higher than a first preset threshold and lower than a second preset threshold, reading an adder model corresponding to the input item group in the preset calculation table, and using the adder model as an adder for the intermediate product result; 若所述局部相关性高于第二预设阈值,则在所述预设计算表中读取对应的计算结果,作为所述第一尾数与所述第二尾数的部分计算任务的计算结果。If the local correlation is higher than a second preset threshold, the corresponding calculation result is read from the preset calculation table as the calculation result of the partial calculation task of the first mantissa and the second mantissa. 7.根据权利要求1所述的基于FPGA的浮点乘法器,其特征在于,所述二进制乘法器模块中设置有多种类型的加法器,所述加法器的类型包括进位传播加法器、进位保存加法器、进位选择加法器、超前进位加法器中的至少一种。7. The FPGA-based floating-point multiplier according to claim 1 is characterized in that the binary multiplier module is provided with multiple types of adders, and the types of the adders include at least one of a carry propagate adder, a carry save adder, a carry select adder, and a carry look-ahead adder. 8.根据权利要求7所述的基于FPGA的浮点乘法器,其特征在于,所述二进制乘法器模块还用于根据中间乘积结果的位宽和数据特征,动态选择加法器的类型;8. The FPGA-based floating-point multiplier according to claim 7, wherein the binary multiplier module is further used to dynamically select the type of adder according to the bit width and data characteristics of the intermediate product result; 所述根据中间乘积结果的位宽和数据特征,动态选择加法器的类型,具体包括:The method of dynamically selecting the type of adder according to the bit width and data characteristics of the intermediate product result specifically includes: 基于向量机模型确定所述中间乘积结果的加法进位概率,以确定所述中间乘积结果的第一加法器预期;Determine the addition carry probability of the intermediate product result based on the vector machine model to determine the first adder expectation of the intermediate product result; 基于所述中间乘积结果的位宽,确定所述中间乘积结果的第二加法器预期;determining a second adder expectation of the intermediate product result based on a bit width of the intermediate product result; 基于用户设定的延迟要求,确定加法操作的时间限制,并基于所述时间限制确定所述中间乘积结果的第三加法器预期;Determining a time limit for the addition operation based on a latency requirement set by a user, and determining a third adder expectation for the intermediate product result based on the time limit; 基于可用硬件资源的数量和类型,确定所述中间乘积结果的第四加法器预期;determining a fourth adder expectation of the intermediate product result based on the amount and type of available hardware resources; 基于所述第一加法器预期、第二加法器预期、第三加法器预期、第四加法器预期,确定所述中间乘积结果对应的加法器类型。Based on the first adder expectation, the second adder expectation, the third adder expectation, and the fourth adder expectation, the adder type corresponding to the intermediate product result is determined. 9.一种基于FPGA的浮点乘法器的计算方法,其特征在于,包括:9. A calculation method of a floating-point multiplier based on FPGA, characterized by comprising: 通过异或门以及第一输入浮点数以及第二输入浮点数的符号位,确定目标输出浮点数的符号;Determine the sign of the target output floating point number through an XOR gate and the sign bits of the first input floating point number and the second input floating point number; 将所述第一输入浮点数以及所述第二输入浮点数的指数相加,并减去相应格式的偏差值,以得到所述目标输出浮点数的指数输出;Adding the exponents of the first input floating point number and the second input floating point number, and subtracting a deviation value of a corresponding format, to obtain an exponential output of the target output floating point number; 基于所述第一输入浮点数以及所述第二输入浮点数的尾数位宽,使用Karatsuba算法以及Urdhva-Tiryagbhyam算法,对所述第一输入浮点数的第一尾数以及第二输入浮点数的第二尾数进行乘法运算,以得到所述目标输出浮点数的尾数乘积;Based on the mantissa bit widths of the first input floating-point number and the second input floating-point number, a first mantissa of the first input floating-point number and a second mantissa of the second input floating-point number are multiplied by a Karatsuba algorithm and an Urdhva-Tiryagbhyam algorithm to obtain a mantissa product of the target output floating-point number; 基于所述尾数乘积进行归一化操作,所述归一化操作包括对尾数进行移位操作以及调整指数值中的至少一种。A normalization operation is performed based on the mantissa product, wherein the normalization operation includes at least one of a shift operation on the mantissa and an adjustment of an exponent value. 10.一种基于FPGA的浮点乘法器的计算设备,其特征在于,包括:10. A computing device based on an FPGA floating-point multiplier, comprising: 至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,at least one processor; and a memory in communication with the at least one processor; wherein, 所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行:The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform: 通过异或门以及第一输入浮点数以及第二输入浮点数的符号位,确定目标输出浮点数的符号;Determine the sign of the target output floating point number through an XOR gate and the sign bits of the first input floating point number and the second input floating point number; 将所述第一输入浮点数以及所述第二输入浮点数的指数相加,并减去相应格式的偏差值,以得到所述目标输出浮点数的指数输出;Adding the exponents of the first input floating point number and the second input floating point number, and subtracting a deviation value of a corresponding format, to obtain an exponential output of the target output floating point number; 基于所述第一输入浮点数以及所述第二输入浮点数的尾数位宽,使用Karatsuba算法以及Urdhva-Tiryagbhyam算法,对所述第一输入浮点数的第一尾数以及第二输入浮点数的第二尾数进行乘法运算,以得到所述目标输出浮点数的尾数乘积;Based on the mantissa bit widths of the first input floating-point number and the second input floating-point number, a first mantissa of the first input floating-point number and a second mantissa of the second input floating-point number are multiplied by a Karatsuba algorithm and an Urdhva-Tiryagbhyam algorithm to obtain a mantissa product of the target output floating-point number; 基于所述尾数乘积进行归一化操作,所述归一化操作包括对尾数进行移位操作以及调整指数值中的至少一种。A normalization operation is performed based on the mantissa product, wherein the normalization operation includes at least one of a shift operation on the mantissa and an adjustment of an exponent value.
CN202510104500.8A 2025-01-23 2025-01-23 A floating-point multiplier, calculation method and device based on FPGA Active CN119536684B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510104500.8A CN119536684B (en) 2025-01-23 2025-01-23 A floating-point multiplier, calculation method and device based on FPGA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510104500.8A CN119536684B (en) 2025-01-23 2025-01-23 A floating-point multiplier, calculation method and device based on FPGA

Publications (2)

Publication Number Publication Date
CN119536684A true CN119536684A (en) 2025-02-28
CN119536684B CN119536684B (en) 2025-06-17

Family

ID=94708927

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510104500.8A Active CN119536684B (en) 2025-01-23 2025-01-23 A floating-point multiplier, calculation method and device based on FPGA

Country Status (1)

Country Link
CN (1) CN119536684B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120011729A (en) * 2025-04-16 2025-05-16 合肥芯车无限半导体科技有限公司 Normalization operation circuit and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101479698A (en) * 2006-06-27 2009-07-08 英特尔公司 Multiplying two numbers
US20100020965A1 (en) * 2007-12-28 2010-01-28 Shay Gueron Method for speeding up the computations for characteristic 2 elliptic curve cryptographic systems
CN107168678A (en) * 2017-05-09 2017-09-15 清华大学 A kind of improved floating dual MAC and floating point multiplication addition computational methods
CN117891430A (en) * 2024-03-18 2024-04-16 中科亿海微电子科技(苏州)有限公司 Floating point multiplication and addition structure applied to FPGA embedded DSP

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101479698A (en) * 2006-06-27 2009-07-08 英特尔公司 Multiplying two numbers
US20100020965A1 (en) * 2007-12-28 2010-01-28 Shay Gueron Method for speeding up the computations for characteristic 2 elliptic curve cryptographic systems
CN107168678A (en) * 2017-05-09 2017-09-15 清华大学 A kind of improved floating dual MAC and floating point multiplication addition computational methods
CN117891430A (en) * 2024-03-18 2024-04-16 中科亿海微电子科技(苏州)有限公司 Floating point multiplication and addition structure applied to FPGA embedded DSP

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
GAJAWADA, S.; DEVI, D.N.; RAO, M.: "MOHSKM: Meta-Heuristic Optimization Driven Hardware-Efficient Heterogeneous-Split Karatsuba Multipliers for Large-Bit Operations", 2024 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI), 25 September 2024 (2024-09-25) *
S ARISH, R.K. SHARMA;: "An efficient floating point multiplier design for high speed applications using Karatsuba algorithm and Urdhva-Tiryagbhyam algorithm", ARXIV, 1 October 2019 (2019-10-01), pages 1 - 6 *
邰强强: "一种降低迭代运算误差的FMA结构研究", 中国优秀硕士学位论文全文数据库 (信息科技辑), no. 01, 15 January 2015 (2015-01-15) *
魏东梅;杨涛;: "基于FPGA的F_2~m域椭圆曲线点乘的快速实现", 计算机应用, vol. 31, no. 02, 1 February 2011 (2011-02-01) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120011729A (en) * 2025-04-16 2025-05-16 合肥芯车无限半导体科技有限公司 Normalization operation circuit and system

Also Published As

Publication number Publication date
CN119536684B (en) 2025-06-17

Similar Documents

Publication Publication Date Title
Pilipović et al. On the design of logarithmic multiplier using radix-4 booth encoding
JP2021525403A (en) Improved low precision binary floating point formatting
US10402167B2 (en) Approximating functions
JP2020514862A (en) Floating-point unit configured to perform a fused multiply-add operation on three 128-bit extended operands, method, program, and system
CN119536684B (en) A floating-point multiplier, calculation method and device based on FPGA
US20230334117A1 (en) Method and system for calculating dot products
US20250224921A1 (en) Apparatus and Method for Processing Floating-Point Numbers
CN116974517A (en) Floating point number processing methods, devices, computer equipment and processors
US7499962B2 (en) Enhanced fused multiply-add operation
US20230221924A1 (en) Apparatus and Method for Processing Floating-Point Numbers
KR20170138143A (en) Method and apparatus for fused multiply-add
CN117648959A (en) Multi-precision operand operation device supporting neural network operation
US8069200B2 (en) Apparatus and method for implementing floating point additive and shift operations
CN100454237C (en) Processor having efficient function estimate instructions
CN102789376B (en) Floating-point number adder circuit and implementation method thereof
CN118519685A (en) Method for realizing rapid solving of sigmoid function based on SIMD instruction
TW202333041A (en) System and method performing floating-point operations
CN115062768A (en) A Softmax hardware implementation method and system for a platform with limited logic resources
Naga Sravanthi et al. Design and performance analysis of rounding approximate multiplier for signal processing applications
Tan et al. Efficient multiple-precision and mixed-precision floating-point fused multiply-accumulate unit for HPC and AI applications
US20250224924A1 (en) Floating-point logarithmic number system scaling system for machine learning
Zadiraka et al. Parallel Methods of Representing Multidigit Numbers in Numeral Systems for Testing Multidigit Arithmetic Operations
CN120276705A (en) Index type low-bit-width calculation acceleration method and system based on lookup table optimization
KR20250044070A (en) Bit pattern operation method and operator using dynamic bit shift
CN118519684A (en) Method for rapidly solving positive arithmetic square root based on SIMD instruction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant