[go: up one dir, main page]

WO2025105928A1 - Procédé distribué de multiplication de matrices implanté dans une structure de réseau de système informatique distribué - Google Patents

Procédé distribué de multiplication de matrices implanté dans une structure de réseau de système informatique distribué Download PDF

Info

Publication number
WO2025105928A1
WO2025105928A1 PCT/KR2024/096558 KR2024096558W WO2025105928A1 WO 2025105928 A1 WO2025105928 A1 WO 2025105928A1 KR 2024096558 W KR2024096558 W KR 2024096558W WO 2025105928 A1 WO2025105928 A1 WO 2025105928A1
Authority
WO
WIPO (PCT)
Prior art keywords
matrix
master node
distributed
sub
input matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/KR2024/096558
Other languages
English (en)
Korean (ko)
Inventor
최완
손경락
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SNU R&DB Foundation
Original Assignee
Seoul National University R&DB Foundation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020240120522A external-priority patent/KR20250071824A/ko
Application filed by Seoul National University R&DB Foundation filed Critical Seoul National University R&DB Foundation
Publication of WO2025105928A1 publication Critical patent/WO2025105928A1/fr
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Definitions

  • the present invention relates to a distributed matrix multiplication operation method and a distributed computing system performing the same, and more specifically, to a method and system for improving the accuracy of a distributed matrix multiplication operation due to the structural characteristics of a network.
  • encoded computing has emerged to solve the stagnation phenomenon of computational accuracy, speed, and load, and is a method that enables restoration of the original matrix multiplication operation result using the encoded matrix multiplication operation result.
  • the edge device due to the computing power or communication constraints of the edge device, there may be a significant delay in the time until the parallel processing operation is transmitted to the main device. Even if the operation results are all received from the remaining edge devices, if the results are not transmitted from one or more edge devices, the result of the entire matrix multiplication operation cannot be restored.
  • the present invention is intended to solve the problems of the above-mentioned prior art, and aims to provide a system and method for performing distributed multiplication operations by utilizing structural characteristics in a wireless network environment based on a group algebra-based coded computing technique.
  • a distributed computing system for performing a distributed matrix multiplication operation method includes a master node for processing a multiplication operation of a first input matrix and a second input matrix, and a plurality of worker nodes configured to process in parallel a portion of the multiplication operation of the first input matrix and the second input matrix distributed by the master node and transfer it to the master node.
  • the distributed matrix multiplication operation method comprises: a step in which the master node generates a first partitioning function based on a number of units according to a partitioning order (pm) of the first input matrix and a second partitioning function based on a number of units according to a partitioning order (pn) of the second input matrix, and broadcasts the same through a network having a multi-access channel structure; a step in which at least pmn worker nodes encode different first sub-matrices and second sub-matrices forming part of the first input matrix and the second input matrix, respectively, based on the first partitioning function and the second partitioning function received through the network, to generate a first encoded matrix and a second encoded matrix; a step in which each worker node performs an encoded matrix multiplication operation using the first encoded matrix and the second encoded matrix generated by each of the worker nodes; And the master node includes a step of collecting the results of the operation from each of the worker nodes and restoring the results of the
  • the speed, accuracy and load of high-dimensional matrix multiplication operations are improved.
  • the complexity of high-dimensional matrix operations is alleviated compared to conventional studies.
  • the present invention can be applied even in a noisy wireless communication environment and calculation errors can be minimized.
  • FIG. 1 is a structural diagram of a distributed computing system according to one embodiment of the present invention.
  • FIG. 2 is a block diagram of a configuration of a master node according to one embodiment of the present invention.
  • Figure 3 is a conceptual diagram for explaining a distributed matrix multiplication operation method that is the basis of the present invention.
  • FIG. 4 is a conceptual diagram explaining a first embodiment of a distributed matrix multiplication operation method of the present invention.
  • FIG. 5 is a conceptual diagram explaining a second embodiment of a distributed matrix multiplication operation method of the present invention.
  • FIG. 6 is a flowchart of a distributed matrix multiplication operation method according to one embodiment of the present invention.
  • FIG. 7a and FIG. 7b are drawings for explaining effects derived from an embodiment of the present invention.
  • the terms first, second, etc. may be used to describe various components, but the components should not be limited by the terms, and the terms are used only for the purpose of distinguishing one component from another.
  • the singular expression may include the plural expression unless the context clearly indicates otherwise.
  • the "user terminal” mentioned below may be implemented as a computer or portable terminal that can access a server or other terminal via a network.
  • the computer may include, for example, a notebook, desktop, or laptop equipped with a web browser.
  • the portable terminal may include, for example, a wireless communication device that ensures portability and mobility, such as a smart phone, a tablet PC, a wearable device, various VR HMD devices, as well as various devices equipped with communication modules such as Bluetooth (BLE, Bluetooth Low Energy), NFC, RFID, ultrasonic, infrared, WiFi, or LiFi.
  • the "network” refers to a connection structure that enables information exchange between each node, such as terminals and servers, and includes a local area network (LAN), a wide area network (WAN), the Internet (WWW: World Wide Web), a wired and wireless data communication network, a telephone network, a wired and wireless television communication network, etc.
  • wireless data communication networks include, but are not limited to, 3G, 4G, 5G, 3GPP (3rd Generation Partnership Project), LTE (Long Term Evolution), WIMAX (World Interoperability for Microwave Access), Wi-Fi, Bluetooth, infrared, ultrasonic, visible light communication (VLC), and LiFi.
  • FIG. 1 is a structural diagram of a distributed computing system according to one embodiment of the present invention.
  • the distributed computing system (10) includes a master node (100) and a plurality of worker nodes (200) connected to the master node (100) via a network.
  • a master node (100) refers to a main device or server in a distributed computing system (10) and can process a high-intelligence task requested from a user terminal (not shown) and provide the result.
  • the high-intelligence task is implemented as a program or application, for example, big data analysis, tactile communication, augmented reality, mixed reality, virtual reality, metaverse, artificial intelligence model, etc., and is not necessarily limited to the examples, and if it is a task that requires high-dimensional matrix operation, it should be interpreted as being included in the present invention.
  • the master node (100) can process a product operation of the first input matrix and the second input matrix to derive a desired result in a task requested by a user terminal.
  • FIG 2 is a block diagram of the configuration of a master node (100) according to one embodiment of the present invention.
  • the master node (100) may include a processor (110), a memory (120), and a communication module (130).
  • the memory (120) can store commands for the master node (100) to execute a product operation of the first input matrix and the second input matrix.
  • the memory (120) can store an executable program that generates and executes one or more commands that implement the corresponding operation.
  • the memory (120) may include built-in memory and/or external memory, and may include volatile memory such as DRAM, SRAM, or SDRAM, nonvolatile memory such as OTPROM (one time programmable ROM), PROM, EPROM, EEPROM, mask ROM, flash ROM, NAND flash memory, or NOR flash memory, a flash drive such as an SSD, a CF (compact flash) card, an SD card, a Micro-SD card, a Mini-SD card, an Xd card, or a memory stick, or a storage device such as an HDD.
  • the memory (120) may include, but is not limited to, a magnetic storage media or a flash storage media.
  • the processor (110) can execute its own process included in the distributed matrix multiplication operation method according to the following embodiment based on the program and instructions stored in the memory (120).
  • the processor (110) can include all kinds of devices capable of processing operations on data.
  • the processor (110) may refer to a data processing device built into hardware, for example, having a physically structured circuit to perform a function expressed by a code or command included in a program.
  • Examples of such data processing devices built into hardware include, but are not limited to, processing devices such as a microprocessor, a central processing unit (CPU), a processor core, a multiprocessor, an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA), and a graphics processing unit (GPU).
  • the communication module (130) may provide a communication interface that provides transmission and reception signals as packet data to external devices including user terminals and worker nodes (200) using wireless communication technology.
  • the communication module (130) may be a device including hardware and software necessary to transmit and receive control signals or data signals, etc. through wired or wireless connections with other network devices.
  • a plurality of worker nodes (200) are included in a distributed computing system (10), and below, an environment is exemplified in which a total of W worker nodes (WN1) to W worker nodes (WNW) are configured.
  • the worker node (200) is a type of edge device or server that processes in parallel a portion of the product operation of the first input matrix and the second input matrix distributed by the master node (100) for the distributed matrix product operation of the first input matrix and the second input matrix and transmits the result to the master node (100).
  • the master node (100) may receive a task processing request signal from a user terminal.
  • the master node (100) may request distributed processing for the task received from the user terminal to at least one worker node (200).
  • Each worker node (200) may process different parts of the product operation of the first input matrix and the second input matrix in parallel and transmit the result to the master node (100) via wireless communication, and the master node (100) may collate the results of the distributed processing to generate a task processing result signal and transmit the result to the user terminal.
  • the input matrix and its product operation performed according to the user's request are defined as follows. However, this is an example for explanation, and the scope of the present invention is not limited thereto.
  • the master node (100) multiplies the first input matrix (A T ) and the second input matrix (B). , and to this end, requests partial computations in parallel to at least some of the first worker node (WN1) to the Wth worker node (WNW).
  • the first input matrix (A T ) and the second input matrix (B) can be expressed by dividing them into a first submatrix according to the first partition order (pm) and a second submatrix according to the second partition order (pn), as in Equation 1 below.
  • At least pmn worker nodes (200) out of W are encoded with a first encoding matrix (in which different first sub-matrix and second sub-matrix are encoded) ) and the second encoding matrix ( ) can be used to perform a multiplication operation of encoding matrices for each part.
  • w is in the range of 1 to W
  • Fig. 3 is a conceptual diagram for explaining a distributed matrix multiplication operation method that serves as the basis of the present invention.
  • the master node (100) encodes the first sub-matrix and the second sub-matrix as in Equation 2 below for distribution to the wth worker node (Wnw) to create the first encoded matrix ( ) and the second encoding matrix ( ) is created.
  • the master node (100) sets the preset transmission power P A and P B and the transmission time based on Equation 3 below. According to this, the first encoding matrix ( ) and the second encoding matrix ( ) respectively and quantizes the vector.
  • the master node (100) During this time, X w,A and X w,B are transmitted to the wth worker node (Wnw). Accordingly, the signal received by the worker node (Wnw) is set as shown in Equation 4 below.
  • n w,B represent the channel noise of the network, respectively.
  • n w,B represents a complex Gaussian noise process, whose components are complex Gaussian random variables with mean 0 and variance 1.
  • the wth worker node (Wnw) After receiving the encoded signal as above, the wth worker node (Wnw) outputs Y w,A and Y w,B respectively. and By decoding it, the encoding matrix multiplication operation is performed as in Equation 5 below.
  • the wth worker node (Wnw) Transmission time During transmission power P w depending on is vector quantized. Accordingly, the signal received by the master node (100) from the worker node (Wnw) is given by Equation 6 below.
  • n w,C represents channel noise
  • the master node (100) performs a product operation of Y w, C and the first input matrix (A T ) and the second input matrix (B). It is restored to C. That is, C and can be expressed as having mn submatrices, C ij and can be defined as the (i, j)th partitioned submatrix.
  • the embodiment of the present invention as illustrated in FIG. 3 is a method in which the master node (100) individually communicates with each worker node (200) to transmit an encoding matrix for distribution of operations, and each worker node (200) also communicates in parallel with the master node (100) to transmit the result of the encoding matrix multiplication operation.
  • this is defined as OMA-OMA (orthogonal multiple access-orthogonal multiple access).
  • Fig. 4 is a conceptual diagram for explaining the first embodiment of the distributed matrix multiplication operation method of the present invention.
  • Fig. 5 is a conceptual diagram for explaining the second embodiment of the distributed matrix multiplication operation method of the present invention.
  • the present invention proposes a method that is more improved than OMA due to the structural characteristics of a wireless network environment. That is, the present invention proposes a transmission technique that minimizes the probability of calculation errors by utilizing the characteristics of broadcasting (BC) and multiple access channels (MAC), and this is defined as BnC (Broadcast-and-Compute) in this specification.
  • BC broadcasting
  • MAC multiple access channels
  • the first theory is a theory that explains matrix multiplication in a group theory framework by corresponding each sub-matrix of the main matrix to an element of several types of groups based on group algebra. (H.Cohn and C.Umans, "A group-theoretic approach to fast matrix multiplication,” in Proc. 44th Annu. IEEE Symp. Found. Comput. Sci., Cambridge, MA, USA, 2003, pp. 438-449.)
  • Another second theory is the theory that attempts to restore the linear combination of a given message using the MAC structure in a distributed computing environment.
  • the theory suggests a lattice-based encoder and decoder to restore the message with an improved transmission rate, and this framework is defined as Compute-and-Forward (CF).
  • CF Compute-and-Forward
  • the second theory introduces CF to restore the equation of linear combination at a desired transmission rate, and one embodiment of the present invention derives the following algorithm based on this.
  • Equation 8 the average error probability can be decoded.
  • Equation 10 Equation 10 below.
  • BnC takes the approach of requesting an encoding matrix multiplication operation to each worker node (200) by broadcasting the same information through a wireless network.
  • a method of linearly combining the results of the encoding matrix multiplication operation performed from each worker node (200) based on MAC and transmitting them all at once to the master node (100) can be additionally introduced.
  • the first embodiment of the present invention is a method of applying BnC when the master node (100) requests distribution of matrix multiplication operation to each worker node, and can be defined as BnC-OMA.
  • the first embodiment is a technique in which, when the master node (100) broadcasts the same information based on the number of troops, each worker node (200) utilizes this to encode each sub-matrix differently.
  • the master node (100) generates a first partition function ( A ) based on the partition order (pm) of the first input matrix (A T ) based on the number of troops, and a second partition function (B) based on the partition order (pn) of the second input matrix ( B ), as shown in Equation 12 below.
  • first partition function ( A ) and the second partition function ( B ) correspond to different first and second sub-matrices of the first input matrix (A T ) and the second input matrix (B), and each worker node (200) is responsible for the first encoding matrix ( ) and the second encoding matrix ( ) can be understood as a commonly applied frame for generating a
  • the method of coding by dividing the matrix product based on the military number considers S, T, U ⁇ g that satisfies the triple product rule.
  • , are considered. Accordingly, the military number-based coding for A and B can be defined as in Equation 14 below.
  • Equation 14 From the coefficients of the group elements s -1 u , the result of the operation of the AB matrix product can be obtained. Intuitively, this can be viewed as a generalization of the concept of cyclic convolution of the DFT (discrete Fourier transform) to an arbitrary group.
  • DFT discrete Fourier transform
  • is an arbitrary integer and is a relatively very small number compared to p, m, and n according to the well-known military number theory.
  • the cyclic group g is considered to have three subsets as shown in Equation 15 below, and these satisfy the triple product rule.
  • the master node (100) can generate the first partition function ( A) and the second partition function (B) by mapping the respective submatrices into which the first input matrix (A T ) and the second input matrix ( B ) are partitioned based on the cyclic group g to an arbitrary matrix structure. That is, the master node (100) embeds the partitioned submatrices by utilizing the element g of g , which can be expressed by the above equation 12.
  • a and B are expressed as frames based on embedded military numbers, but since the encoding matrices processed by each worker node (200) are derived from the same military numbers A and B , A and B can be understood as a type of partition function. Accordingly, the master node (100) can distribute the matrix multiplication operation to each worker node (200) by generating a first partition function ( A ) and a second partition function ( B ) and broadcasting them through a network with a MAC structure.
  • OMA encodes each sub-matrix of the first input matrix (A T ) and the second input matrix (B) at the master node (100) level to create the first encoding matrix ( ) and the second encoding matrix ( ) is generated and distributed to each worker node (200) through individual communication, or BnC transmits the first partition function ( A ) and the second partition function ( B ) in batches, and encodes each sub-matrix at the worker node (200) level to generate the first encoding matrix ( ) and the second encoding matrix ( ) has a fundamental difference in generating the first encoding matrix ( ).
  • the difference in transmission method between BnC and OMA is that, at a preset transmission time T local , the master node (100) ) and the second encoding matrix ( ) can be interpreted as a quantization unit (Quantization Step Size) for transmitting the first partition function ( A ) and the second partition function ( B ).
  • the first partition function ( A ) and the second partition function ( B ) are military numbers. and Since it is an element of , the first partition function ( A ) and the second partition function ( B ) are respectively and It can be expressed as a linear combination of elements ( g ) of a group cyclic group (g) with coefficients in the form of a complex matrix.
  • the master node (100) can divide the total transmission time T loca of the first partition function ( A ) and the second partition function ( B ) into pmn sections with equal intervals, and distribute the matrix multiplication operation to each worker node (200) by transmitting different coefficients for each section. That is, the master node (100) can transmit the coefficient of g i , which is the i-th group element of the first partition function ( A ), in the i-th section through the network.
  • the master node (100) is can be quantized into bits.
  • the second partition function ( B ) is also quantized in the same way.
  • the quantization unit for the master node (100) to broadcast the first partition function ( A ) and the second partition function ( B ) can be given by Equation 16 below.
  • the quantization step size for transmitting each element of the cyclic group in BnC can be determined based on the transmission time and channel capacity of the first partition function ( A ) and the second partition function ( B ), and the partitioning order and dimension of the first input matrix (A T ) and the second input matrix (B).
  • At least pmn (W>pmn) worker nodes (200) each generate different first sub-matrices and second sub-matrices forming part of the first input matrix ( AT ) and the second input matrix ( B ) based on the first partitioning function ( A ) and the second partitioning function (B) received through the network, respectively, as the first encoding matrix ( ) and the second encoding matrix ( ) is created.
  • each worker node (200) generates a first encoding matrix ( ) and the second encoding matrix ( ) is derived by the first partition function ( A ) and the second partition function ( B ) according to Equation 12, and Equation 2 in OMA can be equivalently modeled as Equation 17 below.
  • each worker node (200) has a first encoding matrix ( ) and the second encoding matrix ( ) must perform matrix multiplication operation, so at this time, the coefficients of the cyclic group element g corresponding to the first and second submatrices in the first partition function ( A ) and the second partition function ( B ) are complex values ( ) can be converted into.
  • the first encoding matrix ( ) and the second encoding matrix ( ) is generated, each worker node (200) uses it to perform an encoded matrix multiplication operation.
  • the transmission signal in BnC can be equivalently modeled by Equation 20 below, with respect to Equations 4 and 6 described in OMA, by removing the effects of encoders and decoders on the network.
  • each worker node (200) receives the result of the operation , performs quantization, and transmits it to the master node (100).
  • each worker node (200) may quantize the results of the encoding matrix multiplication operations it performs for each sequentially set transmission section and transmit them to the master node (100).
  • the transmission section may mean the total transmission time taken to transmit the results of all encoding matrix multiplication operations divided into pmn equal intervals.
  • the wth worker node (Wnw) is in the wth transmission interval.
  • Wnw the wth worker node
  • the quantization step size is can be expressed as
  • the received signal of the master node (100) can be expressed as .
  • the master node (100) collects the results of the encoding matrix multiplication operation from each worker node (200) and restores the multiplication operation of the first input matrix (A T ) and the second input matrix (B).
  • the master node (100) receives the When you collect, you get what you originally wanted. can be restored in the following way.
  • the result of the encoding matrix multiplication operation ( ) is the index set of the worker node (200) that delivered the If defined as , the result of the encoding matrix multiplication operation ( ) can be expressed by the following equation 21, which connects the ⁇ -th row and the ⁇ -th column.
  • Equation 22 the heat vector
  • P is a value preset to the master node (100), so the master node (100) By restoring, can be derived.
  • BnC is applied in the process of distributing matrix multiplication from the master node (100) to the worker node (200), and the process of transmitting the result of the encoded matrix multiplication operation from the worker node (200) to the master node (100) follows OMA, and is defined as BnC-OMA in this specification.
  • the second embodiment of the present invention is a method in which the master node (100) utilizes Compute-and-Forward (CF) according to the second theory in the process of collating the encoded matrix multiplication operations of each worker node (200). Accordingly, the second embodiment is defined as BnC-CF in this specification, and the process up to the front end where each worker node (200) performs the encoded matrix multiplication operation overlaps with the first embodiment, and will be replaced with the above.
  • CF Compute-and-Forward
  • the result of each encoded matrix multiplication operation performed by each worker node (200) ) is linearly combined during transmission over the network according to the characteristics of the MAC structure. ) can be reached to the master node (100).
  • the difference between OMA and CF is the result of the distributed matrix multiplication operation performed by the wth worker node (Wnw). ) can be viewed as an encoding and decoding process. That is, by utilizing the grid-based CF encoder and decoder introduced in the second theory. A new speed for transmitting to the master node (100) can be derived.
  • Equation 17 which defines the first encoding matrix ( ) and the second encoding matrix ( )
  • Equation 24 The product can be derived from the following equation 24.
  • Equation 26 the master node (100) can restore the (i, j)th submatrix (C ij ) of the matrix (C) through Equation 25.
  • Equation 27 below can be derived, and the master node (100) can decode Equation 26 at a computational speed satisfying it to restore each submatrix of the matrix (C) in a direction that minimizes the upper limit of noise.
  • each encoding matrix multiplication operation ( ) can be viewed as a linear combination of equations (Formula 26) in which lattice-based complex values according to the partitioning order of the first input matrix (A T ) and the second input matrix (B) are applied as coefficients. Accordingly, the master node (100) can decode the equation (Formula 26) at a computational speed that satisfies a preset equation (Formula 27) based on noise on the network, and restore the results of the product operation for each of the first sub-matrix and the second sub-matrix.
  • each worker node (200) multiplies the result of each encoding matrix operation ( ) can be quantized and transmitted.
  • each worker node (200) has a grid equation ( ) can be uniformly quantized into R ij bits, and the corresponding quantization unit (Quantization Step Size) can be expressed by Equation 28 below.
  • the reception signal according to OMA ( ) can be equivalently modeled as in Equation 29 below.
  • N ij is the interval [ ] is a quantization noise with a uniform distribution over the whole area.
  • Fig. 7a and Fig. 7b are drawings for explaining effects derived from the embodiments of the present invention.
  • Figs. 7a and 7b are experimental data showing the average computation error according to the signal to noise ratio (SNR).
  • SNR signal to noise ratio
  • Fig. 7a shows a graph showing the results of an experiment with 12 worker nodes (200)
  • Fig. 7b shows a graph showing the results of an experiment with 90 worker nodes.
  • UN stands for an uncoded scheme, which means that encoding for distributed matrix multiplication operation is not performed.
  • GA means that encoding is performed based on the first theory.
  • the GA which performs the distributed matrix multiplication operation based on the first theory (the number of groups), shows better performance than the UN, which does not.
  • the performance is superior when transmitting using BnC (the first embodiment) than when using the existing OMA.
  • the utilization of CF the second embodiment is more effective than when using the existing OMA, so it can be confirmed that it is effective for introduction to high-intelligence tasks.
  • Fig. 6 is a flowchart of the operation of a distributed matrix multiplication operation method according to one embodiment of the present invention, and any duplicated contents will be replaced with those described above.
  • the master node (100) is the product of the first input matrix (A T ) and the second input matrix (B). , and to this end, requests partial computations in parallel to at least some of the first worker node (WN1) to the Wth worker node (WNW).
  • step S610 the master node (100) generates a first partition function (A) and a second partition function (B) based on the number of military units according to the partition order (pm) of the first input matrix (A T ) and the partition order ( pn ) of the second input matrix ( B ).
  • the master node (100) constructs a cyclic group ( g ) composed of elements (g) corresponding to each sub-matrix into which the first input matrix (A T ) and the second input matrix (B) are divided, and based on this, each sub-matrix is mapped to an arbitrary matrix structure, thereby linearly combining them to generate the first partition function ( A ) and the second partition function ( B ).
  • step S620 the master node (100) broadcasts the first partitioning function ( A ) and the second partitioning function ( B ) through a network with a multi-access channel structure.
  • the master node (100) divides the preset transmission time of the first partition function ( A ) and the second partition function ( B ) into pmn sections with equal intervals, and transmits coefficients set differently corresponding to the first sub-matrix and the second sub-matrix for each of the divided sections.
  • the master node (100) can quantize the first partition function ( A ) and the second partition function ( B ) with a preset quantization unit (Quantization Step Size).
  • the quantization unit ( ) can be set based on the transmission time and channel capacity of the first partition function ( A ) and the second partition function ( B ), and the partitioning order and dimension of the first input matrix (A T ) and the second input matrix (B).
  • step S630 at least pmn worker nodes (200) each generate different first sub-matrices and second sub-matrices forming part of the first input matrix ( AT ) and the second input matrix ( B ) based on the first partitioning function ( A ) and the second partitioning function (B) received through the network, respectively, into the first encoding matrix ( ) and the second encoding matrix ( ) is created.
  • each worker node (200) performs the coefficients of the first sub-matrix and the second sub-matrix, which are each responsible for the distributed multiplication operation in the first partition function ( A ) and the second partition function ( B ), as complex values ( ) can be converted into.
  • the remainder obtained by dividing the degree applied to the complex value by the above pmn is different for each worker node (200).
  • each worker node (200) generates a first encoding matrix ( ) and the second encoding matrix ( ) is used to perform the encoding matrix multiplication operation ( ) is performed.
  • each worker node (200) sequentially performs the result of the encoding matrix multiplication operation for each set transmission section.
  • the transmission section may be the total transmission time taken to transmit the results of all encoding matrix multiplication operations divided into pmn equal intervals.
  • the result of each encoded matrix multiplication operation performed by each worker node (200) ) are linearly combined while being transmitted over the network. ) can be reached to the master node (100).
  • the result of each encoding matrix multiplication operation ( ) is an equation in which the grid-based complex values according to the partitioning order of the first input matrix (A T ) and the second input matrix (B) are applied as coefficients ( ) can be linearly combined.
  • step S650 the master node (100) receives the results of the operation from each worker node (200). ) is combined to restore the result (C) of the product operation for the first input matrix (A T ) and the second input matrix (B).
  • An embodiment of the present invention may also be implemented in the form of a non-transitory storage medium containing computer-executable instructions, such as program modules executed by a computer.
  • the computer-readable medium may be any available medium that can be accessed by a computer, and includes both volatile and nonvolatile media, removable and non-removable media.
  • the computer-readable medium may include all computer storage media.
  • the computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer-readable instructions, data structures, program modules, or other data.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Software Systems (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Operations Research (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

Un procédé distribué de multiplication de matrices, d'un système informatique distribué comprenant un nœud maître et une pluralité de nœuds exécutants, selon un mode de réalisation de la présente invention, comprend : une étape lors de laquelle le nœud maître génère une première fonction de partition et une seconde fonction de partition sur la base d'une algèbre de groupe, selon un ordre de partition (pm) d'une première matrice d'entrée et un ordre de partition (pn) d'une seconde matrice d'entrée, et diffuse les fonctions de partition par l'intermédiaire d'un réseau doté d'une structure de canal à accès multiple ; une étape lors de laquelle chacun parmi au moins pmn nœuds exécutants code des premières sous-matrices et des secondes sous-matrices différentes qui font partie de la première matrice d'entrée et de la seconde matrice d'entrée, respectivement, en se basant sur la première fonction de partition et la seconde fonction de partition reçues par l'intermédiaire du réseau, pour générer des premières matrices codées et des secondes matrices codées ; une étape lors de laquelle chaque nœud exécutant effectue une multiplication de matrices codées en utilisant la première matrice codée et la seconde matrice codée générées par le nœud exécutant ; et une étape lors de laquelle le nœud maître agrège les résultats de la multiplication provenant de chacun des nœuds exécutants pour reconstituer le résultat de la multiplication de la première matrice d'entrée et de la seconde matrice d'entrée.
PCT/KR2024/096558 2023-11-15 2024-11-14 Procédé distribué de multiplication de matrices implanté dans une structure de réseau de système informatique distribué Pending WO2025105928A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
KR10-2023-0157780 2023-11-15
KR20230157780 2023-11-15
KR1020240120522A KR20250071824A (ko) 2023-11-15 2024-09-05 분산 컴퓨팅 시스템의 네트워크 구조에 기인한 분산 행렬 곱 연산 방법
KR10-2024-0120522 2024-09-05

Publications (1)

Publication Number Publication Date
WO2025105928A1 true WO2025105928A1 (fr) 2025-05-22

Family

ID=95743042

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2024/096558 Pending WO2025105928A1 (fr) 2023-11-15 2024-11-14 Procédé distribué de multiplication de matrices implanté dans une structure de réseau de système informatique distribué

Country Status (1)

Country Link
WO (1) WO2025105928A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110167425A1 (en) * 2005-12-12 2011-07-07 The Mathworks, Inc. Instrument-based distributed computing systems
WO2015004421A1 (fr) * 2013-07-08 2015-01-15 Qatar Foundation Procédé destiné à effectuer une opération matricielle dans un système de traitement réparti
US20200057652A1 (en) * 2017-08-31 2020-02-20 Cambricon Technologies Corporation Limited Processing device and related products
KR20230051840A (ko) * 2021-10-12 2023-04-19 서울대학교산학협력단 분산 노드 간 능력 제한에 따른 함수 할당 및 부호화 전송을 위한 컴퓨팅 시스템 및 그의 방법

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110167425A1 (en) * 2005-12-12 2011-07-07 The Mathworks, Inc. Instrument-based distributed computing systems
WO2015004421A1 (fr) * 2013-07-08 2015-01-15 Qatar Foundation Procédé destiné à effectuer une opération matricielle dans un système de traitement réparti
US20200057652A1 (en) * 2017-08-31 2020-02-20 Cambricon Technologies Corporation Limited Processing device and related products
KR20230051840A (ko) * 2021-10-12 2023-04-19 서울대학교산학협력단 분산 노드 간 능력 제한에 따른 함수 할당 및 부호화 전송을 위한 컴퓨팅 시스템 및 그의 방법

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HONG, SANGWOO ET AL.: "Squeezed Polynomial Codes: Communication-Efficient Coded Computation in Straggler-Exploiting Distributed Matrix Multiplication", IEEE ACCESS, vol. 8, 16 October 2020 (2020-10-16), pages 190516 - 190528, XP011816962, DOI: 10.1109/ACCESS.2020.3031590 *
SON, KYUNGRAK ET AL.: "Coded Matrix Computation in Wireless Network", IEEE TRANSACTIONS ON WIRELESS COMMUNICATION S, vol. 23, no. 6, 15 November 2023 (2023-11-15), pages 6394 - 6410, XP011972446, DOI: 10.1109/TWC.2023.3331263 *

Similar Documents

Publication Publication Date Title
WO2017171521A1 (fr) Procédé et équipement pour la transmission de signal de synchronisation et de canal de diffusion de liaison latérale physique (psbch) en communication v2x
WO2021025538A1 (fr) Procédé et appareil de configuration des paramètres csi dans des systèmes de communication sans fil
EP4292372A1 (fr) Procédé et appareil pour transmettre un canal de liaison montante dans un système de communication sans fil
WO2011099756A2 (fr) Trame de rétroaction unifiée pour la prise en charge d'une pluralité de modes de rétroaction, et système de communication à entrées multiples et à sorties multiples (mimo) utilisant la trame de rétroaction unifiée
WO2012093904A2 (fr) Procédé et dispositif d'alignement d'interférence dans un réseau de téléphonie mobile
WO2021141432A1 (fr) Procédé et appareil de suivi de paramètres pour l'estimation d'information d'état de canal
WO2015111767A1 (fr) Procédé de conformation mixte de faisceaux basé sur des informations statistiques de canal, et appareils l'effectuant
EP3365982A1 (fr) Livre de codes de pré-codeur pour des systèmes de communication sans fil évolués
WO2022075724A1 (fr) Procédé et dispositif pour fournir un signal de référence de gestion d'interférence à distance, et support de stockage
WO2023128682A1 (fr) Dispositif et procédé pour une estimation de canal d'un système de communication sans fil à l'aide d'une ris
WO2023080653A1 (fr) Procédé et appareil de gestion d'une compression de rétroaction d'informations d'état de canal (csi) dans un réseau sans fil
WO2021010512A1 (fr) Procédé et appareil permettant d'effectuer un codage sur la base d'une matrice de contrôle de parité d'un code de contrôle de parité à faible densité généré à partir d'un protographe dans un système de communication sans fil
WO2025105928A1 (fr) Procédé distribué de multiplication de matrices implanté dans une structure de réseau de système informatique distribué
WO2014065586A1 (fr) Source, relais et destination exécutant une transmission en coopération, et leur procédé de commande
WO2017069580A9 (fr) Livre de codes de pré-codeur pour des systèmes de communication sans fil évolués
WO2021230721A1 (fr) Dispositif et procédé de gestion de faisceau dans un système de communication
WO2022220623A1 (fr) Appareil et procédé pour réaliser une communication sur la base d'une erreur d'alignement temporel (tae) dans un système de communication sans fil
WO2021256882A1 (fr) Procédé et appareil de réalisation d'une régulation de puissance de liaison montante et d'une transmission sur canal de liaison montante
WO2024136262A1 (fr) Procédés et appareil de sélection d'un profil de sécurité dans des systèmes de communication sans fil
WO2023195832A1 (fr) Systèmes et procédés de compression fondée sur une base de coefficients doppler pour un retour de csi
WO2023090502A1 (fr) Procédé et appareil pour calculer un produit matriciel de variance sur la base d'une quantification de trame
WO2016006783A1 (fr) Système de stockage hybride utilisant un protocole p2p et procédé de transmission de données utilisant un tel système
WO2022071642A1 (fr) Procédé et appareil pour la mise en œuvre d'un codage de canal d'ue et d'une station de base dans un système de communication sans fil
WO2018182157A1 (fr) Procédé et dispositif de décomposition qr triée
WO2025263851A1 (fr) Nœud et terminal dans un système de communication sans fil, et procédé mis en œuvre par ceux-ci

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24891874

Country of ref document: EP

Kind code of ref document: A1