[go: up one dir, main page]

CN120011298A - An Extended Topology System of Multi-processor Interconnection - Google Patents

An Extended Topology System of Multi-processor Interconnection Download PDF

Info

Publication number
CN120011298A
CN120011298A CN202411626735.5A CN202411626735A CN120011298A CN 120011298 A CN120011298 A CN 120011298A CN 202411626735 A CN202411626735 A CN 202411626735A CN 120011298 A CN120011298 A CN 120011298A
Authority
CN
China
Prior art keywords
processors
group
processor
interconnection
address
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411626735.5A
Other languages
Chinese (zh)
Inventor
李兆石
丛高建
付轩
刘晓青
魏莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Muxi Integrated Circuit Shanghai Co ltd
Original Assignee
Muxi Integrated Circuit Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Muxi Integrated Circuit Shanghai Co ltd filed Critical Muxi Integrated Circuit Shanghai Co ltd
Publication of CN120011298A publication Critical patent/CN120011298A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17356Indirect interconnection networks
    • G06F15/17368Indirect interconnection networks non hierarchical topologies
    • G06F15/17381Two dimensional, e.g. mesh, torus
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17337Direct connection machines, e.g. completely connected computers, point to point communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Multi Processors (AREA)

Abstract

The invention relates to the field of chip design, in particular to an expansion topology system for multi-processor interconnection, which comprises N groups of processors, wherein each group of processors comprises M processors, the positions of the processors in each group of processors are distributed identically, each group of processors comprises two layers of interconnection structures, namely an intra-group interconnection structure and an inter-group interconnection structure, wherein the M processors in the intra-group interconnection structure are in point-to-point full connection, the inter-group interconnection structure comprises M annular connection structures, and the distribution positions of each processor connected in the annular connection structure in each group of processors are identical, so that the obtained expansion topology system realizes the number of expansion processor interconnection under the condition of conforming to an original protocol.

Description

Expansion topology system for multiprocessor interconnection
Technical Field
The invention relates to the field of chip design, in particular to an extended topology system for multiprocessor interconnection.
Background
The training and reasoning process of the large model requires tensors, pipelining and data parallelism in the high bandwidth domain, and only pipelining and data parallelism in the low bandwidth domain. The high bandwidth domain is realized by interconnecting a plurality of GPUs through a high-speed interconnection protocol by a GPU manufacturer, and the low bandwidth domain can be realized by adopting an ethernet. The large model is highly dependent on tensor parallelism, which is used only if the number of GPUs supported in the high bandwidth domain is insufficient. Therefore, the more GPUs supported in the high bandwidth domain, the better the training and reasoning process of the large model can be supported.
At present, GPUs are interconnected through an OAM protocol, and because the routing of the interconnection between any two GPUs is determined and cannot be changed in the OAM protocol, the interconnection topology structure of the GPUs based on the OAM protocol supports at most 8 GPUs, each GPU reserves 8 interconnection ports, and any two GPUs in the interconnection topology structure of the 8 GPUs can be directly interconnected point to point, but cannot be expanded to the interconnection topology structure of more than 8 GPUs due to the limitation of the OAM protocol. Therefore, there is a need for an interconnect topology that can support more than 8 GPUs.
Disclosure of Invention
Aiming at the technical problems, the technical scheme adopted by the invention is that the multi-processor interconnection expansion topological system comprises N groups of processors, each group of processors comprises M processors, the positions of the processors in each group of processors are distributed identically, each group of processors comprises two layers of interconnection structures, namely an intra-group interconnection structure and an inter-group interconnection structure, wherein the M processors in the intra-group interconnection structure are in point-to-point full connection, the inter-group interconnection structure comprises M annular connection structures, and the distribution positions of the processors connected in the annular connection structure in each group of processors are identical.
The invention has at least the following beneficial effects:
In summary, the extended topology system for multiprocessor interconnection provided by the invention comprises N groups of processors, each group of processors comprises M processors, the position distribution of each group of processors is the same, each group of processors comprises two layers of interconnection structures, namely an intra-group interconnection structure and an inter-group interconnection structure, so that the number of the extended processors interconnection is realized under the condition that the obtained extended topology system accords with the original protocol.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of a topology of an OAM protocol supporting a maximum of 8 processor interconnections;
FIG. 2 is a schematic diagram of an extended topology system according to a first embodiment of the present invention;
FIG. 3 is a schematic diagram of an extended topology system according to a second embodiment of the present invention;
FIG. 4 is a schematic diagram of an extended topology system according to a third embodiment of the present invention;
fig. 5 is a schematic diagram of an extended topology system according to a fourth embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
Referring to fig. 1, fig. 1 is a schematic topology diagram of an OAM protocol supporting a maximum of 8 processors, and in fig. 1, a total of 8 processors including OAM0-OAM7 can be directly interconnected, i.e., point-to-point interconnection between any two processors. The interconnection protocol is an OAM protocol in which, when the addresses of the source processor and the destination processor are located, the route between the two is also uniquely determined, because the route between the source processor and the destination processor is fixed in hardware and cannot be changed, the route is { source processor address, source processor port index number, destination processor address, destination processor port index number }. In the training and pushing of large models, point-to-point full interconnection between processors is not needed, and collectives modes such as allreduce, alltoall and the like only need to be supported. The allreduce mode is to collect data from each display card and aggregate the data, and then distribute the aggregate result to each display card. Alltoall mode means that the data of each node is distributed to each display card, and the data of each display card is collected. Thus, processors may be interconnected point-to-point, or may be interconnected via forwarding by other processors. The problem to be solved is thus how to extend the number of processor interconnects while conforming to the original protocol.
Note that the topology in fig. 1 is hereinafter referred to as an original topology, and is not described below.
The invention provides an extended topology system interconnected by multiple processors, which comprises N groups of processors, wherein each group of processors comprises M processors, the positions of the processors in each group of processors are distributed identically, and each group of processors comprises two layers of interconnection structures, namely an intra-group interconnection structure and an inter-group interconnection structure. Point-to-point full connections between M processors in the intra-group interconnect structure. The inter-group interconnection structure comprises M annular connection structures, and the distribution positions of each processor connected in the annular connection structures in each group of processors are the same.
Alternatively, the processor is a GPU or a GPGPU. Other processors in the prior art are also within the scope of the present invention.
Alternatively, N is greater than 8, and N is a multiple of 2. Preferably, N is a multiple of 8.
Alternatively, M is greater than 1. Preferably, M is equal to 4.
As a preferred embodiment, the intra-group interconnect structure of each group of processors is a full interconnect structure in the original topology.
As a preferred embodiment, the processors of each group are internally fully interconnected by the same M-1 ports.
As a preferred embodiment, the software address remapping table is searched to obtain the remapping address of each processor address, and the route corresponding to the remapping address is obtained. Wherein the software address remapping table includes each processor address and its remapping address. When the processor address is less than 7, i.e., any one of S0-S7, the remapped address of the processor address is itself. When the processor address is greater than 7, the remapped address of the processor address is modulo the processor address. And searching the software address remapping tables according to the addresses of the source processor and the destination processor respectively to obtain the remapping address of each processor. And remapping all processor addresses of the N groups of processors into S0-S7 through a software address remapping table, so that the processor addresses during data transmission conform to the interconnection route fixed in hardware. The number of processor interconnects is extended in conformity with the original protocol.
Referring to fig. 2, a first type of interconnect topology is shown. The topology system of this type includes 4 processors, a zeroth processor S0 to a fifteenth processor SF, totaling 16 processors, a first group of processors including a first processor S0, a third processor S3, a fourth processor S4, and a seventh processor S7, a second group of processors including a first processor S1, a second processor S2, a fifth processor S5, and a sixth processor S6, a third group of processors including a tenth processor SA, a ninth processor S9, a fourteenth processor SE, and a thirteenth processor SD, and a fourth group of processors including a fifteenth processor SF, an eleventh processor SB, a twelfth processor SC, and an eighth processor S8. Each processor comprises 7 ports, each port in each processor corresponds to a unique index number, and the index numbers of different processors are independent, namely, the index numbers of the ports in different processors are all from a first port index number link1 to a seventh port index number link7. The 16 processors in fig. 2 all occupy 5 ports in total in the interconnect topology. The same three ports are occupied inside each group of processors to realize full interconnection, and in fig. 2, the full interconnection inside the groups is realized through link1, link2 and link 3. Inter-group interconnections are formed into a ring structure by hopping between the remaining ports of the interconnection processor.
The distribution positions of S0, S5, SF and SA in each group of processors are the same, and in an interconnection structure formed by interconnection of S0, S5, SF and SA, the connection between index numbers of ports is specifically as follows, link7 of S0 is connected with link7 of S5, link4 of S5 is connected with link4 of SF, link7 of SF is connected with link7 of SA, link6 of SA is connected with link6 of S0, and the connection is formed into an annular structure in an end-to-end mode. Similarly, the distribution positions of S4, S1, SB and SE in each group of processors are the same, and in an interconnection structure formed by interconnection of S4, S1, SB and SE, the connection between index numbers of ports is specifically as follows, link7 of S4 is connected with link7 of S1, link4 of S1 is connected with link4 of SB, link7 of SB is connected with link7 of SE, link6 of SE is connected with link6 of S4, and the connection is formed into an annular structure end to end. Similarly, the distribution positions of S3, S6, SC and S9 in each group of processors are the same, the S3, S6, SC and S9 are interconnected to form an interconnection structure, and the connection between index numbers of ports is specifically as follows, link7 of S3 is connected with link7 of S6, link6 of S6 is connected with link6 of SC, link7 of SC is connected with link7 of S9, link4 of S9 is connected with link4 of S3, and an annular structure is formed in a head-tail mode. Similarly, the distribution positions of S7, S2, S8 and SD in each group of processors are the same, the S7, S2, S8 and SD are interconnected to form an interconnection structure, and the connection between index numbers of ports is specifically as follows, link7 of S7 is connected with link7 of S2, link6 of S2 is connected with link6 of S8, link7 of S8 is connected with link7 of SD, link4 of SD is connected with link4 of S3, and an annular structure is formed in a head-tail mode.
On the basis of fig. 2, the 16-bit processor address of S0-SF is remapped to the 8-bit processor address of S0-S7 by means of a software address remapping table. The interconnection route between any two processors in fig. 2 is the same as the original 8 processors in fig. 1. I.e. an extension of the interconnect processor is achieved without changing the fixed routing in the hardware.
Note that, in fig. 2, index numbers link7, link6, and link4 of ports forming the ring-shaped inter-group interconnect structure are link7, link6, link7, and link4 or link7, link4, link7, and link6 in order of the ring-shaped structure. Alternatively, an equivalent implementation of the index number of the interconnection port may be link6, link7, link6 and link4 or link6, link4, link6 and link7 in sequence. It is also possible to replace the index number of any one port in the ring structure with Link5 by Link5, for example, if the index number of the interconnect structure is not Link7 but Link5 is used, then the index numbers of the ports Link5, link6, and Link4.
Referring to fig. 3, a second type of interconnect topology is shown, again comprising 4 sets of processors. The 16 processors in fig. 3 occupy a total of 5 ports in the interconnect topology. Processors within each group of processors implement intra-group full interconnection. Inter-group interconnection is achieved by passing processors of each group of processors in the same distribution position through the remaining ports to form a ring structure. The 4 groups of processors are S0-S3, S4-S7, S8-SB and SC-SF respectively. And realizing the full interconnection in each group by link3, link4, link5 and link6 respectively in each group of processors. Inter-group interconnections are formed into a ring structure by hopping between the remaining ports of the interconnection processor. The S0, S5, SB and SE distributed in the same position in each group of processors are interconnected to form an annular structure, and the connection between index numbers of ports is specifically as follows, link7 of S0 is connected with link7 of S5, link6 of S5 is connected with link6 of SB, link7 of SB is connected with link7 of SE, link5 of SE is connected with link5 of S0, and the connection is formed into an annular structure end to end. Similarly, S1, S4, SA and SF with the same distribution position in each group of processors are interconnected to form a ring structure, and the connection between index numbers of ports is specifically as follows, link7 of S1 is connected with link7 of S4, link4 of S4 is connected with link4 of SA, link7 of SA is connected with link7 of SF, link5 of SF is connected with link5 of S1, and the connection is formed into a ring structure. Similarly, S3, S6, S8 and SD distributed in the same positions in each group of processors are interconnected to form a ring structure, and the connection between index numbers of ports is specifically as follows, link7 of S3 is connected with link7 of S6, link5 of S6 is connected with link5 of S8, link7 of S8 is connected with link7 of SD, link6 of SD is connected with link6 of S3, and the connection is formed into a ring structure. Similarly, S2, S7, S9 and SC distributed in the same position in each group of processors are interconnected to form a ring structure, and the connection between index numbers of ports is specifically as follows, link7 of S2 is connected with link7 of S7, link5 of S7 is connected with link5 of S9, link7 of S9 is connected with link7 of SC, link4 of SC is connected with link4 of S2, and the connection is formed into a ring structure.
On the basis of fig. 3, the 16-bit processor address of the S0-SF is remapped into the 8-bit processor address of the S0-S7 through a software address remapping table, so that the purpose of expanding the interconnection processor is achieved on the premise of not changing the fixed route in hardware.
Referring to fig. 4, a third type of interconnect topology is shown, again comprising 4 sets of processors. The 16 processors in fig. 4 interconnect topology occupies a total of 6 ports. Processors within each group of processors implement intra-group full interconnection. Inter-group interconnection is achieved by passing processors of each group of processors in the same distribution position through the remaining ports to form a ring structure. The 4 sets of processors are S0, S2, S5 and S7, S1, S3, S4 and S6, S8, SA, SD and SF, S9, SB, SC and SE, respectively. And realizing the full interconnection in each group by link2, link6, link7 and link4 respectively in each group of processors. Inter-group interconnections are formed by interconnecting remaining ports of the processors to form a ring structure. The S0, the S1, the SD and the SC which are distributed in the same position in each group of processors are interconnected to form an annular structure, and the interconnection ports are sequentially connected with index numbers of the ports, wherein the connection is specifically that link4 of the S0 is connected with link6 of the S1, link1 of the S1 is connected with link1 of the SD, link5 of the SD is connected with link5 of the SC, and link1 of the SC is connected with link1 of the S0, and the connection is formed into an annular structure. Similarly, S2, S6, SF and SB which are distributed in the same positions in each group of processors are interconnected to form a ring structure, and the connection between ports is specifically as follows, link1 of S2 is connected with link1 of S6, link4 of S6 is connected with link6 of SF, link1 of SF is connected with link5 of SB, link5 of SB is connected with link5 of S2, and the connection is formed into a ring structure end to end. Similarly, S7, S3, SA and SE with the same distribution positions in each group of processors are interconnected to form an annular structure, and the connection between index numbers of ports is specifically as follows, link1 of S7 is connected with link1 of S3, link5 of S3 is connected with link5 of SA, link1 of SA is connected with link1 of SE, link4 of SE is connected with link6 of S7, and the connection is formed into an annular structure in an end-to-end mode. Similarly, S5, S4, S8 and S9 with the same distribution positions in each group of processors are interconnected to form a ring structure, and the connection between index numbers of ports is specifically as follows, link5 of S5 is connected with link5 of S4, link1 of S4 is connected with link1 of S8, link4 of S8 is connected with link6 of S9, and link1 of S9 is connected with link1 of S5, and the connection is formed into a ring structure.
The extended topology system provided by the 16 processors provided in fig. 4 also needs to remap the addresses of the S0-SF into the S0-S7 through address remapping, so as to achieve the purpose of extending the interconnection processors without changing the fixed routes in the hardware.
The equivalent topology system of the extended topology system provided by the 16 processors provided in fig. 2, 3 and 4 further comprises an interconnection structure formed after the positional relationship among the groups of processors in the extended topology system is exchanged. The extended topology system which changes the index number to make the final realized result be the same as the hardware fixed route determined by the OAM protocol falls within the protection scope of the invention.
Referring to fig. 5, a fourth type of interconnect topology is shown, which is a further extension of the interconnect topology provided in fig. 3, to implement the interconnection of 32 processors, equivalent to the interconnection between 4 sets of original interconnect topologies. The interconnection of 32 processors comprises 8 groups of interconnection structures, the inside of each group of interconnection structures is fully interconnected, and the processors at the same position are sequentially interconnected to form a ring-shaped interconnection structure. The address remapping is the same as the software address remapping table used in fig. 3.
As a preferred embodiment, the extended topology system provided in fig. 2 and 4 can be extended again with reference to the extended topology system provided in fig. 5. The method shown in fig. 2, fig. 3 and fig. 4 can be expanded for multiple times in a mode that processors in the whole group are interconnected and processors in the same position among the groups are interconnected to form a ring structure, so that the interconnection of 8*N processors is realized.
As a preferred embodiment, when the source processor performs hardware address remapping to make it conform to the rule checked by the destination processor, or the destination address performs hardware address remapping, the hardware address remapping is added on the hardware in the hardware path between the source processor and the destination processor. As an example, when S4 expects to communicate with the Link1 interconnect of S0, but in fact S4 communicates with the Link7 interconnect of S0, the interconnect Link7 can be remapped to Link1 by mapping address remapping.
As a preferred embodiment, when 16 processors are interconnected, the address remapping of S0-S7 is followed by itself, and the address remapping of S8-SF is followed by the current processor address minus 8.
In summary, the extended topology system for multiprocessor interconnection provided by the invention comprises N groups of processors, each group of processors comprises M processors, the position distribution of each group of processors is the same, each group of processors comprises two layers of interconnection structures, namely an intra-group interconnection structure and an inter-group interconnection structure, so that the number of the extended processors interconnection is realized under the condition that the obtained extended topology system accords with the original protocol.
While certain specific embodiments of the invention have been described in detail by way of example, it will be appreciated by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the invention. Those skilled in the art will also appreciate that many modifications may be made to the embodiments without departing from the scope and spirit of the invention. The scope of the invention is defined by the appended claims.

Claims (8)

1.一种多处理器互连的扩展拓扑系统,其特征在于,所述扩展拓扑系统包括N组处理器;1. An extended topology system for interconnecting multiple processors, characterized in that the extended topology system comprises N groups of processors; 每组处理器包括M个处理器;Each processor group includes M processors; 每组处理器中各个处理器的位置分布相同;The positions of the processors in each group of processors are distributed in the same way; 每组处理器包括两层互连结构:组内互连结构和组间互连结构;其中,所述组内互连结构中M个处理器之间点对点全连接;所述组间互联结构包括M个环形连接结构,所述环形连接结构中连接的每个处理器在每组处理器中的分布位置相同。Each group of processors includes two layers of interconnection structures: an intra-group interconnection structure and an inter-group interconnection structure; wherein the M processors in the intra-group interconnection structure are point-to-point fully connected; and the inter-group interconnection structure includes M ring connection structures, and each processor connected in the ring connection structure has the same distribution position in each group of processors. 2.根据权利要求1所述的系统,其特征在于,每组处理器的组内互连结构为原始拓扑结构中的全互连结构。2. The system according to claim 1 is characterized in that the intra-group interconnection structure of each group of processors is a full interconnection structure in the original topology structure. 3.根据权利要求1所述的系统,其特征在于,每组处理器内部通过相同的M-1个端口实现全互连。3. The system according to claim 1 is characterized in that each group of processors is fully interconnected through the same M-1 ports. 4.根据权利要求3所述的系统,其特征在于,组间互连是通过互连处理器的剩余端口之间跳转形成一个环形结构。4. The system according to claim 3, characterized in that the interconnection between groups forms a ring structure by jumping between the remaining ports of the interconnected processors. 5.根据权利要求1所述的结构,其特征在于,M等于4。5. The structure according to claim 1 is characterized in that M is equal to 4. 6.根据权利要求1所述的系统,其特征在于,处理器为GPU或GPGPU。6. The system according to claim 1, wherein the processor is a GPU or a GPGPU. 7.根据权利要求1所述的系统,其特征在于,所述系统还包括:查找软件地址重映射表得到每个处理器地址的重映射地址,获取重映射地址对应的路由。7. The system according to claim 1 is characterized in that the system further comprises: searching a software address remapping table to obtain a remapping address of each processor address, and obtaining a route corresponding to the remapping address. 8.根据权利要求7所述的系统,其特征在于,当处理器地址大于7时,处理器地址的重映射地址为对处理器地址取模。8. The system according to claim 7 is characterized in that when the processor address is greater than 7, the remapped address of the processor address is modulo the processor address.
CN202411626735.5A 2024-01-04 2024-11-14 An Extended Topology System of Multi-processor Interconnection Pending CN120011298A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202410015931 2024-01-04
CN2024100159312 2024-01-04

Publications (1)

Publication Number Publication Date
CN120011298A true CN120011298A (en) 2025-05-16

Family

ID=95669012

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411626735.5A Pending CN120011298A (en) 2024-01-04 2024-11-14 An Extended Topology System of Multi-processor Interconnection

Country Status (2)

Country Link
CN (1) CN120011298A (en)
WO (1) WO2025113544A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5859983A (en) * 1996-07-01 1999-01-12 Sun Microsystems, Inc Non-hypercube interconnection subsystem having a subset of nodes interconnected using polygonal topology and other nodes connect to the nodes in the subset
US20030225909A1 (en) * 2002-05-28 2003-12-04 Newisys, Inc. Address space management in systems having multiple multi-processor clusters
CN112416850A (en) * 2020-11-20 2021-02-26 新华三云计算技术有限公司 A multi-processor interconnected system and its communication method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10728091B2 (en) * 2018-04-04 2020-07-28 EMC IP Holding Company LLC Topology-aware provisioning of hardware accelerator resources in a distributed environment
US11720521B2 (en) * 2021-03-29 2023-08-08 Alibaba Singapore Holding Private Limited Topologies and algorithms for multi-processing unit interconnected accelerator systems
CN114968902B (en) * 2022-07-28 2022-10-25 沐曦科技(成都)有限公司 Multiprocessor interconnection system
CN115168279A (en) * 2022-07-29 2022-10-11 浪潮商用机器有限公司 Multiprocessor node interconnection system and server

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5859983A (en) * 1996-07-01 1999-01-12 Sun Microsystems, Inc Non-hypercube interconnection subsystem having a subset of nodes interconnected using polygonal topology and other nodes connect to the nodes in the subset
US20030225909A1 (en) * 2002-05-28 2003-12-04 Newisys, Inc. Address space management in systems having multiple multi-processor clusters
CN112416850A (en) * 2020-11-20 2021-02-26 新华三云计算技术有限公司 A multi-processor interconnected system and its communication method

Also Published As

Publication number Publication date
WO2025113544A1 (en) 2025-06-05

Similar Documents

Publication Publication Date Title
US5721819A (en) Programmable, distributed network routing
US7281055B2 (en) Routing mechanisms in systems having multiple multi-processor clusters
US8601423B1 (en) Asymmetric mesh NoC topologies
US7155525B2 (en) Transaction management in systems having multiple multi-processor clusters
JP4734539B2 (en) System and method for searching for the shortest path between nodes in a network
JP4676463B2 (en) Parallel computer system
US9514092B2 (en) Network topology for a scalable multiprocessor system
US7251698B2 (en) Address space management in systems having multiple multi-processor clusters
US20090016355A1 (en) Communication network initialization using graph isomorphism
JPH077382B2 (en) Interconnection network, computer system and method for parallel processing
CN115002584A (en) Reconfigurable computing platform using optical network with one-to-many optical switch
Yeh et al. Macro-star networks: efficient low-degree alternatives to star graphs
BR112021007538A2 (en) reconfigurable computing pods using optical networking
CN115380271A (en) A topology-aware multi-stage approach for cluster communication
CN115336236B (en) Method implemented by first computing node, first computing node and readable medium
US12250145B2 (en) Network-on-chip topology generation
US6584073B1 (en) Network topologies
CN120011298A (en) An Extended Topology System of Multi-processor Interconnection
CN115335804A (en) Avoiding network congestion by halving trunked communication
CN113726879A (en) Hybrid data center network structure VHCN based on VLC link
US20200293478A1 (en) Embedding Rings on a Toroid Computer Network
CN120474971A (en) A data routing system and related method
Ganesan et al. The hyper-deBruijn multiprocessor networks
US20170195211A9 (en) Efficient High-Radix Networks for Large Scale Computer Systems
Farahabady et al. The recursive transpose-connected cycles (RTCC) interconnection network for multiprocessors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination