CN108123878A - A kind of method for routing, device and data transfer equipment - Google Patents
A kind of method for routing, device and data transfer equipment Download PDFInfo
- Publication number
- CN108123878A CN108123878A CN201611086242.2A CN201611086242A CN108123878A CN 108123878 A CN108123878 A CN 108123878A CN 201611086242 A CN201611086242 A CN 201611086242A CN 108123878 A CN108123878 A CN 108123878A
- Authority
- CN
- China
- Prior art keywords
- forwarding device
- data forwarding
- port
- ports
- destination
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/74—Address processing for routing
- H04L45/745—Address table lookup; Address filtering
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/12—Avoiding congestion; Recovering from congestion
- H04L47/122—Avoiding congestion; Recovering from congestion by diverting traffic away from congested entities
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Description
技术领域technical field
本发明涉及通信技术领域,尤其涉及一种路由方法、装置及数据转发设备。The present invention relates to the technical field of communications, in particular to a routing method, device and data forwarding equipment.
背景技术Background technique
随着网络技术的发展,数据中心已经成为提供因特网络服务、分布式并行计算等的基础架构。为数据中心设计可扩展的网络架构和高效的路由算法是当前学术界的研究热点。业界一般采用克洛斯(clos)架构来搭建数据中心。随着蜘蛛网(web)应用、大数据分析以及云计算等业务的快速发展,对数据中心交换网络容量的需求也越来越大。基于叶-脊(leaf-spine)的两级clos架构远远满足不了业务对容量的需求,未来势必需要更多级的clos架构来构建大型的数据中心。With the development of network technology, the data center has become the infrastructure for providing Internet services, distributed parallel computing, etc. Designing scalable network architecture and efficient routing algorithms for data centers is a research hotspot in the current academic circles. The industry generally adopts the clos (clos) architecture to build data centers. With the rapid development of services such as web applications, big data analysis, and cloud computing, the demand for switching network capacity of data centers is also increasing. The two-level clos architecture based on leaf-spine is far from meeting the capacity requirements of the business. In the future, more clos architectures will be needed to build large-scale data centers.
在数据中心负载均衡效果的好坏将直接影响到用户的体验,现有技术中提出一种分布式处理方案来对数据中心流量进行负载调度,具体包括:在数据报文从源节点发往目的节点的过程中,将所选路径标识以及该路径中各段链路的拥塞度量值封装在数据报文中一同发往目的节点。当有报文需要从目的节点发往源节点时,将路径标识和该路径对应的拥塞情况封装在数据报文中告诉源节点。如此,源节点就可以知道到达目的节点的不同路径中哪条是最不拥塞的。当有数据报文要发往该目的节点时,优选这条最不拥塞的路径进行数据报文的发送。The quality of the load balancing effect in the data center will directly affect the user experience. A distributed processing scheme is proposed in the prior art to perform load scheduling on the data center traffic, which specifically includes: when the data message is sent from the source node to the destination In the process of the node, the selected path identifier and the congestion measurement value of each link in the path are encapsulated in a data message and sent to the destination node together. When a message needs to be sent from the destination node to the source node, the path identifier and the congestion situation corresponding to the path are encapsulated in a data message to inform the source node. In this way, the source node can know which of the different paths to the destination node is the least congested. When there is a data message to be sent to the destination node, the least congested path is preferred for sending the data message.
在上述路由方案中,随着clos层级的扩大,端到端的路径数呈指数增长,该方案所需的表项规模也呈指数的增长,例如三级clos架构所需的表项数就要100万以上,现有的交换机无法支持如此大数量的表项。另外,因为端到端的路径呈指数增长,要在交换机上维护端到端所有路径的实时拥塞信息需要的时间也呈指数增长。基于leaf-spine的2级clos架构下,端到端的路径数为24,获取端到端所有路径的拥塞信息所需时间可以接受。但在3级clos架构下端到端的路径数为576,4级clos架构下端到端的路径数高达13824,想要获取所有路径上的实时拥塞信息来进行流量调度将需要很长的时间。因此上述路由方案在多层级的clos架构中效果较差,即上述路由方案的扩展性较差。In the above routing scheme, with the expansion of the clos level, the number of end-to-end paths increases exponentially, and the scale of entries required by the scheme also increases exponentially. For example, the number of entries required by the three-level clos architecture is 100. Existing switches cannot support such a large number of entries. In addition, because the end-to-end paths increase exponentially, the time required to maintain real-time congestion information of all end-to-end paths on the switch also increases exponentially. Under the two-level clos architecture based on leaf-spine, the number of end-to-end paths is 24, and the time required to obtain the congestion information of all end-to-end paths is acceptable. However, the number of end-to-end paths under the 3-level clos architecture is 576, and the number of end-to-end paths under the 4-level clos architecture is as high as 13824. It will take a long time to obtain real-time congestion information on all paths for traffic scheduling. Therefore, the above routing scheme is less effective in a multi-level clos architecture, that is, the above routing scheme has poor scalability.
发明内容Contents of the invention
本发明实施例提供一种路由方法、装置及数据转发设备,用以解决现有技术中的路由方式扩展性较差的技术问题。Embodiments of the present invention provide a routing method, device, and data forwarding device to solve the technical problem of poor scalability of routing methods in the prior art.
第一方面,本发明实施例提供了一种路由方法,该方法从数据包传输路径上的各个数据转发设备的角度进行描述。在该方法中,数据转发设备获取数据包。例如如果是源数据转发设备就可以生成数据包,如果是非源数据转发设备,就可以从上一层数据转发设备接收数据包。然后数据转发设备确定在预设的路由表中是否匹配到所述数据包的下一跳出端口;其中,所述预设的路由表中的出端口均为不拥塞的出端口。若在所述预设的路由表中未匹配到所述数据包的下一跳出端口,所述数据转发设备在所述数据转发设备的除拥塞信息表中记录的到所述数据包的目的数据转发设备的拥塞出端口外的其它端口中,确定第一端口作为所述数据包到所述目的数据转发设备的下一跳出端口。然后数据转发设备通过所述第一端口发送所述数据包。在本发明实施例中,一方面记录拥塞的端口在拥塞信息表中,所以所需表项的数量较少;另一方面,数据转发设备根据拥塞记录表中的拥塞端口选择不拥塞的端口中的一个作为下一跳出端口,即数据转发设备为数据包逐跳选择可用的下一跳,而不需要像现有技术中的方法选择一条端到端的最优路径,所以不需要等待反馈所有端到端的路径的使用情况,所以也不需要等待较长时间。综上,本发明实施例中的路由方法的扩展性强,适合多层级的clos架构的网络。In a first aspect, an embodiment of the present invention provides a routing method, and the method is described from the perspective of each data forwarding device on a data packet transmission path. In this method, the data forwarding device obtains the data packet. For example, if it is a source data forwarding device, it can generate a data packet; if it is a non-source data forwarding device, it can receive a data packet from an upper layer data forwarding device. Then the data forwarding device determines whether the next hop egress port of the data packet is matched in the preset routing table; wherein, the egress ports in the preset routing table are all uncongested egress ports. If the next hop exit port of the data packet is not matched in the preset routing table, the destination data of the data packet recorded by the data forwarding device in the congestion removal information table of the data forwarding device Among the ports other than the congested egress port of the forwarding device, the first port is determined as the next hop egress port of the data packet to the destination data forwarding device. Then the data forwarding device sends the data packet through the first port. In the embodiment of the present invention, on the one hand, the ports that record congestion are in the congestion information table, so the number of entries required is relatively small; One of them is used as the next hop output port, that is, the data forwarding device selects an available next hop for the data packet hop by hop, instead of selecting an end-to-end optimal path like the method in the prior art, so there is no need to wait for feedback from all end points The usage of the end-to-end path, so there is no need to wait for a long time. To sum up, the routing method in the embodiment of the present invention has strong scalability and is suitable for a network with a multi-level clos architecture.
在一个可能的设计中,数据转发设备还确定所述数据转发设备的所有端口中到各个目的数据转发设备的拥塞的端口;所述数据转发设备将所述各个目的数据转发设备和对应于所述各个目的数据转发设备的拥塞的端口的端口号记录在所述拥塞信息表中。在本发明实施例中,由各个数据转发设备建立和维护拥塞信息表,所以比较方便,便于实施。In a possible design, the data forwarding device further determines the congested port to each destination data forwarding device among all the ports of the data forwarding device; The port numbers of the congested ports of each destination data forwarding device are recorded in the congestion information table. In the embodiment of the present invention, each data forwarding device establishes and maintains the congestion information table, so it is more convenient and easy to implement.
在一个可能的设计中,从非源数据转发设备的角度进行描述,所述方法还包括:数据转发设备确定所述数据转发设备到所述各个目的数据转发设备的所有端口是否均处于拥塞状态;若所述数据转发设备到所述各个目的数据转发设备中的第一目的数据转发设备的所有端口均处于拥塞状态,则所述数据转发设备向上层数据转发设备发送第一通知信息,所述第一通知信息用于通知所述上层数据转发设备,所述数据转发设备不能作为所述上层数据转发设备到达所述第一目的数据转发设备的下一跳;若所述数据转发设备到所述各个目的数据转发设备中的第二目的数据转发设备的所有端口未均处于拥塞状态,则所述数据转发设备向上层数据转发设备发送第二通知信息,所述第二通知信息用于通知所述上层数据转发设备,所述数据转发设备能够作为所述上层数据转发设备到达所述第二目的数据转发设备的下一跳。通过该方法,可以告知上一层数据转发设备自身的拥塞状况,使得上一层数据转发设备可以确定拥塞的端口。In a possible design, described from the perspective of a non-source data forwarding device, the method further includes: the data forwarding device determines whether all ports from the data forwarding device to each destination data forwarding device are in a congested state; If all ports from the data forwarding device to the first destination data forwarding device among the respective destination data forwarding devices are in a congested state, the data forwarding device sends first notification information to the upper-layer data forwarding device, and the first A notification message is used to notify the upper-layer data forwarding device that the data forwarding device cannot serve as the next hop for the upper-layer data forwarding device to reach the first destination data forwarding device; All ports of the second destination data forwarding device in the destination data forwarding device are not in a congested state, then the data forwarding device sends second notification information to the upper layer data forwarding device, and the second notification information is used to notify the upper layer A data forwarding device, where the data forwarding device can serve as a next hop for the upper layer data forwarding device to reach the second destination data forwarding device. Through this method, the upper-layer data forwarding device can be notified of its own congestion status, so that the upper-layer data forwarding device can determine the congested port.
在一个可能的设计中,从非目的数据转发设备和非目的数据转发设备的上一层数据转发设备的角度进行描述,所述数据转发设备确定所述数据转发设备的所有端口中到各个目的数据转发设备的拥塞的端口,包括:所述数据转发设备接收与所述数据转发设备的端口连接的各个下层数据转发设备发送的第三通知信息;其中,所述各个下层数据转发设备为到达所述各个目的数据转发设备的路径上的数据转发设备;所述第三通知信息用于表征所述各个下层数据转发设备是否能够作为所述数据转发设备到达所述各个目的数据转发设备的下一跳;所述数据转发设备根据所述第三通知信息确定所述数据转发设备的所有端口中到各个目的数据转发设备的拥塞的端口。通过该方法,数据转发设备可以获知下层数据转发设备到目的数据转发设备的拥塞情况,进而可以确定自身的不可用的下一跳出端口,并且可以得知该不可用下一跳出端口的拥塞点数。In a possible design, the description is made from the perspective of the non-purpose data forwarding device and the upper-layer data forwarding device of the non-purpose data forwarding device. The data forwarding device determines that each destination data The congested port of the forwarding device includes: the data forwarding device receiving third notification information sent by each lower-layer data forwarding device connected to the port of the data forwarding device; A data forwarding device on the path of each destination data forwarding device; the third notification information is used to indicate whether each lower layer data forwarding device can serve as a next hop for the data forwarding device to reach each destination data forwarding device; The data forwarding device determines a congested port to each destination data forwarding device among all ports of the data forwarding device according to the third notification information. Through this method, the data forwarding device can know the congestion situation from the lower layer data forwarding device to the destination data forwarding device, and then can determine its own unavailable next-hop egress port, and can know the congestion points of the unavailable next-hop egress port.
结合以上可能的设计中的任一种设计,所述方法还包括:若所述数据转发设备的不拥塞出端口改变为拥塞出端口时,所述数据转发设备还确定所述预设的路由表中是否包含所述拥塞出端口;若所述预设的路由表中包含所述拥塞出端口时,所述数据转发设备将包含所述拥塞出端口的表项从所述预设的路由表中删除。通过该方法可以根据端口的拥塞状态更新预设的路由表,使得预设的路由表中的出端口始终是可用的下一跳出端口。In combination with any of the above possible designs, the method further includes: if the uncongested egress port of the data forwarding device changes to a congested egress port, the data forwarding device also determines the preset routing table whether the congested outport is included in the preset routing table; if the preset routing table includes the congested outport, the data forwarding device will include the entry of the congested outport from the preset routing table delete. Through this method, the preset routing table can be updated according to the congestion state of the port, so that the outgoing port in the preset routing table is always an available next-hop outgoing port.
结合以上可能的设计中的任一种设计,在所述确定第一端口作为所述数据包到所述目的转发设备的下一跳出端口之后,所述方法还包括:所述数据转发设备在所述预设的路由表中为所述数据包对应的数据流建立新表项,所述新表项的出端口为所述第一端口。通过该方法,可以使得后续该数据流的数据包按照第一端口进行发送,节约数据包转发时间。In combination with any of the above possible designs, after the first port is determined as the next hop port for the data packet to the destination forwarding device, the method further includes: the data forwarding device Create a new entry in the preset routing table for the data flow corresponding to the data packet, and the outgoing port of the new entry is the first port. Through this method, subsequent data packets of the data flow can be sent according to the first port, saving data packet forwarding time.
第二方面,本发明实施例提供一种数据转发设备。该数据转发设备包括:至少两个端口、发送器、接收器和处理器。发送器可以用于执行前述第一方面中的路由方法中发送的步骤。接收器可以执行前述第一方面中的路由方法中接收或获取的步骤。处理器可以执行前述第一方面中的路由方法中建立、删除、记录或确定的步骤。In a second aspect, an embodiment of the present invention provides a data forwarding device. The data forwarding device includes: at least two ports, a transmitter, a receiver and a processor. The sender can be used to execute the sending step in the routing method in the aforementioned first aspect. The receiver may perform the receiving or obtaining step in the routing method in the aforementioned first aspect. The processor may execute the steps of establishing, deleting, recording or determining in the routing method in the aforementioned first aspect.
第三方面,本发明实施例提供一种路由装置,所述路由装置包括用于实现第一方面所述的方法的功能模块。In a third aspect, an embodiment of the present invention provides a routing device, where the routing device includes a functional module for implementing the method described in the first aspect.
第四方面,本发明实施例还提供一种计算机存储介质,所述计算机存储介质上存储有程序代码,所述程序代码包括用于实现所述第一方面的方法的任意可能的实现方式的指令。In a fourth aspect, an embodiment of the present invention further provides a computer storage medium, where program code is stored on the computer storage medium, and the program code includes instructions for realizing any possible implementation of the method in the first aspect. .
附图说明Description of drawings
图1a为本发明实施例提供的一种交换网络的结构图;Figure 1a is a structural diagram of a switching network provided by an embodiment of the present invention;
图1b-图1c为本发明实施例提供的一种数据中心的结构图;1b-1c are structural diagrams of a data center provided by an embodiment of the present invention;
图2为本发明实施例提供的一种数据转发设备的结构图;FIG. 2 is a structural diagram of a data forwarding device provided by an embodiment of the present invention;
图3为本发明实施例提供的一种路由方法的流程图;FIG. 3 is a flowchart of a routing method provided by an embodiment of the present invention;
图4为本发明实施例提供的一种路由装置的功能框图。Fig. 4 is a functional block diagram of a routing device provided by an embodiment of the present invention.
具体实施方式Detailed ways
本发明实施例提供一种路由方法、装置及数据转发设备,用以解决现有技术中的路由方式扩展性较差的技术问题。Embodiments of the present invention provide a routing method, device, and data forwarding device to solve the technical problem of poor scalability of routing methods in the prior art.
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。The term "and/or" in this article is just an association relationship describing associated objects, which means that there can be three relationships, for example, A and/or B can mean: A exists alone, A and B exist simultaneously, and there exists alone B these three situations. In addition, the character "/" in this article generally indicates that the contextual objects are an "or" relationship.
以下将结合附图详细描述本发明实施例中方案的实施过程、目的。The implementation process and purpose of the solutions in the embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings.
本发明实施例提供的一种路由方法,该方法可以应用于m级clos架构的数据中心中,其中,m为大于或等于2的整数。该方法也可以应用于如图1a所示的交换网络。A routing method provided by an embodiment of the present invention can be applied to a data center with an m-level clos architecture, where m is an integer greater than or equal to 2. The method can also be applied to a switched network as shown in Fig. 1a.
该方法在现有的数据中心的架构上和交换网络中,数据转发设备检测端口的拥塞状态,当端口对应的链路拥塞时,将端口从到达目的数据转发设备的可用下一跳中删除。删除后,若到目的数据转发设备的可用下一跳为0,该数据转发设备通知所有上一层数据转发设备不能通过自己到达该目的数据交换设备。上层数据转发设备则更新自己的对应到目的数据转发设备的可用下一跳表项。在流量调度时,数据转发设备根据实时反馈的拥塞信息,逐跳为数据包选择可用的下一跳。因此,该方法扩展性强,适用于多层级clos架构的网络。In the method, the data forwarding device detects the congestion state of the port on the structure of the existing data center and the switching network, and when the link corresponding to the port is congested, the port is deleted from the available next hop to the destination data forwarding device. After deletion, if the available next hop to the destination data forwarding device is 0, the data forwarding device notifies all upper layer data forwarding devices that they cannot reach the destination data switching device through itself. The upper-layer data forwarding device updates its own available next-hop entry corresponding to the destination data forwarding device. During traffic scheduling, the data forwarding device selects an available next hop for data packets hop by hop based on the real-time feedback of congestion information. Therefore, the method is highly scalable and suitable for networks with multi-level clos architectures.
具体的,在图1a所示的交换网络中,只是简单示出了从源交换机到目的交换机的多条路径,路径上的交换机称为中间交换机,例如中间交换机1至中间交换机4。应理解的是,在实际运用中,该交换网络还可以包括更多条路径,每条路径上的中间交换机的数量也可以更多。Specifically, in the switching network shown in FIG. 1 a , multiple paths from the source switch to the destination switch are simply shown, and the switches on the paths are called intermediate switches, such as intermediate switches 1 to 4 . It should be understood that, in practice, the switching network may include more paths, and the number of intermediate switches on each path may also be greater.
具体的,接下来请参考图1b所示,为本发明实施例提供的一种3级clos架构的数据中心的网络结构图。在本实施例中,3级clos架构以3元胖树(fat-tree)拓扑为例。该网络结构由上至下分为三层。由上至下的三层分别为核心层、汇聚层、边缘层。核心层的交换机称为核心交换机,汇聚层的交换机称为汇聚交换机,边缘层的交换机称为边缘交换机。主机或服务器与边缘交换机直接相连。在实际运用中,分布在三个层的设备可以是其它用于进行数据转发的设备,例如路由器,为便于描述,本文中统称为数据转发设备。Specifically, please refer to FIG. 1 b , which is a network structure diagram of a data center with a three-level clos architecture provided by an embodiment of the present invention. In this embodiment, the 3-level clos architecture takes a 3-element fat-tree topology as an example. The network structure is divided into three layers from top to bottom. The three layers from top to bottom are the core layer, aggregation layer, and edge layer. The switches at the core layer are called core switches, the switches at the aggregation layer are called aggregation switches, and the switches at the edge layer are called edge switches. Hosts or servers are directly connected to edge switches. In practical application, the devices distributed in the three layers may be other devices for data forwarding, such as routers, which are collectively referred to as data forwarding devices herein for ease of description.
需要说明的是,在不同拓扑结构的clos架构中或者在不同的场景中,各级数据转发设备的命名会稍有不同,例如边缘交换机在2级叶-脊clos架构也可以称为叶交换机,核心交换机在2级叶-脊clos架构也可以称为脊交换机。再例如边缘交换机从功能上可以称为接入交换机。从交换机的结构上,边缘交换机也可以称为机架上(英文:Top-of-Rack,简称:TOR)交换机。为便于描述,本文中统称为数据转发设备。It should be noted that in clos architectures with different topologies or in different scenarios, the names of data forwarding devices at all levels will be slightly different. For example, an edge switch in a two-level leaf-spine clos architecture can also be called a leaf switch. A core switch in a 2-level leaf-spine clos architecture can also be called a spine switch. For another example, an edge switch may be called an access switch functionally. In terms of the structure of the switch, the edge switch may also be referred to as an on-rack (English: Top-of-Rack, TOR for short) switch. For ease of description, they are collectively referred to as data forwarding devices in this document.
请继续参考图1b所示,n元的胖树网络,包括2n个性能优化数据中心(英文:Performance optimization datacenter,简称:Pod),每个数据转发设备有2n个端口。因此,在本实施例中,n为3的数据中心的网络结构包括6个Pod,每一个Pod内的边缘交换机及汇聚交换机的数量均为3,核心交换机有3个核心组,分别为核心组0至核心组2。每个核心组有3个核心交换机。每一个Pod连接的主机或服务器数目为n2,即9,网络所能支持的主机总数为(2n)3/4,即54。每个交换机有6个端口。每个Pod内的1个汇聚交换机的6个端口连接1个核心组内的3个核心交换机,并且连接相同Pod内的3个边缘交换机,每个汇聚交换机连接的核心组不相同。每个边缘交换机的6个端口分别连接相同Pod内的3个汇聚交换机以及3个主机。举例来说,如图1b所示,Pod0内的汇聚交换机0分别连接至核心组0的三个核心交换机,并且与Pod0内的3个边缘交换机相连。Pod0内汇聚交换机1连接至核心组1的3个核心交换机,并且与Pod0内的3个边缘交换机相连。Pod0内汇聚交换机2连接至核心组2的3个核心交换机,并且与Pod0内的3个边缘交换机相连。其它Pod内的交换机的连接情况类似,在此不再赘述。Please continue to refer to Figure 1b, the n-element fat tree network includes 2n performance optimization data centers (English: Performance optimization datacenter, Pod for short), and each data forwarding device has 2n ports. Therefore, in this embodiment, the network structure of the data center where n is 3 includes 6 Pods, the number of edge switches and aggregation switches in each Pod is 3, and the core switches have 3 core groups, which are core groups 0 to core group 2. Each core group has 3 core switches. The number of hosts or servers connected to each Pod is n 2 , that is, 9, and the total number of hosts that the network can support is (2n) 3 /4, that is, 54. Each switch has 6 ports. The 6 ports of 1 aggregation switch in each Pod are connected to 3 core switches in 1 core group, and are connected to 3 edge switches in the same Pod, and the core groups connected to each aggregation switch are different. The 6 ports of each edge switch are respectively connected to 3 aggregation switches and 3 hosts in the same Pod. For example, as shown in FIG. 1b, aggregation switch 0 in Pod0 is connected to three core switches in core group 0, and is connected to three edge switches in Pod0. Aggregation switch 1 in Pod0 is connected to the three core switches in core group 1 and connected to the three edge switches in Pod0. Aggregation switch 2 in Pod0 is connected to the three core switches in core group 2 and connected to the three edge switches in Pod0. The connections of the switches in other Pods are similar, and will not be repeated here.
在图1b中,Pod0至Pod6表示Pod的编号,在实际运用中,也可以通过其它标识来标识不同的Pod。核心层的交换机的编号0至2用来唯一标识同一核心组内的核心交换机,类似的,汇聚层的交换机的编号0至2用来唯一标识同一Pod内的汇聚交换机,边缘层的交换机的编号0至2用来唯一标识同一Pod内的边缘交换机,在实际运用中,也可以通过其它标识来标识同一Pod内的汇聚交换机或边缘交换机。In FIG. 1b , Pod0 to Pod6 represent the numbers of the Pods. In practical applications, different Pods can also be identified by other identifiers. The numbers 0 to 2 of the switches at the core layer are used to uniquely identify the core switches in the same core group. Similarly, the numbers 0 to 2 of the switches at the aggregation layer are used to uniquely identify the aggregation switches in the same Pod, and the numbers of the switches at the edge layer 0 to 2 are used to uniquely identify the edge switch in the same Pod. In practice, other identifiers can also be used to identify the aggregation switch or the edge switch in the same Pod.
请再参考图1c所示,为图1b中3级clos架构的另一种展现方式,在本实施例中,n取值为3,每个接入交换机的端口为2n个,接入交换机的数量为2n2,分别为接入交换机1至接入交换机n+1,再至接入交换机2n2。汇聚交换机和核心交换机形成一个层面,总共有n个层面,分别为层1至层n。每个层面上包含2n个汇聚交换机,分别为汇聚交换机1至汇聚交换机2n。每个汇聚交换机有2n个端口。其中每个汇聚交换机的n个端口分别与每个层面上包含的n个核心交换机连接,n个核心交换机分别为核心交换机1至核心交换机n。每个汇聚交换机的另外n个端口与n个接入交换机连接。每个核心交换机有2n个端口。每个汇聚交换机分别与每个核心交换机连接。每个接入交换机的n个端口与n个层面的一个汇聚交换机连接。每个接入交换机的另外n个端口可以连接主机或者服务器。2n2个接入交换机可以连接的主机或服务器数量为2n3。对应到图1b中的结构,核心组0的核心交换机0至核心交换机2和各Pod内的汇聚交换机0组成第一层面,核心组1的核心交换机0至核心交换机2和各Pod内汇聚交换机1组成第二层面,核心组2的核心交换机1至核心交换机2和各Pod内汇聚交换机2组成第三层面。Please refer to Figure 1c again, which is another presentation of the 3-level clos architecture in Figure 1b. In this embodiment, the value of n is 3, and each access switch has 2n ports. The number is 2n 2 , which are the access switch 1 to the access switch n+1, and then to the access switch 2n 2 . The aggregation switch and the core switch form a layer, and there are n layers in total, namely layer 1 to layer n. Each level includes 2n aggregation switches, which are respectively aggregation switch 1 to aggregation switch 2n. Each aggregation switch has 2n ports. The n ports of each convergence switch are respectively connected to n core switches included in each layer, and the n core switches are respectively core switch 1 to core switch n. The other n ports of each convergence switch are connected to n access switches. Each core switch has 2n ports. Each aggregation switch is connected to each core switch respectively. The n ports of each access switch are connected to an aggregation switch of n layers. The other n ports of each access switch can be connected to hosts or servers. The number of hosts or servers that can be connected to 2n 2 access switches is 2n 3 . Corresponding to the structure in Figure 1b, core switch 0 to core switch 2 in core group 0 and aggregation switch 0 in each Pod form the first layer, and core switch 0 to core switch 2 in core group 1 and aggregation switch 1 in each Pod The second layer is formed, and the core switch 1 to core switch 2 of the core group 2 and the aggregation switch 2 in each Pod form the third layer.
本文中的一些技术术语以现有的数据中心的网络结构中的术语为例对本发明实施例进行的描述,其可能随着网络的演进发生变化,具体演进可以参考相应标准中的描述。Some technical terms in this article are described in the embodiments of the present invention by taking the terms in the existing data center network structure as examples, which may change with the evolution of the network, and the specific evolution can refer to the description in the corresponding standards.
接下来请参考图2,图2为本发明实施例提供的数据转发设备的可能的结构图。该数据转发设备例如为前述边缘交换机、接入交换机、汇聚交换机和核心交换机。如图2所示,该数据转发设备包括:处理器10、发送器20、接收器30、存储器40和端口50。存储器40、发送器20和接收器30和处理器10可以通过总线进行连接。当然,在实际运用中,存储器40、发送器20和接收器30和处理器10之间可以不是总线结构,而可以是其它结构,例如星型结构,本申请不作具体限定。Please refer to FIG. 2 next. FIG. 2 is a possible structural diagram of a data forwarding device provided by an embodiment of the present invention. The data forwarding device is, for example, the aforementioned edge switch, access switch, convergence switch, and core switch. As shown in FIG. 2 , the data forwarding device includes: a processor 10 , a transmitter 20 , a receiver 30 , a memory 40 and a port 50 . The memory 40, the transmitter 20 and the receiver 30, and the processor 10 may be connected via a bus. Of course, in practical applications, the memory 40, the transmitter 20, the receiver 30, and the processor 10 may not have a bus structure, but may be other structures, such as a star structure, which is not specifically limited in this application.
可选的,处理器10具体可以是中央处理器、特定应用集成电路(英文:ApplicationSpecific Integrated Circuit,简称:ASIC),可以是一个或多个用于控制程序执行的集成电路,可以是使用现场可编程门阵列(英文:Field Programmable Gate Array,简称:FPGA)开发的硬件电路,可以是基带处理器。Optionally, the processor 10 may specifically be a central processing unit, an application-specific integrated circuit (English: Application Specific Integrated Circuit, ASIC for short), may be one or more integrated circuits for controlling program execution, and may be an on-site A hardware circuit developed by a programmable gate array (English: Field Programmable Gate Array, FPGA for short) may be a baseband processor.
可选的,处理器10可以包括至少一个处理核心。Optionally, the processor 10 may include at least one processing core.
可选的,存储器40可以包括只读存储器(英文:Read Only Memory,简称:ROM)、随机存取存储器(英文:Random Access Memory,简称:RAM)和磁盘存储器。存储器40用于存储处理器10运行时所需的数据。存储器40的数量为一个或多个。Optionally, the memory 40 may include a read-only memory (English: Read Only Memory, ROM for short), a random access memory (English: Random Access Memory, RAM for short), and a disk storage. The memory 40 is used to store data required by the processor 10 during operation. The number of memory 40 is one or more.
可选的,端口50的数量为一个或多个,用于与上层或下层的数据转发设备连接。如果数据转发设备为连接主机或服务器的数据转发设备,如上述接入交换机或边缘交换机,端口50还用于与主机或服务器连接。Optionally, the number of ports 50 is one or more, and is used for connecting with the upper-layer or lower-layer data forwarding device. If the data forwarding device is a data forwarding device connected to a host or a server, such as the above-mentioned access switch or edge switch, port 50 is also used to connect to the host or server.
可选的,发送器20和接收器30在物理上可以相互独立也可以集成在一起。发送器20可以通过端口50将数据发送给相邻的数据转发设备。接收器30可以通过端口50接收相邻的数据转发设备发送的数据。Optionally, the transmitter 20 and the receiver 30 may be physically independent from each other or integrated together. The transmitter 20 can send data to adjacent data forwarding devices through the port 50 . The receiver 30 can receive data sent by an adjacent data forwarding device through the port 50 .
接下来请参考如图3所示,为本发明实施例中的路由方法的流程图。如图3所示,该方法包括:Next, please refer to FIG. 3 , which is a flowchart of the routing method in the embodiment of the present invention. As shown in Figure 3, the method includes:
步骤101:数据转发设备获取数据包;Step 101: the data forwarding device obtains the data packet;
步骤102:数据转发设备确定在预设的路由表中是否匹配到所述数据包的下一跳出端口;其中,预设的路由表中的出端口均为不拥塞的出端口;Step 102: The data forwarding device determines whether the next-hop egress port of the data packet is matched in the preset routing table; wherein, the egress ports in the preset routing table are all uncongested egress ports;
步骤103:若在所述预设的路由表中未匹配到所述数据包的下一跳出端口,数据转发设备在数据转发设备的除拥塞信息表中记录的到所述数据包的目的数据转发设备的拥塞出端口外的其它端口中,确定第一端口作为所述数据包到目的数据转发设备的下一跳出端口;Step 103: If the next hop port of the data packet is not matched in the preset routing table, the data forwarding device forwards the destination data of the data packet recorded in the congestion removal information table of the data forwarding device Among the ports other than the congested egress port of the device, the first port is determined as the next hop egress port of the data packet to the destination data forwarding device;
步骤104:数据转发设备通过第一端口发送所述数据包。Step 104: The data forwarding device sends the data packet through the first port.
需要说明的是,在本发明实施例中,拥塞和不拥塞是相对而言的,拥塞的情况和不拥塞的情况可以由用户根据实际需要进行设置,不同的网络结构拥塞的标准也可以不相同,具体可以根据实际情况进行设置。本文中的关于拥塞和不拥塞的判断阈值或标准仅为举例,并不用于限定本发明。It should be noted that in the embodiment of the present invention, congestion and non-congestion are relative terms, and the situation of congestion and non-congestion can be set by the user according to actual needs, and the congestion standards of different network structures can also be different , which can be set according to the actual situation. The judgment thresholds or standards about congestion and non-congestion herein are just examples, and are not intended to limit the present invention.
在步骤101中,数据转发设备获取数据包可以但不限于有以下两种方式,一种是自己生成数据包,另一种是从上层设备接收的数据包。举例来说,当数据转发设备为源数据转发设备时,数据包可以是数据转发设备生成的。若数据转发设备为非源数据转发设备时,数据转发设备可以从上层数据转发设备接收数据包。In step 101, the data forwarding device may obtain the data packet in but not limited to the following two ways, one is to generate the data packet by itself, and the other is to receive the data packet from the upper layer device. For example, when the data forwarding device is the source data forwarding device, the data packet may be generated by the data forwarding device. If the data forwarding device is a non-source data forwarding device, the data forwarding device may receive data packets from the upper layer data forwarding device.
不管是通过哪种方式获取数据包,在获取到数据包之后,接下来执行步骤102,即数据转发设备确定在预设的路由表中是否匹配到数据包的下一跳出端口。其中,预设的路由表中的出端口均为不拥塞的出端口,即为可用的出端口。No matter which method is used to obtain the data packet, after the data packet is obtained, step 102 is executed next, that is, the data forwarding device determines whether the next hop egress port of the data packet is matched in the preset routing table. Wherein, the outgoing ports in the preset routing table are non-congested outgoing ports, that is, available outgoing ports.
可选的,可以通过实时更新的方式保证预设的路由表在执行步骤102时,预设的路由表中的出端口均为不拥塞的出端口,该部分内容将在后面进行详细描述。Optionally, real-time updating can be used to ensure that when step 102 is performed in the preset routing table, all outgoing ports in the preset routing table are uncongested outgoing ports, which will be described in detail later.
可选的,预设的路由表可以是通过现有技术中的方法建立的路由表,也可以采用现有技术中的方法对路由表进行更新。举例来说,该预设的路由表包括的表项包括两个字段,第一个字段为数据包五元组的哈希(hash)值,第二个字段为下一跳出端口号。其中,数据包的五元组可以包括源网络协议(英文:Internet Protocol,简称:IP)地址、源端口号、目的IP地址、目的端口号以及传输层协议。通过对五元组进行哈希计算,就可以获得五元组的哈希值。Optionally, the preset routing table may be a routing table established by a method in the prior art, or the routing table may be updated by using a method in the prior art. For example, the entry included in the preset routing table includes two fields, the first field is the hash (hash) value of the five-tuple of the data packet, and the second field is the next-hop outgoing port number. Wherein, the quintuple of the data packet may include a source network protocol (English: Internet Protocol, IP for short) address, a source port number, a destination IP address, a destination port number, and a transport layer protocol. By hashing the five-tuple, the hash value of the five-tuple can be obtained.
可选的,每个数据流的所有数据包的五元组是相同的,所以预设的路由表项中的下一跳出端口是针对每个数据流的。因此,同一个数据流的数据包都会匹配到相同的下一跳出端口。Optionally, the quintuples of all data packets in each data flow are the same, so the next-hop egress port in the preset routing entry is for each data flow. Therefore, the data packets of the same data flow will be matched to the same next-hop egress port.
上述描述的是基于数据流进行调度,所以路由表是基于数据流的路由表,但在实际运用中,也可以基于流簇(flowlet)调度,对应的,预设的路由表可以是flowlet路由表,flowlet间由一个最小时间间隔分割,一个flowlet内相邻数据包的时间间隔小于该最小时间间隔,只要这个最小时间间隔大于等价多路径最大时延差,多个flowlet就可以调度到不同的路径上去而不乱序。flowlet转发机制为本领域技术人员所熟知的内容,在此不再赘述。The above description is based on data flow scheduling, so the routing table is based on data flow routing table, but in practice, it can also be based on flow cluster (flowlet) scheduling, correspondingly, the default routing table can be flowlet routing table , the flowlets are divided by a minimum time interval. The time interval between adjacent data packets in a flowlet is smaller than the minimum time interval. As long as the minimum time interval is greater than the maximum delay difference of the equal-cost multipath, multiple flowlets can be scheduled to different Paths go up without being out of order. The flowlet forwarding mechanism is well known to those skilled in the art, and will not be repeated here.
举例来说,在步骤102中,数据转发设备就可以计算数据包的五元组的哈希值,然后通过哈希值在预设的路由表中进行匹配,然后确定是否匹配成功,如果匹配成功,那么匹配成功的五元组的哈希值对应的下一跳出端口即为数据包的下一跳出端口;数据转发设备可以通过匹配出的下一跳出端口发送数据包。若在预设的路由表中没有成功的匹配到数据包的下一跳出端口,则数据转发设备执行步骤103。For example, in step 102, the data forwarding device can calculate the hash value of the five-tuple of the data packet, and then use the hash value to match in the preset routing table, and then determine whether the match is successful, if the match is successful , then the next-hop egress port corresponding to the hash value of the successfully matched quintuple is the next-hop egress port of the data packet; the data forwarding device can send the data packet through the matched next-hop egress port. If the next hop egress port of the data packet is not successfully matched in the preset routing table, the data forwarding device executes step 103 .
需要说明的是,根据预设的路由表的表项中的匹配字段的不同,在步骤102中的确定过程中,进行匹配的内容也不同。举例来说,若预设的路由表的表项中包括的第一个字段为数据流的流标识,第二个字段为对应的下一跳出端口号,那么在步骤102中,数据转发设备可以获取数据包的流标识,然后根据流标识在预设的路由表中进行匹配,以确定预设路由表中是否有数据包对应的下一跳出端口。该部分内容为本领域技术人员所熟知的内容,所以在此不再赘述。It should be noted that, in the determination process in step 102, the matching content is different according to the matching field in the entry of the preset routing table. For example, if the first field included in the entry of the preset routing table is the flow identifier of the data flow, and the second field is the corresponding next-hop exit port number, then in step 102, the data forwarding device can Obtain the flow identifier of the data packet, and then match it in the preset routing table according to the flow identifier to determine whether there is a next-hop egress port corresponding to the data packet in the preset routing table. This part of the content is well known to those skilled in the art, so it will not be repeated here.
在步骤103中,数据转发设备在所述数据转发设备的除拥塞信息表中记录的到数据包的目的数据转发设备的拥塞出端口外的其它端口中,确定第一端口作为数据包到目的数据转发设备的下一跳出端口。In step 103, the data forwarding device determines the first port as the first port of the data packet to the destination data forwarding device among other ports recorded in the congestion information table of the data forwarding device except the congested outbound port of the destination data forwarding device for the data packet. The next hop outbound port of the forwarding device.
为了便于理解本发明实施例中的路由方法,下面将先介绍数据转发设备获取拥塞信息表的过程。In order to facilitate the understanding of the routing method in the embodiment of the present invention, the following will first introduce the process for the data forwarding device to obtain the congestion information table.
获取拥塞信息表的一种可能的实施方式为:数据转发设备确定数据转发设备的所有端口中到各个目的数据转发设备的拥塞的端口;将各个目的数据转发设备和对应于所述各个目的数据转发设备的拥塞的端口的端口号记录在拥塞信息表中,所述拥塞信息表还包括每个拥塞的端口的拥塞点数。换言之,数据转发设备维护的拥塞记录表中记录了该数据转发设备到各个目的数据转发设备不可用的下一跳出端口。进一步,拥塞信息表还可以记录每个不可用的下一跳出端口的拥塞点数。A possible implementation manner of obtaining the congestion information table is as follows: the data forwarding device determines the congested port to each destination data forwarding device among all the ports of the data forwarding device; The port numbers of the congested ports of the device are recorded in the congestion information table, and the congestion information table also includes the congestion points of each congested port. In other words, the congestion record table maintained by the data forwarding device records the unavailable next-hop egress ports from the data forwarding device to each destination data forwarding device. Further, the congestion information table can also record the number of congestion points of each unavailable next-hop egress port.
通常来讲,导致下一跳出端口不可用主要有两个原因,一是本数据转发设备到下一跳数据转发设备拥塞了,二是下一跳数据转发设备到目的数据转发设备拥塞了。因此,拥塞点数可以反映从本数据转发设备到目的转发设备有几处拥塞处。Generally speaking, there are two main reasons for the unavailability of the next-hop egress port. One is congestion from the local data forwarding device to the next-hop data forwarding device, and the other is congestion from the next-hop data forwarding device to the destination data forwarding device. Therefore, the number of congestion points can reflect how many places are congested from the current data forwarding device to the destination forwarding device.
因此,基于以上两个原因,数据转发设备确定数据转发设备的所有端口中到各个目的数据转发设备的拥塞的端口可以从两方面进行实施,一方面是确定本数据转发设备到下一跳数据转发设备是否拥塞,另一方面是确定下一跳数据转发设备到目的数据转发设备是否拥塞。Therefore, based on the above two reasons, the data forwarding device can determine the congested port of each destination data forwarding device among all the ports of the data forwarding device, which can be implemented from two aspects. Whether the device is congested, on the other hand, is to determine whether the next hop data forwarding device is congested to the destination data forwarding device.
其中,确定本数据转发设备到下一跳数据转发设备是否拥塞,可以是通过判断调度到连接下一跳数据转发设备的端口上的数据流的数量是否超过阈值,若数据流的数量超过阈值,则表示该端口拥塞,反之,则表示该端口不拥塞。也可以是判断调度到连接下一跳数据转发设备的端口上的端口发包速率是否超过阈值,若发包速率超过阈值,则表示该端口拥塞,反之,则表示该端口不拥塞。该方案适用于源数据转发设备至目的数据转发设备的上一层数据转发设备。Wherein, determining whether the data forwarding device to the next-hop data forwarding device is congested may be by judging whether the number of data streams dispatched to the port connected to the next-hop data forwarding device exceeds a threshold, if the number of data streams exceeds the threshold, It means that the port is congested, otherwise, it means that the port is not congested. It may also be judged whether the packet sending rate of the port scheduled to connect to the next-hop data forwarding device exceeds the threshold. If the packet sending rate exceeds the threshold, it indicates that the port is congested; otherwise, it indicates that the port is not congested. This solution is applicable to the upper-layer data forwarding device from the source data forwarding device to the destination data forwarding device.
举例来说,例如图1c中所示的网络结构,接入交换机1有2n个端口,其中n个端口是用来连接主机或服务器的,另外n个端口是用来连接上行汇聚交换机的,例如连接n个层面的汇聚交换机1。不管目的数据转发设备是其余哪个接入交换机,例如是接入交换机2n2,接入交换机1例如是源交换机,那么n个层面上的汇聚交换机1均为可达的下一跳交换机,所以连接汇聚交换机1的n个端口可以称为上行端口。进而,接入交换机1可以确定接入交换机1到n个层面的汇聚交换1是否拥塞,例如当前调度到层面1上的汇聚交换机1的数据流已经有20个,超过阈值10,说明这条路径已经拥塞,已经不可用,所以可以将连接层面1的汇聚交换机1的端口记录到拥塞信息表中。而当前调度到层面n上的汇聚交换机1的数据流有5个,没有超过阈值10,说明这条路径不拥塞,可以作为可用的下一跳出端口使用。For example, such as the network structure shown in Figure 1c, access switch 1 has 2n ports, wherein n ports are used to connect hosts or servers, and the other n ports are used to connect uplink aggregation switches, for example Connect aggregation switches 1 of n layers. Regardless of which other access switch the destination data forwarding device is, for example, access switch 2n 2 , and access switch 1 is, for example, the source switch, then aggregation switches 1 on n levels are reachable next-hop switches, so the connection The n ports of aggregation switch 1 may be called uplink ports. Furthermore, the access switch 1 can determine whether the aggregation switch 1 on the access switch 1 to n layers is congested. For example, there are already 20 data flows currently scheduled to the aggregation switch 1 on layer 1, which exceeds the threshold of 10, indicating that this path It is already congested and unavailable, so the port connected to aggregation switch 1 at level 1 can be recorded in the congestion information table. Currently, there are 5 data flows scheduled to aggregation switch 1 on layer n, and the threshold value is less than 10, indicating that this path is not congested and can be used as an available next-hop egress port.
可选的,请参考表一所示,为拥塞记录表的一种可能的格式。Optionally, please refer to Table 1, which is a possible format of the congestion record table.
表一Table I
在表一中,拥塞记录表可以包括两个字段,分别为“目的数据转发设备”字段和“不可用下一跳出端口”字段。“目的数据转发设备”既可以填充目的数据转发设备的标识,也可以填充目的数据转发设备的地址,例如IP地址。在实际运用中,该字段填充的内容只要能够唯一标识该目的数据转发设备即可,本发明实施例不作具体限定。“不可用下一跳出端口”字段可以填充拥塞的端口号(冒号前面的内容)以及该拥塞的端口的拥塞点数(冒号后面的内容)。当然,在实际运用中,拥塞记录表还可以包括其它字段,本发明实施例不作具体限定。In Table 1, the congestion record table may include two fields, namely a "destination data forwarding device" field and an "unavailable next hop egress port" field. The "destination data forwarding device" can be filled with either the identifier of the destination data forwarding device or the address of the destination data forwarding device, such as an IP address. In practice, the content filled in this field only needs to be able to uniquely identify the destination data forwarding device, which is not specifically limited in this embodiment of the present invention. The "unavailable next hop egress port" field may be filled with the congested port number (the content before the colon) and the congestion point of the congested port (the content after the colon). Of course, in practical applications, the congestion record table may also include other fields, which are not specifically limited in this embodiment of the present invention.
以前面的举例为例,假设接入交换机1的连接层面1上的汇聚交换机1的端口号为1,连接层面n上的汇聚交换机1的端口号为3,假设目的数据转发设备为接入交换机2。根据表一,那么在接入交换机1上保存的拥塞记录表就可以如表二所示。Taking the previous example as an example, assume that the port number of aggregation switch 1 on connection level 1 of access switch 1 is 1, and the port number of aggregation switch 1 on connection level n is 3, and assume that the destination data forwarding device is an access switch 2. According to Table 1, the congestion record table saved on the access switch 1 can be shown in Table 2.
表二Table II
从表二可以看出,因为端口1拥塞,所以将端口1记录在“不可用下一跳出端口”字段中,并将端口1的拥塞点数加1,也记录在“不可用下一跳出端口”中。而端口3不拥塞,所以就不记录在表二中。It can be seen from Table 2 that because port 1 is congested, port 1 is recorded in the "unavailable next hop exit port" field, and the congestion point of port 1 is increased by 1, which is also recorded in the "unavailable next hop exit port" field. middle. But port 3 is not congested, so it is not recorded in Table 2.
可选的,本数据转发设备为上层数据转发设备,即非目的数据转发设备及目的数据转发设备的上层数据转发设备,则确定下一跳数据转发设备到目的数据转发设备是否拥塞,具体可以包括:数据转发设备接收与数据转发设备的端口连接的各个下层数据转发设备发送的第三通知信息;其中,各个下层数据转发设备为到达各个目的数据转发设备的路径上的数据转发设备;第三通知信息用于表征各个下层数据转发设备是否能够作为数据转发设备到达各个目的数据转发设备的下一跳;数据转发设备根据第三通知信息确定数据转发设备的所有端口中到各个目的数据转发设备的拥塞的端口。Optionally, if the data forwarding device is an upper-layer data forwarding device, that is, the upper-layer data forwarding device of the non-purpose data forwarding device and the destination data forwarding device, it is determined whether the next-hop data forwarding device is congested to the destination data forwarding device, which may specifically include : The data forwarding device receives the third notification information sent by each lower layer data forwarding device connected to the port of the data forwarding device; wherein each lower layer data forwarding device is a data forwarding device on the path to each destination data forwarding device; the third notification The information is used to represent whether each lower layer data forwarding device can be used as a data forwarding device to reach the next hop of each destination data forwarding device; the data forwarding device determines the congestion of all ports of the data forwarding device to each destination data forwarding device according to the third notification information port.
需要说明的是,本文中提到的上层数据转发设备、下层数据转发设备,是通过数据包的流向来定义的,上层数据转发设备即数据包的来源,下层数据转发设备即为数据包的去向。以图1c中的网络结构为例,接入交换机1要将数据包发送给层面n的汇聚交换机1,那么层面n的汇聚交换机1即为接入交换机1的下层交换机,而接入交换机1为层面n的汇聚交换机1的上层交换机。It should be noted that the upper-layer data forwarding device and the lower-layer data forwarding device mentioned in this article are defined by the flow direction of the data packet. The upper-layer data forwarding device is the source of the data packet, and the lower-layer data forwarding device is the destination of the data packet. . Taking the network structure in Figure 1c as an example, access switch 1 wants to send data packets to aggregation switch 1 at layer n, then aggregation switch 1 at layer n is the lower-layer switch of access switch 1, and access switch 1 is It is the upper-layer switch of aggregation switch 1 of layer n.
可选的,下层数据转发设备可以将第三通知信息进行封装随数据报文一起发送给上层数据转发设备,例如通过虚拟可扩展局域网(英文:Virtual eXtensible Local AreaNetwork,简称:VXLAN)技术进行封装。Optionally, the lower-layer data forwarding device may encapsulate the third notification information and send it to the upper-layer data forwarding device together with the data packet, for example, through virtual extensible local area network (English: Virtual eXtensible Local Area Network, VXLAN for short) technology for encapsulation.
可选的,下层数据转发设备可以构造新的报文来发送第三通知信息。一种可能的新的报文格式如表三所示。Optionally, the data forwarding device at the lower layer may construct a new message to send the third notification information. A possible new message format is shown in Table 3.
表三Table three
如表一所示,新的报文格式,即第三通知信息的格式可以包括两个字段,分别为“标志位”字段和“目的数据转发设备”字段。在实际运用中,还可以包括其它字段,本发明实施例不作具体限定。As shown in Table 1, the new message format, that is, the format of the third notification information, may include two fields, namely a "flag bit" field and a "destination data forwarding device" field. In practice, other fields may also be included, which are not specifically limited in this embodiment of the present invention.
可选的,“标志位”字段可以用0和1表示,例如1表示不可用作下一跳,0表示可用作下一跳。当然,在实际运用中,“标志位”字段还可以用其它值来填充,例如用“true”和“false”分别表示可用作下一跳和不可用作下一跳。Optionally, the "flag bit" field can be represented by 0 and 1, for example, 1 indicates that it cannot be used as the next hop, and 0 indicates that it can be used as the next hop. Of course, in practical applications, the "flag bit" field can also be filled with other values, for example, "true" and "false" respectively indicate that the next hop can be used and the next hop cannot be used.
可选的,“目的数据转发设备”字段既可以填充目的数据转发设备的标识,也可以填充目的数据转发设备的地址,例如IP地址。在实际运用中,该字段填充的内容只要能够唯一标识该目的数据转发设备即可,本发明实施例不作具体限定。Optionally, the "destination data forwarding device" field may be filled with either the identifier of the destination data forwarding device or the address of the destination data forwarding device, such as an IP address. In practice, the content filled in this field only needs to be able to uniquely identify the destination data forwarding device, which is not specifically limited in this embodiment of the present invention.
不管是哪种发送方式,当数据转发设备接收到各个下层数据转发设备发送的第三通知信息,可以根据第三通知信息中的“标志位”字段中的内容确定该下层数据转发设备是否能够作为到达“目的数据转发设备”字段中标识的目的数据转发设备的下一跳,进而可以确定数据转发设备的所有端口中到各个目的数据转发设备的拥塞的端口。Regardless of the sending method, when the data forwarding device receives the third notification information sent by each lower-layer data forwarding device, it can determine whether the lower-layer data forwarding device can serve as The next hop to reach the destination data forwarding device identified in the "destination data forwarding device" field, and then the congested port to each destination data forwarding device among all ports of the data forwarding device can be determined.
可选的,“目的数据转发设备”字段既可以标识一个目的数据转发设备,也可以标识一组目的数据转发设备。Optionally, the "destination data forwarding device" field may identify either one destination data forwarding device or a group of destination data forwarding devices.
举例来说,请继续参考图1c所示的网络结构。假设层面n上的汇聚交换机1根据前述两个方面的确定发现,自身到层面n上的所有核心交换机都处于拥塞状态,所以层面n上的汇聚交换机1无法到达接入交换机n+1至接入交换机2n2,此时层面n上的汇聚交换机1就可以向接入交换机1发送第三通知信息,在该第三通知信息中“标志位”字段就可以填充为1,“目的数据转发设备”字段就可以填入接入交换机n+1至接入交换机2n2的标识或地址,那么接入交换机1接收到该第三通知信息,就可以更新表二,得到表四。For example, please continue to refer to the network structure shown in FIG. 1c. Assume that aggregation switch 1 on layer n finds that all core switches from itself to layer n are in a congested state based on the determination of the above two aspects, so aggregation switch 1 on layer n cannot reach access switch n+1 to access switch 2n 2 , at this time, the convergence switch 1 on level n can send the third notification information to the access switch 1, in which the "flag bit" field can be filled with 1, and the "destination data forwarding device" The fields can be filled with the identifiers or addresses of the access switch n+1 to the access switch 2n2 , then the access switch 1 can update Table 2 to obtain Table 4 after receiving the third notification information.
表四Table four
可选的,当某个端口从拥塞状态变为不拥塞状态时,可以在拥塞记录表中将拥塞点数减1,当拥塞点数减为0时,就可以将该端口从拥塞记录表中删除。Optionally, when a port changes from a congested state to an uncongested state, the congestion point can be reduced by 1 in the congestion record table, and when the congestion point is reduced to 0, the port can be deleted from the congestion record table.
可选的,若数据转发设备为非源数据转发设备,即为下层数据转发设备,那么对应的,数据转发设备还确定自身到各个目的数据转发设备的所有端口是否均处于拥塞状态;若自身到各个目的数据转发设备中的第一目的数据转发设备的所有端口均处于拥塞状态,则数据转发设备向上层数据转发设备发送第一通知信息,第一通知信息用于通知上层数据转发设备,自身不能作为上层数据转发设备到达第一目的数据转发设备的下一跳;若数据转发设备到各个目的数据转发设备中的第二目的数据转发设备的所有端口未均处于拥塞状态,则数据转发设备向上层数据转发设备发送第二通知信息,第二通知信息用于通知上层数据转发设备,数据转发设备能够作为上层数据转发设备到达第二目的数据转发设备的下一跳。Optionally, if the data forwarding device is a non-source data forwarding device, that is, the lower layer data forwarding device, then correspondingly, the data forwarding device also determines whether all ports from itself to each destination data forwarding device are in a congested state; All ports of the first destination data forwarding device in each destination data forwarding device are in a congested state, then the data forwarding device sends the first notification information to the upper layer data forwarding device, and the first notification information is used to notify the upper layer data forwarding device that it cannot As the next hop from the upper layer data forwarding device to the first destination data forwarding device; The data forwarding device sends second notification information, and the second notification information is used to notify the upper layer data forwarding device that the data forwarding device can serve as a next hop for the upper layer data forwarding device to reach the second destination data forwarding device.
其中,第一通知信息和第二通知信息与第三通知信息类似,不同的是,第三通知信息包含了可以作为下一跳和不可以作为下一跳两种情况,而第一通知信息只包含不可以作为下一跳的情况,第二通知信息只包含可以作为下一跳的情况,但是三者可以采用相同的发送方式和格式,例如单独采用如表三所示的报文结构进行发送。Among them, the first notification information and the second notification information are similar to the third notification information, the difference is that the third notification information includes two cases that can be used as the next hop Including the situation that cannot be used as the next hop, the second notification information only includes the situation that can be used as the next hop, but the three can use the same sending method and format, for example, use the message structure shown in Table 3 to send separately .
举例来说,步骤103中的数据转发设备为层面n上的汇聚交换机1,该汇聚交换机1根据前述两个方面的方法确定,到接入交换机2的所有端口均处于拥塞状态,那么就可以向接入交换机1发送第一通知信息。假设该汇聚交换机1根据前述两个方面的方法确定,到接入交换机3的所有端口并未全部处于拥塞状态,也就是可以到达接入交换机3,那么该汇聚交换机1就可以向接入交换机1发送第二通知信息。For example, the data forwarding device in step 103 is the aggregation switch 1 on layer n, and the aggregation switch 1 determines according to the methods in the above two aspects that all ports to the access switch 2 are in a congested state, then it can send The access switch 1 sends the first notification information. Assuming that the aggregation switch 1 determines according to the above two methods that all ports to the access switch 3 are not all in a congested state, that is, they can reach the access switch 3, then the aggregation switch 1 can send data to the access switch 1. Send the second notification information.
在以上描述中,均以各个数据转发设备自己记录维护拥塞记录表为例,但在实际运用中,可以由网络结构中的控制器与各个数据转发设备交互拥塞信息,建立并维护拥塞记录表,在步骤103中,数据转发设备可以向控制器获取自身的拥塞记录表。In the above description, each data forwarding device records and maintains the congestion record table by itself as an example, but in actual application, the controller in the network structure can exchange congestion information with each data forwarding device, establish and maintain the congestion record table, In step 103, the data forwarding device may acquire its own congestion record table from the controller.
另外需要说明的是,在以上描述中,虽然以拥塞信息表为例进行说明,但是在实际运用中,也可以是创建和维护非拥塞信息表,即非拥塞信息表中可以记录不拥塞的端口。拥塞信息表和非拥塞信息是同一件事情的两个相反的表达方式,所以两者所能达到的效果是相同的,不同的是,通常来讲,拥塞的端口数少于不拥塞的端口数,所以拥塞信息表可以减少记录的数据量,节约存储空间和维护成本。In addition, it should be noted that in the above description, although the congestion information table is used as an example for illustration, in practice, it is also possible to create and maintain a non-congested information table, that is, non-congested ports can be recorded in the non-congested information table . Congestion information table and non-congestion information are two opposite expressions of the same thing, so the effect they can achieve is the same. The difference is that, generally speaking, the number of congested ports is less than the number of uncongested ports , so the congestion information table can reduce the amount of recorded data, saving storage space and maintenance costs.
可选的,为了保证预设的路由表中的出端口均为可用的下一跳出端口,而且因为拥塞记录表随着时间的变化会发生变化,所以本发明实施例中的方法还包括:若数据转发设备的不拥塞出端口改变为拥塞出端口时,数据转发设备还确定预设的路由表中是否包含所述拥塞出端口;若预设的路由表中包含所述拥塞出端口时,数据转发设备将包含所述拥塞出端口的表项从预设的路由表中删除。通过该方法,可以保证预设的路由表中的出端口始终为可用的下一跳出端口。Optionally, in order to ensure that the outgoing ports in the preset routing table are all available next-hop outgoing ports, and because the congestion record table will change with time, the method in the embodiment of the present invention further includes: if When the uncongested outgoing port of the data forwarding device is changed to a congested outgoing port, the data forwarding device also determines whether the congested outgoing port is included in the preset routing table; if the congested outgoing port is included in the preset routing table, the data The forwarding device deletes the entry including the congested egress port from the preset routing table. Through this method, it can be guaranteed that the outgoing port in the preset routing table is always an available next-hop outgoing port.
举例来说,在前述表二到表四中,端口3由不拥塞出端口变为拥塞出端口,那么接入交换机1就要检查自己本地的路由表中是否包含端口3,如果包含,则将包含端口3的表项从路由表中删除。如此避免了在步骤102中,匹配到端口3,但是端口3又拥塞,导致数据包发送延迟或发送失败。对于端口3对应的数据流,如果有该数据流的数据包需要发送,那么就可以在步骤102中使用更新后的路由表进行匹配,因为已经删除,所以匹配会失败,就继续通过步骤103重新为该数据包选择出端口,进而通过步骤104发送出去。For example, in the aforementioned Tables 2 to 4, port 3 changes from an uncongested outgoing port to a congested outgoing port, then the access switch 1 will check whether port 3 is included in its local routing table, and if so, it will The entry containing port 3 is removed from the routing table. This avoids that in step 102, port 3 is matched, but port 3 is congested again, resulting in delay or failure in sending data packets. For the data flow corresponding to port 3, if there is a data packet of this data flow to be sent, then in step 102, the updated routing table can be used for matching, because it has been deleted, so the matching will fail, and then continue to go through step 103 again An outbound port is selected for the data packet, and then sent out through step 104 .
在介绍完拥塞记录表之后,下面再继续介绍步骤103。After introducing the congestion record table, step 103 will continue to be introduced below.
因为在预设的路由表中没有匹配到数据包的下一跳出端口,所以数据转发设备可以先查看拥塞信息表中记录的到该数据包的目的数据转发设备的拥塞出端口,然后在除了拥塞出端口之外的其它端口中,确定第一端口作为该数据包到该目的数据转发设备的下一跳出端口。Because there is no next-hop egress port matching the data packet in the preset routing table, the data forwarding device can first check the congestion egress port of the destination data forwarding device recorded in the congestion information table, and then remove the congestion Among the ports other than the egress port, the first port is determined as the next hop egress port for the data packet to the destination data forwarding device.
可选的,第一端口可以是除了拥塞出端口之外的其它端口中随机选择一个的端口,还可以是通过一定规则选择出的端口,例如通过等价多路径(英文:Equal Cost MultiPath,简称:ECMP)机制选择第一端口,也可以通过随机负载均衡(英文:Valiant LoadBalancing,简称:VLB)机制选择第一端口。Optionally, the first port may be a port randomly selected from other ports except the congestion outgoing port, or may be a port selected through certain rules, such as through Equal Cost MultiPath (English: Equal Cost MultiPath, referred to as : ECMP) mechanism to select the first port, and may also select the first port through a random load balancing (English: Valiant Load Balancing, referred to as: VLB) mechanism.
举例来说,数据包的目的数据转发设备为接入交换机2,根据表二所示的拥塞信息表中,接入交换机1确定到接入交换机2的端口1,即连接层面1的汇聚交换机1的端口1是拥塞的,所以就可以在其余n-1个端口中随机选择一个端口,例如端口3,即连接至层面n的汇聚交换机1的端口作为第一端口,换言之,选择端口3作为数据包到达接入交换机2的下一跳出端口。For example, the destination data forwarding device of the data packet is the access switch 2, and according to the congestion information table shown in Table 2, the access switch 1 determines that the port 1 of the access switch 2 is connected to the aggregation switch 1 of the layer 1 Port 1 is congested, so a port can be randomly selected from the remaining n-1 ports, for example, port 3, that is, the port connected to aggregation switch 1 of layer n as the first port, in other words, port 3 is selected as the data The packet arrives at the next hop egress port of access switch 2.
在确定下一跳出端口之后,接下来可以执行步骤104,即数据转发设备通过第一端口发送该数据包。例如接入交换机1通过端口3将数据包发送出去。After the next hop-out port is determined, step 104 may be performed next, that is, the data forwarding device sends the data packet through the first port. For example, access switch 1 sends data packets through port 3.
在数据包的传输路径上,除了目的数据转发设备之外,从源数据转发设备到目的数据转发设备的上一层数据转发设备,均可以按照步骤101至步骤104来执行。On the transmission path of the data packet, in addition to the destination data forwarding device, the data forwarding devices at the upper layer from the source data forwarding device to the destination data forwarding device can all be executed according to steps 101 to 104.
可选的,在步骤104之后,该方法还包括:在预设的路由表中为所述数据包对应的数据流建立新表项,新表项的出端口为第一端口。例如新表项的第一字段为流标识,第二字段的出端口为端口3。通过该方法可以使得该数据流的后续的报文可以按照第一端口进行转发,节约数据包转发时间。Optionally, after step 104, the method further includes: creating a new entry for the data flow corresponding to the data packet in the preset routing table, and the outbound port of the new entry is the first port. For example, the first field of the new entry is the flow identifier, and the outbound port of the second field is port 3. Through this method, subsequent packets of the data flow can be forwarded according to the first port, saving data packet forwarding time.
可选的,在数据包达到某个数据转发设备时,如果该数据转发设备到目的数据转发设备是单路径,那么就可以直接从该单路径转发,该单路径的出端口可以记录在预设的路由表中。如此可以节约转发时间。Optionally, when a data packet arrives at a certain data forwarding device, if there is a single path from the data forwarding device to the destination data forwarding device, it can be forwarded directly from the single path, and the outbound port of the single path can be recorded in the preset in the routing table. This saves forwarding time.
接下来将描述一个数据包从源数据转发设备发送至目的数据转发设备的完整过程,请继续参考图1c所示,假设接入交换机1为源数据转发设备,接入交换机2n2为目的数据转发设备。作为源数据转发设备,接入交换机1可以先查询自身保存的路由表,确定是否能匹配到该数据包的下一跳出端口,例如没有匹配到,那么就可以在如表二所示的拥塞信息表中先确定拥塞的端口,发现到接入交换机2n2当前没有拥塞的端口,然后就可以在所有可达接入交换机2n2的端口中选择一个端口作为下一跳出端口,例如选择端口3,即连接层面n的汇聚交换机1的端口,然后将数据包从端口3发送给层面n上的汇聚交换机1(在图1c中,实线箭头表示数据包的传输路径)。可选的,接入交换机1可以在预设的路由表中建立该数据包对应的数据流的新表项,该新表项对应的下一跳出端口为端口3。Next, the complete process of sending a data packet from the source data forwarding device to the destination data forwarding device will be described. Please continue to refer to Figure 1c, assuming that the access switch 1 is the source data forwarding device, and the access switch 2n 2 is the destination data forwarding device. equipment. As the source data forwarding device, the access switch 1 can first query the routing table saved by itself to determine whether the next hop exit port of the data packet can be matched. For example, if no match is found, then the congestion information shown in Table 2 can be First determine the congested port in the table, and find that the access switch 2n 2 does not currently have a congested port, and then select a port from all the ports reachable to the access switch 2n 2 as the next hop egress port, for example, select port 3, That is, connect the port of aggregation switch 1 on layer n, and then send the data packet from port 3 to aggregation switch 1 on layer n (in FIG. 1c, the solid arrow indicates the transmission path of the data packet). Optionally, the access switch 1 may create a new entry of the data flow corresponding to the data packet in the preset routing table, and the next hop exit port corresponding to the new entry is port 3 .
当数据包到达层面n上的汇聚交换机1,该汇聚交换机1同样可以在自身保存的路由表中进行匹配,结果为匹配到下一跳出端口,例如是连接层面n上的核心交换机n,那么该汇聚交换1就将数据包发送给该核心交换机n。When the data packet arrives at the aggregation switch 1 on layer n, the aggregation switch 1 can also perform matching in its own saved routing table, and the result is that it is matched to the next hop egress port, for example, it is connected to the core switch n on layer n, then the The aggregation switch 1 sends the data packet to the core switch n.
当数据包达到该核心交换机n时,因为在图1c的网络结构中,每个层面上的汇聚交换机到某个接入交换机只有一条路径,所以从核心交换机到达某个接入交换机只有一条路径,所以核心交换机可以直接依照预设的路由表中记录的下一跳出端口对报文进行转发。因此,在本实施例中,该核心交换机n会按照本地存储的路由表中的下一跳出端口对数据包进行转发,数据包会发送给层面n上的汇聚交换机2n。When the data packet reaches the core switch n, because in the network structure of Figure 1c, there is only one path from the aggregation switch on each level to a certain access switch, so there is only one path from the core switch to a certain access switch, Therefore, the core switch can directly forward the message according to the next-hop egress port recorded in the preset routing table. Therefore, in this embodiment, the core switch n will forward the data packet according to the next hop exit port in the locally stored routing table, and the data packet will be sent to the aggregation switch 2n on layer n.
当数据包达到层面n上的汇聚交换机2n,因为该汇聚交换机2n下层就是目的数据转发设备,即接入交换机2n2,所以该汇聚交换机2n同样进行单路径转发,即通过本地保存的路由表中的下一跳出端口对数据包进行转发,数据包会发送给接入交换机2n2。至此,数据包就从源数据转发设备,即接入交换机1发送到目的数据转发设备,即接入交换机2n2。When the data packet reaches the aggregation switch 2n on layer n, because the lower layer of the aggregation switch 2n is the destination data forwarding device, that is, the access switch 2n 2 , the aggregation switch 2n also performs single-path forwarding, that is, through the locally saved routing table The next hop egress port forwards the data packet, and the data packet will be sent to the access switch 2n 2 . So far, the data packet is sent from the source data forwarding device, that is, the access switch 1 to the destination data forwarding device, that is, the access switch 2n 2 .
对于图1a所示的交换网络的实施过程,与前述描述的例子类似,所以在此不再详述。The implementation process of the switching network shown in FIG. 1a is similar to the example described above, so it will not be described in detail here.
由以上描述可以看出,第一方面,在本发明实施例中,数据转发设备反馈的自身到目的数据转发设备的链路是否可用的信息,而不像现有技术中的路由方案,需要反馈从源数据转发设备到目的数据转发设备之间每条路径的使用率情况;第二方面,在本发明实施例中,反馈不是从目的数据转发设备到源数据转发设备,而是从拥塞的数据转发设备到上层数据转发设备;第三方面,在数据转发设备进行流量调度时,只要选择可用的下一跳出端口即可,即是逐跳选路,而不用像现有技术中选取最优路径,所以一次选一条端到端的路径。因此,本发明实施例中的方案扩展性强,容易扩展到多级clos架构中。进一步,因为记录的是拥塞的端口,所以占用表项少,和现有技术的路由方案占用表项具有数量级的差距,且随着clos层级的增加,这种优势越来越大。具体请参考表五所示,为本发明实施例中的方案和背景技术中所描述的路由方案的表项差距对比。It can be seen from the above description that, in the first aspect, in the embodiment of the present invention, the data forwarding device feeds back information about whether the link from itself to the destination data forwarding device is available, unlike the routing scheme in the prior art, which requires feedback The utilization rate of each path between the source data forwarding device and the destination data forwarding device; second aspect, in the embodiment of the present invention, the feedback is not from the destination data forwarding device to the source data forwarding device, but from the congested data Forwarding equipment to the upper data forwarding equipment; thirdly, when the data forwarding equipment performs traffic scheduling, it only needs to select the available next-hop outbound port, that is, the route is selected hop by hop, instead of selecting the optimal path as in the prior art , so one end-to-end path is selected at a time. Therefore, the solution in the embodiment of the present invention has strong scalability and can be easily extended to a multi-level clos architecture. Furthermore, because the congested ports are recorded, there are fewer table entries occupied, which is an order of magnitude difference from the table entries occupied by the routing scheme in the prior art, and this advantage becomes larger and larger as the clos level increases. For details, please refer to Table 5, which is a comparison of table item gaps between the scheme in the embodiment of the present invention and the routing scheme described in the background art.
表五Table five
进一步,不管clos层级多大,拥塞信息都是实时的,因为当某个数据转发设备检测到某端口拥塞,只需将该端口从目的数据转发设备的可用下一跳中删除即可。另外,通常在某个数据转发设备到目的数据转发设备没有可用下一跳时,或者从没有可用下一跳变为有可用下一跳时,该数据转发设备才会通知上层数据转发设备,所以本发明实施例中的方法反馈拥塞信息的耗时也较短。Furthermore, no matter how big the clos level is, the congestion information is real-time, because when a data forwarding device detects that a certain port is congested, it only needs to delete the port from the available next hop of the destination data forwarding device. In addition, usually when a data forwarding device has no available next hop to the destination data forwarding device, or when there is no available next hop to an available next hop, the data forwarding device will notify the upper layer data forwarding device, so The method in the embodiment of the present invention also takes less time to feed back the congestion information.
基于同一发明构思,本发明实施例还提供一种数据转发设备(如图2所示),该数据转发设备用于实现前述方法。Based on the same inventive concept, an embodiment of the present invention further provides a data forwarding device (as shown in FIG. 2 ), where the data forwarding device is used to implement the aforementioned method.
具体的,处理器10,用于获取数据包,确定在预设的路由表中是否匹配到所述数据包的下一跳出端口;其中,所述预设的路由表中的出端口均为不拥塞的出端口且均为端口50中的端口;若在所述预设的路由表中未匹配到所述数据包的下一跳出端口,处理器10还用于在所述数据转发设备的除拥塞信息表中记录的到所述数据包的目的数据转发设备的拥塞出端口外的其它端口中,确定第一端口作为所述数据包到所述目的数据转发设备的下一跳出端口;所述拥塞信息表中的出端口以及所述其它端口均为端口50中的端口;发送器20,通过所述第一端口发送所述数据包。Specifically, the processor 10 is configured to obtain the data packet, and determine whether the next-hop outgoing port of the data packet is matched in the preset routing table; wherein, the outgoing ports in the preset routing table are not The congested outgoing ports are all ports in port 50; if the next hop outgoing port of the data packet is not matched in the preset routing table, the processor 10 is also used Among other ports recorded in the congestion information table to the congested egress port of the destination data forwarding device of the data packet, determine the first port as the next hop egress port of the data packet to the destination data forwarding device; The outgoing ports in the congestion information table and the other ports are all ports in the port 50; the sender 20 sends the data packet through the first port.
可选的,处理器10还用于:确定所述至少两个端口中到各个目的数据转发设备的拥塞的端口;将所述各个目的数据转发设备和对应于所述各个目的数据转发设备的拥塞的端口的端口号记录在所述拥塞信息表中。Optionally, the processor 10 is further configured to: determine a congested port to each destination data forwarding device among the at least two ports; The port number of the port is recorded in the congestion information table.
可选的,若所述数据转发设备为非源数据转发设备,处理器10还用于:确定所述数据转发设备到所述各个目的数据转发设备的所有端口是否均处于拥塞状态;Optionally, if the data forwarding device is a non-source data forwarding device, the processor 10 is further configured to: determine whether all ports from the data forwarding device to each destination data forwarding device are in a congested state;
发送器20还用于:若所述数据转发设备到所述各个目的数据转发设备中的第一目的数据转发设备的所有端口均处于拥塞状态,向上层数据转发设备发送第一通知信息,所述第一通知信息用于通知所述上层数据转发设备,所述数据转发设备不能作为所述上层数据转发设备到达所述第一目的数据转发设备的下一跳;若所述数据转发设备到所述各个目的数据转发设备中的第二目的数据转发设备的所有端口未均处于拥塞状态,则向上层数据转发设备发送第二通知信息,所述第二通知信息用于通知所述上层数据转发设备,所述数据转发设备能够作为所述上层数据转发设备到达所述第二目的数据转发设备的下一跳。The sender 20 is further configured to: if all ports from the data forwarding device to the first destination data forwarding device among the respective destination data forwarding devices are in a congested state, send first notification information to the upper layer data forwarding device, and the The first notification information is used to notify the upper-layer data forwarding device that the data forwarding device cannot serve as the next hop for the upper-layer data forwarding device to reach the first destination data forwarding device; if the data forwarding device arrives at the All ports of the second destination data forwarding device in each destination data forwarding device are not in a congested state, then sending second notification information to the upper layer data forwarding device, the second notification information is used to notify the upper layer data forwarding device, The data forwarding device can serve as a next hop for the upper layer data forwarding device to reach the second destination data forwarding device.
可选的,接收器30,用于接收与所述数据转发设备的端口连接的各个下层数据转发设备发送的第三通知信息;其中,所述各个下层数据转发设备为到达所述各个目的数据转发设备的路径上的数据转发设备;所述第三通知信息用于表征所述各个下层数据转发设备是否能够作为所述数据转发设备到达所述各个目的数据转发设备的下一跳;处理器10还用于:根据所述第三通知信息确定所述至少两个端口中到各个目的数据转发设备的拥塞的端口。Optionally, the receiver 30 is configured to receive third notification information sent by each lower-layer data forwarding device connected to the port of the data forwarding device; wherein, each lower-layer data forwarding device forwards the data forwarding device on the path of the device; the third notification information is used to represent whether each lower layer data forwarding device can serve as the next hop for the data forwarding device to reach each destination data forwarding device; the processor 10 also It is configured to: determine, according to the third notification information, a congested port to each destination data forwarding device among the at least two ports.
可选的,处理器10还用于:若所述数据转发设备的不拥塞出端口改变为拥塞出端口时,确定所述预设的路由表中是否包含所述拥塞出端口;若所述预设的路由表中包含所述拥塞出端口时将包含所述拥塞出端口的表项从所述预设的路由表中删除。Optionally, the processor 10 is further configured to: if the uncongested egress port of the data forwarding device is changed to a congested egress port, determine whether the congested egress port is included in the preset routing table; When the preset routing table includes the congested egress port, delete the entry including the congested egress port from the preset routing table.
可选的,处理器10还用于:在所述确定第一端口作为所述数据包到所述目的转发设备的下一跳出端口之后,在所述预设的路由表中为所述数据包对应的数据流建立新表项,所述新表项的出端口为所述第一端口。Optionally, the processor 10 is further configured to: after determining the first port as the next hop port for the data packet to the destination forwarding device, in the preset routing table for the data packet A new entry is created for the corresponding data flow, and the outbound port of the new entry is the first port.
基于同一发明构思,本发明实施例还提供的一种路由装置。该路由装置用于实现如图3所示的路由方法。接下来请参考图4所示,该路由装置包括:处理单元201,用于获取数据包;确定在预设的路由表中是否匹配到所述数据包的下一跳出端口;其中,所述预设的路由表中的出端口均为不拥塞的出端口;若在所述预设的路由表中未匹配到所述数据包的下一跳出端口,在数据转发设备的除拥塞信息表中记录的到所述数据包的目的数据转发设备的拥塞出端口外的其它端口中,确定第一端口作为所述数据包到所述目的数据转发设备的下一跳出端口;发送单元202,用于通过所述第一端口发送所述数据包。Based on the same inventive concept, an embodiment of the present invention also provides a routing device. The routing device is used to implement the routing method shown in FIG. 3 . Next, please refer to FIG. 4, the routing device includes: a processing unit 201, configured to obtain a data packet; determine whether the next hop port of the data packet is matched in the preset routing table; wherein, the preset The outgoing ports in the preset routing table are all non-congested outgoing ports; if the next hop outgoing port of the data packet is not matched in the preset routing table, record it in the congestion removal information table of the data forwarding device Among other ports other than the congested outgoing port of the destination data forwarding device for the data packet, determine the first port as the next hop outgoing port for the data packet to the destination data forwarding device; the sending unit 202 is configured to pass The first port sends the data packet.
可选的,处理单元201还用于:确定所述数据转发设备的所有端口中到各个目的数据转发设备的拥塞的端口;将所述各个目的数据转发设备和对应于所述各个目的数据转发设备的拥塞的端口的端口号记录在所述拥塞信息表中。Optionally, the processing unit 201 is further configured to: determine a congested port to each destination data forwarding device among all ports of the data forwarding device; The port numbers of the congested ports are recorded in the congestion information table.
可选的,若所述数据转发设备为非源数据转发设备,处理单元201还用:确定所述数据转发设备到所述各个目的数据转发设备的所有端口是否均处于拥塞状态;发送单元202还用于:若所述数据转发设备到所述各个目的数据转发设备中的第一目的数据转发设备的所有端口均处于拥塞状态,则向上层数据转发设备发送第一通知信息,所述第一通知信息用于通知所述上层数据转发设备,所述数据转发设备不能作为所述上层数据转发设备到达所述第一目的数据转发设备的下一跳;若所述数据转发设备到所述各个目的数据转发设备中的第二目的数据转发设备的所有端口未均处于拥塞状态,则向上层数据转发设备发送第二通知信息,所述第二通知信息用于通知所述上层数据转发设备,所述数据转发设备能够作为所述上层数据转发设备到达所述第二目的数据转发设备的下一跳。Optionally, if the data forwarding device is a non-source data forwarding device, the processing unit 201 further uses: determining whether all ports from the data forwarding device to each destination data forwarding device are in a congested state; the sending unit 202 further It is used for: if all the ports from the data forwarding device to the first destination data forwarding device among the various destination data forwarding devices are in a congested state, sending first notification information to the upper layer data forwarding device, the first notification The information is used to notify the upper-layer data forwarding device that the data forwarding device cannot serve as the next hop for the upper-layer data forwarding device to reach the first destination data forwarding device; If all the ports of the second destination data forwarding device in the forwarding device are not in a congested state, then the second notification information is sent to the upper layer data forwarding device, and the second notification information is used to notify the upper layer data forwarding device that the data The forwarding device can serve as a next hop for the upper layer data forwarding device to reach the second destination data forwarding device.
可选的,路由装置还包括接收单元203,接收单元203用于:接收与所述数据转发设备的端口连接的各个下层数据转发设备发送的第三通知信息;其中,所述各个下层数据转发设备为到达所述各个目的数据转发设备的路径上的数据转发设备;所述第三通知信息用于表征所述各个下层数据转发设备是否能够作为所述数据转发设备到达所述各个目的数据转发设备的下一跳;处理单元201还用于:根据所述第三通知信息确定所述数据转发设备的所有端口中到各个目的数据转发设备的拥塞的端口。Optionally, the routing device further includes a receiving unit 203, and the receiving unit 203 is configured to: receive third notification information sent by each lower-layer data forwarding device connected to the port of the data forwarding device; wherein, each lower-layer data forwarding device is a data forwarding device on the path to each destination data forwarding device; the third notification information is used to represent whether each lower layer data forwarding device can reach each destination data forwarding device as the data forwarding device The next hop; the processing unit 201 is further configured to: determine, according to the third notification information, a congested port to each destination data forwarding device among all ports of the data forwarding device.
可选的,处理单元201还用于:若所述数据转发设备的不拥塞出端口改变为拥塞出端口时,确定所述预设的路由表中是否包含所述拥塞出端口;若所述预设的路由表中包含所述拥塞出端口时,将包含所述拥塞出端口的表项从所述预设的路由表中删除。Optionally, the processing unit 201 is further configured to: if the uncongested egress port of the data forwarding device is changed to a congested egress port, determine whether the congested egress port is included in the preset routing table; When the preset routing table includes the congested egress port, delete the entry including the congested egress port from the preset routing table.
可选的,处理单元201还用于:在所述确定第一端口作为所述数据包到所述目的转发设备的下一跳出端口之后,在所述预设的路由表中为所述数据包对应的数据流建立新表项,所述新表项的出端口为所述第一端口。Optionally, the processing unit 201 is further configured to: after the first port is determined as the next hop port for the data packet to the destination forwarding device, in the preset routing table for the data packet A new entry is created for the corresponding data flow, and the outbound port of the new entry is the first port.
前述图3实施例中的路由方法中的各种变化方式和具体实例同样适用于本实施例的路由装置及数据转发设备,通过前述对路由方法的详细描述,本领域技术人员可以清楚的知道本实施例中路由装置和数据转发设备的实施方法,所以为了说明书的简洁,在此不再详述。The various variations and specific examples of the routing method in the foregoing embodiment in FIG. 3 are also applicable to the routing device and data forwarding device in this embodiment. Through the foregoing detailed description of the routing method, those skilled in the art can clearly understand that this The implementation methods of the routing device and the data forwarding device in the embodiments are not described in detail here for the sake of brevity.
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present invention may be provided as methods, systems, or computer program products. Accordingly, the present invention can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, optical storage, etc.) having computer-usable program code embodied therein.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions The device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby The instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the spirit and scope of the present invention. Thus, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and equivalent technologies thereof, the present invention also intends to include these modifications and variations.
Claims (18)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201611086242.2A CN108123878B (en) | 2016-11-30 | 2016-11-30 | A routing method, device and data forwarding device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201611086242.2A CN108123878B (en) | 2016-11-30 | 2016-11-30 | A routing method, device and data forwarding device |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN108123878A true CN108123878A (en) | 2018-06-05 |
| CN108123878B CN108123878B (en) | 2020-12-15 |
Family
ID=62226313
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201611086242.2A Expired - Fee Related CN108123878B (en) | 2016-11-30 | 2016-11-30 | A routing method, device and data forwarding device |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN108123878B (en) |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109246231A (en) * | 2018-09-29 | 2019-01-18 | 北京深度奇点科技有限公司 | A kind of method for intelligently routing and smart routing devices |
| CN109802879A (en) * | 2019-01-31 | 2019-05-24 | 新华三技术有限公司 | A kind of flow routing method and device |
| CN110460537A (en) * | 2019-06-28 | 2019-11-15 | 天津大学 | Traffic Scheduling Method Based on Packet Collection in Data Center Asymmetric Topology |
| CN111147386A (en) * | 2018-11-02 | 2020-05-12 | 伊姆西Ip控股有限责任公司 | Method, electronic device and computer program product for handling data transmission congestion |
| CN114007152A (en) * | 2022-01-05 | 2022-02-01 | 北京国科天迅科技有限公司 | Port convergence processing method and device for optical fiber switch |
| CN117082014A (en) * | 2023-10-17 | 2023-11-17 | 苏州元脑智能科技有限公司 | CLOS network, construction method, transmission method, system, device and medium |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101286930A (en) * | 2008-05-30 | 2008-10-15 | 华南理工大学 | A Congestion Adaptive Routing Method for Multi-Hop Wireless Ad Hoc Networks |
| US20140219090A1 (en) * | 2013-02-04 | 2014-08-07 | Telefonaktiebolaget L M Ericsson (Publ) | Network congestion remediation utilizing loop free alternate load sharing |
| WO2015161409A1 (en) * | 2014-04-21 | 2015-10-29 | 华为技术有限公司 | Load balance implementation method, device and system |
| US20160154756A1 (en) * | 2014-03-31 | 2016-06-02 | Avago Technologies General Ip (Singapore) Pte. Ltd | Unordered multi-path routing in a pcie express fabric environment |
-
2016
- 2016-11-30 CN CN201611086242.2A patent/CN108123878B/en not_active Expired - Fee Related
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101286930A (en) * | 2008-05-30 | 2008-10-15 | 华南理工大学 | A Congestion Adaptive Routing Method for Multi-Hop Wireless Ad Hoc Networks |
| US20140219090A1 (en) * | 2013-02-04 | 2014-08-07 | Telefonaktiebolaget L M Ericsson (Publ) | Network congestion remediation utilizing loop free alternate load sharing |
| US20160154756A1 (en) * | 2014-03-31 | 2016-06-02 | Avago Technologies General Ip (Singapore) Pte. Ltd | Unordered multi-path routing in a pcie express fabric environment |
| WO2015161409A1 (en) * | 2014-04-21 | 2015-10-29 | 华为技术有限公司 | Load balance implementation method, device and system |
Non-Patent Citations (1)
| Title |
|---|
| YUAN YANG等: "A Hop-by-Hop Routing Mechanism for Green Internet", 《IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS》 * |
Cited By (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109246231A (en) * | 2018-09-29 | 2019-01-18 | 北京深度奇点科技有限公司 | A kind of method for intelligently routing and smart routing devices |
| CN109246231B (en) * | 2018-09-29 | 2022-11-18 | 北京深度奇点科技有限公司 | Intelligent routing method and intelligent routing equipment |
| CN111147386A (en) * | 2018-11-02 | 2020-05-12 | 伊姆西Ip控股有限责任公司 | Method, electronic device and computer program product for handling data transmission congestion |
| CN109802879A (en) * | 2019-01-31 | 2019-05-24 | 新华三技术有限公司 | A kind of flow routing method and device |
| CN109802879B (en) * | 2019-01-31 | 2021-05-28 | 新华三技术有限公司 | Data stream routing method and device |
| CN110460537A (en) * | 2019-06-28 | 2019-11-15 | 天津大学 | Traffic Scheduling Method Based on Packet Collection in Data Center Asymmetric Topology |
| CN110460537B (en) * | 2019-06-28 | 2023-01-24 | 天津大学 | Packet set-based data center asymmetric topology flow scheduling method |
| CN114007152A (en) * | 2022-01-05 | 2022-02-01 | 北京国科天迅科技有限公司 | Port convergence processing method and device for optical fiber switch |
| CN114007152B (en) * | 2022-01-05 | 2022-06-07 | 北京国科天迅科技有限公司 | Port convergence processing method and device for optical fiber switch |
| CN117082014A (en) * | 2023-10-17 | 2023-11-17 | 苏州元脑智能科技有限公司 | CLOS network, construction method, transmission method, system, device and medium |
| CN117082014B (en) * | 2023-10-17 | 2024-01-23 | 苏州元脑智能科技有限公司 | CLOS network, construction method, transmission method, system, device and medium |
Also Published As
| Publication number | Publication date |
|---|---|
| CN108123878B (en) | 2020-12-15 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP3259885B1 (en) | Traffic engineering feeder for packet switched networks | |
| KR101866174B1 (en) | System and method for software defined routing of traffic within and between autonomous systems with enhanced flow routing, scalability and security | |
| CN104272708B (en) | It is distributed with the stateless first order grouping to server farm and is distributed to the secondary data packets of the stateful second level grouping distribution of some server in group | |
| US8976697B2 (en) | Network status mapping | |
| KR101317969B1 (en) | Inter-node link aggregation system and method | |
| CN108123878B (en) | A routing method, device and data forwarding device | |
| US9973435B2 (en) | Loopback-free adaptive routing | |
| US12494995B2 (en) | Packet forwarding method and apparatus | |
| US9800508B2 (en) | System and method of flow shaping to reduce impact of incast communications | |
| CN107104895A (en) | The unicast forwarding that adaptive route is notified | |
| US11228524B1 (en) | Methods and apparatus for efficient use of link aggregation groups | |
| CN104243323A (en) | Exchange network multicast route method and system | |
| WO2023011153A1 (en) | Method and apparatus for determining hash algorithm information for load balancing, and storage medium | |
| US9413639B2 (en) | Use of alternate paths in forwarding of network packets | |
| CN104469846B (en) | A kind of message processing method and equipment | |
| US20250202822A1 (en) | Positive and negative notifications for adaptive routing | |
| KR101145389B1 (en) | Scalable centralized network architecture with de-centralization of network control and network switching apparatus therefor | |
| WO2018028457A1 (en) | Route determining method and apparatus, and communication device | |
| CN120729796A (en) | Message transmission control method, device and system | |
| CN115996244A (en) | Method, device, storage medium and computer program product for forwarding message | |
| CN101789869A (en) | Processing method and devices of protocol independent multicast service | |
| CN112087380A (en) | Flow adjusting method and device | |
| HK1216055B (en) | System and method for software defined routing of traffic within and between autonomous systems with enhanced flow routing, scalability and security |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20201215 |
|
| CF01 | Termination of patent right due to non-payment of annual fee |