[go: up one dir, main page]

CN106657355B - Cluster management method and device - Google Patents

Cluster management method and device Download PDF

Info

Publication number
CN106657355B
CN106657355B CN201611245816.6A CN201611245816A CN106657355B CN 106657355 B CN106657355 B CN 106657355B CN 201611245816 A CN201611245816 A CN 201611245816A CN 106657355 B CN106657355 B CN 106657355B
Authority
CN
China
Prior art keywords
cluster
slave device
slave
master
port
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201611245816.6A
Other languages
Chinese (zh)
Other versions
CN106657355A (en
Inventor
李虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Huawei Digital Technologies Co Ltd
Original Assignee
Beijing Huawei Digital Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Huawei Digital Technologies Co Ltd filed Critical Beijing Huawei Digital Technologies Co Ltd
Priority to CN201611245816.6A priority Critical patent/CN106657355B/en
Publication of CN106657355A publication Critical patent/CN106657355A/en
Application granted granted Critical
Publication of CN106657355B publication Critical patent/CN106657355B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1044Group management mechanisms 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0659Management of faults, events, alarms or notifications using network fault recovery by isolating or reconfiguring faulty entities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1044Group management mechanisms 
    • H04L67/1048Departure or maintenance mechanisms

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Small-Scale Networks (AREA)
  • Computer And Data Communications (AREA)

Abstract

一种集群管理方法及设备,所述方法应用于集群,所述集群包括主设备和从设备,所述方法包括:当所述主设备与所述从设备之间的协议通道故障后,所述从设备转换为主设备角色;所述从设备将集群检测报文通过数据通道发送给所述主设备;在接收所述主设备发送的集群应答报文后,所述从设备根据所述集群应答消息退出所述集群;所述从设备关闭所述从设备的端口。通过采用本方案,能够避免主设备和从设备的BGP反复震荡的现象。

Figure 201611245816

A cluster management method and device, the method is applied to a cluster, the cluster includes a master device and a slave device, the method includes: when a protocol channel between the master device and the slave device fails, the The slave device changes the role of the master device; the slave device sends the cluster detection packet to the master device through the data channel; after receiving the cluster response packet sent by the master device, the slave device responds according to the cluster response message to exit the cluster; the slave device closes the slave device's port. By adopting this solution, the phenomenon that the BGP of the master device and the slave device repeatedly oscillates can be avoided.

Figure 201611245816

Description

一种集群管理方法及设备Cluster management method and device

技术领域technical field

本发明涉及虚拟集群技术领域,尤其涉及一种集群管理方法及设备。The present invention relates to the technical field of virtual clusters, in particular to a cluster management method and device.

背景技术Background technique

虚机集群技术能够减少运营成本、扩大单节点端口数量等,目前在路由器产品上应用越来越多;如企业网场景(例如广电、招商银行、泰国PEA、英国UK等)。使用虚拟集群技术时,在一个集群中的主设备和从设备之间的协议通道故障后,则会存在双主设备的情况。Virtual machine cluster technology can reduce operating costs, expand the number of ports on a single node, etc., and is currently used more and more in router products; such as enterprise network scenarios (such as radio and television, China Merchants Bank, Thailand PEA, UK UK, etc.). When the virtual cluster technology is used, after the protocol channel between the master device and the slave device in a cluster fails, there will be a situation of dual master devices.

由于出现双主设备,这两个主设备都会与当前所处的集群上下行设备建立边界网关协议(英文全称:Border Gateway Protocol,英文简称:BGP)邻居,从而出现BGP协议反复震荡现象,并且会影响到组网的路由震荡,导致节点的流量无法正常转发。Due to the presence of dual-master devices, the two master devices will establish Border Gateway Protocol (English full name: Border Gateway Protocol, English abbreviation: BGP) neighbors with the upstream and downstream devices in the current cluster, so that the BGP protocol repeatedly oscillates, and will The route flapping of the network is affected, and the traffic of the node cannot be forwarded normally.

发明内容SUMMARY OF THE INVENTION

本发明提供了一种集群管理方法及设备,能够解决现有技术中主设备和从设备的BGP协议反复震荡的问题。The invention provides a cluster management method and device, which can solve the problem of repeated oscillation of the BGP protocol of the master device and the slave device in the prior art.

第一方面提供一种集群管理方法,该方法可应用于集群,集群是指通过协议将一组相互独立的、通过网络互联的计算机,它们构成了一个组网,并以单一系统的模式加以管理。集群内的各设备之间能共享资源、分担费用、共用信道设备及服务等。所述集群包括一个主设备和至少一个从设备,主设备是指用于控制其所在的集群。从设备是用于采集数据。主设备和从设备都可与其上下行通信连接的各设备之间进行流量转发等操作。在协议通道正常时,主设备的设备角色为主设备角色,从设备的设备角色为从设备角色。该方法包括:The first aspect provides a cluster management method, which can be applied to a cluster. A cluster refers to a group of mutually independent computers interconnected through a network through a protocol. They form a network and are managed in a single system mode. . The devices in the cluster can share resources, share costs, share channel devices and services, etc. The cluster includes a master device and at least one slave device, and the master device refers to the cluster in which it is used to control. A slave device is used to collect data. Both the master device and the slave device can perform operations such as traffic forwarding between the devices connected to the upstream and downstream communication. When the protocol channel is normal, the device role of the master device is the role of the master device, and the device role of the slave device is the role of the slave device. The method includes:

当所述主设备与所述从设备之间的协议通道故障后,所述从设备转换为主设备角色,然后所述从设备将集群检测报文通过数据通道发送给所述主设备,在接收所述主设备发送的集群应答报文后,即可确定其所在的集群已被集散,那么,该从设备可根据所述集群应答消息退出所述集群。When the protocol channel between the master device and the slave device fails, the slave device switches to the role of the master device, and then the slave device sends the cluster detection packet to the master device through the data channel. After the master device sends the cluster response message, it can be determined that the cluster to which it belongs has been collected and distributed, and then the slave device can exit the cluster according to the cluster response message.

在退出其所在的集群后,所述从设备关闭所述从设备的端口。其中,关闭的端口包括ge0/0/0、环回接口(loopback)、虚拟局域网接口vlanif)、业务口等。After exiting the cluster where it is located, the slave device closes the port of the slave device. The closed ports include ge0/0/0, a loopback interface (loopback), a virtual local area network interface (vlanif), a service port, and the like.

与现有机制相比,本发明中,在协议通道故障后,转化为主设备角色的从设备将集群检测报文通过数据通道发送给所述主设备,然后在接收到主设备发送的集群应答报文后,强制关闭从设备的端口,从而隔离原从设备。不对原主设备进行任何处理,原主设备与原主设备的上下行设备之间的协议依旧可以建立,从而可以继续转发原来的业务数据,这样就不会影响到原主设备的流量转发,避免主设备和从设备的BGP反复震荡的现象。Compared with the existing mechanism, in the present invention, after the protocol channel fails, the slave device that is transformed into the role of the master device sends the cluster detection message to the master device through the data channel, and then receives the cluster response sent by the master device. After the message is sent, the port of the slave device is forcibly closed, thereby isolating the original slave device. No processing is performed on the original master device, and the protocol between the original master device and the upstream and downstream devices of the original master device can still be established, so that the original service data can continue to be forwarded, so that the traffic forwarding of the original master device will not be affected. The BGP of the device flaps repeatedly.

在一些可能的设计中,所述集群检测报文包括第一集群检测字段,所述集群应答报文包括第二集群检测字段。In some possible designs, the cluster detection packet includes a first cluster detection field, and the cluster response packet includes a second cluster detection field.

其中,所述第一集群检测字段包括第一指示位、第二指示位和第三指示位,所述第一集群检测字段中的第一指示位用于指示发送所述集群检测检测报文的从设备在所述协议通道故障之前的设备角色,所述第一集群检测字段中的第二指示位指示发送所述集群检测检测报文的从设备在所述协议通道故障之后的设备角色,所述第一集群检测字段中的第三指示位指示从设备申请关闭从设备的端口。Wherein, the first cluster detection field includes a first indicator bit, a second indicator bit, and a third indicator bit, and the first indicator bit in the first cluster detection field is used to indicate the sender of the cluster detection detection packet. The device role of the slave device before the protocol channel failure, the second indicator bit in the first cluster detection field indicates the device role of the slave device that sent the cluster detection detection packet after the protocol channel failure, so The third indication bit in the first cluster detection field indicates that the slave device applies for closing the port of the slave device.

所述第二集群检测字段均包括第一指示位、第二指示位和第三指示位,所述第二集群检测字段中的第一指示位用于指示发送所述集群应答报文的主设备在所述协议通道故障之前的设备角色,所述第二集群检测字段中的第二指示位指示发送所述集群应答报文的主设备在所述协议通道故障之后的设备角色,所述第二集群检测字段中的第三指示位指示同意从设备关闭从设备的端口。Each of the second cluster detection fields includes a first indicator bit, a second indicator bit, and a third indicator bit, and the first indicator bit in the second cluster detection field is used to indicate the master device that sends the cluster response message. The device role before the protocol channel failure, the second indication bit in the second cluster detection field indicates the device role of the master device that sends the cluster response message after the protocol channel failure, the second The third indicator bit in the cluster detection field indicates that the slave device is permitted to close the slave device's port.

在一些可能的设计中,当所述集群中包括两个以上的从设备,且主设备与至少一个从设备之间的协议通道故障时,所述方法还包括:In some possible designs, when the cluster includes more than two slave devices and the protocol channel between the master device and at least one slave device fails, the method further includes:

所述两个以上的从设备中与所述主设备之间的协议通道故障的从设备接收所述主设备发送的集群检测报文,并向所述主设备返回集群应答报文,以使所述主设备关闭所述主设备的端口。Among the two or more slave devices, the slave device whose protocol channel is faulty with the master device receives the cluster detection packet sent by the master device, and returns a cluster response packet to the master device, so that all The master device closes the port of the master device.

例如,当主设备和至少一个从设备之间的协议通道故障后,若当前主设备的数量小于所述集群系统中的从设备的数量,则可由原来的主设备发送集群检测报文给各从设备中转换为主设备的从设备,原来的主设备down。转换为主设备角色的各从设备接收所述主设备发送的集群检测报文后,会向所述主设备返回集群应答报文,从而使得原来的主设备认为该集群已解散,并关闭主设备的各端口。For example, when the protocol channel between the master device and at least one slave device fails, if the current number of master devices is less than the number of slave devices in the cluster system, the original master device can send a cluster detection packet to each slave device. The slave device converted to the master device in the middle, the original master device is down. After receiving the cluster detection message sent by the master device, each slave device that is converted into the master device will return a cluster response message to the master device, so that the original master device thinks that the cluster has been dissolved and shuts down the master device. of each port.

若主设备的数量与所述集群系统中的从设备的数量相等,主设备与从设备之间的协议通道故障时,则一般情况下优先由从设备发送集群检测报文,原来的主设备返回集群应答报文,从设备在收到主设备返回的集群应答报文后关闭自身的端口。在其他实施方式中,也可由主设备向从设备发送集群检测报文,然后在收到从设备返回的集群应答报文后关闭自身的端口,具体本发明不作限定,只要能够避免BGP反复震荡,保证节点之间的流量正常转发即可。If the number of master devices is equal to the number of slave devices in the cluster system, and the protocol channel between the master device and the slave device fails, in general, the slave device sends the cluster detection message first, and the original master device returns A cluster response message, the slave device closes its own port after receiving the cluster response message returned by the master device. In other embodiments, the master device can also send a cluster detection packet to the slave device, and then close its own port after receiving the cluster response packet returned by the slave device. The present invention is not limited, as long as repeated BGP oscillation can be avoided. It is enough to ensure the normal forwarding of traffic between nodes.

在一些可能的设计中,当所述从设备与所述主设备之间的协议通道故障恢复后,当前设备角色为主设备角色的从设备恢复到从设备角色;In some possible designs, when the protocol channel between the slave device and the master device recovers from a fault, the current device role is restored from the slave device in the master role to the slave role;

所述从设备开启所述从设备的端口,从设备既可自动解除各端口的down状态,还可通过命令行手动解除各端口的down状态。The slave device opens the port of the slave device, and the slave device can not only automatically release the down state of each port, but also manually release the down state of each port through a command line.

在一些可能的设计中,在所述从设备关闭所述从设备的端口之后,所述方法还包括:In some possible designs, after the slave device closes the port of the slave device, the method further includes:

所述从设备将所述从设备的每个关闭的端口的端口状态配置为down状态,以便所述从设备与所述主设备之间的协议通道故障恢复后,当前设备角色为主设备角色的从设备恢复到从设备角色,然后,所述从设备开启所述从设备的端口。本发明中,从设备既可自动解除各端口的down状态,又可通过命令行手动解除各端口的down状态。The slave device configures the port state of each closed port of the slave device to the down state, so that after the protocol channel between the slave device and the master device recovers from a fault, the current device role is the same as the master device role. The slave device is restored to the role of the slave device, and then the slave device opens the port of the slave device. In the present invention, the slave device can not only automatically release the down state of each port, but also manually release the down state of each port through a command line.

本发明第二方面提供一种集群设备,具有实现对应于上述第一方面提供的集群管理方法的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。硬件或软件包括一个或多个与上述功能相对应的模块,所述模块可以是软件和/或硬件。A second aspect of the present invention provides a cluster device, which has the function of implementing the cluster management method corresponding to the above-mentioned first aspect. The functions can be implemented by hardware, or can be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions, and the modules may be software and/or hardware.

一种可能的设计中,集群设备应用于集群,所述集群包括主设备和从设备,所述集群设备包括:In a possible design, a cluster device is applied to a cluster, the cluster includes a master device and a slave device, and the cluster device includes:

处理模块,用于当所述主设备与所述从设备之间的协议通道故障后,将所述从设备转换为主设备角色;a processing module, configured to convert the slave device into the role of the master device when the protocol channel between the master device and the slave device fails;

收发模块,用于将集群检测报文通过数据通道发送给所述主设备;a transceiver module for sending the cluster detection message to the master device through a data channel;

所述处理模块还用于在通过所述收发模块接收所述主设备发送的集群应答报文后,根据所述集群应答消息退出所述集群;并关闭所述从设备的端口。The processing module is further configured to, after receiving the cluster response message sent by the master device through the transceiver module, exit the cluster according to the cluster response message; and close the port of the slave device.

可选的,所述集群检测报文包括第一集群检测字段,所述集群应答报文包括第二集群检测字段。Optionally, the cluster detection packet includes a first cluster detection field, and the cluster response packet includes a second cluster detection field.

在一些可能的设计中,所述第一集群检测字段包括第一指示位、第二指示位和第三指示位,所述第一集群检测字段中的第一指示位用于指示发送所述集群检测检测报文的从设备在所述协议通道故障之前的设备角色,所述第一集群检测字段中的第二指示位指示发送所述集群检测检测报文的从设备在所述协议通道故障之后的设备角色,所述第一集群检测字段中的第三指示位指示从设备申请关闭从设备的端口。In some possible designs, the first cluster detection field includes a first indicator bit, a second indicator bit, and a third indicator bit, and the first indicator bit in the first cluster detection field is used to indicate that the cluster is to be sent The device role of the slave device that detects the detection packet before the protocol channel fails, and the second indication bit in the first cluster detection field indicates that the slave device that sent the cluster detection packet is after the protocol channel fails. the device role, the third indication bit in the first cluster detection field indicates that the slave device applies for closing the port of the slave device.

在一些可能的设计中,所述第二集群检测字段包括第一指示位、第二指示位和第三指示位,所述第二集群检测字段中的第一指示位用于指示发送所述集群应答报文的主设备在所述协议通道故障之前的设备角色,所述第二集群检测字段中的第二指示位指示发送所述集群应答报文的主设备在所述协议通道故障之后的设备角色,所述第二集群检测字段中的第三指示位指示同意从设备关闭从设备的端口。In some possible designs, the second cluster detection field includes a first indicator bit, a second indicator bit and a third indicator bit, and the first indicator bit in the second cluster detection field is used to indicate that the cluster is to be sent The device role of the master device of the response packet before the protocol channel failure, the second indication bit in the second cluster detection field indicates the device of the master device sending the cluster response packet after the protocol channel failure role, the third indication bit in the second cluster detection field indicates that the slave device is allowed to close the port of the slave device.

在一些可能的设计中,当所述集群中包括两个以上的从设备时,所述收发模块还用于:In some possible designs, when the cluster includes more than two slave devices, the transceiver module is also used for:

接收所述主设备发送的集群检测报文,并向所述主设备返回集群应答报文,以使所述主设备关闭所述主设备的端口。Receive a cluster detection packet sent by the master device, and return a cluster response packet to the master device, so that the master device closes the port of the master device.

在一些可能的设计中,所述处理模块还用于:In some possible designs, the processing module is also used to:

当所述从设备与所述主设备之间的协议通道故障恢复后,将所述从设备当前的主设备角色恢复到从设备角色;After the fault of the protocol channel between the slave device and the master device is restored, restore the current master device role of the slave device to the slave device role;

开启所述从设备的端口。Open the port of the slave device.

在一些可能的设计中,所述处理模块在所述从设备关闭所述从设备的端口之后,还用于:In some possible designs, after the slave device closes the port of the slave device, the processing module is further configured to:

将所述从设备的每个关闭的端口的端口状态配置为down状态。The port state of each closed port of the slave device is configured to be in the down state.

一种可能的设计中,集群设备应用于集群,所述集群包括主设备和从设备,所述集群设备包括:In a possible design, a cluster device is applied to a cluster, the cluster includes a master device and a slave device, and the cluster device includes:

至少一个处理器、存储器和收发器;at least one processor, memory and transceiver;

其中,所述存储器用于存储程序代码,所述处理器用于调用所述存储器中的程序代码来执行以下操作:Wherein, the memory is used to store program code, and the processor is used to call the program code in the memory to perform the following operations:

当所述主设备与所述从设备之间的协议通道故障后,将所述从设备转换为主设备角色;When the protocol channel between the master device and the slave device fails, the slave device is converted into the role of the master device;

通过收发器将集群检测报文通过数据通道发送给所述主设备;Send the cluster detection message to the master device through the data channel through the transceiver;

在通过所述收发器接收所述主设备发送的集群应答报文后,根据所述集群应答消息退出所述集群;并关闭所述从设备的端口。After receiving the cluster response message sent by the master device through the transceiver, exit the cluster according to the cluster response message; and close the port of the slave device.

相较于现有技术,本发明提供的方案中,在协议通道故障后,转化为主设备角色的从设备将集群检测报文通过数据通道发送给所述主设备,然后在接收到主设备发送的集群应答报文后,强制关闭从设备的端口,从而隔离原从设备。不对原主设备进行任何处理,原主设备与原主设备的上下行设备之间的协议依旧可以建立,从而可以继续转发原来的业务数据,这样就不会影响到原主设备的流量转发,避免主设备和从设备的BGP协议反复震荡的现象。由此可见,这样既保证了该节点有一台设备可用,并且对当前网络的影响极小。Compared with the prior art, in the solution provided by the present invention, after the failure of the protocol channel, the slave device that is transformed into the role of the master device sends the cluster detection message to the master device through the data channel, and then sends the cluster detection message to the master device after receiving it. After receiving the cluster reply message, the port of the slave device is forcibly closed, thereby isolating the original slave device. No processing is performed on the original master device, and the protocol between the original master device and the upstream and downstream devices of the original master device can still be established, so that the original service data can continue to be forwarded, so that the traffic forwarding of the original master device will not be affected. The BGP protocol of the device fluctuates repeatedly. It can be seen that this ensures that the node has a device available and has minimal impact on the current network.

附图说明Description of drawings

图1为本实施例中集群的一种网络拓扑结构的示意图;1 is a schematic diagram of a network topology structure of a cluster in this embodiment;

图2为本实施例中集群管理方法的一种流程示意图;FIG. 2 is a schematic flowchart of a cluster management method in this embodiment;

图3为本实施例中集群管理方法的另一种流程示意图;FIG. 3 is another schematic flowchart of the cluster management method in this embodiment;

图4为本实施例中集群设备的一种结构示意图;FIG. 4 is a schematic structural diagram of a cluster device in this embodiment;

图5为本实施例中执行上述集群管理方法的实体装置的一种结构示意图。FIG. 5 is a schematic structural diagram of an entity device for executing the above cluster management method in this embodiment.

具体实施方式Detailed ways

本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或模块的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或模块,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或模块,本文中所出现的模块的划分,仅仅是一种逻辑上的划分,实际应用中实现时可以有另外的划分方式,例如多个模块可以结合成或集成在另一个系统中,或一些特征可以忽略,或不执行,另外,所显示的或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,模块之间的间接耦合或通信连接可以是电性或其他类似的形式,本文中均不作限定。并且,作为分离部件说明的模块或子模块可以是也可以不是物理上的分离,可以是也可以不是物理模块,或者可以分布到多个电路模块中,可以根据实际的需要选择其中的部分或全部模块来实现本发明实施例方案的目的。The terms "first", "second" and the like in the description and claims of the present invention and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It is to be understood that data so used may be interchanged under appropriate circumstances so that the embodiments described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having", and any variations thereof, are intended to cover non-exclusive inclusion, for example, a process, method, system, product or device comprising a series of steps or modules is not necessarily limited to those expressly listed Those steps or modules, but may include other steps or modules not explicitly listed or inherent to these processes, methods, products or devices, the division of modules presented herein is only a logical division, In practical applications, there may be other divisions in implementation, for example, multiple modules may be combined or integrated in another system, or some features may be ignored, or not implemented, in addition, the coupling shown or discussed with each other Or direct coupling or communication connection may be through some interfaces, and indirect coupling or communication connection between modules may be electrical or other similar forms, which are not limited herein. In addition, the modules or sub-modules described as separate components may or may not be physically separated, may or may not be physical modules, or may be distributed into multiple circuit modules, and some or all of them may be selected according to actual needs. modules to achieve the purpose of the solutions of the embodiments of the present invention.

本发明实施例提供了一种集群管理方法及设备,主要用于虚拟集群技术领域,例如应用于集群系统,能够避免主设备和从设备的BGP协议反复震荡,保证节点的流量正常转发。以下进行详细说明。Embodiments of the present invention provide a cluster management method and device, which are mainly used in the field of virtual cluster technology, for example, in a cluster system, which can avoid repeated oscillation of the BGP protocol of a master device and a slave device, and ensure normal forwarding of node traffic. A detailed description will be given below.

本发明中的集群是指通过协议将一组相互独立的、通过网络互联的计算机,它们构成了一个组网,并以单一系统的模式加以管理。集群内的各设备之间能共享资源、分担费用、共用信道设备及服务等,可以是虚拟机群,虚拟集群是由多个单机框组成的一台逻辑单一的核心路由器。集群可包括一个主设备和至少一个从设备,主设备和从设备都可布局在机框中。主设备是指用于控制其所在的集群。从设备是用于采集数据。主设备和从设备都可与其上下行通信连接的各设备之间进行流量转发等操作。如图1所示,图1为集群的一种网络拓扑结构示意图,主设备(设备A)和从设备(设备B)通过协议部署在同一个组网中,各自的物理位置不受限制。图1中,设备C和设备D为该集群的上下行设备,也就是BGP邻居。The cluster in the present invention refers to a group of mutually independent computers interconnected through a network through a protocol, which constitute a network and are managed in a single system mode. Each device in the cluster can share resources, share costs, share channel devices and services, etc. It can be a virtual machine cluster. A virtual cluster is a single logical core router composed of multiple single chassis. The cluster may include one master device and at least one slave device, and both the master device and the slave device may be arranged in the chassis. The master device is used to control the cluster in which it resides. A slave device is used to collect data. Both the master device and the slave device can perform operations such as traffic forwarding between the devices connected to the upstream and downstream communication. As shown in Figure 1, Figure 1 is a schematic diagram of a network topology of the cluster. The master device (device A) and the slave device (device B) are deployed in the same network through protocols, and their physical locations are not limited. In Figure 1, device C and device D are uplink and downlink devices of the cluster, that is, BGP neighbors.

主设备和从设备之间存在协议通道和数据通道,协议通道是指在集群机框之间,通过主控板接口通信连接的通信链路,协议通道可用于传递集群内的协议报文。数据通道是指集群机框之间,通过业务控板接口通信连接的链路,数据通道可用于传递虚拟集群框间数据报文。A protocol channel and a data channel exist between the master device and the slave device. The protocol channel refers to the communication link between the cluster chassis through the interface of the main control board. The protocol channel can be used to transmit protocol packets in the cluster. The data channel refers to the link between the cluster chassis, which is connected through the interface of the service control board. The data channel can be used to transmit data packets between the virtual cluster chassis.

由于在协议通道故障后,从设备可能会转换为主设备的角色,从而导致出现一个集群中出现两个以上的主设备的现象,同时,协议通道故障的设备会退出故障前所在的集群,由于出现两个以上的主设备,所以会导致主设备的BGP邻居设备的BGP反复震荡,进而导致流量反复中断,无法保证流量的正常转发。为解决上述技术问题,本发明实施例主要提供以下技术方案:After the protocol channel fails, the slave device may switch to the role of the master device, resulting in the phenomenon of more than two master devices in a cluster. At the same time, the device with the protocol channel failure will exit the cluster where it was before the failure. If there are more than two master devices, the BGP of the BGP neighbor device of the master device will be repeatedly flapped, and the traffic will be interrupted repeatedly, and the normal traffic forwarding cannot be guaranteed. In order to solve the above-mentioned technical problems, the embodiments of the present invention mainly provide the following technical solutions:

从设备转化为主设备角色后,从设备将集群检测报文通过数据通道发送给原主设备,然后在接收到原主设备发送的集群应答报文后,强制关闭从设备的端口,从而隔离该从设备。After the slave device is converted into the role of the master device, the slave device sends the cluster detection packet to the original master device through the data channel, and then forcibly closes the port of the slave device after receiving the cluster response packet sent by the original master device, thereby isolating the slave device. .

通过以上技术方案,能够避免出现两个以上的主设备时导致BGP协议反复震荡的问题,也能够保证各节点的流量正常转发。Through the above technical solutions, the problem of repeated oscillation of the BGP protocol when there are more than two master devices can be avoided, and the normal forwarding of traffic of each node can also be ensured.

请参照图2,以下对本发明提供一种集群管理方法进行举例说明,所述方法应用于集群,所述集群包括主设备和从设备,主设备与从设备之间存在协议通道和数据通道,在一个集群中,从设备的个数可以包括一个以上,在协议通道正常情况下,一个集群中仅有一个主设备。所述方法包括:Referring to FIG. 2 , a cluster management method provided by the present invention is exemplified below. The method is applied to a cluster, and the cluster includes a master device and a slave device. A protocol channel and a data channel exist between the master device and the slave device. In a cluster, the number of slave devices can include more than one. Under normal conditions of the protocol channel, there is only one master device in a cluster. The method includes:

101、当所述主设备与所述从设备之间的协议通道故障后,所述从设备转换为主设备角色。101. When the protocol channel between the master device and the slave device fails, the slave device switches to the role of the master device.

102、所述从设备将集群检测报文通过数据通道发送给所述主设备。102. The slave device sends a cluster detection packet to the master device through a data channel.

其中,所述集群检测报文包括第一集群检测字段。所述第一集群检测字段包括第一指示位、第二指示位和第三指示位,所述第一集群检测字段中的第一指示位用于指示发送所述集群检测检测报文的从设备在所述协议通道故障之前的设备角色,所述第一集群检测字段中的第二指示位指示发送所述集群检测检测报文的从设备在所述协议通道故障之后的设备角色,所述第一集群检测字段中的第三指示位指示从设备申请关闭从设备的端口。Wherein, the cluster detection packet includes a first cluster detection field. The first cluster detection field includes a first indicator bit, a second indicator bit and a third indicator bit, and the first indicator bit in the first cluster detection field is used to indicate a slave device that sends the cluster detection detection packet The device role before the protocol channel failure, the second indicator bit in the first cluster detection field indicates the device role of the slave device that sends the cluster detection detection packet after the protocol channel failure, the first The third indicator bit in a cluster detection field indicates that the slave device applies to close the port of the slave device.

在一些应用场景中,上述集群检测报文可采用统一链路层管理(英文全称:Consolidated LinkLayer Management,英文简称:CLM)/光纤链路层管理(英文全称:FiberLinkLayer Management,英文简称:FLM)链路层协议的扩展,增加第一集群检测字段,该第一集群检测字段占用2字节,定义格式如下:前1-2位为第一指示位,用于表示集群设备在协议通道故障之前的设备角色,其中01表示从设备角色(slave),10表示主设备角色(master)。第3-4位为第二指示位,用于表示集群设备当前的设备角色,其中01表示从设备角色(slave),10表示主设备角色(master)。第5-6位表示是否申请down,其中01表示申请down。例如,在集群检测报文中,第一集群检测字段可显示为01100100 00000000。In some application scenarios, the above-mentioned cluster detection packets can adopt the unified link layer management (English full name: Consolidated LinkLayer Management, English abbreviation: CLM)/fiber link layer management (English full name: FiberLinkLayer Management, English abbreviation: FLM) chain The expansion of the road layer protocol adds the first cluster detection field. The first cluster detection field occupies 2 bytes, and the definition format is as follows: the first 1-2 bits are the first indicator bits, which are used to indicate the cluster device before the protocol channel failure. Device role, where 01 represents the slave role (slave), and 10 represents the master device role (master). Bits 3-4 are the second indication bits, which are used to represent the current device role of the cluster device, wherein 01 represents the slave device role (slave), and 10 represents the master device role (master). Bits 5-6 indicate whether to apply for down, of which 01 indicates application for down. For example, in a cluster detection packet, the first cluster detection field may be displayed as 01100100 00000000.

103、所述主设备接收到上述集群检测报文后,向所述从设备发送集群应答报文。103. After receiving the cluster detection packet, the master device sends a cluster response packet to the slave device.

其中,所述集群应答报文包括第二集群检测字段。所述第二集群检测字段均包括第一指示位、第二指示位和第三指示位,所述第二集群检测字段中的第一指示位用于指示发送所述集群应答报文的主设备在所述协议通道故障之前的设备角色,所述第二集群检测字段中的第二指示位指示发送所述集群应答报文的主设备在所述协议通道故障之后的设备角色,所述第二集群检测字段中的第三指示位指示同意从设备关闭从设备的端口。Wherein, the cluster response message includes a second cluster detection field. Each of the second cluster detection fields includes a first indicator bit, a second indicator bit, and a third indicator bit, and the first indicator bit in the second cluster detection field is used to indicate the master device that sends the cluster response message. The device role before the protocol channel failure, the second indication bit in the second cluster detection field indicates the device role of the master device that sends the cluster response message after the protocol channel failure, the second The third indicator bit in the cluster detection field indicates that the slave device is permitted to close the slave device's port.

相应的,上述集群应答报文也可采用CLM/FLM链路层协议的扩展,增加第二集群检测字段,该第二集群检测字段占用2字节,定义格式如下:前1-2位表示第一指示位,用于表示集群设备在协议通道故障之前的设备角色,其中01表示从设备角色(slave),10表示主设备角色(master)。第3-4位表示第二指示位,用于表示集群设备当前的设备角色,其中01表示从设备角色(slave),10表示主设备角色(master)。5-6位表示是否申请down,其中10表示同意down。例如,在集群应答报文中,第二集群检测字段可显示为10101100 00000000。Correspondingly, the above cluster response message can also be extended by the CLM/FLM link layer protocol, adding a second cluster detection field. The second cluster detection field occupies 2 bytes, and the definition format is as follows: The first 1-2 bits represent the first An indication bit is used to indicate the device role of the cluster device before the protocol channel failure, wherein 01 represents the slave device role (slave), and 10 represents the master device role (master). Bits 3-4 represent the second indication bits, which are used to represent the current device role of the cluster device, wherein 01 represents the slave device role (slave), and 10 represents the master device role (master). 5-6 bits indicate whether to apply for down, and 10 indicates agree to down. For example, in the cluster reply message, the second cluster detection field may be displayed as 10101100 00000000.

对于上述集群检测报文和上述集群应答报文,二者的通信格式可以是如下表1中的一种,也可以是在下表1的基础上变形得到,具体本发明不作限定。For the above-mentioned cluster detection message and the above-mentioned cluster response message, the communication format of the two can be one of the following Table 1, or can be obtained by deformation on the basis of Table 1, which is not limited in the present invention.

Figure GDA0002552561880000081
Figure GDA0002552561880000081

表1Table 1

上述表1中的各元素的具体含义可参考下表2。For the specific meaning of each element in Table 1 above, please refer to Table 2 below.

Figure GDA0002552561880000091
Figure GDA0002552561880000091

表2Table 2

表2中,CMD是指集群多框检测(英文全称:cluster-chassis multi-userdetection,英文简称:CMD),可用于检测解集群的协议,CMD为集群检测报文或集群应答报文的一种具体实施方式,还可以有其他的名称,具体本发明不作限定。In Table 2, CMD refers to cluster multi-frame detection (full name in English: cluster-chassis multi-user detection, English abbreviation: CMD), which can be used to detect and de-cluster the protocol, CMD is a kind of cluster detection message or cluster response message The specific embodiment may also have other names, which are not specifically limited by the present invention.

104、在接收所述主设备发送的集群应答报文后,所述从设备根据所述集群应答消息退出所述集群。104. After receiving the cluster response message sent by the master device, the slave device exits the cluster according to the cluster response message.

105、所述从设备关闭所述从设备的端口。105. The slave device closes the port of the slave device.

其中,关闭的端口包括ge0/0/0、环回接口(loopback)、虚拟局域网接口(英文全称:virtual local area network interface,英文简称:vlanif)、业务口等。在关闭从设备的各端口后,从设备还可以将所述从设备的每个关闭的端口的端口状态配置为down状态,以便所述从设备与所述主设备之间的协议通道故障恢复后,当前设备角色为主设备角色的从设备恢复到从设备角色,然后,所述从设备开启所述从设备的端口。本发明中,从设备既可自动解除各端口的down状态,又可通过命令行手动解除各端口的down状态。The closed ports include ge0/0/0, a loopback interface (loopback), a virtual local area network interface (full English name: virtual local area network interface, English abbreviation: vlanif), service ports, and the like. After closing each port of the slave device, the slave device may also configure the port state of each closed port of the slave device to the down state, so that after the protocol channel between the slave device and the master device recovers from failure , the current device role is restored from the slave device in the master device role to the slave device role, and then the slave device opens the port of the slave device. In the present invention, the slave device can not only automatically release the down state of each port, but also manually release the down state of each port through a command line.

与现有机制相比,本发明实施例中,在协议通道故障后,转化为主设备角色的从设备将集群检测报文通过数据通道发送给所述主设备,然后在接收到主设备发送的集群应答报文后,强制关闭从设备的端口,从而隔离原从设备。不对原主设备进行任何处理,原主设备与原主设备的上下行设备之间的协议依旧可以建立,从而可以继续转发原来的业务数据,这样就不会影响到原主设备的流量转发,避免主设备和从设备的BGP反复震荡的现象。Compared with the existing mechanism, in this embodiment of the present invention, after the protocol channel fails, the slave device that is transformed into the role of the master device sends the cluster detection packet to the master device through the data channel, and then receives the message sent by the master device. After the cluster responds to the message, the port of the slave device is forcibly closed, thereby isolating the original slave device. No processing is performed on the original master device, and the protocol between the original master device and the upstream and downstream devices of the original master device can still be established, so that the original service data can continue to be forwarded, so that the traffic forwarding of the original master device will not be affected. The BGP of the device flaps repeatedly.

换句话说,在检测到两个以上的主设备时,可通过强制关闭其中原从设备的端口的方式,隔离原从设备。在上述步骤104和步骤105中,不对原主设备进行任何处理,原主设备可以继续转发原来的业务数据,这样就不会影响到原主设备的流量转发。由此可见,这样既保证了该节点有一台设备可用,并且对当前网络的影响极小。同时,还可以通过联动BFD的特性,进一步将出现两个以上的主设备对网络所带来的影响降到毫秒级。In other words, when more than two master devices are detected, the original and slave devices can be isolated by forcibly closing the ports of the original and slave devices. In the above steps 104 and 105, the original master device does not perform any processing, and the original master device can continue to forward the original service data, so that the traffic forwarding of the original master device will not be affected. It can be seen that this ensures that the node has a device available and has minimal impact on the current network. At the same time, by linking the BFD feature, the impact on the network caused by the presence of more than two master devices can be further reduced to the millisecond level.

可选的,在一些发明实施例中,当集群中包括两个以上的从设备时,在主设备与至少一个从设备之间的协议通道故障。由于原从设备数量多于原主设备,那么,这种情况下,还可设置如下规则:Optionally, in some inventive embodiments, when the cluster includes more than two slave devices, the protocol channel between the master device and at least one slave device is faulty. Since the number of original slave devices is more than the original master device, in this case, the following rules can also be set:

可由主设备向集群中的各从设备(包括转化为主设备角色的从设备,和/或,仍然为从设备角色的从设备)发送集群检测报文。各从设备接收所述主设备发送的集群检测报文,并向所述主设备返回集群应答报文,使得原来的主设备认为该集群已解散,并关闭主设备的各端口(与前述部分关闭从设备的端口相同或类似)。The master device may send a cluster detection message to each slave device in the cluster (including the slave device that has been transformed into the role of the master device, and/or the slave device that is still in the role of the slave device). Each slave device receives the cluster detection message sent by the master device, and returns a cluster response message to the master device, so that the original master device thinks that the cluster has been disbanded, and closes each port of the master device (the same as the previous part. the port of the slave device is the same or similar).

需要说明的是,当集群中的主设备与从设备的数目相同时,若主设备与从设备之间的协议通道故障,则一般情况下优先由从设备向主设备发送集群检测报文,然后,从设备在收到主设备返回的集群应答报文后关闭自身的端口。在其他实施方式中,也可由主设备向从设备发送集群检测报文,然后在收到从设备返回的集群应答报文后关闭自身的端口,具体本发明不作限定,只要能够避免BGP反复震荡,保证节点之间的流量正常转发即可。It should be noted that when the number of master devices and slave devices in the cluster is the same, if the protocol channel between the master device and the slave device fails, the slave device will send cluster detection packets to the master device first, and then , the slave device closes its own port after receiving the cluster response packet returned by the master device. In other embodiments, the master device can also send a cluster detection packet to the slave device, and then close its own port after receiving the cluster response packet returned by the slave device. The present invention is not limited, as long as repeated BGP oscillation can be avoided. It is enough to ensure the normal forwarding of traffic between nodes.

为便于理解,下面举一具体应用场景对本发明的集群管理方法进行举例说明:如图3所示,集群包括一个主机框(Master)和一个从机框(Slave),在二者之间建立了协议通道(通过#17端口和#18端口连接)和数据通道(例如通过#11端口和#12端口连接)。在协议通道故障后,Slave通过数据通道向Master发送集群检测报文,然后Master向Slave返回集群应答报文。在Slave收到该集群应答报文后,认为该集群已解散,则会关闭其端口,并退出该集群。For ease of understanding, a specific application scenario is given below to illustrate the cluster management method of the present invention: As shown in FIG. 3 , the cluster includes a master frame (Master) and a slave frame (Slave), and a Protocol channels (connected via ports #17 and #18) and data channels (connected via ports #11 and #12, for example). After the protocol channel fails, the slave sends a cluster detection packet to the master through the data channel, and then the master returns a cluster response packet to the slave. After the Slave receives the cluster response message and thinks that the cluster has been dissolved, it will close its port and exit the cluster.

以上对本发明中一种集群管理方法进行说明,以下对执行上述集群管理方法的集群设备进行描述。A cluster management method in the present invention is described above, and a cluster device that executes the above cluster management method is described below.

一、参照图4,对集群设备40进行说明,所述集群设备应用于集群,所述集群包括主设备和从设备,所述集群设备40包括:1. Referring to FIG. 4 , the cluster device 40 is described. The cluster device is applied to a cluster, and the cluster includes a master device and a slave device. The cluster device 40 includes:

处理模块401,用于当所述主设备与所述从设备之间的协议通道故障后,将所述从设备转换为主设备角色;A processing module 401, configured to convert the slave device into the role of the master device when the protocol channel between the master device and the slave device fails;

收发模块402,用于将集群检测报文通过数据通道发送给所述主设备;a transceiver module 402, configured to send a cluster detection message to the master device through a data channel;

所述处理模块401还用于在通过所述收发模块402接收所述主设备发送的集群应答报文后,根据所述集群应答消息退出所述集群;并关闭所述从设备的端口。The processing module 401 is further configured to, after receiving the cluster response message sent by the master device through the transceiver module 402, exit the cluster according to the cluster response message; and close the port of the slave device.

本发明实施例中,在协议通道故障后,处理模块401将从设备转化为主设备角色,收发模块402将集群检测报文通过数据通道发送给所述主设备,然后在接收到主设备发送的集群应答报文后,处理模块401强制关闭从设备的端口,从而隔离原从设备。不对原主设备进行任何处理,原主设备可以继续转发原来的业务数据,这样就不会影响到原主设备的流量转发,避免主设备和从设备的BGP反复震荡的现象。由此可见,这样既保证了该节点有一台设备可用,并且对当前网络的影响极小。In this embodiment of the present invention, after the protocol channel fails, the processing module 401 transforms the slave device into the role of the master device, and the transceiver module 402 sends the cluster detection message to the master device through the data channel, and then receives the message sent by the master device. After the cluster reply message, the processing module 401 forcibly closes the port of the slave device, thereby isolating the original slave device. Without performing any processing on the original master device, the original master device can continue to forward the original service data, so that the traffic forwarding of the original master device will not be affected, and the phenomenon of repeated BGP oscillation between the master device and the slave device will be avoided. It can be seen that this ensures that the node has a device available and has minimal impact on the current network.

可选的,所述集群检测报文包括第一集群检测字段,所述集群应答报文包括第二集群检测字段。Optionally, the cluster detection packet includes a first cluster detection field, and the cluster response packet includes a second cluster detection field.

可选的,在一些发明实施例中,所述第一集群检测字段包括第一指示位、第二指示位和第三指示位,所述第一集群检测字段中的第一指示位用于指示发送所述集群检测检测报文的从设备在所述协议通道故障之前的设备角色,所述第一集群检测字段中的第二指示位指示发送所述集群检测检测报文的从设备在所述协议通道故障之后的设备角色,所述第一集群检测字段中的第三指示位指示从设备申请关闭从设备的端口。Optionally, in some embodiments of the invention, the first cluster detection field includes a first indicator bit, a second indicator bit, and a third indicator bit, and the first indicator bit in the first cluster detection field is used to indicate The device role of the slave device that sends the cluster detection detection packet before the protocol channel fails, and the second indication bit in the first cluster detection field indicates that the slave device that sent the cluster detection detection packet is in the The device role after a protocol channel failure, the third indication bit in the first cluster detection field indicates that the slave device applies for closing the port of the slave device.

可选的,在一些发明实施例中,所述第二集群检测字段包括第一指示位、第二指示位和第三指示位,所述第二集群检测字段中的第一指示位用于指示发送所述集群应答报文的主设备在所述协议通道故障之前的设备角色,所述第二集群检测字段中的第二指示位指示发送所述集群应答报文的主设备在所述协议通道故障之后的设备角色,所述第二集群检测字段中的第三指示位指示同意从设备关闭从设备的端口。Optionally, in some embodiments of the invention, the second cluster detection field includes a first indicator bit, a second indicator bit, and a third indicator bit, and the first indicator bit in the second cluster detection field is used to indicate the device role of the master device sending the cluster response message before the protocol channel fails, and the second indication bit in the second cluster detection field indicates that the master device sending the cluster response message is in the protocol channel The role of the device after the failure, the third indication bit in the second cluster detection field indicates that the slave device is allowed to close the port of the slave device.

可选的,在一些发明实施例中,当所述集群中包括两个以上的从设备时,所述收发模块402还用于:Optionally, in some inventive embodiments, when the cluster includes more than two slave devices, the transceiver module 402 is further configured to:

接收所述主设备发送的集群检测报文,并向所述主设备返回集群应答报文,以使所述主设备关闭所述主设备的端口。Receive a cluster detection packet sent by the master device, and return a cluster response packet to the master device, so that the master device closes the port of the master device.

可选的,在一些发明实施例中,所述处理模块401还用于:Optionally, in some inventive embodiments, the processing module 401 is further configured to:

当所述从设备与所述主设备之间的协议通道故障恢复后,将所述从设备当前的主设备角色恢复到从设备角色;After the fault of the protocol channel between the slave device and the master device is restored, restore the current master device role of the slave device to the slave device role;

开启所述从设备的端口。Open the port of the slave device.

可选的,在一些发明实施例中,所述处理模块401在所述从设备关闭所述从设备的端口之后,还用于:Optionally, in some embodiments of the invention, after the slave device closes the port of the slave device, the processing module 401 is further configured to:

将所述从设备的每个关闭的端口的端口状态配置为down状态。The port state of each closed port of the slave device is configured to be in the down state.

需要说明的是,在本发明图4所对应的实施例中的收发模块对应的实体设备可以为收发器,处理模块对应的实体设备可以为处理器。图4所示的装置可以具有如图5所示的结构,当其中一种装置具有如图5所示的结构时,图5中的处理器和收发器实现前述对应该装置的装置实施例提供的处理模块和收发模块相同或相似的功能,图5中的存储器存储处理器执行上述集群管理方法时需要调用的程序代码。It should be noted that, in the embodiment corresponding to FIG. 4 of the present invention, the entity device corresponding to the transceiver module may be a transceiver, and the entity device corresponding to the processing module may be a processor. The apparatus shown in FIG. 4 may have the structure shown in FIG. 5 , when one of the apparatuses has the structure shown in FIG. 5 , the processor and the transceiver in FIG. 5 implement the foregoing apparatus embodiments corresponding to the apparatus provided The processing module and the transceiver module have the same or similar functions, and the memory in FIG. 5 stores the program code that needs to be called when the processor executes the above cluster management method.

在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference may be made to the relevant descriptions of other embodiments.

所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the system, device and module described above can refer to the corresponding process in the foregoing method embodiments, which is not repeated here.

在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或模块的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the modules is only a logical function division. In actual implementation, there may be other division methods. For example, multiple modules or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or modules, and may be in electrical, mechanical or other forms.

所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical modules, that is, may be located in one place, or may be distributed to multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外,在本发明各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。In addition, each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may exist physically alone, or two or more modules may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules.

所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(英文全称:Read-OnlyMemory,英文简称:ROM)、随机存取存储器(英文全称:Random Access Memory,英文简称:RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated modules are implemented in the form of software functional modules and sold or used as independent products, they may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention is essentially or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (full English name: Read-Only Memory, English abbreviation: ROM), random access memory (English full name: Random Access Memory, English abbreviation: RAM), magnetic disk Or various media such as optical discs that can store program codes.

以上对本发明所提供的技术方案进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。The technical solutions provided by the present invention are described in detail above, and specific examples are used in this paper to illustrate the principles and implementations of the present invention. The descriptions of the above embodiments are only used to help understand the method of the present invention and its core idea; Meanwhile, for those of ordinary skill in the art, according to the idea of the present invention, there will be changes in the specific embodiments and application scope. In summary, the contents of this specification should not be construed as limiting the present invention.

Claims (6)

1. A cluster management method is applied to a cluster, wherein the cluster comprises a master device and a slave device, and the method comprises the following steps:
when a protocol channel between the master device and the slave device fails, the slave device is switched to a master device role;
the slave device sends the cluster detection message to the master device through a data channel;
after receiving a cluster response message sent by the master device, the slave device exits the cluster according to the cluster response message;
the slave device closes a port of the slave device;
the cluster detection message comprises a first cluster detection field, and the cluster response message comprises a second cluster detection field;
the first cluster detection field includes a first indication bit, a second indication bit and a third indication bit, the first indication bit in the first cluster detection field is used to indicate a device role of a slave device sending the cluster detection packet before the protocol channel failure, the second indication bit in the first cluster detection field indicates a device role of the slave device sending the cluster detection packet after the protocol channel failure, and the third indication bit in the first cluster detection field indicates that the slave device applies for closing a port of the slave device;
the second cluster detection field includes a first indication bit, a second indication bit and a third indication bit, the first indication bit in the second cluster detection field is used to indicate a device role of the master device sending the cluster response packet before the protocol channel failure, the second indication bit in the second cluster detection field indicates a device role of the master device sending the cluster response packet after the protocol channel failure, and the third indication bit in the second cluster detection field indicates that the slave device agrees to close a port of the slave device;
when more than two slave devices are included in the cluster, the method further comprises:
and the slave equipment with the fault protocol channel between the two or more slave equipment and the master equipment receives the cluster detection message sent by the master equipment and returns a cluster response message to the master equipment, so that the master equipment closes the port of the master equipment.
2. The method of claim 1, further comprising:
when the protocol channel between the slave device and the master device is recovered from the fault, the slave device with the current device role as the master device role is recovered to the slave device role;
the slave device opens a port of the slave device.
3. The method of claim 1, wherein after the slave device closes the port of the slave device, the method further comprises:
the slave device configures a port state of each closed port of the slave device to a down state.
4. A cluster device, wherein the cluster device is applied to a cluster, wherein the cluster comprises a master device and a slave device, and wherein the cluster device comprises:
the processing module is used for converting the slave equipment into a master equipment role after a protocol channel between the master equipment and the slave equipment fails;
the receiving and sending module is used for sending the cluster detection message to the main equipment through a data channel;
the processing module is further configured to quit the cluster according to the cluster response message after receiving the cluster response message sent by the master device through the transceiver module; and closing the port of the slave device;
the cluster detection message comprises a first cluster detection field, and the cluster response message comprises a second cluster detection field;
the first cluster detection field includes a first indication bit, a second indication bit and a third indication bit, the first indication bit in the first cluster detection field is used to indicate a device role of a slave device sending the cluster detection packet before the protocol channel failure, the second indication bit in the first cluster detection field indicates a device role of the slave device sending the cluster detection packet after the protocol channel failure, and the third indication bit in the first cluster detection field indicates that the slave device applies for closing a port of the slave device;
the second cluster detection field includes a first indication bit, a second indication bit and a third indication bit, the first indication bit in the second cluster detection field is used to indicate a device role of the master device sending the cluster response packet before the protocol channel failure, the second indication bit in the second cluster detection field indicates a device role of the master device sending the cluster response packet after the protocol channel failure, and the third indication bit in the second cluster detection field indicates that the slave device agrees to close a port of the slave device;
when more than two slave devices are included in the cluster, the transceiver module is further configured to:
and receiving a cluster detection message sent by the main equipment, and returning a cluster response message to the main equipment so that the main equipment closes the port of the main equipment.
5. The cluster device of claim 4, wherein the processing module is further configured to:
when the protocol channel between the slave equipment and the master equipment is recovered from the fault, recovering the current master equipment role of the slave equipment to the slave equipment role;
opening a port of the slave device.
6. The cluster device of claim 4, wherein the processing module, after the slave device closes the port of the slave device, is further configured to:
configuring a port state of each closed port of the slave device to a down state.
CN201611245816.6A 2016-12-29 2016-12-29 Cluster management method and device Expired - Fee Related CN106657355B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611245816.6A CN106657355B (en) 2016-12-29 2016-12-29 Cluster management method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611245816.6A CN106657355B (en) 2016-12-29 2016-12-29 Cluster management method and device

Publications (2)

Publication Number Publication Date
CN106657355A CN106657355A (en) 2017-05-10
CN106657355B true CN106657355B (en) 2020-10-16

Family

ID=58835941

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611245816.6A Expired - Fee Related CN106657355B (en) 2016-12-29 2016-12-29 Cluster management method and device

Country Status (1)

Country Link
CN (1) CN106657355B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107579918B (en) * 2017-08-15 2020-05-12 新华三技术有限公司 Method and device for maintaining neighbor relation
CN112737944B (en) * 2020-12-25 2022-07-08 浪潮思科网络科技有限公司 Bfd-based peer-link state detection method, device and medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101309185A (en) * 2008-07-16 2008-11-19 杭州华三通信技术有限公司 Processing method of multi-active devices and stack member devices in stacking system

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6670882B1 (en) * 1999-07-28 2003-12-30 Cisco Technology, Inc. Multi drop stack bus detector method and apparatus
CN101610182B (en) * 2009-06-26 2011-09-07 杭州华三通信技术有限公司 Multi-primary apparatus conflict detection method in stack and stack member apparatus
CN101714932B (en) * 2009-12-03 2012-01-04 杭州华三通信技术有限公司 MAD testing method and device for IRF stacker
US8634662B2 (en) * 2010-08-25 2014-01-21 Apple Inc. Detecting recurring events in consumer image collections
CN102457402B (en) * 2010-10-14 2014-07-16 杭州华三通信技术有限公司 Method for detecting multiple active equipment conflict and apparatus thereof
CN102209008A (en) * 2011-05-18 2011-10-05 杭州华三通信技术有限公司 Multi-activation detection method and device used for intelligent elastic framework
CN102355366B (en) * 2011-08-24 2014-12-10 杭州华三通信技术有限公司 Member-stacking device and method for managing member-stacking device at split stacking moment
CN102315975B (en) * 2011-10-17 2014-03-19 杭州华三通信技术有限公司 Fault processing method based on intelligent resilient framework (IRF) system and equipment thereof
CN102724069B (en) * 2012-06-14 2015-04-22 福建星网锐捷网络有限公司 Collision detection method, device and network device of dual-master device in thermal staking system
CN103001831B (en) * 2012-12-19 2016-03-09 迈普通信技术股份有限公司 A kind of system and method testing many activation detection perform
CN103166811B (en) * 2013-03-06 2016-12-28 杭州华三通信技术有限公司 A kind of MAD detection method and equipment
CN103560955B (en) * 2013-10-24 2016-09-28 华为技术有限公司 Redundance unit changing method and device
CN103825766B (en) * 2014-02-28 2017-04-12 杭州华三通信技术有限公司 Device and method for detecting BFD links

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101309185A (en) * 2008-07-16 2008-11-19 杭州华三通信技术有限公司 Processing method of multi-active devices and stack member devices in stacking system

Also Published As

Publication number Publication date
CN106657355A (en) 2017-05-10

Similar Documents

Publication Publication Date Title
CN111095869B (en) Method and system for high availability enhancement of computer networks
JP4688765B2 (en) Network redundancy method and intermediate switch device
CN101771618B (en) Host routing reachable method and system in access ring of packet transport network
CN102148677B (en) Method for updating address resolution protocol table entries and core switch
US20100014527A1 (en) Packet ring network system, packet forwarding method and node
EP2922247B1 (en) Method and apparatus for rapidly restoring spanning tree backup port
CN103534982A (en) Method, device and network virtualization system for protecting service reliability
CN104270231B (en) A kind of system and method for realizing binode interconnection pseudo-wire
CN101436975B (en) Method, apparatus and system for implementing rapid convergence in looped network
CN105591775A (en) Method, device and system for network operation administration maintenance (OAM)
CN106657355B (en) Cluster management method and device
WO2018107974A1 (en) Routing switching-back method, and controller and system
CN102025561B (en) Method and system for refreshing MAC (Medium Access Controller) in Ethernet ring
KR101563133B1 (en) System and method for virtual circuit protection in the dynamic multi-domain environment
CN108282346B (en) Software upgrading method and device
CN100512220C (en) Method for realizing service protection on elastic group ring
CN102739540B (en) Method and system of access of branch to headquarters, and branch equipment
CN103746891B (en) A kind of guard method of looped network access service, apparatus and system
CN101420351B (en) Apparatus and method for implementing service protection on elastic packet ring
CN101150478A (en) A method, system and router for establishing active and standby links
CN102891798B (en) heartbeat message transmission method and device
CN105207792A (en) Non-linkage pseudo wire gateway protection system and pseudo wire gateway protection method thereof
CN101729349A (en) RRPP-based detection method and device for connectivity of main ring access
JP2011254293A (en) Network switch device
CN102014033B (en) Method and system for restoring services of fault nodes in ring network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20201016

Termination date: 20211229