[go: up one dir, main page]

CN120729696A - Network fault perception method, device, network equipment and program product - Google Patents

Network fault perception method, device, network equipment and program product

Info

Publication number
CN120729696A
CN120729696A CN202511052242.XA CN202511052242A CN120729696A CN 120729696 A CN120729696 A CN 120729696A CN 202511052242 A CN202511052242 A CN 202511052242A CN 120729696 A CN120729696 A CN 120729696A
Authority
CN
China
Prior art keywords
address
timestamp
fault
flow
flow table
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202511052242.XA
Other languages
Chinese (zh)
Inventor
张超迪
唐勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Maipu Communication Technology Co Ltd
Original Assignee
Maipu Communication Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Maipu Communication Technology Co Ltd filed Critical Maipu Communication Technology Co Ltd
Priority to CN202511052242.XA priority Critical patent/CN120729696A/en
Publication of CN120729696A publication Critical patent/CN120729696A/en
Pending legal-status Critical Current

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本发明涉及数据通信领域,提供一种网络故障感知方法、装置、网络设备及程序产品,所述方法包括:轮询流表项,每条流表项包括第一地址、第二地址以及第一时间戳和第二时间戳,第一时间戳为以第一地址为目的地址的流量的当前最新采集时间,第二时间戳为以第二地址为目的地址的流量的当前最新采集时间,第一地址和第二地址分别为同一条面向连接的数据流的源地址和目的地址;若第一时间戳和第二时间戳之间的间隔时长大于预设静默故障超时时长,则判定第一地址和第二地址之间的流量出现静默故障。本发明能够实现静默故障自动感知。

The present invention relates to the field of data communications and provides a network fault perception method, apparatus, network equipment, and program product. The method comprises: polling flow table entries, each flow table entry comprising a first address, a second address, and a first timestamp and a second timestamp, the first timestamp being the latest current collection time of traffic with the first address as the destination address, the second timestamp being the latest current collection time of traffic with the second address as the destination address, the first address and the second address being the source address and destination address of the same connection-oriented data flow, respectively; if the interval between the first timestamp and the second timestamp is longer than a preset silent fault timeout, then determining that a silent fault has occurred in the traffic between the first address and the second address. The present invention can achieve automatic silent fault perception.

Description

Network fault sensing method, device, network equipment and program product
Technical Field
The present invention relates to the field of network monitoring and high reliability technologies in the field of data communications, and in particular, to a network fault sensing method, device, network equipment, and program product.
Background
As the scale of networks increases, the complexity increases, and the requirements for network quality of service and reliability increase, traditional network monitoring and fault management methods increasingly expose the disadvantages. Particularly, the intelligent computing center network used in the artificial intelligence field has the characteristics of distributed computation, long-period operation, real-time response and the like, and is extremely sensitive to network faults. Users place great emphasis on business continuity and quality of service requirements, and enterprises and service providers increasingly rely on networks to support critical business processes and provide online services. Any network disruption or performance degradation can lead to serious economic loss and customer dissatisfaction. Therefore, the rapid and accurate identification and repair of various faults in a network becomes a key to maintaining service continuity and improving quality of service.
Traditional network fault convergence technology mainly relies on changes in port physical states and the like to perceive network faults. However, when a situation that the physical state of a port is normal and forwarding is not feasible occurs in the network, such as configuration errors, abnormal forwarding tables, abnormal forwarding devices and the like, any fault, namely silence faults, cannot be triggered, such faults cannot be detected in the conventional network quality detection currently, such as RTR (Ready To Receive) or BFD (Bidirectional Forwarding Detection ) detection is adopted, session messages cannot be discarded, and such network faults cannot be detected. The industry mainly uses manual intervention to repair faults after the reasons are found out, and the processing process is as long as several hours, so that the service is seriously affected.
Disclosure of Invention
The invention aims to provide a network fault sensing method, a network fault sensing device, network equipment and a program product, which can automatically identify a silent fault in a network.
Embodiments of the invention may be implemented as follows:
In a first aspect, the present invention provides a network failure awareness method, the method comprising:
Polling stream table items, wherein each stream table item comprises a first address, a second address, a first timestamp and a second timestamp, the first timestamp is the current latest collection time of traffic taking the first address as a destination address, the second timestamp is the current latest collection time of traffic taking the second address as a destination address, and the first address and the second address are the source address and the destination address of the same connection-oriented data stream respectively;
and if the interval time between the first timestamp and the second timestamp is longer than the preset silence fault timeout time, judging that the traffic between the first address and the second address has silence faults.
In an alternative embodiment, after the step of determining that the traffic between the first address and the second address has a silence failure, the method further comprises:
taking the larger timestamp of the first timestamp and the second timestamp as a target timestamp;
and determining the flow direction of the corresponding destination address in the flow table item where the target timestamp is located as a fault flow direction.
In an optional embodiment, the flow table entry further includes a plurality of outlets corresponding to each destination address, and after determining the flow direction of the corresponding destination address in the flow table where the target timestamp is located as the failure flow direction, the method further includes:
determining a path corresponding to an outlet which is currently in use in the plurality of outlets as a fault path;
selecting a corresponding path from the currently unused outlets in the plurality of outlets as a target switching path and switching the fault path to the target switching path.
In an alternative embodiment, before the step of polling the flow table entry, the method further comprises:
enabling IPFIX hardware functions in the in-out direction of the exchange chip;
setting an acquisition period of an IPFIX hardware function according to the silent fault sensing time, wherein the acquisition period is smaller than the silent fault sensing time;
and collecting the connection-oriented data stream, creating a corresponding stream table entry and updating in real time.
In a second aspect, the present invention provides a network failure sensing apparatus, the apparatus comprising:
The polling module polls stream table items, wherein each stream table item comprises a first address, a second address, a first timestamp and a second timestamp, the first timestamp is the current latest collection time of the traffic taking the first address as a destination address, the second timestamp is the current latest collection time of the traffic taking the second address as the destination address, and the first address and the second address are the source address and the destination address of the same connection-oriented data stream respectively;
And the judging module is used for judging that the flow between the first address and the second address has the silence fault if the interval time between the first timestamp and the second timestamp is longer than the preset silence fault timeout time.
In an alternative embodiment, the determining module is further configured to:
taking the larger timestamp of the first timestamp and the second timestamp as a target timestamp;
And determining the flow direction of the corresponding destination address in the flow table item where the target timestamp is located as a fault flow direction.
In an optional embodiment, the flow table entry further includes a plurality of outlets corresponding to each destination address, and the determining module is further configured to:
determining a path corresponding to an outlet which is currently in use in the plurality of outlets as a fault path;
selecting a corresponding path from the currently unused outlets in the plurality of outlets as a target switching path and switching the fault path to the target switching path.
In an alternative embodiment, the apparatus further comprises an acquisition module for:
enabling IPFIX hardware functions in the in-out direction of the exchange chip;
setting an acquisition period of an IPFIX hardware function according to the silent fault sensing time, wherein the acquisition period is smaller than the silent fault sensing time;
and collecting the connection-oriented data stream, creating a corresponding stream table entry and updating in real time.
In a third aspect, the present invention provides a network device comprising a processor and a memory, the memory being for storing a computer program, the processor being for implementing the network failure awareness method of any of the preceding embodiments when executing the computer program.
In a fourth aspect, the present invention provides a program product which, when executed by a processor, implements a network failure awareness method according to any of the preceding embodiments.
Compared with the prior art, the invention has the following beneficial effects:
The invention records the first timestamp of the current latest acquisition time of the traffic taking the first address as the destination address and the second timestamp of the current latest acquisition time of the traffic taking the second address as the destination address by utilizing the stream table item for the same connection-oriented data stream between the first address and the second address, and can judge whether the traffic between the first address and the second address has a silence fault or not by judging whether the interval time between the first timestamp and the second timestamp is longer than the preset silence fault timeout time, thereby realizing the automatic perception of the silence fault.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is an exemplary diagram of an application scenario provided in this embodiment.
Fig. 2 is a block diagram of a network device according to the present embodiment.
Fig. 3 is an exemplary diagram of functions of layers in a forwarding plane in the network failure awareness process provided in the present embodiment.
Fig. 4 is a flowchart illustrating a network failure sensing method according to the present embodiment.
Fig. 5 is a block diagram of a network fault sensing apparatus according to this embodiment.
The icons are 10-network equipment, 11-processor, 12-memory, 13-bus, 100-network fault sensing device, 110-polling module, 120-judging module and 130-collecting module.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that like reference numerals and letters refer to like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
In the description of the present invention, it should be noted that, if the terms "upper", "lower", "inner", "outer", and the like indicate an azimuth or a positional relationship based on the azimuth or the positional relationship shown in the drawings, or the azimuth or the positional relationship in which the inventive product is conventionally put in use, it is merely for convenience of describing the present invention and simplifying the description, and it is not indicated or implied that the apparatus or element referred to must have a specific azimuth, be configured and operated in a specific azimuth, and thus it should not be construed as limiting the present invention.
Furthermore, the terms "first," "second," and the like, if any, are used merely for distinguishing between descriptions and not for indicating or implying a relative importance.
It should be noted that the features of the embodiments of the present invention may be combined with each other without conflict.
Referring to fig. 1, fig. 1 is an exemplary diagram of an application scenario provided in this embodiment, in fig. 1, a server accesses a network through a Leaf device, and ECMP (Equal-Cost Multi-Path) is formed between the Leaf device and a Spine device, so as to implement traffic load sharing. ECMP is a network routing strategy that allows traffic to travel to the same destination through multiple paths of the same cost. Leaf devices, also known as Leaf node switches, are located at the edge layer of the topology, directly connected to servers, storage devices, or terminals. The Spine device, also called a Spine node switch, is located at the core layer of the topology and connects all Leaf devices to form a fully interconnected structure.
Specifically, in FIG. 1, server1 (Server 1) communicates directly with the Leaf1 device, server2 (Server 2) communicates directly with the Leaf2 device, both the Leaf1 device and the Leaf2 device communicate directly with both the Spine1 device and the Spine2 device, so there are two paths of Leaf1- > Spine1- > Leaf2 and Leaf1- > Spine2- > Leaf2 in the ECMP in the Leaf1 device, and there are two paths of Leaf2- > Spine1- > Leaf1 and Leaf2- > Spine2- > Leaf1 in the ECMP in the Leaf2 device.
Once silence fault occurs in the network, the interruption time is longer, and the upper layer service is greatly influenced. For example, for online transaction type applications, if a continuous packet loss occurs, the transaction fails, and the application performance is significantly reduced. Taking Leaf1 equipment as an example, calculating a HASH value based on characteristics such as message quintuples in a normal state, and forwarding a service flow according to a Leaf1- > Spine1- > Leaf2 path. When silence faults such as abnormal forwarding table items or faults of forwarding devices occur in the Leaf 1-Spine 1-Leaf 2 path (as shown by a fault forwarding path in fig. 1), the service flow of the Server 1-Server 2 is abnormally forwarded, and the service flow should be switched to a path shown by a switched forwarding path in fig. 1, but the conventional fault convergence technology cannot quickly identify the silence faults, and certainly cannot timely switch paths.
It should be further noted that fig. 1 is only a simple example of a network implementation, and in fact, in practical applications, the number of Leaf devices and Spine devices in the network may be greater, so that the silence failure cannot be recognized and handled in time, which may result in more serious consequences.
In view of this, the present embodiment provides an implementation manner capable of quickly sensing a silence fault in the application scenario in fig. 1, so that the Leaf1 device can timely sense the silence fault occurring in the current path, so that the forwarding path of the service flow can be quickly switched to Leaf1- > Spine2- > Leaf2, and the service is ensured to be quickly recovered. Which will be described in detail below.
First, the block example diagram of the network device 10 is provided in this embodiment, and the network device 10 may be the Leaf device in fig. 1, or may be the Spine device in fig. 1, which is not limited in this embodiment. Referring to fig. 2, fig. 2 is a block diagram of a network device according to the present embodiment, where the network device 10 implements the network fault sensing method according to the present embodiment, and the network device 10 includes a processor 11, a memory 12 and a bus 13, and the processor 11 and the memory 12 are connected through the bus 13.
The processor 11 may be an integrated circuit chip with signal processing capabilities. In the implementation process, the steps of the network failure sensing method of the present embodiment may be completed by an integrated logic circuit of hardware in the processor 11 or an instruction in the form of software. The Processor 11 may be a general-purpose Processor including a CPU (Central Processing Unit, a central processing unit), an NP (Network Processor, a network Processor), a DSP (DIGITAL SIGNAL Processor), an ASIC (Application SPECIFIC INTEGRATED Circuit), an FPGA (Field Programmable Logic GATE ARRAY, field programmable gate array) or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component.
The memory 12 is used to store a program for implementing the network failure sensing method, and the program may be a software function module stored in the memory 12 in the form of software or firmware (firmware) or solidified in an OS (Operating System) of the network device 10.
After receiving the execution instruction, the processor 11 executes a program to implement the network failure sensing method of the present embodiment.
The inventor finds out after analyzing the reason that the prior art can not quickly identify the technical obstacle of the silence fault, because the silence fault belongs to a service session level fault, and when the silence fault occurs, an RTR message and a BFD message can not be discarded and still be forwarded normally, thereby causing the missed detection of the silence fault, and the inventor finds out a new path for avoiding the missed detection of the silence fault and identifies the silence fault on a forwarding plane. The forwarding plane, also called a data plane, is responsible for fast forwarding of data, and transmits data packets from an ingress to an egress according to predefined rules (such as routing tables, forwarding tables, etc.), and the device performing the forwarding plane processing is typically a switch or a router with a hardware forwarding engine, such as a Leaf device or a Spine device in fig. 1.
Before introducing the network failure sensing method provided in this embodiment, first, the core functions that need to be implemented by each layer in the forwarding plane during the network failure sensing processing are described, referring to fig. 3, fig. 3 is an exemplary diagram of functions of each layer in the forwarding plane during the network failure sensing processing provided in this embodiment, and in fig. 3, the forwarding plane includes a control layer, a data layer and a forwarding layer. The control layer is responsible for generating forwarding rules (such as a flow table, a routing table and a policy rule) and issuing the forwarding rules to the data layer or the forwarding layer, the data layer is responsible for storing and managing forwarding table items (such as a MAC table, a flow table and the like), and the forwarding layer is responsible for executing actual data packet forwarding and performing high-speed processing according to the table items provided by the data layer. In order to timely sense the silence fault, the embodiment performs real-time flow collection at the forwarding layer, creates or updates a flow table entry at the data layer, the flow table entry is used for recording the collected flow information, analyzes the flow information in the flow table entry at the control layer, makes a fault decision according to the analysis result, determines whether the silence fault occurs, and can further perform fault path switching after the silence fault occurs.
Based on the functions provided by the layers of the forwarding plane of fig. 3, the network failure sensing method provided in this embodiment will be described in detail below. Referring to fig. 4, fig. 4 is a flowchart illustrating a network failure sensing method according to the present embodiment, and the method includes the following steps:
In step S101, the flow entries are polled, each flow entry includes a first address, a second address, a first timestamp and a second timestamp, the first timestamp is a current latest collection time of the traffic with the first address as a destination address, the second timestamp is a current latest collection time of the traffic with the second address as a destination address, and the first address and the second address are a source address and a destination address of a same connection-oriented data flow respectively.
In this embodiment, the flow between the first address and the second address refers to a complete session path formed by the source end and the destination end in the bidirectional communication, and the flow belongs to a connection-oriented data flow, and the connection-oriented data flow has a definite bidirectional interaction characteristic and has a message confirmation mechanism, that is, when a data message is lost due to a network failure, the destination end will not reply to the confirmation message, and the source end will trigger retransmission behavior. For example, the TCP protocol includes a three-way handshake establishment connection and acknowledgement mechanism, and in conjunction with fig. 1, the TCP data flow between Server1 and Server2 in fig. 1 belongs to a connection-oriented data flow.
In this embodiment, the flow table entry is used to track the data transmission states in two directions of the same data flow, so as to ensure the uniqueness and comparability of the flow records, help to avoid duplication or confusion in the subsequent fault judging process, and enable the flow information in two directions of the same data flow to track, specifically, when it is monitored that a certain data flow belongs to a connection-oriented data flow, the system merges the data in two directions of the data flow into one flow record, and uses the addresses at two ends of the flow as the first address and the second address respectively. The information recorded by the flow table entry includes, but is not limited to, a first address, a second address, and a current latest acquisition time of the traffic with the first address as a destination address, and a current latest acquisition time of the traffic with the second address as a destination address.
Step S102, if the interval time between the first timestamp and the second timestamp is longer than the preset silence fault timeout duration, determining that the silence fault occurs in the traffic between the first address and the second address.
In this embodiment, the silence failure refers to a phenomenon that, in a case where the physical layer of the network device is normal but the actual forwarding path is abnormal, the traffic cannot be transferred normally and any explicit alarm is not triggered. The silence fault timeout duration can be configured according to the actual requirements of the service on fault perception precision, and can be set to be in the millisecond level or even finer, so that high-sensitivity fault identification is realized. If the time interval between the time of sending one end address of one data stream and the time of receiving the other end address is longer than the preset silence fault timeout duration, the abnormal flow transmission between the two ends is indicated, and the silence fault occurs between the two ends.
According to the method provided by the embodiment, whether the flow between the first address and the second address has the silence fault or not is judged by judging whether the interval time between the first timestamp and the second timestamp of the connection-oriented data flow collection between the first address and the second address is longer than the preset silence fault timeout time, so that the automatic perception of the silence fault is realized.
In an optional implementation manner, after determining that the traffic between the first address and the second address has a silence fault, in order to further accurately determine a specific influence direction of the silence fault, the embodiment further provides an implementation manner of determining the fault traffic direction:
taking the larger timestamp of the first timestamp and the second timestamp as a target timestamp;
and determining the flow direction of the corresponding target address in the flow table item where the target timestamp is located as the fault flow direction.
In this embodiment, since the occurrence of the silence fault is generally represented by that the data stream in one direction cannot be normally transmitted, in the case that there is a significant difference between the first timestamp and the second timestamp, a larger timestamp indicates that there is still an effective data transmission behavior in the corresponding direction, and a smaller timestamp represents the state of data reception interruption in the other direction. Therefore, the traffic direction with the destination address as the destination address is determined as the failure traffic direction, that is, it means that there may be an abnormality in the data flow path sent from the communication partner to the destination address. For example, the first address and the second address are respectively an address 1 and an address 2, the first timestamp and the second timestamp are respectively a timestamp 1 and a timestamp 2, wherein the interval time between the timestamp 1 and the timestamp 2 is longer than the preset silence fault timeout time, and the timestamp 1 is smaller than the timestamp 2, which means that the direction of the timestamp 1 indicates that the corresponding destination address (i.e. the direction taking the address 1 as the destination address) does not receive the expected message, which indicates that the traffic from the address 2 to the address 1 has the packet loss phenomenon, and the fault point is located in the path of the address 2 sent to the address 1.
In an alternative embodiment, in addition to determining a silence fault and a fault traffic direction of the silence fault, in order to improve reliability of network transmission, this embodiment further provides an implementation manner of automatically switching paths where the silence fault occurs:
determining a path corresponding to an outlet currently in use in the plurality of outlets as a fault path;
and selecting a corresponding path from the currently unused outlets in the plurality of outlets as a target switching path and switching the fault path to the target switching path.
In this embodiment, the multiple outlets of the destination address may be physical or logical interfaces that the network device depends on when forwarding data, and are used to indicate a specific transmission channel for traffic from the local device to the next-hop node. In the application scenario shown in fig. 1, when the Leaf1 device detects that a silence fault occurs in a forward data flow between the Server1 and the Server2 due to a forwarding abnormality of a Spine1 node, the system determines a path Leaf1- > Spine1- > Leaf2 originally forwarded through the Spine1 as a fault path, and selects one of other available paths as a target switching path, for example, the Leaf1- > Spine2- > Leaf2 is taken as the target switching path, so that the fault path is switched to the target switching path, and the target switching path performs traffic forwarding between the subsequent Server1 and the Server 2.
In an optional implementation manner, in order to enable the flow table entry to reflect the latest flow state, so as to ensure that the silence fault is accurately determined based on the flow table entry, reasonable management and update are required for the flow table entry, and this embodiment further provides an implementation manner for managing the flow table entry:
(1) Collecting and monitoring flow;
(2) If the monitoring flow comprises a target flow facing the connection and no flow table item exists, creating a flow table item and updating the flow table item according to the target flow;
(3) If the monitored flow includes a connection-oriented target flow and a flow entry exists, updating the flow entry according to the target flow.
In this embodiment, only the connection-oriented traffic (such as the data traffic carried by the TCP protocol) is analyzed and processed, so that the interference of the invalid traffic can be greatly reduced. On the premise that the monitoring flow comprises the target flow facing the connection, if no flow table item exists, the flow table item is firstly established, then the flow table item is updated according to the target flow, and if the flow table item exists, the information in the existing flow table item is updated to reflect the latest flow state.
In a specific implementation, according to different network scales, the network may include flows of multiple bidirectional flows, whether the corresponding flows of the bidirectional flows are added into the flow table entry for the first time or not can be determined according to the first address and the second address, if yes, the corresponding first address and the second address and the acquisition time of the flows are added into the flow table entry, otherwise, only the first timestamp and the second timestamp of the corresponding flows in the flow table entry need to be updated. For example, the network includes the traffic between the address 1 and the address 2 and the traffic between the address 3 and the address 4, when the traffic between the address 1 and the address 2 is acquired for the first time, the address 1, the address 2 and the corresponding first timestamp and the corresponding second timestamp need to be added to the flow table entry, when the traffic between the address 1 and the address 2 is acquired for the subsequent time, the traffic information between the address 1 and the address 2 is known according to the address 1 and the address 2, and only the first timestamp and the second timestamp corresponding to the address 1 and the address 2 need to be updated according to the acquisition time.
In an alternative implementation manner, in order to make full use of the functions provided by the existing hardware, the embodiment is based on an implementation manner of monitoring traffic collection by using an IPFIX (IP Flow Information Export ) function provided by a switching chip of the network device:
enabling IPFIX hardware functions in the in-out direction of the exchange chip;
setting an acquisition period of an IPFIX hardware function according to the silent fault sensing time, wherein the acquisition period is smaller than the silent fault sensing time;
And collecting the connection-oriented data flow, creating a corresponding flow table item and updating in real time.
In this embodiment, IPFIX is a technique for providing packet statistics based on "flow", and is used to complete statistics and derivation of flow information. The silence fault sensing time refers to the maximum tolerance time required by a user to expect to detect silence faults, and the IPFIX acquisition period is set to be smaller than the value of the sensing time so as to ensure that relevant flow change information can be captured as soon as possible after the faults occur, thereby meeting the requirement of quick response to the faults. For example, if the expected silence fault perception time is 100 milliseconds, the IPFIX acquisition period may be set to 80 milliseconds or less to increase the sensitivity of fault identification.
In this embodiment, the IPFIX function of the switch chip of the network device in and out direction is turned on to ensure that bidirectional traffic is collected. And the network equipment judges whether the flow is a connection-oriented data flow or not after receiving the flow information reported by the exchange chip, and ensures that only the connection-oriented data flow is collected.
In an alternative embodiment, according to the target flow, the flow information already recorded in the flow table entry may be updated, or the flow information may be added to the flow table entry, and the present embodiment provides two processing methods in two cases respectively:
if the flow table item does not comprise the first timestamp and/or the second timestamp, adding a first address and the first timestamp and/or the second address and the second timestamp into the flow table item according to the target flow;
In this embodiment, the flow table entry does not include the first timestamp and/or the second timestamp, which means that the information of the target flow is collected for the first time and is not added to the flow table entry, so that the first address, the first timestamp, the second address and the second timestamp need to be added to the flow table entry, and a specific adding manner is as follows:
taking an acquisition time stamp of the traffic taking the first address as a destination address in the target traffic as a first time stamp, and adding the first address and the first time stamp to a flow table item;
And taking the acquisition time stamp of the traffic taking the second address as a destination address in the target traffic as a second time stamp, and adding the second address and the second time stamp to the flow table entry.
In this embodiment, as an implementation manner, the values of the source address and the destination address related to the target traffic may be compared, and the smaller address value is defined as the first address, and the larger address is defined as the second address. Therefore, whether the target traffic is sent to the second address by the first address or sent to the first address by the second address, the traffic information in both directions can be correctly identified and recorded in the flow table entry.
And if the flow table item comprises the first timestamp and/or the second timestamp, updating the first timestamp and/or the second timestamp according to the target flow.
In this embodiment, if the flow entry includes the first timestamp, this means that the target flow is not the first collected flow, in which case, a new entry does not need to be added to the flow entry, but the first timestamp needs to be updated to be the collection time of the target flow, and the second timestamp is similar to the first timestamp. For example, when the current flow table entry is empty, the target flow acquired at t1 is bidirectional flow between address 1 and address 2, the first address and the second address are address 1 and address 2, respectively, the first timestamp and the second timestamp are both t1, at this time, address 1, address 2, the first timestamp and the second timestamp need to be added to the flow table entry, the target flow acquired at t2 is flow sent from address 1 to address 2, then the second timestamp needs to be updated to t2, and the first timestamp does not need to be updated.
In order to perform the above embodiments and the corresponding steps in each possible implementation, an implementation of the network failure sensing device 100 is given below. Referring to fig. 5, fig. 5 is a block diagram of a network fault sensing device according to the present embodiment, and it should be noted that the basic principle and the technical effects of the network fault sensing device 100 according to the present invention are the same as those of the corresponding embodiments, and the description of the embodiment is omitted.
The network fault sensing device 100 includes a polling module 110, a determining module 120, and an acquisition module 130.
A polling module 110, configured to poll flow entries, where each flow entry includes a first address, a second address, and a first timestamp and a second timestamp, where the first timestamp is a current latest collection time of traffic with the first address as a destination address, and the second timestamp is a current latest collection time of traffic with the second address as a destination address, and the first address and the second address are a source address and a destination address of a same connection-oriented data flow respectively;
and the judging module 120 is configured to judge that the traffic between the first address and the second address has a silence fault if the interval time between the first timestamp and the second timestamp is longer than the preset silence fault timeout time.
In an alternative embodiment, the determining module 120 is further configured to:
taking the larger timestamp of the first timestamp and the second timestamp as a target timestamp;
And determining the flow direction of the corresponding destination address in the flow table item where the target timestamp is located as a fault flow direction.
In an alternative embodiment, the flow table entry further includes a plurality of outlets corresponding to each destination address, and the determining module 120 is further configured to:
determining a path corresponding to an outlet currently in use in the plurality of outlets as a fault path;
and selecting a corresponding path from the currently unused outlets in the plurality of outlets as a target switching path and switching the fault path to the target switching path.
In an alternative embodiment, the acquisition module 130 is configured to:
enabling IPFIX hardware functions in the in-out direction of the exchange chip;
Setting an acquisition period of the IPFIX hardware function according to the silent fault sensing time, wherein the acquisition period is smaller than the silent fault sensing time;
And collecting the connection-oriented data flow, creating a corresponding flow table item and updating in real time.
The present embodiment provides a program product which, when executed by a processor, implements a network failure awareness method as described in the foregoing embodiments.
In summary, the embodiment of the invention provides a network fault sensing method, a device, a network device and a program product, wherein the method comprises the steps of polling stream table entries, each stream table entry comprises a first address, a second address, a first timestamp and a second timestamp, the first timestamp is the current latest collection time of traffic taking the first address as a destination address, the second timestamp is the current latest collection time of traffic taking the second address as the destination address, the first address and the second address are the source address and the destination address of the same connection-oriented data stream respectively, and if the interval time between the first timestamp and the second timestamp is longer than the preset silent fault timeout time, the traffic between the first address and the second address is judged to have a silent fault. Compared with the prior art, the method and the device have the advantages that (1) whether the flow between the first address and the second address has the silence fault or not is judged by judging whether the interval time between the first timestamp and the second timestamp of the acquisition of the connection-oriented data flow between the first address and the second address is longer than the preset silence fault timeout time, so that the silence fault is automatically perceived, the fault flow direction is rapidly determined according to the size between the first timestamp and the second timestamp, (3) a fault path is determined and the fault path switching is carried out according to the outlets of the first address and the second address recorded in the flow table entry, the local end fault is perceived, the opposite end fault is perceived, the flow acquisition is carried out through IPFIX, the service session level fault perception is supported, the silence fault is rapidly perceived in a forwarding plane, the fault switching convergence performance is improved, the switching of the paths is based on specific service switching, the outlets of the corresponding routes are not required to be switched, the fault is carried out on the basis of the existing architecture, the flow table entry is recorded, the additional cost is avoided based on the flow table entry, and the fault resource is avoided.
The above description is merely illustrative of various embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think about variations or substitutions within the scope of the present invention, and the invention is intended to be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1.一种网络故障感知方法,其特征在于,所述方法包括:1. A network fault perception method, characterized in that the method comprises: 轮询流表项,每条流表项包括第一地址、第二地址以及第一时间戳和第二时间戳,所述第一时间戳为以第一地址为目的地址的流量的当前最新采集时间,所述第二时间戳为以第二地址为目的地址的流量的当前最新采集时间,所述第一地址和所述第二地址分别为同一条面向连接的数据流的源地址和目的地址;Polling flow table entries, each flow table entry includes a first address, a second address, and a first timestamp and a second timestamp, the first timestamp is the current latest collection time of the traffic with the first address as the destination address, the second timestamp is the current latest collection time of the traffic with the second address as the destination address, the first address and the second address are respectively the source address and destination address of the same connection-oriented data flow; 若所述第一时间戳和所述第二时间戳之间的间隔时长大于预设静默故障超时时长,则判定所述第一地址和所述第二地址之间的流量出现静默故障。If the interval between the first timestamp and the second timestamp is longer than a preset silent fault timeout, it is determined that a silent fault occurs in the traffic between the first address and the second address. 2.根据权利要求1所述的方法,其特征在于,在所述判定所述第一地址和所述第二地址之间的流量出现静默故障的步骤之后,所述方法还包括:2. The method according to claim 1, characterized in that after the step of determining that a silent failure occurs in the traffic between the first address and the second address, the method further comprises: 将所述第一时间戳和所述第二时间戳中较大的时间戳作为目标时间戳;The larger of the first timestamp and the second timestamp is used as the target timestamp; 将所述目标时间戳所在流表项中对应的目的地址的流量方向确定为故障流量方向。The flow direction of the destination address corresponding to the flow table entry where the target timestamp is located is determined as the fault flow direction. 3.根据权利要求2所述的方法,其特征在于,所述流表项还包括每个目的地址对应的多个出口,在所述将所述目标时间戳所在流表中对应的目的地址的流量方向确定为故障流量方向之后,所述方法还包括:3. The method according to claim 2, wherein the flow table entry further includes multiple egresses corresponding to each destination address, and after determining the traffic direction of the destination address corresponding to the flow table containing the target timestamp as the fault traffic direction, the method further includes: 将所述多个出口中当前正在使用的出口对应的路径确定为故障路径;Determining a path corresponding to an exit currently in use among the multiple exits as a faulty path; 从所述多个出口中当前未使用的出口中选择一条对应的路径作为目标切换路径并将所述故障路径切换至所述目标切换路径。A corresponding path is selected from the multiple egresses that are currently unused as a target switching path, and the faulty path is switched to the target switching path. 4.根据权利要求1所述的方法,其特征在于,在所述轮询流表项的步骤之前,所述方法还包括:4. The method according to claim 1, characterized in that before the step of polling flow table entries, the method further comprises: 在交换芯片出入方向启用IPFIX硬件功能;Enable IPFIX hardware function in the inbound and outbound directions of the switch chip; 根据静默故障感知时间设置IPFIX硬件功能的采集周期,所述采集周期小于所述静默故障感知时间;Setting a collection period for the IPFIX hardware function according to the silent fault perception time, wherein the collection period is less than the silent fault perception time; 采集所述面向连接的数据流并创建对应的流表项并实时更新。The connection-oriented data stream is collected and a corresponding flow table entry is created and updated in real time. 5.一种网络故障感知装置,其特征在于,所述装置包括:5. A network fault sensing device, characterized in that the device comprises: 轮询模块,轮询流表项,每条流表项包括第一地址、第二地址以及第一时间戳和第二时间戳,所述第一时间戳为以第一地址为目的地址的流量的当前最新采集时间,所述第二时间戳为以第二地址为目的地址的流量的当前最新采集时间,所述第一地址和所述第二地址分别为同一条面向连接的数据流的源地址和目的地址;A polling module, polling flow table entries, each flow table entry including a first address, a second address, and a first timestamp and a second timestamp, wherein the first timestamp is the latest current collection time of the flow with the first address as the destination address, and the second timestamp is the latest current collection time of the flow with the second address as the destination address, and the first address and the second address are respectively the source address and destination address of the same connection-oriented data flow; 判定模块,用于若所述第一时间戳和所述第二时间戳之间的间隔时长大于预设静默故障超时时长,则判定所述第一地址和所述第二地址之间的流量出现静默故障。The determination module is configured to determine that a silent fault occurs in the traffic between the first address and the second address if the interval between the first timestamp and the second timestamp is greater than a preset silent fault timeout. 6.根据权利要求5所述的装置,其特征在于,所述判定模块还用于:6. The device according to claim 5, wherein the determination module is further configured to: 将所述第一时间戳和所述第二时间戳中较大的时间戳作为目标时间戳;The larger of the first timestamp and the second timestamp is used as the target timestamp; 将所述目标时间戳所在流表项中对应的目的地址的流量方向确定为故障流量方向。The flow direction of the destination address corresponding to the flow table entry where the target timestamp is located is determined as the fault flow direction. 7.根据权利要求6所述的装置,其特征在于,所述流表项还包括每个目的地址对应的多个出口,所述判定模块还用于:7. The device according to claim 6, wherein the flow table entry further includes multiple egresses corresponding to each destination address, and the determination module is further configured to: 将所述多个出口中当前正在使用的出口对应的路径确定为故障路径;Determining a path corresponding to an exit currently in use among the multiple exits as a faulty path; 从所述多个出口中当前未使用的出口中选择一条对应的路径作为目标切换路径并将所述故障路径切换至所述目标切换路径。A corresponding path is selected from the multiple egresses that are currently unused as a target switching path, and the faulty path is switched to the target switching path. 8.根据权利要求6所述的装置,其特征在于,所述装置还包括采集模块,所述采集模块用于:8. The device according to claim 6, further comprising a collection module, wherein the collection module is configured to: 在交换芯片出入方向启用IPFIX硬件功能;Enable IPFIX hardware function in the inbound and outbound directions of the switch chip; 根据静默故障感知时间设置IPFIX硬件功能的采集周期,所述采集周期小于所述静默故障感知时间;Setting a collection period for the IPFIX hardware function according to the silent fault perception time, wherein the collection period is less than the silent fault perception time; 采集所述面向连接的数据流并创建对应的流表项并实时更新。The connection-oriented data stream is collected and a corresponding flow table entry is created and updated in real time. 9.一种网络设备,其特征在于,包括处理器和存储器,所述存储器用于存储计算机程序,所述处理器用于在执行所述计算机程序时,实现权利要求1-4中任一项所述的网络故障感知方法。9. A network device, characterized by comprising a processor and a memory, wherein the memory is used to store a computer program, and the processor is used to implement the network fault perception method according to any one of claims 1 to 4 when executing the computer program. 10.一种程序产品,其特征在于,所述程序产品被处理器执行时,实现如权利要求1-4中任一项所述的网络故障感知方法。10. A program product, characterized in that when the program product is executed by a processor, it implements the network fault perception method according to any one of claims 1 to 4.
CN202511052242.XA 2025-07-29 2025-07-29 Network fault perception method, device, network equipment and program product Pending CN120729696A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202511052242.XA CN120729696A (en) 2025-07-29 2025-07-29 Network fault perception method, device, network equipment and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202511052242.XA CN120729696A (en) 2025-07-29 2025-07-29 Network fault perception method, device, network equipment and program product

Publications (1)

Publication Number Publication Date
CN120729696A true CN120729696A (en) 2025-09-30

Family

ID=97162057

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202511052242.XA Pending CN120729696A (en) 2025-07-29 2025-07-29 Network fault perception method, device, network equipment and program product

Country Status (1)

Country Link
CN (1) CN120729696A (en)

Similar Documents

Publication Publication Date Title
US9137101B2 (en) Multi-layer network diagnostic tracing
CN108123824B (en) A kind of network fault detection method and device
CN112866004B (en) Control plane equipment switching method and device and transfer control separation system
CN109344014B (en) Main/standby switching method and device and communication equipment
CN113890816A (en) Network health state analysis method and device, computer equipment and storage medium
US10771363B2 (en) Devices for analyzing and mitigating dropped packets
CN111740877B (en) Link detection method and system
JP2011146982A (en) Computer system, and monitoring method of computer system
EP3316520B1 (en) Bfd method and apparatus
JP2006501717A (en) Telecom network element monitoring
CN111565133B (en) Private line switching method and device, electronic equipment and computer readable storage medium
WO2024001324A1 (en) Network path detection method and system, and computer device
Van et al. Network troubleshooting: Survey, taxonomy and challenges
JP4532253B2 (en) Frame transfer apparatus and frame loop suppression method
JP4464256B2 (en) Network host monitoring device
CN113132140B (en) Network fault detection method, device, equipment and storage medium
CN110932878A (en) Distributed network management method, device and system
CN110071843B (en) Fault positioning method and device based on flow path analysis
JP2014033242A (en) Communication system and network fault detection method
CN116708129A (en) Method, device and storage medium for link fault detection and quick recovery
CN105281929B (en) A kind of service network interface state-detection and fault-tolerant devices and methods therefor
CN116133004A (en) Link detection method, device, network equipment and network element node
US12143286B2 (en) Network monitoring device, network monitoring method, and network monitoring program
CN118075195B (en) Data center network fault rerouting method and device, electronic equipment and medium
CN114095341A (en) Network recovery method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination