CN120729696A

CN120729696A - Network fault perception method, device, network equipment and program product

Info

Publication number: CN120729696A
Application number: CN202511052242.XA
Authority: CN
Inventors: 张超迪; 唐勇
Original assignee: Maipu Communication Technology Co Ltd
Current assignee: Maipu Communication Technology Co Ltd
Priority date: 2025-07-29
Filing date: 2025-07-29
Publication date: 2025-09-30

Abstract

The present invention relates to the field of data communications and provides a network fault perception method, apparatus, network equipment, and program product. The method comprises: polling flow table entries, each flow table entry comprising a first address, a second address, and a first timestamp and a second timestamp, the first timestamp being the latest current collection time of traffic with the first address as the destination address, the second timestamp being the latest current collection time of traffic with the second address as the destination address, the first address and the second address being the source address and destination address of the same connection-oriented data flow, respectively; if the interval between the first timestamp and the second timestamp is longer than a preset silent fault timeout, then determining that a silent fault has occurred in the traffic between the first address and the second address. The present invention can achieve automatic silent fault perception.

Description

Network fault sensing method, device, network equipment and program product

Technical Field

The present invention relates to the field of network monitoring and high reliability technologies in the field of data communications, and in particular, to a network fault sensing method, device, network equipment, and program product.

Background

As the scale of networks increases, the complexity increases, and the requirements for network quality of service and reliability increase, traditional network monitoring and fault management methods increasingly expose the disadvantages. Particularly, the intelligent computing center network used in the artificial intelligence field has the characteristics of distributed computation, long-period operation, real-time response and the like, and is extremely sensitive to network faults. Users place great emphasis on business continuity and quality of service requirements, and enterprises and service providers increasingly rely on networks to support critical business processes and provide online services. Any network disruption or performance degradation can lead to serious economic loss and customer dissatisfaction. Therefore, the rapid and accurate identification and repair of various faults in a network becomes a key to maintaining service continuity and improving quality of service.

Traditional network fault convergence technology mainly relies on changes in port physical states and the like to perceive network faults. However, when a situation that the physical state of a port is normal and forwarding is not feasible occurs in the network, such as configuration errors, abnormal forwarding tables, abnormal forwarding devices and the like, any fault, namely silence faults, cannot be triggered, such faults cannot be detected in the conventional network quality detection currently, such as RTR (Ready To Receive) or BFD (Bidirectional Forwarding Detection ) detection is adopted, session messages cannot be discarded, and such network faults cannot be detected. The industry mainly uses manual intervention to repair faults after the reasons are found out, and the processing process is as long as several hours, so that the service is seriously affected.

Disclosure of Invention

The invention aims to provide a network fault sensing method, a network fault sensing device, network equipment and a program product, which can automatically identify a silent fault in a network.

Embodiments of the invention may be implemented as follows:

In a first aspect, the present invention provides a network failure awareness method, the method comprising:

Polling stream table items, wherein each stream table item comprises a first address, a second address, a first timestamp and a second timestamp, the first timestamp is the current latest collection time of traffic taking the first address as a destination address, the second timestamp is the current latest collection time of traffic taking the second address as a destination address, and the first address and the second address are the source address and the destination address of the same connection-oriented data stream respectively;

and if the interval time between the first timestamp and the second timestamp is longer than the preset silence fault timeout time, judging that the traffic between the first address and the second address has silence faults.

In an alternative embodiment, after the step of determining that the traffic between the first address and the second address has a silence failure, the method further comprises:

taking the larger timestamp of the first timestamp and the second timestamp as a target timestamp;

and determining the flow direction of the corresponding destination address in the flow table item where the target timestamp is located as a fault flow direction.

In an optional embodiment, the flow table entry further includes a plurality of outlets corresponding to each destination address, and after determining the flow direction of the corresponding destination address in the flow table where the target timestamp is located as the failure flow direction, the method further includes:

determining a path corresponding to an outlet which is currently in use in the plurality of outlets as a fault path;

selecting a corresponding path from the currently unused outlets in the plurality of outlets as a target switching path and switching the fault path to the target switching path.

In an alternative embodiment, before the step of polling the flow table entry, the method further comprises:

enabling IPFIX hardware functions in the in-out direction of the exchange chip;

setting an acquisition period of an IPFIX hardware function according to the silent fault sensing time, wherein the acquisition period is smaller than the silent fault sensing time;

and collecting the connection-oriented data stream, creating a corresponding stream table entry and updating in real time.

In a second aspect, the present invention provides a network failure sensing apparatus, the apparatus comprising:

The polling module polls stream table items, wherein each stream table item comprises a first address, a second address, a first timestamp and a second timestamp, the first timestamp is the current latest collection time of the traffic taking the first address as a destination address, the second timestamp is the current latest collection time of the traffic taking the second address as the destination address, and the first address and the second address are the source address and the destination address of the same connection-oriented data stream respectively;

And the judging module is used for judging that the flow between the first address and the second address has the silence fault if the interval time between the first timestamp and the second timestamp is longer than the preset silence fault timeout time.

In an alternative embodiment, the determining module is further configured to:

In an optional embodiment, the flow table entry further includes a plurality of outlets corresponding to each destination address, and the determining module is further configured to:

In an alternative embodiment, the apparatus further comprises an acquisition module for:

enabling IPFIX hardware functions in the in-out direction of the exchange chip;

In a third aspect, the present invention provides a network device comprising a processor and a memory, the memory being for storing a computer program, the processor being for implementing the network failure awareness method of any of the preceding embodiments when executing the computer program.

In a fourth aspect, the present invention provides a program product which, when executed by a processor, implements a network failure awareness method according to any of the preceding embodiments.

Compared with the prior art, the invention has the following beneficial effects:

The invention records the first timestamp of the current latest acquisition time of the traffic taking the first address as the destination address and the second timestamp of the current latest acquisition time of the traffic taking the second address as the destination address by utilizing the stream table item for the same connection-oriented data stream between the first address and the second address, and can judge whether the traffic between the first address and the second address has a silence fault or not by judging whether the interval time between the first timestamp and the second timestamp is longer than the preset silence fault timeout time, thereby realizing the automatic perception of the silence fault.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is an exemplary diagram of an application scenario provided in this embodiment.

Fig. 2 is a block diagram of a network device according to the present embodiment.

Fig. 3 is an exemplary diagram of functions of layers in a forwarding plane in the network failure awareness process provided in the present embodiment.

Fig. 4 is a flowchart illustrating a network failure sensing method according to the present embodiment.

Fig. 5 is a block diagram of a network fault sensing apparatus according to this embodiment.

The icons are 10-network equipment, 11-processor, 12-memory, 13-bus, 100-network fault sensing device, 110-polling module, 120-judging module and 130-collecting module.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that like reference numerals and letters refer to like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

In the description of the present invention, it should be noted that, if the terms "upper", "lower", "inner", "outer", and the like indicate an azimuth or a positional relationship based on the azimuth or the positional relationship shown in the drawings, or the azimuth or the positional relationship in which the inventive product is conventionally put in use, it is merely for convenience of describing the present invention and simplifying the description, and it is not indicated or implied that the apparatus or element referred to must have a specific azimuth, be configured and operated in a specific azimuth, and thus it should not be construed as limiting the present invention.

Furthermore, the terms "first," "second," and the like, if any, are used merely for distinguishing between descriptions and not for indicating or implying a relative importance.

It should be noted that the features of the embodiments of the present invention may be combined with each other without conflict.

Referring to fig. 1, fig. 1 is an exemplary diagram of an application scenario provided in this embodiment, in fig. 1, a server accesses a network through a Leaf device, and ECMP (Equal-Cost Multi-Path) is formed between the Leaf device and a Spine device, so as to implement traffic load sharing. ECMP is a network routing strategy that allows traffic to travel to the same destination through multiple paths of the same cost. Leaf devices, also known as Leaf node switches, are located at the edge layer of the topology, directly connected to servers, storage devices, or terminals. The Spine device, also called a Spine node switch, is located at the core layer of the topology and connects all Leaf devices to form a fully interconnected structure.

Specifically, in FIG. 1, server1 (Server 1) communicates directly with the Leaf1 device, server2 (Server 2) communicates directly with the Leaf2 device, both the Leaf1 device and the Leaf2 device communicate directly with both the Spine1 device and the Spine2 device, so there are two paths of Leaf1- > Spine1- > Leaf2 and Leaf1- > Spine2- > Leaf2 in the ECMP in the Leaf1 device, and there are two paths of Leaf2- > Spine1- > Leaf1 and Leaf2- > Spine2- > Leaf1 in the ECMP in the Leaf2 device.

Once silence fault occurs in the network, the interruption time is longer, and the upper layer service is greatly influenced. For example, for online transaction type applications, if a continuous packet loss occurs, the transaction fails, and the application performance is significantly reduced. Taking Leaf1 equipment as an example, calculating a HASH value based on characteristics such as message quintuples in a normal state, and forwarding a service flow according to a Leaf1- > Spine1- > Leaf2 path. When silence faults such as abnormal forwarding table items or faults of forwarding devices occur in the Leaf 1-Spine 1-Leaf 2 path (as shown by a fault forwarding path in fig. 1), the service flow of the Server 1-Server 2 is abnormally forwarded, and the service flow should be switched to a path shown by a switched forwarding path in fig. 1, but the conventional fault convergence technology cannot quickly identify the silence faults, and certainly cannot timely switch paths.

It should be further noted that fig. 1 is only a simple example of a network implementation, and in fact, in practical applications, the number of Leaf devices and Spine devices in the network may be greater, so that the silence failure cannot be recognized and handled in time, which may result in more serious consequences.

In view of this, the present embodiment provides an implementation manner capable of quickly sensing a silence fault in the application scenario in fig. 1, so that the Leaf1 device can timely sense the silence fault occurring in the current path, so that the forwarding path of the service flow can be quickly switched to Leaf1- > Spine2- > Leaf2, and the service is ensured to be quickly recovered. Which will be described in detail below.

First, the block example diagram of the network device 10 is provided in this embodiment, and the network device 10 may be the Leaf device in fig. 1, or may be the Spine device in fig. 1, which is not limited in this embodiment. Referring to fig. 2, fig. 2 is a block diagram of a network device according to the present embodiment, where the network device 10 implements the network fault sensing method according to the present embodiment, and the network device 10 includes a processor 11, a memory 12 and a bus 13, and the processor 11 and the memory 12 are connected through the bus 13.

The processor 11 may be an integrated circuit chip with signal processing capabilities. In the implementation process, the steps of the network failure sensing method of the present embodiment may be completed by an integrated logic circuit of hardware in the processor 11 or an instruction in the form of software. The Processor 11 may be a general-purpose Processor including a CPU (Central Processing Unit, a central processing unit), an NP (Network Processor, a network Processor), a DSP (DIGITAL SIGNAL Processor), an ASIC (Application SPECIFIC INTEGRATED Circuit), an FPGA (Field Programmable Logic GATE ARRAY, field programmable gate array) or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component.

The memory 12 is used to store a program for implementing the network failure sensing method, and the program may be a software function module stored in the memory 12 in the form of software or firmware (firmware) or solidified in an OS (Operating System) of the network device 10.

After receiving the execution instruction, the processor 11 executes a program to implement the network failure sensing method of the present embodiment.

The inventor finds out after analyzing the reason that the prior art can not quickly identify the technical obstacle of the silence fault, because the silence fault belongs to a service session level fault, and when the silence fault occurs, an RTR message and a BFD message can not be discarded and still be forwarded normally, thereby causing the missed detection of the silence fault, and the inventor finds out a new path for avoiding the missed detection of the silence fault and identifies the silence fault on a forwarding plane. The forwarding plane, also called a data plane, is responsible for fast forwarding of data, and transmits data packets from an ingress to an egress according to predefined rules (such as routing tables, forwarding tables, etc.), and the device performing the forwarding plane processing is typically a switch or a router with a hardware forwarding engine, such as a Leaf device or a Spine device in fig. 1.

Before introducing the network failure sensing method provided in this embodiment, first, the core functions that need to be implemented by each layer in the forwarding plane during the network failure sensing processing are described, referring to fig. 3, fig. 3 is an exemplary diagram of functions of each layer in the forwarding plane during the network failure sensing processing provided in this embodiment, and in fig. 3, the forwarding plane includes a control layer, a data layer and a forwarding layer. The control layer is responsible for generating forwarding rules (such as a flow table, a routing table and a policy rule) and issuing the forwarding rules to the data layer or the forwarding layer, the data layer is responsible for storing and managing forwarding table items (such as a MAC table, a flow table and the like), and the forwarding layer is responsible for executing actual data packet forwarding and performing high-speed processing according to the table items provided by the data layer. In order to timely sense the silence fault, the embodiment performs real-time flow collection at the forwarding layer, creates or updates a flow table entry at the data layer, the flow table entry is used for recording the collected flow information, analyzes the flow information in the flow table entry at the control layer, makes a fault decision according to the analysis result, determines whether the silence fault occurs, and can further perform fault path switching after the silence fault occurs.

Based on the functions provided by the layers of the forwarding plane of fig. 3, the network failure sensing method provided in this embodiment will be described in detail below. Referring to fig. 4, fig. 4 is a flowchart illustrating a network failure sensing method according to the present embodiment, and the method includes the following steps:

In step S101, the flow entries are polled, each flow entry includes a first address, a second address, a first timestamp and a second timestamp, the first timestamp is a current latest collection time of the traffic with the first address as a destination address, the second timestamp is a current latest collection time of the traffic with the second address as a destination address, and the first address and the second address are a source address and a destination address of a same connection-oriented data flow respectively.

In this embodiment, the flow between the first address and the second address refers to a complete session path formed by the source end and the destination end in the bidirectional communication, and the flow belongs to a connection-oriented data flow, and the connection-oriented data flow has a definite bidirectional interaction characteristic and has a message confirmation mechanism, that is, when a data message is lost due to a network failure, the destination end will not reply to the confirmation message, and the source end will trigger retransmission behavior. For example, the TCP protocol includes a three-way handshake establishment connection and acknowledgement mechanism, and in conjunction with fig. 1, the TCP data flow between Server1 and Server2 in fig. 1 belongs to a connection-oriented data flow.

In this embodiment, the flow table entry is used to track the data transmission states in two directions of the same data flow, so as to ensure the uniqueness and comparability of the flow records, help to avoid duplication or confusion in the subsequent fault judging process, and enable the flow information in two directions of the same data flow to track, specifically, when it is monitored that a certain data flow belongs to a connection-oriented data flow, the system merges the data in two directions of the data flow into one flow record, and uses the addresses at two ends of the flow as the first address and the second address respectively. The information recorded by the flow table entry includes, but is not limited to, a first address, a second address, and a current latest acquisition time of the traffic with the first address as a destination address, and a current latest acquisition time of the traffic with the second address as a destination address.

Step S102, if the interval time between the first timestamp and the second timestamp is longer than the preset silence fault timeout duration, determining that the silence fault occurs in the traffic between the first address and the second address.

In this embodiment, the silence failure refers to a phenomenon that, in a case where the physical layer of the network device is normal but the actual forwarding path is abnormal, the traffic cannot be transferred normally and any explicit alarm is not triggered. The silence fault timeout duration can be configured according to the actual requirements of the service on fault perception precision, and can be set to be in the millisecond level or even finer, so that high-sensitivity fault identification is realized. If the time interval between the time of sending one end address of one data stream and the time of receiving the other end address is longer than the preset silence fault timeout duration, the abnormal flow transmission between the two ends is indicated, and the silence fault occurs between the two ends.

According to the method provided by the embodiment, whether the flow between the first address and the second address has the silence fault or not is judged by judging whether the interval time between the first timestamp and the second timestamp of the connection-oriented data flow collection between the first address and the second address is longer than the preset silence fault timeout time, so that the automatic perception of the silence fault is realized.

In an optional implementation manner, after determining that the traffic between the first address and the second address has a silence fault, in order to further accurately determine a specific influence direction of the silence fault, the embodiment further provides an implementation manner of determining the fault traffic direction:

and determining the flow direction of the corresponding target address in the flow table item where the target timestamp is located as the fault flow direction.

In this embodiment, since the occurrence of the silence fault is generally represented by that the data stream in one direction cannot be normally transmitted, in the case that there is a significant difference between the first timestamp and the second timestamp, a larger timestamp indicates that there is still an effective data transmission behavior in the corresponding direction, and a smaller timestamp represents the state of data reception interruption in the other direction. Therefore, the traffic direction with the destination address as the destination address is determined as the failure traffic direction, that is, it means that there may be an abnormality in the data flow path sent from the communication partner to the destination address. For example, the first address and the second address are respectively an address 1 and an address 2, the first timestamp and the second timestamp are respectively a timestamp 1 and a timestamp 2, wherein the interval time between the timestamp 1 and the timestamp 2 is longer than the preset silence fault timeout time, and the timestamp 1 is smaller than the timestamp 2, which means that the direction of the timestamp 1 indicates that the corresponding destination address (i.e. the direction taking the address 1 as the destination address) does not receive the expected message, which indicates that the traffic from the address 2 to the address 1 has the packet loss phenomenon, and the fault point is located in the path of the address 2 sent to the address 1.

In an alternative embodiment, in addition to determining a silence fault and a fault traffic direction of the silence fault, in order to improve reliability of network transmission, this embodiment further provides an implementation manner of automatically switching paths where the silence fault occurs:

determining a path corresponding to an outlet currently in use in the plurality of outlets as a fault path;

and selecting a corresponding path from the currently unused outlets in the plurality of outlets as a target switching path and switching the fault path to the target switching path.

In this embodiment, the multiple outlets of the destination address may be physical or logical interfaces that the network device depends on when forwarding data, and are used to indicate a specific transmission channel for traffic from the local device to the next-hop node. In the application scenario shown in fig. 1, when the Leaf1 device detects that a silence fault occurs in a forward data flow between the Server1 and the Server2 due to a forwarding abnormality of a Spine1 node, the system determines a path Leaf1- > Spine1- > Leaf2 originally forwarded through the Spine1 as a fault path, and selects one of other available paths as a target switching path, for example, the Leaf1- > Spine2- > Leaf2 is taken as the target switching path, so that the fault path is switched to the target switching path, and the target switching path performs traffic forwarding between the subsequent Server1 and the Server 2.

In an optional implementation manner, in order to enable the flow table entry to reflect the latest flow state, so as to ensure that the silence fault is accurately determined based on the flow table entry, reasonable management and update are required for the flow table entry, and this embodiment further provides an implementation manner for managing the flow table entry:

(1) Collecting and monitoring flow;

(2) If the monitoring flow comprises a target flow facing the connection and no flow table item exists, creating a flow table item and updating the flow table item according to the target flow;

(3) If the monitored flow includes a connection-oriented target flow and a flow entry exists, updating the flow entry according to the target flow.

In this embodiment, only the connection-oriented traffic (such as the data traffic carried by the TCP protocol) is analyzed and processed, so that the interference of the invalid traffic can be greatly reduced. On the premise that the monitoring flow comprises the target flow facing the connection, if no flow table item exists, the flow table item is firstly established, then the flow table item is updated according to the target flow, and if the flow table item exists, the information in the existing flow table item is updated to reflect the latest flow state.

In a specific implementation, according to different network scales, the network may include flows of multiple bidirectional flows, whether the corresponding flows of the bidirectional flows are added into the flow table entry for the first time or not can be determined according to the first address and the second address, if yes, the corresponding first address and the second address and the acquisition time of the flows are added into the flow table entry, otherwise, only the first timestamp and the second timestamp of the corresponding flows in the flow table entry need to be updated. For example, the network includes the traffic between the address 1 and the address 2 and the traffic between the address 3 and the address 4, when the traffic between the address 1 and the address 2 is acquired for the first time, the address 1, the address 2 and the corresponding first timestamp and the corresponding second timestamp need to be added to the flow table entry, when the traffic between the address 1 and the address 2 is acquired for the subsequent time, the traffic information between the address 1 and the address 2 is known according to the address 1 and the address 2, and only the first timestamp and the second timestamp corresponding to the address 1 and the address 2 need to be updated according to the acquisition time.

In an alternative implementation manner, in order to make full use of the functions provided by the existing hardware, the embodiment is based on an implementation manner of monitoring traffic collection by using an IPFIX (IP Flow Information Export ) function provided by a switching chip of the network device:

enabling IPFIX hardware functions in the in-out direction of the exchange chip;

And collecting the connection-oriented data flow, creating a corresponding flow table item and updating in real time.

In this embodiment, IPFIX is a technique for providing packet statistics based on "flow", and is used to complete statistics and derivation of flow information. The silence fault sensing time refers to the maximum tolerance time required by a user to expect to detect silence faults, and the IPFIX acquisition period is set to be smaller than the value of the sensing time so as to ensure that relevant flow change information can be captured as soon as possible after the faults occur, thereby meeting the requirement of quick response to the faults. For example, if the expected silence fault perception time is 100 milliseconds, the IPFIX acquisition period may be set to 80 milliseconds or less to increase the sensitivity of fault identification.

In this embodiment, the IPFIX function of the switch chip of the network device in and out direction is turned on to ensure that bidirectional traffic is collected. And the network equipment judges whether the flow is a connection-oriented data flow or not after receiving the flow information reported by the exchange chip, and ensures that only the connection-oriented data flow is collected.

In an alternative embodiment, according to the target flow, the flow information already recorded in the flow table entry may be updated, or the flow information may be added to the flow table entry, and the present embodiment provides two processing methods in two cases respectively:

if the flow table item does not comprise the first timestamp and/or the second timestamp, adding a first address and the first timestamp and/or the second address and the second timestamp into the flow table item according to the target flow;

In this embodiment, the flow table entry does not include the first timestamp and/or the second timestamp, which means that the information of the target flow is collected for the first time and is not added to the flow table entry, so that the first address, the first timestamp, the second address and the second timestamp need to be added to the flow table entry, and a specific adding manner is as follows:

taking an acquisition time stamp of the traffic taking the first address as a destination address in the target traffic as a first time stamp, and adding the first address and the first time stamp to a flow table item;

And taking the acquisition time stamp of the traffic taking the second address as a destination address in the target traffic as a second time stamp, and adding the second address and the second time stamp to the flow table entry.

In this embodiment, as an implementation manner, the values of the source address and the destination address related to the target traffic may be compared, and the smaller address value is defined as the first address, and the larger address is defined as the second address. Therefore, whether the target traffic is sent to the second address by the first address or sent to the first address by the second address, the traffic information in both directions can be correctly identified and recorded in the flow table entry.

And if the flow table item comprises the first timestamp and/or the second timestamp, updating the first timestamp and/or the second timestamp according to the target flow.

In this embodiment, if the flow entry includes the first timestamp, this means that the target flow is not the first collected flow, in which case, a new entry does not need to be added to the flow entry, but the first timestamp needs to be updated to be the collection time of the target flow, and the second timestamp is similar to the first timestamp. For example, when the current flow table entry is empty, the target flow acquired at t1 is bidirectional flow between address 1 and address 2, the first address and the second address are address 1 and address 2, respectively, the first timestamp and the second timestamp are both t1, at this time, address 1, address 2, the first timestamp and the second timestamp need to be added to the flow table entry, the target flow acquired at t2 is flow sent from address 1 to address 2, then the second timestamp needs to be updated to t2, and the first timestamp does not need to be updated.

In order to perform the above embodiments and the corresponding steps in each possible implementation, an implementation of the network failure sensing device 100 is given below. Referring to fig. 5, fig. 5 is a block diagram of a network fault sensing device according to the present embodiment, and it should be noted that the basic principle and the technical effects of the network fault sensing device 100 according to the present invention are the same as those of the corresponding embodiments, and the description of the embodiment is omitted.

The network fault sensing device 100 includes a polling module 110, a determining module 120, and an acquisition module 130.

A polling module 110, configured to poll flow entries, where each flow entry includes a first address, a second address, and a first timestamp and a second timestamp, where the first timestamp is a current latest collection time of traffic with the first address as a destination address, and the second timestamp is a current latest collection time of traffic with the second address as a destination address, and the first address and the second address are a source address and a destination address of a same connection-oriented data flow respectively;

and the judging module 120 is configured to judge that the traffic between the first address and the second address has a silence fault if the interval time between the first timestamp and the second timestamp is longer than the preset silence fault timeout time.

In an alternative embodiment, the determining module 120 is further configured to:

In an alternative embodiment, the flow table entry further includes a plurality of outlets corresponding to each destination address, and the determining module 120 is further configured to:

In an alternative embodiment, the acquisition module 130 is configured to:

enabling IPFIX hardware functions in the in-out direction of the exchange chip;

Setting an acquisition period of the IPFIX hardware function according to the silent fault sensing time, wherein the acquisition period is smaller than the silent fault sensing time;

The present embodiment provides a program product which, when executed by a processor, implements a network failure awareness method as described in the foregoing embodiments.

In summary, the embodiment of the invention provides a network fault sensing method, a device, a network device and a program product, wherein the method comprises the steps of polling stream table entries, each stream table entry comprises a first address, a second address, a first timestamp and a second timestamp, the first timestamp is the current latest collection time of traffic taking the first address as a destination address, the second timestamp is the current latest collection time of traffic taking the second address as the destination address, the first address and the second address are the source address and the destination address of the same connection-oriented data stream respectively, and if the interval time between the first timestamp and the second timestamp is longer than the preset silent fault timeout time, the traffic between the first address and the second address is judged to have a silent fault. Compared with the prior art, the method and the device have the advantages that (1) whether the flow between the first address and the second address has the silence fault or not is judged by judging whether the interval time between the first timestamp and the second timestamp of the acquisition of the connection-oriented data flow between the first address and the second address is longer than the preset silence fault timeout time, so that the silence fault is automatically perceived, the fault flow direction is rapidly determined according to the size between the first timestamp and the second timestamp, (3) a fault path is determined and the fault path switching is carried out according to the outlets of the first address and the second address recorded in the flow table entry, the local end fault is perceived, the opposite end fault is perceived, the flow acquisition is carried out through IPFIX, the service session level fault perception is supported, the silence fault is rapidly perceived in a forwarding plane, the fault switching convergence performance is improved, the switching of the paths is based on specific service switching, the outlets of the corresponding routes are not required to be switched, the fault is carried out on the basis of the existing architecture, the flow table entry is recorded, the additional cost is avoided based on the flow table entry, and the fault resource is avoided.

The above description is merely illustrative of various embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think about variations or substitutions within the scope of the present invention, and the invention is intended to be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A network fault perception method, characterized in that the method comprises:

Polling flow table entries, each flow table entry includes a first address, a second address, and a first timestamp and a second timestamp, the first timestamp is the current latest collection time of the traffic with the first address as the destination address, the second timestamp is the current latest collection time of the traffic with the second address as the destination address, the first address and the second address are respectively the source address and destination address of the same connection-oriented data flow;

If the interval between the first timestamp and the second timestamp is longer than a preset silent fault timeout, it is determined that a silent fault occurs in the traffic between the first address and the second address.

2. The method according to claim 1, characterized in that after the step of determining that a silent failure occurs in the traffic between the first address and the second address, the method further comprises:

The larger of the first timestamp and the second timestamp is used as the target timestamp;

The flow direction of the destination address corresponding to the flow table entry where the target timestamp is located is determined as the fault flow direction.

3. The method according to claim 2, wherein the flow table entry further includes multiple egresses corresponding to each destination address, and after determining the traffic direction of the destination address corresponding to the flow table containing the target timestamp as the fault traffic direction, the method further includes:

Determining a path corresponding to an exit currently in use among the multiple exits as a faulty path;

A corresponding path is selected from the multiple egresses that are currently unused as a target switching path, and the faulty path is switched to the target switching path.

4. The method according to claim 1, characterized in that before the step of polling flow table entries, the method further comprises:

Enable IPFIX hardware function in the inbound and outbound directions of the switch chip;

Setting a collection period for the IPFIX hardware function according to the silent fault perception time, wherein the collection period is less than the silent fault perception time;

The connection-oriented data stream is collected and a corresponding flow table entry is created and updated in real time.

5. A network fault sensing device, characterized in that the device comprises:

A polling module, polling flow table entries, each flow table entry including a first address, a second address, and a first timestamp and a second timestamp, wherein the first timestamp is the latest current collection time of the flow with the first address as the destination address, and the second timestamp is the latest current collection time of the flow with the second address as the destination address, and the first address and the second address are respectively the source address and destination address of the same connection-oriented data flow;

The determination module is configured to determine that a silent fault occurs in the traffic between the first address and the second address if the interval between the first timestamp and the second timestamp is greater than a preset silent fault timeout.

6. The device according to claim 5, wherein the determination module is further configured to:

7. The device according to claim 6, wherein the flow table entry further includes multiple egresses corresponding to each destination address, and the determination module is further configured to:

8. The device according to claim 6, further comprising a collection module, wherein the collection module is configured to:

9. A network device, characterized by comprising a processor and a memory, wherein the memory is used to store a computer program, and the processor is used to implement the network fault perception method according to any one of claims 1 to 4 when executing the computer program.

10. A program product, characterized in that when the program product is executed by a processor, it implements the network fault perception method according to any one of claims 1 to 4.