WO2024227123A1 - Load-aware packet size distribution measurement in a network device - Google Patents
Load-aware packet size distribution measurement in a network device Download PDFInfo
- Publication number
- WO2024227123A1 WO2024227123A1 PCT/US2024/026724 US2024026724W WO2024227123A1 WO 2024227123 A1 WO2024227123 A1 WO 2024227123A1 US 2024026724 W US2024026724 W US 2024026724W WO 2024227123 A1 WO2024227123 A1 WO 2024227123A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- network device
- packets
- threshold
- distribution information
- load
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/36—Flow control; Congestion control by determining packet size, e.g. maximum transfer unit [MTU]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0876—Network utilisation, e.g. volume of load or congestion level
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/16—Threshold monitoring
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L49/00—Packet switching elements
- H04L49/65—Re-configuration of fast packet switches
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0876—Network utilisation, e.g. volume of load or congestion level
- H04L43/0894—Packet rate
Definitions
- the present disclosure relates generally to communication networks, and more particularly to power saving techniques for use within a network device.
- a computer network is a set of computing components interconnected by communication links.
- Each computing component may be a separate computing device, such as, without limitation, a hub, a network switch, a bridge, a router, a server, a gateway, or personal computer, or a component thereof.
- Each computing component, or “network device,” is considered to be a node within the network.
- a communication link is a mechanism of connecting at least two nodes such that each node may transmit data to and receive data from the other node. Such data may be transmitted in the form of signals over transmission media such as, without limitation, electrical cables, optical cables, or wireless media.
- the structure and transmission of data between nodes is governed by a number of different protocols. There may be multiple layers of protocols, typically beginning with a lowest layer, such as a “physical” layer that governs the transmission and reception of raw bit streams as signals over a transmission medium. Each layer defines a data unit (the protocol data unit, or “PDU”), with multiple data units at one layer combining to form a single data unit in another.
- PDU protocol data unit
- Additional examples of layers may include, for instance, a data link layer in which bits defined by a physical layer are combined to form a frame or cell, a network layer in which frames or cells defined by the data link layer are combined to form a packet, and a transport layer in which packets defined by the network layer are combined to form a Transmission Control Protocol (TCP) segment or a User Datagram Protocol (UDP) datagram.
- TCP Transmission Control Protocol
- UDP User Datagram Protocol
- IP Internet Protocol
- IP Internet Protocol
- a given node in a network may not necessarily have a link to each other node in the network, particularly in more complex networks.
- each node may only have a limited number of physical ports into which cables may be plugged to create links.
- Certain “terminal” nodes often servers or end-user devices — may only have one or a handful of ports.
- Other nodes such as switches, hubs, or routers, may have a great deal more ports, and typically are used to relay information between the terminal nodes.
- the arrangement of nodes and links in a network is said to be the topology of the network, and is typically visualized as a network graph or tree.
- a given node in the network may communicate with another node in the network by sending data units along one or more different “paths” through the network that lead to the other node, each path including any number of intermediate nodes.
- the transmission of data across a computing network typically involves sending units of data, such as packets, cells, or frames, along paths through intermediary networking devices, such as switches or routers, that direct or redirect each data unit towards a corresponding destination.
- an intermediary networking device may perform any of a variety of actions, or processing steps, with the data unit.
- the exact set of actions taken will depend on a variety of characteristics of the data unit, such as metadata found in the header of the data unit, and in many cases the context or state of the network device.
- address information specified by or otherwise associated with the data unit such as a source address, destination address, a virtual local area network (VLAN) identifier, path information, etc., is typically used to determine how to handle a data unit (i.e., what actions to take with respect to the data unit).
- an IP data packet may include a destination IP address field within the header of the IP data packet, based upon which a network router may determine one or more other networking devices, among a number of possible other networking devices, to which the IP data packet is to be forwarded.
- control packets such as packets for setting up a connection, tearing down a connection, acknowledgment packets, etc.
- data packets tend to be relatively small, such as less than 100 bytes
- data packets tend to be significantly larger than control packets, often exceeding 1000 bytes.
- data packets with video data are typically more than 1000 bytes
- data packets with audio data are typically smaller, e.g., several hundred bytes.
- the percentage of packets with relatively small packet sizes tends to be high, whereas when the connection is up and running the percentage of packets with relatively small packet sizes tends to be low.
- a network device analyzes headers of data units (e.g., packets) to determine how to handle the data units. For example, a network device having multiple ports coupled to multiple network links, such as a network switch, a bridge, a router, a gateway, etc., will analyze a header of a received data unit to determine one or more ports via which the data unit is to be transmitted. For a given data rate, the processing load of a network device is higher for small data units as compared to large data units because a rate at which the network device receives packet headers is higher with small data units as compared to large data units (for a given data rate). In other words, given a same data rate, the processing load of a network device varies depending on the relative amounts of packets with small sizes and packets with large sizes.
- data units e.g., packets
- a method for controlling operation of a network device includes: determining, at the network device, a load metric corresponding to a processing load of the network device; in response to determining that the load metric meets a first threshold, beginning, at the network device, measuring distribution information regarding a distribution of sizes of packets processed by the network device; ending, at the network device, measuring the distribution information regarding the distribution of sizes of packets processed by the network device; and using, at the network device, the distribution information to control the network device.
- a network device comprises: a plurality of network interfaces; a packet processor configured to process data units received via the plurality of network interfaces to determine network interfaces, among the plurality of network interfaces, that are to transmit the data units; first circuitry that is configured to determine a load metric corresponding to a processing load of the network device; second circuitry that is configured to: in response to determining that the load metric meets a first threshold, begin measuring distribution information regarding a distribution of sizes of packets processed by the network device, and end measuring the distribution information regarding the distribution of sizes of packets processed by the network device; and a controller configured to use the distribution information to control the network device.
- FIG. 1 is a simplified diagram of an example networking system in which load-aware packet size distribution (PSD) information generation techniques described herein are practiced, according to an embodiment.
- PSD packet size distribution
- FIG. 2A is a simplified diagram of an example network device in which PSD information generation techniques are utilized, according to an embodiment.
- FIG. 2B is another simplified diagram of the example network device of Fig. 2A, according to an embodiment.
- FIG. 3 is a simplified block diagram of an example PSD information generation circuitry, according to an embodiment.
- Fig. 4 is a is a graph showing an illustrative example of PSD information measured at different processing load levels of network device, according to an embodiment.
- Fig. 5 is a simplified example state diagram for circuitry that controls generation of PSD information in a network device, according to an embodiment.
- Fig. 6 is another simplified example state diagram for circuitry that controls generation of PSD information in a network device, according to another embodiment.
- FIG. 7 is a simplified flow diagram of an example method for controlling a network device based on PSD measurements, according to an embodiment. Detailed Description
- Network device power consumption is typically most critical when the network device (or a portion thereof) is fully loaded. During time periods when the network device is experiencing high loading, packet processors and data paths of the network device are heavily stressed. Often, the percentage of small-sized packets is larger during time periods of low loading and smaller during time periods of high loading. However, current network devices do not differentiate the collection of packet size distribution information between periods of high loading and periods of low loading. As a result, the packet size distribution information generated by current network devices is typically measured across periods of both high and low loading and therefore often does not accurately portray packet size distribution at times of high loading. Thus, the packet size distribution information generated by current network devices has low utility for reducing power consumption of a network device during periods of high loading.
- Packet size distribution information is useful for controlling the network device to more optimally adjust operation of the network device, at least in some embodiments.
- the network device may identify packet flows contributing high amounts of packets with small packet sizes, and may adjust the processing of those flows to redistribute processing and/or memory resources amongst packet flows, and/or to reduce power consumption of the network device.
- Networking system 100 comprises a plurality of interconnected nodes HO a-HO n (collectively nodes 110), each implemented by a different computing device.
- a node 110 may be a single networking computing device, such as a router or switch, in which some or all of the processing components described herein are implemented in applicationspecific integrated circuits (ASICs), field programmable gate arrays (FPGAs), or other integrated circuit(s).
- ASICs applicationspecific integrated circuits
- FPGAs field programmable gate arrays
- a node 110 may include one or more memories storing machine- readable instructions for implementing various components described herein, one or more hardware processors configured to execute the instructions stored in the one or more memories, and various data repositories in the one or more memories for storing data structures utilized and manipulated by the various components.
- Each node 110 is connected to one or more other nodes 110 in network 100 by one or more communication links, depicted as lines between nodes 110.
- the communication links may be any suitable wired cabling or wireless links. Note that system 100 illustrates only one of many possible arrangements of nodes within a network. Other networks may include fewer or additional nodes 110 having any number of links between them.
- each node 110 may or may not have a variety of other functions, in an embodiment, each node 110 is configured to send, receive, and/or relay data to one or more other nodes 110 via communication links.
- data is communicated as a series of discrete units or structures of data represented by signals transmitted over the communication links.
- Different nodes 110 within a network 100 may send, receive, and/or relay data units at different communication levels, or layers.
- a first node 110 may send a data unit at the network layer (e.g., a TCP segment) to a second node 110 over a path that includes an intermediate node 110.
- the data unit may be broken into smaller data units (“subunits”) at various sublevels before it is transmitted from the first node 110.
- the data unit may be broken into packets, then cells, and eventually sent out as a collection of signal-encoded bits to the intermediate device.
- the intermediate node 110 may rebuild the entire original data unit before routing the information to the second node 110, or the intermediate node 110 may simply rebuild the subunits (e.g., packets or frames) and route those subunits to the second node 110 without ever composing the entire original data unit.
- the intermediate node 110 may rebuild the entire original data unit before routing the information to the second node 110, or the intermediate node 110 may simply rebuild the subunits (e.g., packets or frames) and route those subunits to the second node 110 without ever composing the entire original data unit.
- a node 110 When a node 110 receives a data unit, it typically examines addressing information within the data unit (and/or other information within the data unit) to determine how to process the data unit.
- the addressing information may include, for instance, a media access control (MAC) address, an IP address, a VLAN identifier, information within a multi -protocol label switching (MPLS) label, or any other suitable information. If the addressing information indicates that the receiving node 110 is not the destination for the data unit, the node may look up forwarding information within a forwarding database of the receiving node 110 and forward the data unit to one or more other nodes 110 connected to the receiving node 110 based on the forwarding information.
- MAC media access control
- MPLS multi -protocol label switching
- the forwarding information may indicate, for instance, an outgoing port over which to send the data unit, a header to attach to the data unit, a new destination address to overwrite in the data unit, etc.
- the forwarding information may include information indicating a suitable approach for selecting one of those paths, or a path deemed to be the best path may already be defined.
- Addressing information, flags, labels, and other metadata used for determining how to handle a data unit are typically embedded within a portion of the data unit known as the header.
- One or more headers are typically at the beginning of the data unit, and are followed by the payload of the data unit.
- a first data unit having a first header corresponding to a first communication protocol may be encapsulated in a second data unit at least by appending a second header to the first data unit, the second header corresponding to a second communication protocol.
- the second communication protocol is below the first communication protocol in a protocol stack, in some embodiments.
- a header has a structure defined by a communication protocol and comprises fields of different types, such as a destination address field, a source address field, a destination port field, a source port field, and so forth, according to some embodiments.
- the number and the arrangement of fields is fixed.
- Other protocols allow for variable numbers of fields and/or variable length fields with some or all of the fields being preceded by type information that indicates to a node the meaning of the field and/or length information that indicates a length of the field.
- a communication protocol defines a header having multiple different formats and one or more values of one or more respective fields in the header indicate to a node the format of the header. For example, a header includes a type field, a version field, etc., that indicates to which one of multiple formats that header conforms.
- Different communication protocols typically define respective headers having respective formats.
- packets data units are sometimes referred to herein as “packets,” which is a term often used to refer to data units defined by the IP.
- IP IP-to-Network Interface
- packets are a term often used to refer to data units defined by the IP.
- the approaches, techniques, and mechanisms described herein, however, are applicable to data units defined by suitable communication protocols other than the IP.
- packet should be understood to refer to any type of data structure communicated across a network, including packets as well as segments, cells, data frames, datagrams, and so forth.
- Any node in the depicted network 100 may communicate with any other node in the network 100 by sending packets through a series of nodes 110 and links, referred to as a path.
- Node B 110 b
- Node H 110 h
- another path from Node B to Node H is from Node B to Node D to Node G to Node H.
- a node 110 does not actually need to specify a full path for a packet that it sends. Rather, the node 110 may simply be configured to calculate the best path for the packet out of the device (e.g., via which one or more egress ports should send the packet to be transmitted).
- the node 110 receives a packet that is not addressed directly to the node 110, based on header information associated with a packet, such as path and/or destination information, the node 110 relays the packet along to either the destination node 110, or a “next hop” node 110 that the node 110 calculates is in a better position to relay the packet to the destination node 110, according to some embodiments.
- the actual path of a packet is product of each node 110 along the path making routing decisions about how best to move the packet along to the destination node 110 identified by the packet, according to some embodiments.
- the nodes may, on occasion, discard, fail to send, or fail to receive data units, thus resulting in the data units failing to reach their intended destination.
- the act of discarding of a data unit, or failing to deliver a data unit is typically referred to as “dropping” the data unit. Instances of dropping a data unit, referred to herein as “drops” or “packet loss,” may occur for a variety of reasons, such as resource limitations, errors, or deliberate policies.
- One or more of the nodes 110 utilize load-aware packet size distribution (PSD) measurement techniques, examples of which are described below.
- PSD packet size distribution
- Fig. 1 depicts node 1 lOd and node 110g as having load-aware PSD measurement modules that utilize PSD measurement techniques, such as described below, that involve initiating PSD measurements in response to a processing load of the node 110 meeting a condition.
- Fig. 2A is a simplified diagram of an example network device 200 in which load- aware PSD measurement techniques are utilized, according to an embodiment.
- the network device 200 is a computing device comprising any combination of i) hardware and/or ii) one or more processors executing machine-readable instructions, being configured to implement the various logical components described herein.
- the node 1 lOd and node 110g of Fig. 1 have a structure the same as or similar to the network device 200.
- the network device 200 may be one of a number of components within a node 110.
- network device 200 may be implemented on one or more integrated circuits, or “chips,” configured to perform switching and/or routing functions within a node 110, such as a network switch, a router, etc.
- the node 110 may further comprise one or more other components, such as one or more central processor units, storage units, memories, physical interfaces, LED displays, or other components external to the one or more chips, some or all of which may communicate with the one or more chips.
- the node 110 comprises multiple network devices 200.
- the network device 200 is utilized in a suitable networking system different than the example networking system 100 of Fig. 1.
- the network device 200 includes a plurality of packet processing modules 204, with each packet processing module being associated with a respective plurality of ingress network interfaces 208 (sometimes referred to herein as “ingress ports” for purposes of brevity) and a respective plurality of egress network interfaces 212 (sometimes referred to herein as “egress ports” for purposes of brevity).
- the ingress ports 208 are ports by which packets are received via communication links in a communication network
- the egress ports 212 are ports by which at least some of the packets are transmitted via the communication links after having been processed by the network device 200.
- the data units may be packets, cells, frames, or other suitable structures.
- the individual atomic data units upon which the depicted components operate are cells or frames. That is, data units are received, acted upon, and transmitted at the cell or frame level, in some such embodiments.
- These cells or frames are logically linked together as the packets to which they respectively belong for purposes of determining how to handle the cells or frames, in some embodiments.
- the cells or frames are not actually assembled into packets within device 200, particularly if the cells or frames are being forwarded to another destination through device 200, in some embodiments.
- Ingress ports 208 and egress ports 212 are depicted as separate ports for illustrative purposes, but typically correspond to the same physical network interfaces of the network device 200. That is, a single network interface acts as both an ingress port 208 and an egress port 212, in some embodiments. Nonetheless, for various functional purposes, certain logic of the network device 200 may view a single physical network interface as logically being a separate ingress port 208 and egress port 212.
- certain logic of the network device 200 may subdivide a single physical network interface into multiple ingress ports 208 or egress ports 212 (e.g., “virtual ports”), or aggregate multiple physical network interfaces into a single ingress port 208 or egress port 212 (e.g., a trunk, a link aggregate group (LAG), an equal cost multipath (ECMP) group, etc.).
- ingress ports 208 and egress ports 212 are considered distinct logical constructs that are mapped to physical network interfaces rather than simply as distinct physical constructs.
- At least some ports 208/212 are coupled to one or more transceivers (not shown in Fig. 2A), such as Serializer/Deserializer (“SerDes”) blocks.
- SerDes Serializer/Deserializer
- ingress ports 208 provide serial inputs of received data units into a SerDes block, which then outputs the data units in parallel into a packet processing module 204.
- a packet processing module 204 provides data units in parallel into another SerDes block, which outputs the data units serially to egress ports 212.
- Each packet processing module 204 comprises an ingress portion 204-xa and an egress portion 204-xb.
- the ingress portion 204-xa generally performs ingress processing operations for packets such as one of, or any suitable combination of two or more of: packet classification, tunnel termination, Layer-2 (L2) forwarding lookups, Layer-3 (L3) forwarding lookups, etc.
- the egress portion 204-xb generally performs egress processing operations for packets such as one of, or any suitable combination of two or more of: packet duplication (e.g., for multicast packets), header alteration, rate limiting, traffic shaping, egress policing, flow control, maintaining statistics regarding packets, etc.
- Each ingress portion 204-xa is communicatively coupled to multiple egress portions 204-xb via an interconnect 216.
- each egress portion 204-xb is communicatively coupled to multiple ingress portions 204-xa via the interconnect 216.
- the interconnect 216 comprises one or more switching fabrics, one or more crossbars, etc., according to various embodiments.
- an ingress portion 204-xa receives a packet via an associated ingress port 208 and performs ingress processing operations for the packet, including determining one or more egress ports 212 via which the packet is to be transmitted (sometimes referred to herein as “target ports”). The ingress portion 204-xa then transfers the packet, via the interconnect 216, to one or more egress portion 204-xb corresponding to the determined one or more target ports 212. Each egress portion 204-xb that receives the packet performs egress processing operations for the packet and then transfers the packet to one or more determined target ports 212 associated with the egress portion 204-xb for transmission from the network device 200.
- the ingress portion 204-xa determines a virtual target port and one or more egress portions 204-xb corresponding to the virtual target port map the virtual target portion to one or more physical egress ports 212. In some embodiments, the ingress portion 204- xa determines a group of target ports 212 (e.g., a trunk, a LAG, an ECMP group, etc.) and one or more egress portions 204-xb corresponding to the group of target ports selects one or more particular target egress ports 212 within the group of target ports.
- target port refers to a physical port, a virtual port, a group of target ports, etc., unless otherwise stated or apparent.
- Each packet processing module 204 is implemented using any suitable combination of fixed circuitry and/or a processor executing machine-readable instructions, such as specific logic components implemented by one or more FPGAs, ASICs, or one or more processors executing machine-readable instructions, according to various embodiments.
- At least respective portions of multiple packet processing modules 204 are implemented on a single IC (or “chip”). In some embodiments, respective portions of multiple packet processing modules 204 are implemented on different respective chips.
- components of each ingress portion 204-xa are arranged in a pipeline such that outputs of one or more components are provided as inputs to one or more other components.
- the components are arranged in a pipeline, one or more components of the ingress portion 204-xa are skipped or bypassed for certain packets.
- the components are arranged in a suitable manner that is not a pipeline.
- the exact set and/or sequence of components that process a given packet may vary, in some embodiments, depending on the attributes of the packet and/or the state of the network device 200, in some embodiments.
- components of each egress portion 204-xb are arranged in a pipeline such that outputs of one or more components are provided as inputs to one or more other components.
- the components are arranged in a pipeline, one or more components of the egress portion 204-xb are skipped or bypassed for certain packets.
- the components are arranged in a suitable manner that is not a pipeline. The exact set and/or sequence of components that process a given packet may vary, in some embodiments, depending on the attributes of the packet and/or the state of the network device 200, in some embodiments.
- Each ingress portion 204-xa includes circuitry 220 (sometimes referred to herein as “ingress arbitration circuitry”) that is configured to reduce traffic loss during periods of bursty traffic and/or other congestion.
- the ingress arbitration circuitry 220 is configured to function in a manner that facilitates economization of the sizes, numbers, and/or qualities of downstream components within the packet processing module 204 by more intelligently controlling the release of data units to these components.
- the ingress arbitration circuitry 220 is further configured to support features such as lossless protocols and cut-through switching while still permitting high rate bursts from ports 208.
- the ingress arbitration circuitry 220 is coupled to an ingress buffer memory 224 that is configured to temporarily store packets that are received via the ports 208 while components of the packet processing module 204 process the packets.
- Each data unit received by the ingress portion 204-xa is stored in one or more entries within one or more buffers, which entries are marked as utilized to prevent newly received data units from overwriting data units that are already buffered in the buffer memory 224.
- the one or more entries in which a data unit is buffered in the ingress buffer memory 224 are then marked as available for storing newly received data units, in some embodiments.
- Each buffer may be a portion of any suitable type of memory, including volatile memory and/or non-volatile memory.
- the ingress buffer memory 224 comprises a single-ported memory that supports only a single input/output (I/O) operation per clock cycle (i.e., either a single read operation or a single write operation). Single-ported memories are utilized for higher operating frequency, though in other embodiments multi-ported memories are used instead.
- the ingress buffer memory 224 comprises multiple physical memories that are capable of being accessed concurrently in a same clock cycle, though full realization of this capability is not necessary.
- each buffer is a distinct memory bank, or set of memory banks.
- different buffers are different regions within a single memory bank.
- each buffer comprises many addressable “slots” or “entries” (e.g., rows, columns, etc.) in which data units, or portions thereof, may be stored.
- buffers in the ingress buffer memory 224 comprises a variety of buffers or sets of buffers, each utilized for varying purposes and/or components within the ingress portion 204-xa.
- the ingress portion 204-xa comprises a buffer manager (not shown) that is configured to manage use of the ingress buffers 224.
- the buffer manager performs, for example, one of or any suitable combination of the following: allocates and deallocates specific segments of memory for buffers, creates and deletes buffers within that memory, identifies available buffer entries in which to store a data unit, maintains a mapping of buffers entries to data units stored in those buffers entries (e.g., by a packet sequence number assigned to each packet when the first data unit in that packet was received), marks a buffer entry as available when a data unit stored in that buffer is dropped, sent, or released from the buffer, determines when a data unit is to be dropped because it cannot be stored in a buffer, performs garbage collection on buffer entries for data units (or portions thereof) that are no longer needed, etc., in various embodiments.
- the buffer manager includes buffer assignment logic (not shown) that is configured to identify which buffer, among multiple buffers in the ingress buffer memory 224, should be utilized to store a given data unit, or portion thereof, according to an embodiment.
- each packet is stored in a single entry within its assigned buffer.
- a packet is received as, or divided into, constituent data units such as fixed-size cells or frames, and the constituent data units are stored separately (e.g., not in the same location, or even the same buffer).
- the ingress arbitration circuitry 220 is also configured to maintain ingress queues 228, according to some embodiments, which are used to manage the order in which data units are processed from the buffers in the ingress buffer memory 224.
- Each data unit, or the buffer locations(s) in which the data unit is stored, is said to belong to one or more constructs referred to as queues.
- a queue is a set of memory locations (e.g., in the ingress buffer memory 224) arranged in some order by metadata describing the queue.
- the memory locations may (and often are) non-contiguous relative to their addressing scheme and/or physical or logical arrangement.
- the sequence of constituent data units as arranged in a queue generally corresponds to an order in which the data units or data unit portions in the queue will be released and processed.
- Such queues are known as first-in-first-out (“FIFO”) queues, though in other embodiments other types of queues may be utilized.
- FIFO first-in-first-out
- the ingress portion 204-xa also includes an ingress packet processor 232 that is configured to perform ingress processing operations for packets such as one of, or any suitable combination of two or more of packet classification, tunnel termination, L2 forwarding lookups, L3 forwarding lookups, etc., according to various embodiments.
- the ingress packet processor 232 includes an L2 forwarding database and/or an L3 forwarding database, and the ingress packet processor 232 performs L2 forwarding lookups and/or L3 forwarding lookups to determine target ports for packets. In some embodiments, the ingress packet processor 232 uses header information in packets to perform L2 forwarding lookups and/or L3 forwarding lookups.
- the ingress arbitration circuitry 220 is configured to release a certain number of data units (or portions of data units) from ingress queues 228 for processing (e.g., by the ingress packet processor 232) or for transfer (e.g., via the interconnect 216) each clock cycle or other defined period of time.
- the next data unit (or portion of a data unit) to release may be identified using one or more ingress queues 228.
- respective ingress ports 208 or respective groups of ingress ports 208) are assigned to respective ingress queues 228, and the ingress arbitration circuitry 220 selects queues 228 from which to release one or more data units (or portions of data units) according to a selection scheme, such as a round-robin scheme or another suitable selection scheme, in some embodiments.
- the ingress arbitration circuitry 220 selects a data unit (or a portion of a data unit) from a head of a FIFO ingress queue 228, which corresponds to a data unit (or portion of a data unit) that has been in the FIFO ingress queue 228 for a longest time, in some embodiments.
- Transferring a data unit from an ingress portion 204-xa to an egress portions 204-xb comprises releasing (or dequeuing) the data unit and transferring the data unit to the egress portion 204-xb via the interconnect 216, according to an embodiment.
- the egress portion 204-xb comprises circuitry 248 (sometimes referred to herein as “traffic manager circuitry 248”) that is configured to control the flow of data units from the ingress portions 204-xa to one or more other components of the egress portion 204-xb.
- the egress portion 204-xb is coupled to an egress buffer memory 252 that is configured to store egress buffers.
- a buffer manager (not shown) within the traffic manager circuitry 248 temporarily stores data units received from one or more ingress portions 204-xa in egress buffers as they await processing by one or more other components of the egress portion 204-xb.
- the buffer manager of the traffic manager circuitry 248 is configured to operate in a manner similar to the buffer manager of the ingress arbitration circuitry 220 discussed above.
- the egress buffer memory 252 (and buffers of the egress buffer memory 252) is structured the same as or similar to the ingress buffer memory 224 (and buffers of the ingress buffer memory 224) discussed above.
- each data unit received by the egress portion 204-xb is stored in one or more entries within one or more buffers, which entries are marked as utilized to prevent newly received data units from overwriting data units that are already buffered in the egress buffer memory 252.
- the one or more entries in which the data unit is buffered in the egress buffer memory 252 are then marked as available for storing newly received data units, in some embodiments.
- buffers in the egress buffer memory 252 comprises a variety of buffers or sets of buffers, each utilized for varying purposes and/or components within the egress portion 204-xb.
- the buffer manager (not shown) is configured to manage use of the egress buffers 252.
- the buffer manager performs, for example, one of or any suitable combination of the following: allocates and deallocates specific segments of memory for buffers, creates and deletes buffers within that memory, identifies available buffer entries in which to store a data unit, maintains a mapping of buffers entries to data units stored in those buffers entries (e.g., by a packet sequence number assigned to each packet when the first data unit in that packet was received), marks a buffer entry as available when a data unit stored in that buffer is dropped, sent, or released from the buffer, determines when a data unit is to be dropped because it cannot be stored in a buffer, performs garbage collection on buffer entries for data units (or portions thereof) that are no longer needed, etc., in various embodiments.
- the traffic manager circuitry 248 is also configured to maintain egress queues 256, according to some embodiments, that are used to manage the order in which data units are processed from the egress buffers 252.
- the egress queues 256 are structured the same as or similar to the ingress queues 228 discussed above.
- different egress queues 256 may exist for different destinations. For example, each port 212 is associated with a respective set of one or more egress queues 256.
- the egress queue 256 to which a data unit is assigned may, for instance, be selected based on forwarding information indicating the target port determined for the packet.
- different egress queues 256 correspond to respective flows or sets of flows. That is, packets for each identifiable traffic flow or group of traffic flows is assigned a respective set of egress queues 256. In some embodiments, different egress queues 256 correspond to different classes of traffic, QoS levels, etc.
- egress queues 256 correspond to respective egress ports 212 and/or respective priority sets.
- a respective set of multiple queues 256 corresponds to each of at least some of the egress ports 212, with respective queues 256 in the set of multiple queues 256 corresponding to respective priority sets.
- the traffic manager circuitry 248 stores (or “enqueues”) the packets in egress queues 256.
- the ingress buffer memory 224 corresponds to a same or different physical memory as the egress buffer memory 252, in various embodiments. In some embodiments in which the ingress buffer memory 224 and the egress buffer memory 252 correspond to a same physical memory, ingress buffers 224 and egress buffers 252 are stored in different portions of the same physical memory, allocated to ingress and egress operations, respectively.
- ingress buffers 224 and egress buffers 252 include at least some of the same physical buffers, and are separated only from a logical perspective.
- metadata or internal markings may indicate whether a given individual buffer entry belongs to an ingress buffer 224 or egress buffer 252.
- ingress buffers 224 and egress buffers 252 may be allotted a certain number of entries in each of the physical buffers that they share, and the number of entries allotted to a given logical buffer is said to be the size of that logical buffer.
- the data unit when a packet is transferred from the ingress portion 204-xa to the egress portion 204-xb within a same packet processing module 204, instead of copying the packet from an ingress buffer entry to an egress buffer, the data unit remains in the same buffer entry, and the designation of the buffer entry (e.g., as belonging to an ingress queue versus an egress queue) changes with the stage of processing.
- the egress portion 204-xb also includes an egress packet processor 268 that is configured to perform egress processing operations for packets such as one of, or any suitable combination of two or more of: packet duplication (e g., for multicast packets), header alteration, rate limiting, traffic shaping, egress policing, flow control, maintaining statistics regarding packets, etc., according to various embodiments.
- packet duplication e.g., for multicast packets
- header alteration e.g., rate limiting, traffic shaping, egress policing, flow control, maintaining statistics regarding packets, etc.
- the egress packet processor 268 modifies header information in the egress buffers 252, in some embodiments.
- the egress packet processor 268 is coupled to a group of egress ports 212 via egress arbitration circuitry 272 that is configured to regulate access to the group of egress ports 212 by the egress packet processor 268.
- the egress packet processor 268 is additionally or alternatively coupled to suitable destinations for packets other than egress ports 212, such as one or more internal central processing units (not shown), one or more storage subsystems, etc.
- the egress packet processor 268 may replicate a data unit one or more times.
- a data unit may be replicated for purposes such as multicasting, mirroring, debugging, and so forth.
- a single data unit may be replicated, and stored in multiple egress queues 256.
- certain techniques described herein may refer to the original data unit that was received by the network device 200, it will be understood that those techniques will equally apply to copies of the data unit that have been generated by the network device for various purposes.
- a copy of a data unit may be partial or complete.
- egress buffers 252 there may be an actual physical copy of the data unit in egress buffers 252, or a single copy of the data unit 252 may be linked from a single buffer location (or single set of locations) in the egress buffers 252 to multiple egress queues 256.
- Fig. 2B is another simplified block diagram of the network device 200, according to an embodiment.
- the network device 200 also includes one or more central processing units (CPUs) 276.
- the one or more CPUs 276 are configured to perform management functions for the network device 200, such as configuration of the packet processing modules 204, optimization of the network device 200, data collection, statistics collection, etc.
- the CPU(s) 276 are coupled to one or more memories 278 that store machine- readable instructions, and the CPU(s) 276 are configured to execute the machine-readable instructions.
- the ingress arbitration circuitry 220 includes one or more load-aware PSD modules 280.
- Each load-aware PSD module 280 is configured to initiate measuring PSD information regarding a distribution of sizes of packets processed by the network device in response to determining that a processing load of the network device meets a condition.
- the processing load is represented by a suitable load metric.
- the load metric corresponds to an individual entity corresponding to the ingress portion 204-xa, such as i) a rate at which data is being received at a port 208, ii) a fill level of an ingress queue 228, iii) a length of an ingress queue 228, iv) a time delay between when a packet is added to an ingress queue 228 and when the packet is dequeued from the ingress queue 228, v) an occupancy level of an ingress buffer 224, etc.
- the load-aware PSD module 280 is configured to determine when a processing load of the network device meets a condition at least by comparing an individual load metric to a threshold.
- the load-aware PSD module 280 is configured to determine when a processing load of the network device meets a condition at least by comparing multiple individual load metrics to a threshold (or multiple respective different thresholds), and determining whether any of the multiple individual load metrics meet the corresponding threshold(s). In other embodiments, the load-aware PSD module 280 is configured to determine when a processing load of the network device meets a condition at least by comparing multiple individual load metrics to a threshold (or multiple respective different thresholds), and determining whether all of the multiple individual load metrics meet the corresponding threshold(s).
- the load metric is a suitable mathematical combination of two or more suitable individual load metrics such as described above, and the load-aware PSD module 280 is configured to determine when a processing load of the network device meets a condition at least by comparing the mathematical combination of two or more suitable individual load metrics to a threshold.
- the load-aware PSD module 280 is configured to measure PSD information corresponding to an individual entity corresponding to the ingress portion 204-xa, such as PSD information regarding packets received by a port 208, packets stored in an ingress queue 228, packets stored in an ingress buffer 224, etc. For example, when the load metric corresponds to an individual entity, the load-aware PSD module 280 measures PSD information regarding the individual entity, in an embodiment.
- the load-aware PSD module 280 is configured, additionally or alternatively, to measure PSD information corresponding to a group of entities corresponding to the ingress portion 204-xa, such as PSD information regarding packets received by a set of multiple ports 208, packets stored in a set of multiple ingress queues 228, packets stored in a set of multiple ingress buffers 224, etc. For example, when the load-aware PSD module 280 determines whether any of multiple individual load metrics corresponding to multiple entities meet a corresponding threshold(s), the load-aware PSD module 280 measures PSD information regarding all of the multiple entities, in an embodiment.
- each ingress arbitration circuitry 220 includes one load- aware PSD module 280
- each of at least one ingress arbitration circuitry 220 includes multiple load-aware PSD modules 280, in some embodiments.
- Two or more ingress arbitration circuitry 220 include different numbers of multiple load-aware PSD modules 280, in some embodiments.
- At least one ingress arbitration circuitry 220 does not include any load-aware PSD modules 280, in some embodiments.
- the traffic manager circuitry 248 includes one or more load-aware PSD modules 284.
- the load-aware PSD modules 284 are similar to the load-aware PSD modules 280, but measure load metrics and PSD information regarding entities of the egress portion 204-xb.
- the load aware PSD module uses load metrics such as i) a rate at which data is being transmitted a port 212, ii) a fill level of an egress queue 256, iii) a length of an egress queue 256, iv) a time delay between when a packet is added to an egress queue 256 and when the packet is dequeued from the egress queue 256, v) an occupancy level of an egress buffer 252, etc.
- the load-aware PSD module 284 is configured to measure PSD information corresponding to an individual entity corresponding to the egress portion 204-xab such as PSD information regarding packets transmitted by a port 212, packets stored in an egress queue 256, packets stored in an egress buffer 252, etc. For example, when the load metric corresponds to an individual entity, the load-aware PSD module 284 measures PSD information regarding the individual entity, in an embodiment.
- the load-aware PSD module 284 is configured, additionally or alternatively, to measure PSD information corresponding to a group of entities corresponding to the egress portion 204-xb, such as PSD information regarding packets transmitted by a set of multiple ports 212, packets stored in a set of multiple egress queues 256, packets stored in a set of multiple egress buffers 252, etc. For example, when the load-aware PSD module 284 determines whether any of multiple individual load metrics corresponding to multiple entities meet a corresponding threshold(s), the load-aware PSD module 284 measures PSD information regarding all of the multiple entities, in an embodiment.
- the egress arbitration circuitry 272 also includes one or more load-aware PSD modules 288.
- the load-aware PSD modules 288 are similar to the load-aware PSD modules 284, but measure PSD information after packets have been processed by the egress packet processor 268, which may change the size of packets by adding tunnel headers, removing tunnel headers, modifying headers, etc.
- the load aware PSD module 288 uses load metrics such as i) a rate at which data is being transmitted a port 212, ii) a fill level of an egress queue 256, iii) a length of an egress queue 256, iv) a time delay between when a packet is added to an egress queue 256 and when the packet is dequeued from the egress queue 256, v) an occupancy level of an egress buffer 252, etc.
- load metrics such as i) a rate at which data is being transmitted a port 212, ii) a fill level of an egress queue 256, iii) a length of an egress queue 256, iv) a time delay between when a packet is added to an egress queue 256 and when the packet is dequeued from the egress queue 256, v) an occupancy level of an egress buffer 252, etc.
- the load-aware PSD module 288 is configured to measure PSD information for packets that have been processed by the egress packet processor 268 and that correspond to an individual entity corresponding to the egress portion 204-xab such as PSD information regarding packets transmitted by a port 212, packets stored in an egress queue 256, packets stored in an egress buffer 252, etc. For example, when the load metric corresponds to an individual entity, the load-aware PSD module 288 measures PSD information regarding the individual entity, in an embodiment.
- the load-aware PSD module 288 is configured, additionally or alternatively, to measure PSD information for packets that have been processed by the egress packet processor 268 and that correspond to a group of entities corresponding to the egress portion 204-xb, such as PSD information regarding packets transmitted by a set of multiple ports 212, packets stored in a set of multiple egress queues 256, packets stored in a set of multiple egress buffers 252, etc. For example, when the load-aware PSD module 288 determines whether any of multiple individual load metrics corresponding to multiple entities meet a corresponding threshold(s), the load-aware PSD module 288 measures PSD information regarding all of the multiple entities, in an embodiment.
- the PSD modules 280, 284, 288 are implemented using hardware circuitry and/or one or more processors executing machine readable instructions stored in one or more memories coupled to the one or more processors.
- PSD information generated by the PSD modules 280, 284, 288 is used to control the network device 200, in some embodiments.
- the network device 200 redistributes, within the network device 200, processing of one or more packet types, one or more flows of packets, etc., contributing to the high percentage of short-length packets corresponding the port 208, 212 or queue 228, 256.
- the network device 200 redirects the one or more packet types, one or more flows of packets, etc., contributing to the high percentage of short-length packets to a dedicated queue 228, 256, and processes (e.g., with the ingress packet processor 232, the egress packet processor 268, etc.) packets in the dedicated queue 228, 256 at a reduced rate to reduce power consumption associated with the processing of the packets in the dedicated queue 228, 256.
- the network device 200 adjusts buffer allocation algorithms, queue allocations, buffer admission policy, buffer storage algorithms, etc., based on the PSD information determined when a load metric indicates a high processing load, according to some embodiments.
- the network device 200 redirects the one or more packet types, one or more flows of packets, etc., contributing to the high percentage of short-length packets to a dedicated queue 228, 256, and a clock rate of a processor that is processing packets in the dedicated queue 228, 256 at a reduced rate to reduce power consumption associated with the processing of the packets in the dedicated queue 228, 256.
- the network device 200 redirects the one or more packet types, one or more flows of packets, etc., contributing to the high percentage of short-length packets to another egress port 212 with a lower load, the other egress port 212 corresponding to an alternative path through a network.
- network device 200 sends collected PSD information and optionally other telemetry information such as buffer lengths, queue lengths, latency measurements, etc., to an analyzer and/or controller that is external to the network device 200.
- the analyzer and/or controller determines initial operating parameters (e.g., a processing rate for the network device 200, a selection of network paths to be routed through the network device 200, etc.) to be reduce power consumption by the network device 200.
- the analyzer and/or controller determines whether operating parameters of the network device and/or other network devices in the network should be adjusted based on collected PSD information and optionally other telemetry information, and responsively adjusts the operating parameters of the network device 200 and/or other network devices, in an embodiment.
- the analyzer and/or controller uses reinforcement learning to determine optimal operating parameters for the network device 200 and/or other network devices, for example, by using the collected PSD information and optionally other telemetry information as feedback, according to an embodiment.
- the one or more CPUs adjust operations being performed by the CPU(s) 276. For example, the CPU(s) 276 reduce a rate at which statistical information regarding the network device 200 is collected by the CPU(s) 276. Additionally or alternatively, in embodiments in which the CPU(s) 276 perform functions related to artificial intelligence and/or machine learning (AI/ML) network analysis and/or control, the CPU(s) 276 reduce a rate at which such functions are performed.
- AI/ML machine learning
- the PSD information is provided to another device in a communication network to which the network device 200 belongs for network control and/or monitoring operations such as network optimization, congestion management, troubleshooting, etc.
- the CPU 276 generates one or more packets that include PSD information, and the CPU 276 controls the network device 200 to transmit the one or more packets to another network device in the communication network.
- the other communication uses the PSD information in the one or more packets to perform one or more functions related to network optimization, congestion management, troubleshooting, etc., in an embodiment.
- the network device 200 in response to the PSD information indicating a high percentage of small-sized packets during a period of high processing load, the network device 200 identifies one or more other network devices in the communication network that are transmitting high numbers of small-sized packets to the network device 200, and then transmits flow control packets to the one or more other network devices.
- the network device 200 in response to the PSD information indicating a high percentage of small-sized packets during a period of high processing load, the network device 200 identifies one or more packet flows that are contributing high numbers of small-sized packets, and then begins notifying one or more other network devices that packets in the one or more packet flows are causing congestion in the network device 200, such as by explicit congestion notification (ECN) marking packets in the one or more packet flows or by using another suitable congestion notification mechanism.
- ECN explicit congestion notification
- PSD information is statistical information regarding the distribution of packet sizes in a set of multiple packets. Counts of packets having packet sizes that fall within respective packet size ranges is an illustrative example of PSD information, and is sometimes referred to as a histogram of packet sizes, or a packet size histogram. Other examples of PSD information includes a combination of statistical measurements such as a combination of i) one or more statistics measuring a respective central tendency (e.g., mean, median, mode, etc.) and ii) one or more statistics measuring dispersion or variation (e.g., range, standard deviation, variance, mean absolute difference, median absolute deviation, average deviation, etc.).
- Fig. 3 is a simplified block diagram of an example PSD module 300, according to an embodiment.
- the PSD module 300 corresponds to one or more of the PSD module 280, 284, 288 of Fig. 2A, in some embodiments. In some embodiments, one or more (or all) of the PSD module 280, 284, 288 having a suitable structure that is different than the PSD module 300. Additionally, the PSD module 300 is used in a suitable network device different than the network device 200 of Figs. 2A-B.
- the PSD module 300 includes a bank 304 of counters (sometimes referred to herein as the “counter bank 304”) that is used for maintaining counts of packets that fall within different packet size ranges. Counts of packets that fall within different packet size ranges is an example of packet size distribution (PSD) information.
- PSD packet size distribution
- the counter bank 304 includes H sets of counters, where H is a suitable positive integer.
- Each set of counters is sometimes referred to herein as a “histogram set”.
- Each histogram set corresponds to an entity of the network device (e.g., a port, a queue, a buffer, etc.), or a group of entities, and each histogram set is used to maintain counts of packets, corresponding to the entity or group of entities, that fall within different packet size ranges.
- Each histogram set includes a plurality of counters 312, each counter 312 corresponding to a respective packet size range.
- the different packet size ranges being counted by counters 312 in a histogram set are sometimes referred to herein as “bins,” and the counters 312 in a histogram set are sometimes referred to herein as a “bin counters”.
- a granularity of packet size ranges being counted in a histogram set is configurable. Thus, some counters 312 in a histogram set are not used for larger size ranges, in some embodiments.
- At least some counters 312 can be used selectively used for different histogram sets. For example, if the granularity of a first histogram set does not need a maximum number of counters 312, counters 312 that are not needed for the first histogram set can be used for another histogram set.
- the PSD module 300 also includes a histogram index generator 320 that is configured to generate an indicator of a corresponding histogram set within the counter bank 304 (sometimes referred to herein as a “histogram index”) based on an indicator of an entity (or a group of entities) associated with a packet (e.g., a port that received or will transmit the packet, a queue in which the packet was stored, a buffer in which the packet was stored, etc.).
- indicators of entities include port identifiers (e.g., identifiers of ingress ports 208 and/or egress ports 212).
- indicators of entities include identifiers of groups of ports.
- indicators of entities additionally or alternatively include queue identifiers (e.g., identifiers of ingress queues 228 and/or egress queues 256) and/or identifiers of groups of queues.
- indicators of entities additionally or alternatively include buffer identifiers (e.g., identifiers of ingress buffers 224 and/or egress buffers 252) and/or identifiers of groups of buffers.
- the histogram index generator 320 is configured to generate the histogram index further based on one or more characteristics of the packet, such as packet type, a protocol type, a type of packet flow to which the packet belongs, etc.
- a histogram set can be used for generating PSD information for packets associated with a particular entity (or group of entities) and having one or more particular packet characteristics, in an embodiment.
- a first histogram set is used for generating PSD information for packets associated with a particular entity and having a first set of one or more particular characteristics
- a second histogram set is used for generating PSD information for packets associated with the particular entity and having a second set of one or more particular characteristics.
- a histogram set is used for generating PSD information for packets associated with a particular entity and having a set of one or more particular characteristics, but packets associated with the particular entity and not having the set of one or more particular characteristics are not used for generating the PSD information.
- the histogram index generator 320 includes (or is coupled to) a configuration memory 322 that stores configuration information that includes associations between histogram sets and entities. In such embodiments, the histogram index generator 320 uses the associations between histogram sets and entities to determine one or more histogram sets associated with an entity (or group of entities). In some embodiments, the configuration information in the configuration memory 322 includes associations between histogram sets and entity(ies)/packet characteristic tuples. In such embodiments, the histogram index generator 320 uses the associations between histogram sets and the entity(ies)/packet characteristics to determine a histogram set associated with an entity(ies)/packet characteristics tuple.
- the PSD module 300 also includes a granularity table 324 that is configured to store respective indications of granularity of the histogram sets of the counter bank 304.
- the granularity table 324 stores a respective indication of granularity for each histogram set, in an embodiment.
- the indication of granularity indicates a size range of each counter 312 in the histogram set, in an embodiment.
- the indication of granularity additionally or alternatively indicates a number of counters 312 in the histogram set, in another embodiment.
- the granularity table 324 is configured to receive a histogram index from the histogram set index generator 320.
- the granularity table 324 uses the histogram index to lookup an indication of granularity that corresponds to histogram index, and outputs the indication of granularity.
- the PSD module 300 also includes a bin counter index generator 328 that is configured to generate a relative index of a bin counter 312 within a histogram set based on i) a packet size of a packet that is to be counted, and ii) an indication of granularity received from the granularity table 324.
- the granularity table 324 is omitted and the bin counter index generator 328 generates the relative index without the indication of granularity received from the granularity table 324.
- a merged index generator 332 is configured to generate a merged index into the counter bank 304 using i) the histogram index generated by the histogram index generator 320 and ii) the bin counter index generated by the bin counter index generator 328.
- the merged index generated by the merged index generator 332 selects a counter 312 from amongst multiple histogram sets in the counter bank 304, in an embodiment.
- the merged index generated by the merged index generator 332 selects a counter 312 from amongst all of the counters 312 in the counter bank 304, in an embodiment.
- the PSD module 300 also includes an update signal generator 348 that is configured to initiate measurement of PSD information for packets processed by the network device in response to determining that a load metric meets a condition. For example, when a load metric corresponding to an entity (e.g., a port, a queue, a buffer, etc.) meets a condition, the update signal generator 348 initiates measurement of PSD information regarding packets associated with the entity (e.g., packets received via a port, packets transmitted by a port, packets stored in a queue, packets stored in a buffer, etc.), according to an embodiment. In some embodiments, when a load metric corresponding to a group of entities meets a condition, the update signal generator 348 initiates measurement of PSD information regarding packets associated with the group of entities.
- an entity e.g., a port, a queue, a buffer, etc.
- the update signal generator 348 initiates measurement of PSD information regarding packets associated with the entity.
- the update signal generator 348 determines when the counter bank 304 is to update the PSD information regarding the entity(ies). For example, the update signal generator 348 generates an update signal that indicates when the counter bank 304 is to update PSD information.
- the update signal generator 348 generates the update signal based on packet events corresponding to entity(ies).
- packet events corresponding to entity(ies) include a packet being received via a port, a packet being transmitted via a port, a packet being scheduled for transmission via a port, a packet being stored in a queue, a packet being dequeued from a queue, a packet being stored in a buffer, a packet being retrieved from a buffer, etc., according to various embodiments.
- the update signal generator 348 generates the update signal further based on one or more characteristics of a packet corresponding to a packet event, such as packet type, a protocol type, a type of packet flow to which the packet belongs, a classification of the packet, etc. For example, the update signal generator 348 generates the update signal only for packets having one or more particular packet characteristics so that PSD information is measured only for packets having the one or more particular packet characteristics, in an embodiment.
- the update signal generator 348 includes (or is coupled to) a configuration memory 352 that stores configuration information that includes associations between entities and packet characteristics that indicate, for each entity (or group of entities), the packet characteristics of packets corresponding to an entity (or group of entities) for which packet distribution information is to be measured.
- the update signal generator 348 uses the associations between entities and packet characteristics to determine when to generate the update signal so that PSD information is measured only for packets having certain particular packet characteristics, in an embodiment.
- the update signal generator 348 is configured to initiate measurement of PSD information for packets corresponding to an entity (or group of entities) in response to determining that a load metric corresponding to the entity(ies) exceeds a first threshold. In an embodiment, the update signal generator 348 is configured to stop measurement of PSD information for packets corresponding to the entity(ies) in response to determining that the load metric corresponding to the entity(ies) falls below a second threshold. In an embodiment, the second threshold is the same as the first threshold. In another embodiment, the second threshold is below the first threshold to provide hysteresis.
- Fig. 4 is a graph showing an illustrative example of PSD information 400 measured at less than 50% peak load and at greater than 50% peak load, according to an embodiment.
- the PSD information 400 includes counts of packets falling into six packet size ranges (or “bins”): i) less than or equal to 64 bytes, ii) 65-127 bytes, iii) 128-511 bytes, iv) 512-2047 bytes, v) 2048- 4095 bytes, and vi) 4096-9216 bytes.
- PSD information measured by a load-aware PSD module 280, 284, 288, 300 includes more packet size ranges and/or different packet size ranges as compared to the example PSD information 400 of Fig. 4.
- Fig. 5 is a simplified example state diagram 500 for circuitry that controls generation of PSD information in a network device, according to an embodiment.
- the update signal generator 348 of Fig. 3 implements the state diagram 500, according to an embodiment, and the state diagram 500 is described with reference to Fig. 3 for ease of explanation. In other embodiments, the update signal generator 348 implements another suitable set of state transitions different than the state diagram 500. Additionally, the state diagram 500 is implemented by another PSD measurement apparatus different than the PSD module 300 of Fig. 3, in some embodiments.
- PSD measurement is turned off for an entity or group of entities.
- the update signal generator 348 remains in the state 504 while a load metric corresponding to the entity/group of entities remains below a first threshold.
- the update signal generator 348 transitions to a state 508.
- the update signal generator 348 turns PSD measurement on for the entity/group of entities. Additionally, PSD measurement remains on for the entity/group of entities while the update signal generator 348 remains in the state 508.
- the update signal generator 348 remains in the state 508 while the load metric corresponding to the entity/group of entities remains above a second threshold. In response to the load metric falling below the second threshold (and/or equaling the second threshold, in some embodiments), the update signal generator 348 transitions to the state 504. Upon transitioning to the state 504, the update signal generator 348 turns PSD measurement off for the entity/group of entities.
- the second threshold is below the first threshold to provide hysteresis. In another embodiment, the second threshold is equal to the first threshold.
- PSD measurements corresponding to the entity/group of entities are reset upon transitioning to the state 508.
- counters 312 of a histogram set corresponding to the entity/group of entities are reset upon transitioning to the state 508.
- PSD measurements corresponding to the entity/group of entities are not reset upon transitioning to the state 508 so that running PSD measurements are made over multiple distinct instances of high load associated with the entity/group of entities.
- Fig. 6 is another simplified example state diagram 600 for circuitry that controls generation of PSD information in a network device, according to another embodiment.
- the update signal generator 348 of Fig. 3 implements the state diagram 600, according to an embodiment, and the state diagram 600 is described with reference to Fig. 3 for ease of explanation. In other embodiments, the update signal generator 348 implements another suitable set of state transitions different than the state diagram 600. Additionally, the state diagram 600 is implemented by another PSD measurement apparatus different than the PSD module 300 of Fig. 3, in some embodiments.
- PSD measurement is turned off for an entity or group of entities.
- the update signal generator 348 remains in the state 604 while a load metric corresponding to the entity/group of entities remains below a threshold.
- the update signal generator 348 transitions to a state 608.
- the update signal generator 348 starts a timer of the update signal generator 348.
- the timer is configured to measure a suitable time period.
- the update signal generator 348 turns PSD measurement on for the entity /group of entities upon transitioning to the state 608.
- PSD measurement remains on for the entity/group of entities while the update signal generator 348 remains in the state 608.
- the update signal generator 348 remains in the state 608 while the timer has not expired. In response to the timer expiring, the update signal generator 348 transitions to the state 604. Upon transitioning to the state 604, the update signal generator 348 turns PSD measurement off for the entity/group of entities.
- the terms “above” and “below” are relative terms that depend on the load metric being compared.
- the load metric being “above” a threshold corresponds to a processing load of a network device being relatively high
- the load metric being “below” a threshold corresponds to a processing load of a network device being relatively low. If a particular load metric is inversely proportional to processing load (such as a processing load indicating available processing capacity), then the particular load metric being below a threshold indicates relatively high processing load, whereas the particular load metric being above a threshold indicates relatively low processing load.
- state transition diagram 500 and/or the state transition diagram 600 are implemented using hardware circuitry (e.g., a hardware state machine) and/or one or more processors executing machine readable instructions stored in one or more memories coupled to the one or more processors.
- hardware circuitry e.g., a hardware state machine
- processors executing machine readable instructions stored in one or more memories coupled to the one or more processors.
- Fig. 7 is a simplified flow diagram of an example method 700 for controlling a network device based on PSD measurements, according to an embodiment.
- the network device 200 of Figs. 2A-B implements the method 700, according to an embodiment, and the method 700 is described with reference to Figs. 2A-B for ease of explanation.
- the network device 200 implements another suitable method for controlling the network device 200 based on PSD measurements different than the method 700.
- the method 700 is implemented by another suitable network device different than the network device 200 of Figs. 2A-B, in some embodiments.
- the method 700 is implemented using the PSD module 300 of Fig. 3, and the method 700 is described with reference to Fig. 3 for ease of explanation.
- the PSD module 300 is used to implement another suitable method for controlling a network device based on PSD measurements.
- the method 700 is implemented using another suitable PSD measurement apparatus different than the PSD module 300 of Fig. 3, in some embodiments.
- a network device determines a load metric corresponding to a processing load of the network device.
- the packet processing module 204 determines (e.g., the ingress arbitration circuitry 220 determines, the traffic manager circuitry 248 determines, etc.) the load metric, the load metric corresponding to an entity (or group of entities) of the packet processing module 204, such as a port 208, 212, a queue 228, 256, a buffer 224, 252, etc.
- the network device determines whether the load metric determined at block 704 meets a threshold. In response to determining that the load metric does not meet the threshold, the flow repeats block 708. For example, block 704 involves repeatedly determining the load metric over time, and block 708 involves repeatedly comparing the load metric to the threshold over time.
- the ingress arbitration circuitry 220 and/or the traffic manager circuitry 248 determines whether the load metric meets the threshold.
- the PSD module 300 e.g., the update signal generator 348) determines whether the load metric meets the threshold.
- the flow proceeds to block 712.
- the network device begins generating PSD measurements.
- the load metric determined at block 704 corresponds to an entity or group of entities of the network device, and the PSD measurements that are begun at block 712 are for packets corresponding to the entity or group of entities.
- the ingress arbitration circuitry 220 and/or the traffic manager circuitry 248 begins generating PSD measurements at block 712.
- the PSD module 300 begins generating the PSD measurements at block 712.
- the network device ends the PSD measurements that were begun at block 712.
- the ingress arbitration circuitry 220 and/or the traffic manager circuitry 248 begins generating PSD measurements at block 712.
- the PSD module 300 begins generating the PSD measurements at block 712.
- the threshold to which the load metric is compared at block 708 is a first threshold
- the method 700 further comprises comparing the load metric to a second threshold.
- the PSD measurements are ended at block 716 in response to determining that the load metric falls below the second threshold.
- the method 700 further comprises starting a timer in connection with beginning generation of PSD measurements at block 712.
- the PSD measurements are ended at block 716 in response to determining that the timer has expired.
- the network device uses the PSD measurements made in connection with blocks 712 and 716 to control the network device.
- using the PSD measurements at block 720 includes adjusting a buffer allocation algorithm implemented by the network device based on the PSD measurements. In another embodiment, using the PSD measurements at block 720 includes adjusting queue allocations of the network device based on the PSD measurements.
- using the PSD measurements at block 720 includes, when load of a port 208, 212 or queue 228, 256 is high and when PSD measurement indicates a relatively high percentage of short-length packets corresponding the port 208, 212 or queue 228, 256, the network device 200 redistributes, within the network device 200, processing of one or more packet types, one or more flows of packets, etc., contributing to the high percentage of short-length packets corresponding the port 208, 212 or queue 228, 256.
- using the PSD measurements at block 720 includes the network device 200 redirecting one or more packet types, one or more flows of packets, etc., contributing to a high percentage of short-length packets to another egress port 212 with a lower load, the other egress port 212 corresponding to an alternative path through a network.
- using the PSD measurements at block 720 includes the one or more CPUs (276) adjusting operations being performed by the CPU(s) 276 based on the PSD measurements.
- the method 700 includes, in addition to the block 720 or instead of the block 720, the network device providing the PSD measurements made in connection with blocks 712 and 716 to another network device for use by the other network device in controlling and/or monitoring a communication network in which the network device operates.
- the method 700 includes, in addition to the block 720 or instead of the block 720, the network device providing transmitting flow control messages and/or congestion notification messages to another network device based on the PSD measurements made in connection with blocks 712 and 716.
- At least some of the various blocks, operations, and techniques described above are suitably implemented utilizing dedicated hardware, such as one or more of discrete components, an integrated circuit, an ASIC, a programmable logic device (PLD), a processor executing firmware instructions, a processor executing software instructions, or any combination thereof.
- the software or firmware instructions may be stored in any suitable computer readable memory such as in a random access memory (RAM), a read-only memory (ROM), a solid state memory, etc.
- the software or firmware instructions may include machine readable instructions that, when executed by one or more processors, cause the one or more processors to perform various acts described herein.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Environmental & Geological Engineering (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202318141269A | 2023-04-28 | 2023-04-28 | |
| US18/141,269 | 2023-04-28 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024227123A1 true WO2024227123A1 (en) | 2024-10-31 |
Family
ID=91585978
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/026724 Pending WO2024227123A1 (en) | 2023-04-28 | 2024-04-28 | Load-aware packet size distribution measurement in a network device |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2024227123A1 (en) |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7088678B1 (en) * | 2001-08-27 | 2006-08-08 | 3Com Corporation | System and method for traffic shaping based on generalized congestion and flow control |
-
2024
- 2024-04-28 WO PCT/US2024/026724 patent/WO2024227123A1/en active Pending
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7088678B1 (en) * | 2001-08-27 | 2006-08-08 | 3Com Corporation | System and method for traffic shaping based on generalized congestion and flow control |
Non-Patent Citations (1)
| Title |
|---|
| DUQUE-TORRES ALEJANDRA ET AL: "Heavy-Hitter Flow Identification in Data Centre Networks Using Packet Size Distribution and Template Matching", 2019 IEEE 44TH CONFERENCE ON LOCAL COMPUTER NETWORKS (LCN), 31 October 2019 (2019-10-31), pages 10 - 17, XP093195447, Retrieved from the Internet <URL:https://ecs.wgtn.ac.nz/foswiki/pub/Groups/WiNe/WirelessNetworksResearchGroup/LCN2019.pdf> [retrieved on 20240815], DOI: 10.1109/LCN44214.2019.8990807 * |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12360924B2 (en) | Method and system for facilitating lossy dropping and ECN marking | |
| CN116671081B (en) | Delay-based automatic queue management and tail drop | |
| US8248930B2 (en) | Method and apparatus for a network queuing engine and congestion management gateway | |
| US8467342B2 (en) | Flow and congestion control in switch architectures for multi-hop, memory efficient fabrics | |
| US20240056385A1 (en) | Switch device for facilitating switching in data-driven intelligent network | |
| CN113472697A (en) | Network information transmission system | |
| Wang et al. | Flow distribution-aware load balancing for the datacenter | |
| US12231342B1 (en) | Queue pacing in a network device | |
| WO2024227123A1 (en) | Load-aware packet size distribution measurement in a network device | |
| KR20250174099A (en) | Measuring load-aware packet size distribution on network devices | |
| US12216518B2 (en) | Power saving in a network device | |
| US20250286835A1 (en) | Combining queues in a network device to enable high throughput | |
| US7009973B2 (en) | Switch using a segmented ring | |
| CN120856648A (en) | Minimize latency entry arbitration |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24734172 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 1020257039547 Country of ref document: KR Free format text: ST27 STATUS EVENT CODE: A-0-1-A10-A15-NAP-PA0105 (AS PROVIDED BY THE NATIONAL OFFICE) |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2024734172 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2024734172 Country of ref document: EP Effective date: 20251128 |
|
| ENP | Entry into the national phase |
Ref document number: 2024734172 Country of ref document: EP Effective date: 20251128 |
|
| ENP | Entry into the national phase |
Ref document number: 2024734172 Country of ref document: EP Effective date: 20251128 |