WO2025088365A1 - Système d'observabilité basé sur une association - Google Patents
Système d'observabilité basé sur une association Download PDFInfo
- Publication number
- WO2025088365A1 WO2025088365A1 PCT/IB2023/060859 IB2023060859W WO2025088365A1 WO 2025088365 A1 WO2025088365 A1 WO 2025088365A1 IB 2023060859 W IB2023060859 W IB 2023060859W WO 2025088365 A1 WO2025088365 A1 WO 2025088365A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- observability
- observation data
- association
- controller
- determining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3006—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3065—Monitoring arrangements determined by the means or processing involved in reporting the monitored data
- G06F11/3072—Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/04—Network management architectures or arrangements
- H04L41/046—Network management architectures or arrangements comprising network management agents or mobile agents therefor
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/142—Network analysis or design using statistical or mathematical methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/16—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/20—Arrangements for monitoring or testing data switching networks the monitoring system or the monitored elements being virtualised, abstracted or software-defined entities, e.g. SDN or NFV
Definitions
- ASSOCIATION-BASED OBSERVABILITY SYSTEM TECHNICAL FIELD [0001]
- the various embodiments described herein relate to the field of communication networks and, more specifically, to an association-based observability system.
- BACKGROUND [0002] As systems become larger, more complex, and geographically distributed, the observability of systems becomes more important. Observability is a measure of how well the internal states of the system can be inferred from knowledge of its external outputs. Maintenance of a large and geographically distributed system with a large number of nodes is difficult without using an observability system.
- various nodes include observation data collectors that collect observation data associated with the various nodes.
- the nodes provide the collected observation data to a central location, such as a controller node, for analysis.
- a central location such as a controller node
- data collection by the observation data collectors is often static, with unidirectional communication between observation data collectors and the controller node. That is, the observation data collectors are typically configured to collect a pre-defined set of data and to transmit collected data to the controller node, but do not receive data or other communications from the controller node during operation.
- an observation system could be configured to monitor the performance of the system.
- the observation data collectors could each be configured to collect performance information for the corresponding node.
- the controller node could execute analysis software that receives performance information from different nodes in the system and analyzes the performance information to determine performance of the system.
- observation data collectors could be configured to collect and analyze many different types of data.
- the observation data collectors could each be configured to collect a variety of data associated with the corresponding node.
- the controller node could receive the data and perform different data analysis on different portions of the data as desired. In general, however, when more data is collected, more central processing unit (CPU) and network resources are consumed.
- CPU central processing unit
- One embodiment of the present application sets forth a method, performed by a first node in a cluster, for managing observation data collected by an observability system.
- the method includes collecting a first set of observation data.
- the method further includes, while collecting the first set of observation data, determining that a system event has started.
- the method further includes determining that the system event belongs to an association that links the system event with a set of observability actions.
- the method further includes, in response to determining that the system event belongs to an association, causing the set of observability actions to be performed, wherein causing the set of observability actions to be performed comprises collecting a second set of observation data relating to the system event in accordance with a set of observation data collection rules.
- One embodiment of the present application includes a non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform operations for managing observation data collected by an observability system.
- the operations include collecting a first set of observation data.
- the operations further include, while collecting the first set of observation data, determining that a system event has started.
- the operations further include determining that the system event belongs to an association that links the system event with a set of observability actions.
- the operations further include, in response to determining that the system event belongs to an association, causing the set of observability actions to be performed, wherein causing the set of observability actions to be performed comprises collecting a second set of observation data relating to the system event in accordance with a set of observation data collection rules.
- One embodiment of the present application includes an agent computing node that includes one or more processors.
- the agent computing node further includes a memory storing instructions which, when executed by the one or more processors, cause the agent computing node to carry out operations for managing observation data collected by an observability system.
- the operations include collecting a first set of observation data.
- the operations further include, while collecting the first set of observation data, determining that a system event has started.
- the operations further include determining that the system event belongs to an association that links the system event with a set of observability actions.
- the operations further include, in response to determining that the system event belongs to an association, causing the set of observability actions to be performed, wherein causing the set of observability actions to be performed comprises collecting a second set of observation data relating to the system event in accordance with a set of observation data collection rules.
- the method further includes determining that the system event belongs to an association that links the system event with a set of observability actions.
- the method further includes in response to determining that the system event belongs to an association, for each observability action included in the set of observability actions, identifying one or more nodes in the cluster that should perform the observability action and causing the observability action to be performed by the identified nodes.
- One embodiment of the present application includes a non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform operations for managing observation data collected by an observability system. The operations include determining that a system event has started.
- the operations further include determining that the system event belongs to an association that links the system event with a set of observability actions.
- the operations further include in response to determining that the system event belongs to an association, for each observability action included in the set of observability actions, identifying one or more nodes in the cluster that should perform the observability action and causing the observability action to be performed by the identified nodes.
- One embodiment of the present application includes a controller computing node that includes one or more processors.
- the controller computing node further includes a memory storing instructions which, when executed by the one or more processors, cause the agent computing node to carry out operations for managing observation data collected by an observability system.
- the operations include determining that a system event has started.
- the operations further include determining that the system event belongs to an association that links the system event with a set of observability actions.
- the operations further include in response to determining that the system event belongs to an association, for each observability action included in the set of observability actions, identifying one or more nodes in the cluster that should perform the observability action and causing the observability action to be performed by the identified nodes.
- Figure 1 illustrates an observability system configured to implement one or more aspects of the various embodiments
- Figure 2 is a diagram illustrating interactions between components of the system of Figure 1 to create a new association, according to various embodiments
- Figure 3 is a diagram illustrating interactions between components of the system of Figure 1 for tracking an association at a local observability controller, according to various embodiments
- Figure 4 is a diagram illustrating interactions between components of the system of Figure 1 for tracking an association at an observation data collector, according to various embodiments
- Figure 5 is a flowchart of method steps for creating an association at an observability system, according to various embodiments
- Figure 6 is a flowchart of method steps for tracking an association at an agent computing node of an observability system, according to various embodiments
- Figure 7 is a flowchart of method steps for tracking an association at a controller computing node of an observability system, according to various embodiments
- references in the specification to “one embodiment,” “an embodiment,” “an example embodiment,” etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
- Bracketed text and blocks with dashed borders may be used herein to illustrate optional operations that add additional features to embodiments of the invention. However, such notation should not be taken to mean that these are the only options or optional operations, and/or that blocks with solid borders are not optional in certain embodiments of the invention.
- the terms “coupled” and “connected,” along with their derivatives, may be used. It should be understood that these terms are not intended as synonyms for each other. “Coupled” is used to indicate that two or more elements, which may or may not be in direct physical or electrical contact with each other, co-operate or interact with each other.
- An electronic device stores and transmits (internally and/or with other electronic devices over a network) code (which is composed of software instructions and which is sometimes referred to as computer program code or a computer program) and/or data using machine-readable media (also called computer-readable media), such as machine-readable storage media (e.g., magnetic disks, optical disks, solid state drives, read only memory (ROM), flash memory devices, phase change memory) and machine- readable transmission media (also called a carrier) (e.g., electrical, optical, radio, acoustical or other form of propagated signals – such as carrier waves, infrared signals).
- machine-readable media also called computer-readable media
- machine-readable storage media e.g., magnetic disks, optical disks, solid state drives, read only memory (ROM), flash memory devices, phase change memory
- machine-readable transmission media also called a carrier
- carrier e.g., electrical, optical, radio, acoustical or other form of propagated signals – such as carrier waves, in
- an electronic device e.g., a computer
- hardware and software such as a set of one or more processors (e.g., wherein a processor is a microprocessor, controller, microcontroller, central processing unit, digital signal processor, application specific integrated circuit, field programmable gate array, other electronic circuitry, a combination of one or more of the preceding) coupled to one or more machine-readable storage media to store code for execution on the set of processors and/or to store data.
- processors e.g., wherein a processor is a microprocessor, controller, microcontroller, central processing unit, digital signal processor, application specific integrated circuit, field programmable gate array, other electronic circuitry, a combination of one or more of the preceding
- an electronic device may include non-volatile memory containing the code since the non-volatile memory can persist code/data even when the electronic device is turned off (when power is removed), and while the electronic device is turned on that part of the code that is to be executed by the processor(s) of that electronic device is typically copied from the slower non-volatile memory into volatile memory (e.g., dynamic random access memory (DRAM), static random access memory (SRAM)) of that electronic device.
- Typical electronic devices also include a set of one or more physical network interface(s) (NI(s)) to establish network connections (to transmit and/or receive code and/or data using propagating signals) with other electronic devices.
- NI(s) physical network interface
- a physical NI may comprise radio circuitry capable of receiving data from other electronic devices over a wireless connection and/or sending data out to other devices via a wireless connection.
- This radio circuitry may include transmitter(s), receiver(s), and/or transceiver(s) suitable for radiofrequency communication.
- the radio circuitry may convert digital data into a radio signal having the appropriate parameters (e.g., frequency, timing, channel, bandwidth, etc.). The radio signal may then be transmitted via antennas to the appropriate recipient(s).
- the set of physical NI(s) may comprise network interface controller(s) (NICs), also known as a network interface card, network adapter, or local area network (LAN) adapter.
- NICs network interface controller
- the NIC(s) may facilitate in connecting the electronic device to other electronic devices allowing them to communicate via wire through plugging in a cable to a physical port connected to a NIC.
- One or more parts of an embodiment of the invention may be implemented using different combinations of software, firmware, and/or hardware.
- a network device is an electronic device that communicatively interconnects other electronic devices on the network (e.g., other network devices, end-user devices).
- Some network devices are “multiple services network devices” that provide support for multiple networking functions (e.g., routing, bridging, switching, Layer 2 aggregation, session border control, Quality of Service, and/or subscriber management), and/or provide support for multiple application services (e.g., data, voice, and video).
- multiple services network devices e.g., routing, bridging, switching, Layer 2 aggregation, session border control, Quality of Service, and/or subscriber management
- application services e.g., data, voice, and video.
- embodiments described herein provide a dynamic approach to observing a system that enables associating different system events with collection of additional observation data. By doing so, the observability system can passively collect a smaller amount of observation data during normal operation.
- the observability system determines that a system event that is associated with collection of additional observation data has started, the observability system can collect additional observation data (e.g., metrics, logs, traces, and/or the like) associated with the event.
- additional observation data e.g., metrics, logs, traces, and/or the like
- One benefit of the disclosed techniques compared to prior approaches is that the observability system is able to collect detailed observation data for analyzing selected system events while reducing the overall amount (or maintaining a minimal/limited amount) of observation data that is collected and transmitted.
- the system increases observation data collection only after an associated event is detected (e.g., collecting observation data associated with the event), the system is able to collect less observation data during times when the increased observation data is not needed (e.g., when the associated event is not occurring).
- the system is able to collect greater amounts of useful observation data while minimizing the performance penalty caused by collecting large amounts of observation data, such as computational resources needed to collect and process the data, data processing time, networking resources for transmitting collected data, data storage needed to store the collected data, and/or the like.
- observation data is transmitted to a central analytical service that receives the data, analyzes the data, and provides instructions to various nodes that are collecting data, such as instructions to collect additional observation data.
- Figure 1 illustrates an observability system configured to implement one or more aspects of the various embodiments.
- the observability system includes a controller computing node 110 and one or more agent computing nodes, such as agent computing nodes 140(1)-(N).
- the controller computing node 110 communicates with the agent computing nodes 140(1)-(N) over a network 130.
- Network 130 can include any suitable combination of wired or wireless communication over local area networks and wide area networks.
- a node can be any suitable type of node, including computing nodes, network nodes, storage nodes, and/or the like.
- the agent computing nodes 140(1)-(N), or a portion thereof collectively implement a distributed system (e.g., an application composed of a number of microservices).
- the controller computing node 110 is one of the nodes in the distributed system and/or is configured as a controller for the distributed system.
- Agent computing nodes 140 collect local observation data and transmit collected data to controller computing node 110.
- Local observation data comprises data (e.g., metrics, system information, logs, traces, and/or the like) that is associated with and collected at the agent computing node 140.
- local observation data comprises local metrics that are measured at the agent computing node 140.
- Local observation data includes, for example and without limitation, device statistics/measurements, network statistics/measurements, interrupt information, kernel statistics/measurements, file system statistics/measurements, session information/measurements, routing information, device logs, network logs, hardware/software/storage status information, service request status information, and/or the like.
- Controller computing node 110 receives the local observation data from various agent computing nodes (e.g., agent computing nodes 140(1)-(N). Controller computing node 110 stores, processes, and/or analyzes the received observation data. In some embodiments, controller computing node 110 generates global observation data based on the local observation data received from the agent computing nodes 140. Global observation data comprises data (e.g., metrics, analytical data, system-wide service information, and/or the like) associated with the system being observed. In some embodiments, local observation data includes values (i.e., metrics) that are measured at an agent computing node 140. Global observation data includes values that cannot be directly measured and are instead computed based on the local observation data.
- agent computing nodes e.g., agent computing nodes 140(1)-(N). Controller computing node 110 stores, processes, and/or analyzes the received observation data. In some embodiments, controller computing node 110 generates global observation data based on the local observation data received from the agent computing nodes 140. Global observation data comprises data (e.g., metrics
- controller computing node 110 computes various global metrics from multiple locally measured values.
- controller computing node 110 could receive, from various agent computing nodes 140, average latencies between a given computing node and a neighboring computing node.
- the controller computing node 110 could determine, based on the average latencies, the total (average) latency for a service chain across multiple service chain components (i.e., total latency across a plurality of computing nodes).
- Global observation data includes, for example and without limitation, power consumption of a service, response latency of a service, service availability (e.g., as a percentage over a period of time), service reliability, service throughput, total system utilization, parallel active (e.g., not yet replied) requests to a service, and/or the like.
- the controller computing node 110 includes an API server 122, observability operator 124, central observability controller 126, and optionally, an AI controller 128. In other embodiments, one or more components of controller computing node 110 could be included in a different node, such as one or more of the agent computing nodes 140.
- Agent computing node 140(1) includes a local observability controller 152, an observation data collector 154, one or more applications 156, and one or more sets of collection instructions 158.
- agent computing nodes 140(2)-(N) may include the same or similar components as agent computing node 140(1) (which are not shown in the diagram to reduce clutter) and can operate in a similar manner to that discussed herein with reference to agent computing node 140(1). Any given agent computing node 140 could include more or fewer components than that shown in Figure 1, depending on the implementation.
- controller computing node 110 e.g., API server 122, observability operator 124, central observability controller 126, and/or AI controller 128, and/or of an agent computing node 140 (e.g., local observability controller 152, observation data collector 154, application(s) 156, and/or collection instruction(s) 158) are virtualized.
- the controller computing node 110 could implement a virtual computing node 120 that implements the API server 122, observability operator 124, central observability controller 126, and AI controller 128.
- agent computing node 140(1) could implement a virtual computing node 150(1) that implements the local observability controller 152, observation data collector 154, application(s) 156, and collection instruction(s) 158.
- the observability system is configured to perform one or more observability actions (e.g., collecting additional observation data) in response to detecting that an association event has started or occurred.
- an “association event” is a system event that is associated with collection of (additional) observation data via an association.
- An “association” refers to a link between a given system event and one or more observability actions that should be performed, such as one or more changes in the observation data that should be collected by the observability system.
- an association includes an association definition that specifies a set of criteria for causing one or more observability actions to be performed.
- the set of criteria could include, for example, a given system event (i.e., the association event) being detected in the execution system.
- the set of criteria includes one or more values for observation data collected by the observability system (e.g., via one or more observation data collector(s) 154).
- the set of criteria could include observation data value(s) that, when collected by the observability system, indicate (either directly or indirectly) that the given system event has started.
- an association includes a collection definition that describes the one or more observation data collection actions that should be performed when the association event occurs.
- an association defines one or more observation data collection rules associated with a given system event.
- An observation data collection rule specifies, for example and without limitation, how, when, and/or what type(s) of observation data should be collected.
- the observation data collection rules include a plurality of triggers.
- Each trigger specifies a set of one or more trigger conditions and a set of one or more corresponding actions to be performed when the set of trigger conditions are met.
- a trigger could specify a set of observation data to be collected and one or more conditions for collecting the set of observation data.
- the observability system evaluates each trigger to determine whether to start collecting the observation data specified by the trigger. After the corresponding association event has ended, then the observability system stops evaluating the plurality of triggers.
- the observability system stops collecting any observation data whose collection was started by a trigger.
- the observability system tracks the association to determine whether the criteria specified by the association has been met. For example, the observability system could analyze collected observation data to determine if the collected observation data includes a specified set of data values. As another example, the observability system could detect when the association event has started or when the association event is occurring. [0042] If the criteria specified by the association definition is met, then the observability system performs the observability action(s) specified by the collection definition.
- the observability system could execute the set of instructions.
- the collection definition includes a set of triggers, then the observability system could begin evaluating the triggers.
- the observability action(s) are performed as long as the association remains active (e.g., the association event is running).
- the association stops being active e.g., the association event ends
- the observability system stops performing the corresponding observability action(s).
- the observability system could stop executing the set of instructions and/or remove the set of instructions from any observation data collectors that were executing the set of instructions.
- a given association can be tracked in an observation data collector at an agent computing node, a local observability controller at the agent computing node, and/or a central observability controller at a controller computing node.
- the component(s) that track an association depends on which collector(s) need to provide the data needed to detect the corresponding association event. That is, whether one or more observation data collectors 154, one or more local observability controllers 152, and/or a central observability controller 126 are responsible for detecting an association event and determining whether to start/end collection of observation data associated with the event depending on which collector(s) and from which node(s) in the system are needed to detect the association event.
- an observability operator 124 at the controller computing node 110 receives a request to create an association.
- the request includes an association definition and a collection definition.
- the association definition indicates the criteria for triggering one or more observability data collection actions.
- An association definition could include, for example and without limitation, triggering rules, scripts (e.g., Python code), executable code (e.g., WebAssembly), AI logic, and/or the like.
- the collection definition indicates the one or more observability data collection actions that should be taken when the criteria is met.
- a collection definition could include, for example and without limitation, observation data value(s) to be collected, observation data type(s) to be collected, trigger(s) associated with different values to be collected, scripts, extended Berkely Packet Filter (eBPF) code (e.g., for collecting specific system data), plug-ins, extensions, location(s) of collection instructions (e.g., scripts, code, plug-ins, and/or the like) that should be executed, and/or the like.
- eBPF Extended Berkely Packet Filter
- plug-ins e.g., for collecting specific system data
- plug-ins e.g., extensions
- location(s) of collection instructions e.g., scripts, code, plug-ins, and/or the like
- a given association can include and/or indicate the storage location of instructions for collecting the data value(s) specified by the association definition and/or collection definition.
- observation data collector 154 can execute the instructions in order to obtain the specified observation value(s).
- observability operator 124 receives an association request directly from a user.
- observability operator 124 could receive user input indicating the association definition and the collection definition.
- observability operator 124 receives a manifest that describes the association (e.g., a Kubernetes manifest). The manifest could include, for example, the association definition and the collection definition for the association.
- the observability operator 124 receives an association request via API server 122.
- API server 122 provides API(s) for users to interact with the observability system.
- a user provides a manifest to the observability system using API server 122.
- API server 122 forwards/transmits the manifest and/or the association and collection definitions included in the manifest to observability operator 124.
- the manifest included additional data or information relating to the association
- API server 122 could also forward/transmit the additional data or information to observability operator 124.
- Additional data or information relating to an association could be, for example, a user requesting the association, system application(s) and/or components specified by the association, [0048]
- observability operator 124 determines the resources needed for the association request.
- Resources needed could include, for example, system components, system component data, applications, application data, specific metrics or other data values, logs, compute resources, storage resources, network resources, and/or the like.
- observability operator 124 determines whether the resources needed for the association request and, if applicable, the quantity of resources needed, are available. If the observability operator 124 determines that the resources are available, then the observability operator 124 transmits the association request to the central observability controller 126. If the observability operator 124 determines that the resources are not available or that the quantity requested for a given resource is not available, then observability operator 124 could deny the association request.
- observability operator 124 does not transmit the association request to central observability controller 126.
- determining whether the resources are available includes determining whether the user requesting the association has the needed authority (e.g., access rights, privileges, and/or the like) to request the resources.
- observability operator 124 transmits an access verification check request to an access control service (not shown).
- the access verification check request specifies the user requesting the association and the requested resources.
- the access control service performs the requested check and transmits a response indicating whether the user has permission to request the resources.
- the access control service grants access to the requested resources.
- determining whether the resources are available includes determining the observability data (e.g., metrics, traces, logs, and/or the like) that are available from a given application (e.g., application 156). For example, if the observability system does not know how to retrieve some (or all) of the observability data for a given application, observability operator 124 could request and/or receive a list of available observability data from the given application.
- the observability data e.g., metrics, traces, logs, and/or the like
- the list of observability data indicates the observability data, such as metrics, traces, logs, and/or the like, that the given application is willing or able to provide (e.g., to an observation data collector 154) upon request.
- Observability operator 124 determines whether the observability data specified by the association request (that is associated with the given application) is available, based on the list of available observability data.
- another component of the observability system could request and/or receive the available observability data from an application 156, instead of observability operator 124.
- application 156 could transmit a list of available observability data to a local observability controller 152.
- observability operator 124 could receive the list of available observability data from the local observability controller 152 (either directly or indirectly via central observability controller 126). For example, in response to receiving the list of available observability data from an application 156, local observability controller 152 could forward the list to central observability controller 126. Central observability controller 126 could transmit/forward the list to observability operator 124 (e.g., automatically or in response to receiving a request for the list).
- an application transmits a list of available observation data in response to discovering (e.g., via Kubernetes K8 service discovery) a local observability controller 152 that is executing on the same agent computing node 140.
- the controller computing node includes a central observability controller 126.
- the central observability controller 126 is responsible for managing the local observability controllers 152 of the agent computing nodes 140, such as transmitting observation data collection requests to local observability controllers 152, receiving collected observation data from the local observability controllers 152, storing the received observation data, processing the received observation data, analyzing the received observation data, and/or the like.
- creating an association includes storing the association definition and collection definition.
- central observability controller 126 could store the association definition and collection definition in a storage device at or accessible to controller computing node 110.
- Central observability controller 126 can use the stored association definition to determine when the corresponding criteria has been met.
- central observability controller 126 can use the stored collection definition to determine the observability action(s) that should be performed for an association.
- creating an association includes transmitting/forwarding the association request to the one or more local observability controllers 152.
- the central observability controller 126 transmits the association definition and the collection definition that was received from observability operator 124. In some embodiments, the central observability controller 126 transmits a portion of the association request. For example, central observability controller 126 could transmit the association definition without transmitting the collection definition, transmit the collection definition without transmitting the association definition, transmit a portion of the association definition and/or a portion of the collection definition, and/or the like. [0056] In some embodiments, central observability controller 126 determines, based on the association definition, one or more agent computing nodes 140 at which observation data used in detecting the association event should be detected.
- central observability controller 126 could determine, based on the association definition, one or more observation data collectors 154 for collecting the observation data used in detecting the association event. Central observability controller 126 transmits the association request to the local observability controller 152 at each of the agent computing nodes 140. [0057] In some embodiments, central observability controller 126 determines a portion of the association definition that is associated with a given agent computing node 140. Central observability controller 126 transmits the portion of the association definition that is associated with the given agent computing node 140 to the given agent computing node 140. Additionally, in such embodiments, central observability controller 126 does not transmit the portion of the association definition that is not associated with the given agent computing node 140 (if any).
- the association definition specifies a first set of criteria associated with observation data that should be collected by observation data collector 154(1) at agent computing node 140(1) and a second set of criteria associated with observation data that should be collected by observation data collector 154(2) at agent computing node 140(2).
- Central observability controller 126 could transmit the first set of criteria to local observability controller 152(1) at agent computing node 140(1) without transmitting the second set of criteria.
- central observability controller 126 could transmit the second set of criteria to local observability controller 152(2) at agent computing node 140(2) without transmitting the first set of criteria.
- central observability controller 126 determines a portion of the collection definition that is associated with a given agent computing node 140.
- Central observability controller 126 transmits the portion of the collection definition that is associated with the given agent computing node 140 to the given agent computing node 140. Additionally, in such embodiments, central observability controller 126 does not transmit the portion of the collection definition that is not associated with the given agent computing node 140 (if any). For example, assume the collection definition specifies a first observability action associated with observation data that should be collected by observation data collector 154(1) at agent computing node 140(1) and a second observability action associated with observation data that should be collected by observation data collector 154(2) at agent computing node 140(2). Central observability controller 126 could transmit the first observability action to local observability controller 152(1) at agent computing node 140(1) without transmitting the second set of criteria.
- creating an association includes identifying one or more local observability controllers 152 that should participate in the association.
- Local observability controllers 152 that should participate in the association include, for example and without limitation, a local observability controller 152 that should track the association, manages an observation data collector 154 that should track the association, manages an observation data collector 154 that collects observation data for tracking the association, that should perform one or more observability actions specified by the association, that manages an observation data collector 154 that should perform one or more observability actions specified by the association, and/or the like.
- Central observability controller 126 transmits/forwards the association request to the one or more local observability controllers 152 that should participate in the association.
- central observability controller 126 analyzes the association definition to determine which local observability controllers 152 are involved in tracking the association.
- the criteria specified by the association definition could correspond to observation data from a given application, such as application 156.
- Central observability controller 126 could determine the components that are needed to collect the observation data from the given application, such as which agent computing nodes 140 the application is executing on, which observation data collectors 154 are configured to collect or have access to observation data from the given application, and/or the like.
- Central observability controller 126 includes the local observability controllers 152 that are involved in tracking the association in the one or more local observability controllers 152 that should participate in the association. [0061] In some embodiments, central observability controller 126 analyzes the collection definition to determine which local observability controllers 152 are involved in performing an observability action. For example, a given observability action could specify collecting observation data from a given application. Central observability controller 126 could determine the components that are needed to collect the observation data in the same way as discussed above for tracking an association. Central observability controller 126 includes the local observability controllers 152 that are involved in performing an observability action in the one or more local observability controllers 152 that should participate in the association.
- the local observability controller(s) 152 are responsible for managing the observation data collector(s) 154 at the agent computing node 140.
- a local observability controller 152 receives an association request (or portion thereof) from a central observability controller 126 and configures one or more observation data collector(s) 154 based on the association request.
- local observability controller 152 configures the one or more observation data collectors 154 to collect observation data associated with the request. For example, local observability controller 152 could configure a given observation data collector 154 to collect observation data used in tracking the association during operation of the system under observation and/or to collect observation data specified by the collection definition when the association is active.
- configuring an observation data collector 154 to collect observation data includes transmitting or forwarding the association request (or portion thereof) to the observation data collector 154.
- the observation data collector 154 determines the observation data specified in the association request and collects observation data in accordance with the request.
- configuring an observation data collector 154 includes determining the observation data that should be collected by the observation data collector 154 and transmitting a data collection request to the observation data collector 154.
- the data collection request specifies the observation data that should be collected.
- the data collection request could also include additional information, parameters, criteria, and/or the like associated with collecting the observation data.
- configuring an observation data collector 154 includes configuring the observation data collector 154 to execute the included instructions.
- the association request could include a plug-in for collecting observation data used in detecting the association event.
- the local observability controller 152 could transmit the plug-in or a storage location of the plug-in to the observation data collector 154.
- the observation data collector 154 receives or retrieves the plug-in and installs the plug-in. After installing the plug-in, the observation data collector 154 can execute the plug-in to collect the associated observation data.
- the association request could include collection instructions that should be executed when the association is active.
- the local observability controller 152 could transmit the collection instructions or a storage location of the collection instructions to the observation data collector 154.
- the observation data collector 154 could store the collection instructions in connection with the association. When the association is active, the observation data collector 154 can execute the collection instructions.
- a local observability controller 152 instead of transmitting the collection instructions as part of the association creation process, a local observability controller 152 could transmit the collection instructions after determining that the association is active. That is, during operation of the system under operation, if local observability controller 152 determines that the association event has started or occurred, then local observability controller 152 transmits the collection instructions to the observation data collector 154.
- the association request includes instructions or indicates a storage location of instructions that, when executed, cause an observation data collector 154 to track the association.
- Local observability controller 152 transmits the instructions or the indicated location to observation data collector 154.
- Observation data collector 154 receives or retrieves the instructions and executes the instructions.
- the observation data collector 154 is able to detect when an association event has started/occurred/ended and make determinations as to whether the association is active or not. Additionally, executing the instructions could also cause the observation data collector 154 to perform the observability actions specified by the collection definition when the observation data collector 154 determines that the association is active.
- the observation data collector 154 can detect when an association event has occurred and perform the corresponding observability actions, without waiting for the local observability controller 152 to transmit a data collection request.
- the instructions included or indicated by an association request are injected as an extension to the observation data collector 154.
- an adapter e.g., an eBPF program
- injected code can be an extension of association logic, an extension of collection logic, or could even involve making changes to the system under observation, such as dropping or rerouting a packet or changing the kernel scheduling order. Any technically feasible approach can be used to inject instructions into an observation data collector 154.
- an observation data collector 154 could include an API for injecting (and optionally removing) code from the observation data collector 154.
- an association request includes instructions that should be injected in and/or executed by a local observability controller 152.
- the association request could include association evaluation instructions that, when executed by a local observability controller 152, enable the local observability controller 152 to determine whether an association is active.
- local observability controller 152 After receiving the association request from central observability controller 126, local observability controller 152 could determine that the association request includes instructions that should be executed by the local observability controller 152 (as opposed to instructions that should be executed by an observation data collector).
- the Local observability controller 152 stores, injects, and/or executes the instructions, whichever the case may be.
- the observation data collectors 154 collect observation data for the corresponding agent computing node 140.
- an observation data collector 154 e.g., node_exporter or opentelemetry_exporter
- the observation data collector 154 collects observation data related to the execution environment.
- the observation data includes, for example and without limitation, measurement data and/or trace data.
- Measurement data includes numeric information such as the number of received/sent network packages per second, CPU utilization percentage, or the like. Trace data includes information regarding events that are determined to belong together.
- local observability controller 152 could be configured to filter the collected observation data (e.g., in accordance with configuration information, settings, and/or the like) and transmit the filtered observation data to central observability controller 126.
- central observability controller 126 could transmit a request for observation data.
- Local observability controller 152 could identify a portion of the collected observation data that corresponds to the request (e.g., data for a particular application, service, and/or the like) and transmit the identified portion.
- the local observability controller 152 transmits the collected observation data, or a portion thereof, to the central observability controller 126 via an observability application programming interface (API)/framework such as OpenTelemetry.
- API application programming interface
- local observability controller 152 could transmit observation data using a “push” mechanism (e.g., transmit observation data when the observation data is available) and/or a “pull” mechanism (e.g., transmit the observation data in response to receiving a request for the observation data).
- central observability controller 126 can receive observation data without first requesting the data (e.g., when local observability controller 152 pushes the data) and/or can request observation data from local observability controller 152.
- central observability controller 126 could request observation data periodically (e.g., every given number of minutes, hour(s), day(s), week(s), and/or the like).
- central observability controller 126 could receive a notification that an association event has started and, in response, request additional observation data (e.g., as specified by a corresponding collection definition) from local observability controller 152.
- local observability controller 152 could request the observation data be collected by one or more observation data collectors 154.
- an observation data collector 154 is configured to track one or more associations in conjunction with collecting observation data. While collecting observation data, observation data collector 154 also determines whether any association events have occurred and/or whether any associations have become active.
- the observation data collector 154 could include injected code that enables observation data collector 154 to detect when a given association event has started.
- observation data collector 154 detects the association event and determines that the corresponding association is active. Additionally, while collecting observation data, observation data collector 154 determines whether any association events have ended and/or whether any associations have become inactive. [0075] In some embodiments, in response to detecting that an association event has started and/or determining that an association has become active, the observation data collector 154 determines one or more observability actions corresponding to the association. The observation data collector 154 performs the one or more observability actions or causes the one or more observability actions to be performed. In some embodiments, the one or more observability actions corresponding to an association include a plurality of triggers.
- Performing the one or more observability actions includes evaluating the plurality of triggers to determine if the conditions for any triggers have been met. If the conditions for a trigger have been met, then the action associated with the trigger (i.e., observation data collection) is performed. In some embodiments, observation data collector 154 performs the one or more observability actions periodically (e.g., every minute, every ten minutes, every hour, etc.) while the association is active.
- the collection definition for an association indicates when an observability action should be performed (e.g., immediately after association is activated, only when trigger conditions are met, etc.) and/or a frequency at which the observability action should be performed (e.g., periodically, at a specified time, only once, a specified number of times, etc.).
- the observation data collector 154 in response to detecting that an association event has started and/or determining that an association has become active, the observation data collector 154 transmits a notification to the local observability controller 152. The notification could indicate that the association has become active and/or that the association event has started.
- observation data collector 154 determines if at least one of the one or more observability actions should be performed by a component of the observability system other than the observation data collector 154 itself. For example, observation data collector 154 could determine that a given observability action should be performed by at least one of another observation data collector 154, a local observability controller 152, or the central observability controller 126. Observation data collector 154 transmits the notification to the local observability controller 152 in response to determining that at least one observability action should be performed by a different component.
- Transmitting the notification to the local observability controller 152 causes the at least one observability action to be performed by the other component(s). For example, as explained in further detail below, when the local observability controller 152 receives the notification, local observability controller 152 causes the observability action(s) for the association to be performed by the responsible components of the observability system (e.g., via transmitting/forwarding a notification to the central observability controller 126, performing observability actions for which the local observability controller 152 is responsible, transmitting requests to other observation data collectors 154 that are managed by the local observability controller 152, and/or the like).
- the responsible components of the observability system e.g., via transmitting/forwarding a notification to the central observability controller 126, performing observability actions for which the local observability controller 152 is responsible, transmitting requests to other observation data collectors 154 that are managed by the local observability controller 152, and/or the like).
- the observation data collector 154 in response to detecting that an association event has ended and/or determining that an association has become inactive, stops performing the one or more observability actions corresponding to the association and/or causes the one or more observability actions to be stopped. In some embodiments, stopping performance of an observability action includes ceasing to execute a set of instructions corresponding to the observability action, such as a plug-in, code, script, and/or the like. Additionally, observation data collector 154 could remove the set of instructions (e.g., uninstalling a plug-in, deleting a script, etc.).
- the observation data collector 154 in response to detecting that an association event has ended and/or determining that an association has become inactive, transmits a notification to the local observability controller 152.
- the notification indicates that the association has become inactive and/or that the association event has ended/stopped.
- the notification includes identifying information that can be used to determine the specific association that has become inactive from among a plurality of associations.
- observation data collector 154 determines if at least one of the one or more observability actions is being performed by a component of the observability system other than the observation data collector 154 itself, e.g., by at least one of another observation data collector 154, a local observability controller 152, or the central observability controller 126.
- Observation data collector 154 transmits the notification to the local observability controller 152 in response to determining that at least one observability action of the inactive association is being performed by a different component. Transmitting the notification to the local observability controller 152 causes the other component(s) to stop performing the at least one observability action.
- the observability system can easily extend the capability of the observation data collectors 154 and their collection configurations without requiring redeployment or reprogramming of the observation data collectors.
- an observation data collector 154 can collect observation data needed for the association that the observation data collector 154 may have been unable to collect without the instructions.
- the observation data collector 154 is able to detect when an association is active and perform observability actions without having to transmit observation data to the local observability controller 152 and wait to receive instructions from the local observability controller 152 (or for local observability controller 152 to transmit observation data to and receive instructions from central observability controller 126).
- a local observability controller 152 is configured to track one or more associations in conjunction with receiving observation data from one or more observation data collectors 154. While receiving observation data, the local observability controller 152 also determines whether any association events have started and/or whether any associations have become active. Additionally, the local observability controller 152 could determine whether any association events (that were previously started) have ended and/or whether any associations have become inactive.
- the local observability controller 152 determines that an association event has started based on receiving a notification from an observation data collector 154 that indicates that the association event has started. In some embodiments, the local observability controller 152 determines that an association has become active based on receiving a notification from an observation data collector 154 that indicates that the association has become active. The notifications could include an identifier or other information that indicates the specific association and/or association event. [0084] Similarly, in some embodiments, the local observability controller 152 determines that an association event has ended based on receiving a notification from an observation data collector 154 that indicates that the association event has ended.
- the local observability controller 152 determines that an association has become inactive based on receiving a notification from an observation data collector 154 that indicates that the association has become inactive.
- the notifications could include an identifier or other information that indicates the specific association and/or association event.
- determining that an association event has started and/or that an association has become active includes determining that a given system event has occurred. After determining that a system event has occurred, the local observability controller 152 determines whether the system event is an association event and/or is specified by the association definition for any associations.
- Local observability controller 152 compares each system event with the association definition for each association included in a plurality of association (e.g., associations for which local observability controller 152 has stored information). If an association definition specifies the given system event, then local observability controller 152 determines that the system event is an association event and that the association event has started. [0086] In some embodiments, local observability controller 152 determines that a given system event occurred based on analyzing observation data received from one or more observation data collectors 154. For example, the observation data could include data indicating that a given system event has started.
- local observability controller 152 determines that a given system event occurred based on receiving a notification or other data from an observation data collector 154 indicating that one or more events have occurred, including the given system event. [0087] Similarly, in some embodiments, determining that an association event has ended and/or that an association has become inactive includes determining that a given system event has ended. After determining that the given system event has ended, the local observability controller 152 determines whether the system event is an association event and/or is specified by the association definition for any associations. Local observability controller 152 compares each system event with the association definition for each association.
- local observability controller 152 determines that the system event as an association event and that the association event has ended. [0088] In some embodiments, local observability controller 152 determines that a given system event ended based on analyzing observation data received from one or more observation data collectors 154. For example, the observation data could include data indicating that the given system event has ended or completed. As another example, the observation data could include data indicating that a second system event has started, where the second system event indicates the end of the given system event. In some embodiments, local observability controller 152 determines that a given system event ended based on receiving a notification or other data from an observation data collector 154 indicating that one or more events have ended, including the given system event.
- determining that an association event has started and/or that an association has become active includes determining whether the criteria specified by the association definition has been met.
- local observability controller 152 determines whether collected observation data values satisfy the criteria specified by the association definition. If the criteria has been met, then local observability controller 152 determines that the association has become active. If the criteria has not been met, then local observability controller 152 determines that the association has not become active, even though the association event has started.
- the criteria for a given association could specify a plurality of observation data values that, when observed/collected, indicate that the association event has started.
- Local observability controller 152 determines, based on observation data values received from one or more observation data collectors 154, whether the criteria for the given association has been met. If the criteria has been met, then local observability controller 152 determines that the association has become active. [0090] In some embodiments, local observability controller 152 evaluates the criteria for a subset of associations that are not currently active (i.e., for inactive associations). In some embodiments, the criteria includes criteria for determining that the corresponding association event has ended and/or that the association has become inactive (i.e., criteria for inactivating an association).
- an association could become inactive if the criteria for activating the association and/or the criteria for determining that the association event has started (or is still occurring) are no longer being met.
- local observability controller 152 evaluates the criteria for both active and inactive associations.
- local observability controller 152 determines one or more observability actions corresponding to the association. Determining the one or more observability actions could include, for example, retrieving the collection definition for the association from storage and determining the observability action(s) specified in the collection definition.
- the local observability controller 152 performs the one or more observability actions or causes the one or more observability actions to be performed. [0092] In some embodiments, for each observability action, local observability controller 152 determines the component(s) that are involved in performing the observability action. If local observability controller 152 determines that the observability action should be performed by the local observability controller 152 itself, then local observability controller 152 performs the observability action. If local observability controller 152 determines that an observability action should be performed by an observation data collector 154 that is managed the local observability controller 152, then local observability controller 152 transmit a request to the observation data collector 154 to perform the observability action.
- an observability action specifies observation data that should be collected by an observation data collector 154.
- the specified observation data can be additional observation data that was not part of the observation data being collected by the observation data collector 154 (i.e., was not collected prior to the association becoming active).
- the local observability controller 152 causes the observation data collector 154 to collect the specified observation data.
- local observability controller 152 transmits a data collection request to the observation data collector 154 that specifies the observation data that should be collected.
- the collection definition includes instructions or indicates a location of instructions that should be executed as part of performing or in order to perform an observability action. Performing the observability action includes executing the instructions or causing the instructions to be executed.
- the instructions could include a plug-in that should be added to an observation data collector 154.
- the local observability controller 152 transmits the plug-in to the observation data collector 154 or transmits a request for the observation data collector 154 to retrieve and install the plug- in.
- the observation data collector 154 receives the instructions and begins execution of the instructions.
- the one or more observability actions corresponding to an association include a plurality of triggers. Performing the one or more observability actions includes evaluating the plurality of triggers to determine if the conditions for any triggers have been met. If the conditions for a trigger have been met, then the action associated with the trigger (i.e., observation data collection) is performed.
- local observability controller 152 performs the one or more observability actions periodically (e.g., every minute, every ten minutes, every hour, etc.), or causes the one or more observability actions to be performed periodically, while the association is active.
- the collection definition for an association indicates when an observability action should be performed (e.g., immediately after association is activated, only when trigger conditions are met, etc.) and/or a frequency at which the observability action should be performed (e.g., periodically, at a specified time, only once, a specified number of times, etc.).
- the local observability controller 152 in response to detecting that an association event has started and/or determining that an association has become active, transmits a notification to the central observability controller 126.
- the notification could indicate that the association has become active and/or that the association event has started.
- the notification includes identifying information that can be used to retrieve or request stored association information (e.g., the association definition and/or collection definition).
- local observability controller 152 determines if at least one of the one or more observability actions should be performed by a component of the observability system that is not at the agent computing node 140.
- local observability controller 152 could determine that a given observability action should be performed by at least one of an observation data collector 154 executing on a different agent computing node 140, another local observability controller 152, or the central observability controller 126.
- Local observability controller 152 transmits the notification to the central observability controller 126 in response to determining that at least one observability action should be performed by a component at a different computing node. Transmitting the notification to the central observability controller 126 causes the at least one observability action to be performed by the other component(s).
- central observability controller 126 when the central observability controller 126 receives the notification, central observability controller 126 causes the observability action(s) for the association to be performed by the responsible components of the observability system (e.g., by transmitting requests to other local observability controllers 152).
- the local observability controller 152 in response to detecting that an association event has ended and/or determining that an association has become inactive, stops performing the one or more observability actions corresponding to the association and/or causes the one or more observability actions to be stopped.
- the local observability controller 152 transmits a request to the observation data collector 154 to stop performing the observability action. In response to receiving the request, the observation data collector 154 stops performing the observability action. In some embodiments, if central observability controller 126 received a notification that the association has become inactive and/or that the association event has ended from the observation data collector 154, then the local observability controller 152 may not need to send a request to stop performing the observability action to the observation data collector 154.
- stopping performance of an observability action includes ceasing to execute a set of instructions corresponding to the observability action, such as a plug-in, code, script, and/or the like. Additionally, local observability controller 152 could remove the set of instructions (e.g., uninstalling a plug-in, deleting a script, etc.). If the instructions are being executed by an observation data collector 154 managed by the local observability controller 152, local observability controller 152 could transmit a request to the observation data collector 154 to stop executing the instructions and/or to remove the instructions. In response to receiving the request, observation data collector 154 stops executing the instructions and/or removes the instructions, whichever the case may be.
- the local observability controller 152 in response to detecting that an association event has ended and/or determining that an association has become inactive, transmits a notification to the central observability controller 126.
- the notification indicates that the association has become inactive and/or that the association event has ended/stopped.
- the notification includes identifying information that can be used to determine the specific association that has become inactive from among a plurality of associations.
- local observability controller 152 determines if at least one of the one or more observability actions is being performed by a component of the observability system that is not at the agent computing node 140, e.g., by at least one of an observation data collector 154 of a different agent computing node 140, another local observability controller 152, or the central observability controller 126.
- Local observability controller 152 transmits the notification to the central observability controller 126 in response to determining that at least one observability action of the inactive association is being performed by a component at a different computing node. Transmitting the notification to the central observability controller 126 causes the other component(s) to stop performing the at least one observability action.
- central observability controller 126 receives observation data collected by various observation data collectors 154 from the local observability controllers 152 of the corresponding agent computing nodes 140. In some embodiments, central observability controller 126 transmits the received observation data to other applications, modules, computing devices, and/or the like for further processing. For example, central observability controller 126 could transmit observation data to an analysis application that analyzes the data and displays results to a user via a user interface. In some embodiments, central observability controller 126 is configured to track one or more associations in conjunction with receiving observation data from one or more local observability controllers 152.
- the central observability controller 126 While receiving observation data, the central observability controller 126 also determines whether any association events have started and/or whether any associations have become active. Additionally, the central observability controller 126 could determine whether any association events (that were previously started) have ended and/or whether any associations have become inactive. [0104] In some embodiments, the central observability controller 126 determines that an association event has started based on receiving a notification from a local observability controller 152 that indicates that the association event has started. In some embodiments, the central observability controller 126 determines that an association has become active based on receiving a notification from a local observability controller 152 that indicates that the association has become active. The notifications could include an identifier or other information that indicates the specific association and/or association event.
- the notifications are forwarded to the central observability controller 126 from an observation data collector 154 by the local observability controller 152.
- the central observability controller 126 determines that an association event has ended based on receiving a notification from a local observability controller 152 that indicates that the association event has ended.
- the central observability controller 126 determines that an association has become inactive based on receiving a notification from a local observability controller 152 that indicates that the association has become inactive.
- the notifications could include an identifier or other information that indicates the specific association and/or association event.
- the notifications are forwarded to the central observability controller 126 from an observation data collector 154 by the local observability controller 152.
- determining that an association event has started and/or that an association has become active includes determining that a given system event has occurred. After determining that a system event has occurred, the central observability controller 126 determines whether the system event is an association event and/or is specified by the association definition for any associations. Central observability controller 126 compares each system event with the association definition for each association included in a plurality of association (e.g., associations for which central observability controller 126 has stored information).
- central observability controller 126 determines that the system event is an association event and that the association event has started. [0107] In some embodiments, central observability controller 126 determines that a given system event occurred based on analyzing observation data received from one or more local observability controllers 152. For example, the observation data could include data indicating that a given system event has started. In some embodiments, central observability controller 126 determines that a given system event occurred based on receiving a notification or other data from a local observability controller 152 indicating that one or more events have occurred, including the given system event.
- determining that an association event has ended and/or that an association has become inactive includes determining that a given system event has ended. After determining that the given system event has ended, the central observability controller 126 determines whether the system event is an association event and/or is specified by the association definition for any associations. Central observability controller 126 compares each system event with the association definition for each association. If an association definition specifies the given system event, then central observability controller 126 determines that the system event as an association event and that the association event has ended. [0109] In some embodiments, central observability controller 126 determines that a given system event ended based on analyzing observation data received from one or more local observability controllers 152.
- the observation data could include data indicating that the given system event has ended or completed.
- the observation data could include data indicating that a second system event has started, where the second system event indicates the end of the given system event.
- central observability controller 126 determines that a system event ended based on receiving a notification or other data from a local observability controller 152 indicating that one or more events have ended, including the system event. [0110] In some embodiments, determining that an association event has started and/or that an association has become active includes determining whether the criteria specified by the association definition has been met.
- central observability controller 126 determines whether collected observation data values satisfy the criteria specified by the association definition. If the criteria has been met, then central observability controller 126 determines that the association has become active. If the criteria has not been met, then central observability controller 126 determines that the association has not become active, even though the association event has started. In another example, the criteria for a given association could specify a plurality of observation data values that, when observed/collected, indicate that the association event has started. Central observability controller 126 determines, based on observation data values received from one or more local observability controllers 152, whether the criteria for the given association has been met.
- central observability controller 126 determines that the association has become active. [0111] In some embodiments, central observability controller 126 evaluates the criteria for a subset of associations that are not currently active (i.e., for inactive associations). In some embodiments, the criteria includes criteria for determining that the corresponding association event has ended and/or that the association has become inactive (i.e., criteria for inactivating an association). Additionally or alternatively, an association could become inactive if the criteria for activating the association and/or the criteria for determining that the association event has started (or is still occurring) are no longer being met. In such embodiments, central observability controller 126 evaluates the criteria for both active and inactive associations.
- central observability controller 126 determines one or more observability actions corresponding to the association. Determining the one or more observability actions could include, for example, retrieving the collection definition for the association from storage and determining the observability action(s) specified in the collection definition. The central observability controller 126 performs the one or more observability actions or causes the one or more observability actions to be performed. [0113] In some embodiments, for each observability action, central observability controller 126 determines the component(s) that are involved in performing the observability action.
- central observability controller 126 determines that an observability action should be performed by the central observability controller 126 itself, then central observability controller 126 performs the observability action. If central observability controller 126 determines that an observability action should be performed by a given local observability controller 152, then central observability controller 126 transmits a request to the given local observability controller 152. The request could indicate, for example, the observability action and that the local observability controller 152 should perform the observability action. [0114] If central observability controller 126 determines that an observability action should be performed by a given observation data collector 154, central observability controller 126 causes the given observation data collector 154 to perform the observability action.
- central observability controller 126 could transmit a request to the local observability controller 152 that manages the given observation data collector 154.
- the request could indicate the observability action and the given observation data collector 154 that should perform the observability action. Additionally or alternately, the request could indicate the association that was activated.
- the local observability controller 152 receives the request and causes the observation data collector 154 to perform the observability action, e.g., by transmitting a data collection request, forwarding the request to the observation data collector, and/or the like.
- an observability action specifies observation data that should be collected by an observation data collector 154.
- the specified observation data can be additional observation data that was not part of the observation data being collected by the observation data collector 154 (i.e., was not collected prior to the association becoming active).
- the central observability controller 126 causes the observation data collector 154 to collect the specified observation data.
- central observability controller 126 transmits a data collection request to the observation data collector 154, via the corresponding local observability controller 152, that specifies the observation data that should be collected.
- the collection definition includes instructions or indicates a location of instructions that should be executed as part of performing or in order to perform an observability action. Performing the observability action includes executing the instructions or causing the instructions to be executed.
- the instructions could include a plug-in that should be added to an observation data collector 154.
- the central observability controller 126 transmits the plug-in, or a request for the observation data collector 154 to retrieve and install the plug-in, to the observation data collector 154 via the corresponding local observability controller 152.
- the observation data collector 154 receives the instructions and begins execution of the instructions.
- the one or more observability actions corresponding to an association include a plurality of triggers. Performing the one or more observability actions includes evaluating the plurality of triggers to determine if the conditions for any triggers have been met. If the conditions for a trigger have been met, then the action associated with the trigger (i.e., observation data collection) is performed.
- central observability controller 126 performs the one or more observability actions periodically (e.g., every minute, every ten minutes, every hour, etc.), or causes the one or more observability actions to be performed periodically, while the association is active.
- the collection definition for an association indicates when an observability action should be performed (e.g., immediately after association is activated, only when trigger conditions are met, etc.) and/or a frequency at which the observability action should be performed (e.g., periodically, at a specified time, only once, a specified number of times, etc.).
- the component(s) after transmitting a request or notification to the component(s) that are responsible for performing an observability action, the component(s) determine whether the observability action should be performed (e.g., evaluating triggers, periodically repeating the observability action, and/or the like). That is, central observability controller 126 is not involved in performance of the observability action after transmitting the request or notification to the relevant component(s).
- the central observability controller 126 in response to detecting that an association event has ended and/or determining that an association has become inactive, stops performing the one or more observability actions corresponding to the association and/or causes the one or more observability actions to be stopped.
- central observability controller 126 determines one or more components that are performing or responsible for performing each observability action. If central observability controller 126 determines that an observability action is performed by the central observability controller 126 itself, then central observability controller 126 stops performing the observability action. If central observability controller 126 determines that an observability action is performed by a given local observability controller 152, then central observability controller 126 transmits a request to the given local observability controller 152 to stop performing the observability action. The request could indicate, for example, that the association has become inactive and/or that the association event has ended.
- central observability controller 126 may not need to send a request to stop performing the observability action to the local observability controller 152.
- central observability controller 126 determines that an observability action is performed by a given observation data collector 154, central observability controller 126 causes the given observation data collector 154 to stop performing the observability action. For example, central observability controller 126 could transmit a request to stop performing the observability action to the given observation data collector 154 via the local observability controller 152 that manages the given observation data collector 154.
- stopping performance of an observability action includes ceasing to execute a set of instructions corresponding to the observability action, such as a plug-in, code, script, and/or the like. Additionally, central observability controller 126 could remove the set of instructions (e.g., uninstalling a plug-in, deleting a script, etc.). If the instructions are being executed by another component, central observability controller 126 could transmit a request to a corresponding local observability controller 152 (e.g., the local observability controller 152 that is executing the instructions or that is managing an observation data collector 154 that is executing the instructions) to stop executing the instructions and/or to remove the instructions.
- a corresponding local observability controller 152 e.g., the local observability controller 152 that is executing the instructions or that is managing an observation data collector 154 that is executing the instructions
- the local observability controller 152 and/or observation data collector 154 stops executing the instructions and/or removes the instructions, whichever the case may be.
- the particular components of the observability system that tracks an association can vary depending on the particular association. In some embodiments, for a given association, the component that tracks the association depends on which observation data collectors 154 are used to provide the observation data used for determining whether the association is active and/or whether the corresponding association event has started/ended. For example, if the associated event can be detected by a single observation data collector, then the observation data collector could be responsible for detecting the event and starting/ending association data collection.
- the local observability controller of the given node could be responsible for detecting the event and starting/ending association data collection. If data from collectors at different nodes are needed to detect the associated event, then the central observability controller could be responsible for detecting the event and starting/ending association data collection.
- the particular components that are involved in an association e.g., tracking an association, performing an observability action
- the component(s) that are involved depend on, for example and without limitation, which component(s) are able to detect the association event; which node the association event occurred on; which node(s) are targeted or affected by the association event; which application(s) 156 are involved in the association (e.g., source of event, source of observation data, etc.); what types and/or sources of observation data are specified in the association definition and/or collection definition; and/or the like.
- determining which components are involved in an association includes determining which node(s) are involved in the corresponding association event.
- determining which components are involved in an association includes determining an application 156 that is specified in the association definition and/or collection definition.
- the association definition could specify an application 156 that generates the association event.
- observation data items specified in the association definition e.g., observation data values for the criteria
- collection definition e.g., observation data values for trigger criteria, observation data to be collected
- the component making the determination determines one or more of: which node(s) (e.g., agent computing node 140) the application 156 is executing on; which local observability controller(s) 152 is(are) executing on the same node as the application 156; which observation data collector(s) 154 is(are) configured to collect/request data from the application 156; and/or the like.
- determining which components are involved in an association includes determining the observation data that is specified in the association definition and/or collection definition. The component making the determination could determine which observation data collector(s) 154 is(are) configured to collect/request the specified observation data.
- the controller computing node 110 optionally includes an AI controller 128.
- the AI controller 128 is responsible for controlling operations of the central observability controller 126. For example, in some embodiments, AI controller 128 analyzes observation data received by central observability controller 126 to determine whether to adjust or change the observation data collected by the observability system.
- the AI controller In response to determining that the observation data collection should be adjusted or changed, the AI controller causes the central observability controller 126 to transmit requests to one or more local observability controllers 152 to adjust or change the observation data collected by various data collectors.
- AI controller 128 could analyze collected observation data to make determinations relating to an association, such as whether an association event has occurred; whether an association event has ended; whether an association has become active; whether an association has become inactive; whether a trigger included in an association has been activated; and/or the like.
- AI controller 128 could transmit the determination to central observability controller 126 as part of central observability controller 126 determining whether an association is active/inactive and whether to perform the observability actions for an association.
- FIG. 1 is a diagram illustrating interactions between components of the system of Figure 1 to create a new association, according to various embodiments.
- the API server 122 receives a manifest for a new association between a system event and a set of one or more observability actions.
- the system event is an event that occurs within the system under observation.
- the system under observation could be the observability system itself (e.g., the observability system is also the execution system) or could be a system that is separate from the observability system.
- the manifest describes the association, including the association definition and the collection definition for the association.
- the API server 122 transmits the association definition and the collection definition to observability operator 124.
- API server 122 forwards the manifest that was received in operation 202 to observability operator 124.
- API server 122 extracts the association definition and the collection definition, and transmits the extracted definitions to observability operator 124.
- transmitting the association definition and the collection definition includes transmitting one or more sets of instructions that are included with the manifest (e.g., code, scripts, rules, AI logic, etc.) to observability operator 124.
- the observability operator 124 checks the available resources of the observability system to determine whether the resources needed for the association request are available. Additionally, observability operator 124 checks if the quantity of resources needed are available, if applicable. [0133] In some embodiments, determining whether the resources are available includes determining whether the user requesting the association has the needed authority/permissions to request the resources.
- observability operator 124 transmits an access verification check request to an access control service.
- the access control service performs the requested check and transmits a response indicating whether the user has permission to request the resources. If the response indicates that the user has permission to request the resources, observability operator 124 determines that the user has the needed authority. If the response indicates that the user does not have permission to request the resources, observability operator 124 determines that the user does not have the needed authority. [0134] In some embodiments, if the access control service determines that the user has permission to request the resources, the access control service also grants access to the requested resources.
- the access control service determines that the user does not have permission to request the resources, then the access control service denies access to the requested resources. [0135] If the observability operator 124 determines that the resources are not available or that the quantity requested for a given resource is not available, then observability operator 124 could deny the association request. In such cases, observability operator 124 does not transmit the association request to central observability controller 126 and the association is not created. [0136] If the observability operator 124 determines that the resources are available, then in operation 208, the observability operator 124 transmits an association request to the central observability controller 126. The association request is a request for central observability controller 126 to create the association in the observability system.
- observability operator 124 includes the association definition and the collection definition for the association in the request. Additionally, if any instructions correspond to the association definition or collection definition (e.g., if any instructions were included with the manifest), observability operator 124 could include the instructions with the request. [0137] In operation 210, the central observability controller 126 stores the association information included in the association request. For example, central observability controller 126 could store the association definition and collection definition in a storage device at or accessible to controller computing node 110. Additionally, if the central observability controller 126 received instructions in conjunction with the association request, central observability controller 126 could store the instructions.
- central observability controller 126 could request or retrieve the set of instructions from the indicated location.
- central observability controller 126 can use the stored association definition to determine when the association is active (e.g., corresponding criteria has been met) or inactive (e.g., association event has ended).
- central observability controller 126 can use the stored collection definition to determine the observability action(s) that should be performed for an association and the components of the observability system that should perform the observability action(s).
- the central observability controller 126 transmits the association request to one or more local observability controllers 152.
- central observability controller 126 transmits the association request to each local observability controller 152 that is included in the observability system.
- central observability controller 126 identifies one or more local observability controllers 152 that are relevant to the association.
- the one or more local observability controllers 152 include a first set of one or more local observability controllers 152 that are involved in tracking the association and/or a second set one or more local observability controllers 152 that are involved in performing the observability actions in the association.
- a given local observability controller 152 can be included in the first set, the second set, both the first set and the second set, or neither the first set nor the second set, depending on the association.
- Central observability controller 126 transmits the association request to the one or more local observability controllers 152. Central observability controller 126 does not transmit the association request to local observability controllers 152 that were not identified as participating in the association. [0140] In some embodiments, the central observability controller 126 transmits the association definition and the collection definition that was received from observability operator 124. In some embodiments, the central observability controller 126 transmits a portion of the association request.
- central observability controller 126 could transmit, to each local observability controller 152, only the portions of the association definition and collection definition that are relevant to the local observability controller 152.
- the local observability controller 152 stores the association information included in the association request.
- local observability controller 152 could store the association definition and collection definition in a storage device at or accessible to controller computing node 110. Additionally, if the local observability controller 152 received instructions in conjunction with the association request, local observability controller 152 could store the instructions. Additionally or alternatively, if the association definition or collection definition indicate a storage location of a set of instructions, local observability controller 152 could request or retrieve the set of instructions from the indicated location.
- local observability controller 152 can use the stored association definition to determine when the association is active and when the association is inactive. Similarly, local observability controller 152 can use the stored collection definition to determine the observability action(s) that should be performed for an association and the components of the observability system that should perform the observability action(s). [0143] In operation 216, local observability controller 152 transmits a association request to one or more observation data collectors 154. In some embodiments, local observability controller 152 forwards the association request received from central observability controller 126.
- local observability controller 152 determines a portion of the association definition and/or a portion of the collection definition that is relevant to each observation data collector 154. Local observability controller 152 transmits an association request that includes the relevant portion(s) of the definition(s). Portions of the definitions that are not relevant may not be included in the association request. [0144] In some embodiments, if the association request received by the local observability controller 152 included instructions (e.g., scripts, code, rules, etc.) for collecting observation data, local observability controller 152 transmits the instructions in conjunction with the association request. The instructions could be transmitted together with the association request (e.g., included in the association request) or could be transmitted separately.
- instructions e.g., scripts, code, rules, etc.
- local observability controller 152 could retrieve and transmit the instructions. Alternately, local observability controller 152 could forward the storage location and the local observability controller 152 retrieves the instructions after receiving the association request. [0146] In some embodiments, local observability controller 152 determines, based on the association request, observation data that should be collected by each observation data collector 154 for tracking the association.
- the association request that is transmitted to the observation data collector 154 comprises a data collection request for the observation data that should be collected by the observation data collector 154.
- the observation data collector 154 is configured based on the association information included in the association request.
- the association request indicates a system event corresponding to the association (i.e., the association event).
- the observation data collector 154 is configured to detect when the system event occurs. Additionally, the observation data collector 154 could be configured to detect when the system event ends.
- configuring the observation data collector 154 includes determining, based on the information included in the association request, the observation data that should be collected by the observation data collector 154.
- the association request could be a data collection request that specifies the observation data that should be collected.
- the association request could include an association definition that indicates a set of criteria for activating the association.
- the observation data collector 154 determines the observation data specified by the set of criteria (e.g., type of observation data, data source, and such) and configures data collection operations to include collecting the observation data specified by the set of criteria.
- configuring the observation data collector 154 includes storing and/or executing the instructions. If the association request indicated a storage location for instructions or if an indication of the storage location was received in connection with the association request, observation data collector 154 retrieves/requests the instructions from the indicated storage location. After receiving the instructions, observation data collector 154 stores and/or executes the instructions.
- an association definition could include plug-in(s) or other code for collecting observation data used in tracking the association.
- Observation data collector 154 obtains the plug-in(s) or other code and installs/runs the code to collect the observation data.
- an association definition could include code to be injected in the observation data collector 154 for tracking the association. Observation data collector 154 executes the injected code to track the association and detect when the association has become active.
- Figure 3 is a diagram illustrating interactions between components of the system of Figure 1 for tracking an association at a local observability controller 152, according to various embodiments.
- an observation data collector 154 detects that a system event has started.
- observation data collector 154 executes code that monitors the system and detects when the system event occurs. In some embodiments, observation data collector 154 collects observation data that includes data indicating that the system event has started/occurred. [0152] In operation 304, the observation data collector 154 transmits an event start notification to a local observability controller 152. In some embodiments, the event start notification is a notification that indicates the system event and the state of the system event (i.e., started). In some embodiments, the event start notification comprises collected observation data that indicates that the system event has started. [0153] In operation 306, the local observability controller 152 determines that data associated with the event should be collected.
- determining that data associated with the event should be collected includes determining that the event is an association event (i.e., is part of an association). For example, local observability controller 152 could determine whether any stored associations are associated with the system event. Additionally, in some embodiments, determining that data associated with the event should be collected includes determining that one or more criteria specified in the association definition of the association has been met. [0154] In operation 308, the local observability controller 152 transmits a data collection request to the observation data collector 154. In some embodiments, the data collection request includes the collection definition associated with the system event. In some embodiments, the data collection request includes the observation data specified by the collection definition.
- transmitting a data collection request to the observation data collector 154 includes determining, based on the collection definition associated with the system event, observation data that should be collected for the event.
- the data collection request indicates the observation data that should be collected for the event.
- transmitting the data collection request includes determining, based on the collection definition, a subset of observation data that should be collected by the observation data collector 154.
- the subset can be a all of the observation data that should be collected for the event (i.e., only the observation data collector 154 collects observation data) or a portion thereof (i.e., other observation data collectors 154 are responsible for one or more other portions of the observation data).
- the data collection request includes a set of instructions associated with collecting the observation data (e.g., collection code, scripts, plug-ins, and such).
- the observation data that should be collected includes observation data from an application 156 that is configured to provide observation data upon request.
- the observation data collector 154 may not know how to retrieve observation data for the application directly (e.g., the observation data is internal to the application 156).
- the local observability controller 152 requests or receives (or previously requested or received) a list of available observation data from the application 156. Local observability controller 152 determines that the observation data that should be collected includes observation data that is provided by the application 156 based on the list.
- local observability controller 152 transmits a request to application 156 to provide the observation data directly to local observability controller 152 and/or to observation data collector 154.
- the data collection request indicates the observation data that should be requested from the application 156.
- the observation data collector 154 collects observation data associated with the event based on the data collection request.
- the data collection request includes a collection definition associated with the system event. Collecting the observation data includes determining, based on the collection definition, what observation data should be collected. Observation data collector 154 collects the determined observation data.
- the data collection request includes a set of instructions associated with collecting the observation data.
- observation data collector 154 could include or store instructions associated with collecting the observation data. Collecting the observation data includes obtaining, installing, and/or executing the associated instructions. [0159] If the collection definition specifies one or more observability actions, collecting the observation data further includes performing the one or more observability actions. As discussed above, an observability action can include data collection actions, but can also include other actions associated with the event, such as making changes or modifications to the system. [0160] If the collection definition specifies one or more triggers, collecting the observation data further includes evaluating the one or more triggers. When the trigger criteria is met, then observation data collector 154 performs the action (e.g., observation data collection) specified by the trigger.
- the action e.g., observation data collection
- observation data collector 154 does not perform the action. Observation data collector 154 could periodically re-evaluate the triggers and perform the action when the trigger criteria are met.
- the data collection request and/or collection definition specifies a frequency with which the observation data should be collected. Observation data collector 154 collects the observation data in accordance with the specified frequency (e.g., a single time, periodically, whenever a corresponding condition is met, etc.).
- the data collection request and/or collection definition specifies observation data from an application 156. Observation data collector 154 may not be configured to read or obtain the specified observation data from the application 156.
- observation data collector 154 receives the observation data from application 156.
- application 156 could transmit the observation data to observation data collector 154 in response to a request from local observability controller 152 and/or from observation data collector 154.
- the observation data collector 154 transmits the collected observation data to the local observability controller 152.
- the local observability controller 152 transmits an event start notification to a central observability controller 126.
- the event start notification indicates the system event that was started.
- the event start notification indicates the association that was activated as a result of the system event starting.
- central observability controller 126 causes observation data associated with the event to be collected at one or more agent computing nodes 140.
- causing observation data associated with the event to be collected at one or more agent computing nodes 140 includes determining that the one or more agent computing nodes 140 are involved in collecting the observation data.
- causing observation data associated with the event to be collected at the one or more agent computing nodes 140 includes transmitting a request to the corresponding local observability controller 152 at each agent computing node 140 of the one or more agent computing nodes.
- the event end notification indicates the system event and the state of the system event (i.e., ended/stopped). In some embodiments, the event end notification comprises collected observation data that indicates that the system event has ended or stopped. [0168] In operation 322, in response to receiving the event end notification, the local observability controller 152 transmits a data collection stop request to the observation data collector 154. [0169] In operation 324, the observation data collector 154 stops collecting observation data associated with the event. In some embodiments, stopping collection of observation data includes no longer monitoring one or more data sources associated with the observation data. In some embodiments, stopping collection of observation data includes stopping execution of data collection instructions.
- stopping collection of observation data includes removing (e.g., deleting, uninstalling, etc.) the data collection instructions.
- local observability controller 152 transmits an event end notification to central observability controller 126.
- central observability controller 126 causes observation data to stop being collected at the one or more agent computing nodes 140.
- causing observation data associated with the event to stop being collected at one or more agent computing nodes 140 includes determining that the one or more agent computing nodes 140 are involved in collecting the observation data.
- causing observation data associated with the event to stop being collected at the one or more agent computing nodes 140 includes transmitting a request to the corresponding local observability controller 152 at each agent computing node 140 of the one or more agent computing nodes.
- the request could indicate, for example, one or more of: the association corresponding to the event; an indication that the association is inactive; an indication that the event ended; the observation data that should stop being collected; and/or the like.
- observation data collector 154 collects observation data.
- the observation data includes a first set of data that the observation data collector 154 is configured to collect during normal operation of the execution system, i.e., before any associations have been activated and/or before data collection operations have been modified in any way.
- the observation data collector 154 transmits the collected observation data to a local observability controller 152.
- local observability controller 152 stores received data into a persistent storage (not shown). Additionally or alternatively, local observability controller 152 caches received data in temporary storage (not shown). Additionally or alternatively, local observability controller 152 forwards the received data, or a portion thereof, to central observability controller 126.
- the observation data collector 154 determines that an association has become active. In some embodiments, determining that an association has become active includes determining that a system event included in the association (i.e., the association event) has started.
- determining that an association has become active includes detecting a system event has started, and determining that the system event corresponds to the association. In some embodiments, determining that an association has become active includes determining one or more criteria specified in the association definition and determining that the one or more criteria have been met. [0177] In some embodiments, observation data collector 154 includes injected code that enables the observation data collector 154 to track the association. While executing the injected code, observation data collector 154 is able to detect when the association criteria has been met, when the system event has started, or other indications that the association has become active. [0178] Optionally, in operation 408, the observation data collector 154 transmits an association start notification to the local observability controller 152.
- the local observability controller 152 transmits the association start notification to a central observability controller 126.
- the observation data collector 154 collects observation data based on the association. In some embodiments, collecting observation data based on the association includes determining, based on the collection definition for the association, what observation data should be collected. Observation data collector 154 collects the determined observation data. [0181] In some embodiments, the collection definition includes or references a set of instructions for collecting the observation data. Collecting the observation data includes obtaining, installing, and/or executing the associated instructions. [0182] If the collection definition specifies one or more observability actions, collecting the observation data further includes performing the one or more observability actions.
- an observability action can include data collection actions, but can also include other actions associated with the event, such as making changes or modifications to the system.
- collecting the observation data further includes evaluating the one or more triggers. When the trigger criteria is met, then observation data collector 154 performs the action (e.g., observation data collection) specified by the trigger. If the trigger criteria are not met, then observation data collector 154 does not perform the action. Collecting the observation data could include periodically re-evaluating the triggers and perform the action when the trigger criteria are met.
- the collection definition specifies a frequency with which the observation data should be collected.
- Observation data collector 154 collects the observation data in accordance with the specified frequency (e.g., a single time, periodically, whenever a corresponding condition is met, etc.).
- the data collection request and/or collection definition specifies observation data from an application 156.
- Observation data collector 154 may not be configured to read or obtain the specified observation data from the application 156. Instead, observation data collector 154 receives the observation data from application 156. For example, observation data collector 154 could determine the observation data needed from the application 156 and transmit a request to the application 156.
- Application 156 transmits the requested observation data to observation data collector 154.
- the observation data collector 154 transmits the association observation data to the local observability controller 152.
- the observation data collector 154 determines that the association has become inactive. In some embodiments, determining that the association has become inactive includes determining that the system event included in the association (i.e., the association event) has stopped/ended. In some embodiments, determining that the association has become inactive includes detecting a system event has stopped/ended, and determining that the system event corresponds to the association. In some embodiments, determining that an association has become active includes determining one or more criteria specified in the association definition and determining that the one or more criteria are no longer being met. [0188] In some embodiments, observation data collector 154 includes injected code that enables the observation data collector 154 to track the association.
- observation data collector 154 While executing the injected code, observation data collector 154 is able to detect when the association criteria are no longer being met, when the system event has stopped/ended, or other indications that the association has become inactive. [0189] Optionally, in operation 418, the observation data collector 154 transmits an association end notification to the local observability controller 152. [0190] Optionally, in operation 420, the local observability controller 152 transmits an association end notification to the central observability controller 126. [0191] In operation 422, the local observability controller 152 stops collecting observation data for the association. In some embodiments, stopping collection of observation data includes no longer monitoring one or more data sources associated with the observation data. In some embodiments, stopping collection of observation data includes stopping execution of data collection instructions.
- stopping collection of observation data includes removing (e.g., deleting, uninstalling, etc.) the data collection instructions.
- the operations of Figure 4 are shown in an order, in various embodiments, the operations can be performed in a different order. Additionally, various operations can be performed in parallel.
- the association start notifications could be transmitted to the local observability controller 152 and/or the central observability controller 126 after or in conjunction with observation data collector 154 collecting observation data for the association.
- the association end notifications could be transmitted to the local observability controller 152 and/or the central observability controller 126 after or in conjunction with observation data collector 154 stopping collection of observation data for the association.
- FIG. 5 is a flowchart of method steps for creating an association at an observability system, according to various embodiments. Although the method steps are described with reference to the system of Figure 1, persons skilled in the art will understand that any system configured to implement the method steps, in any order, falls within the scope of the present disclosure.
- a method 500 begins at step 502, where a controller computing node receives a request to associate a system event with observation data collection.
- the request includes an association definition and a collection definition.
- the association definition indicates the criteria for triggering one or more observability data collection actions.
- the collection definition indicates the one or more observability data collection actions that should be taken when the criteria is met.
- the request includes one or more sets of observability action instructions. Each set of observation data instruction, when executed, performs the corresponding observability action. In some embodiments, the request includes one or more sets of association tracking instructions. Each set of association tracking instructions, when executed, tracks the association to detect when the association has become active and when the association has become inactive. In some embodiments, the request includes one or more sets of observation data collection instructions. Each set of observation data collection instructions, when executed, collects the corresponding observation data. Collecting the corresponding observation data can be part of tracking the association and/or part of performing an observability action. [0196] In some embodiments, the request is received from a user.
- a user could transmit a request to an observability operator 124 at controller computing node 110 via a graphical user interface.
- the request is received via an API.
- an API server 122 at controller computing node 110 could receive a manifest that contains an association definition and a collection definition.
- the controller computing node determines whether the system includes sufficient resources to meet the request. In some embodiments, the controller computing node determines, based on the association request, the resources needed to meet the request. The controller computing node determines whether the needed resources are available. Additionally, the controller computing node could determine a quantity of each resource that is needed, if applicable. The controller computing node determines whether the needed quantity of each resource is available.
- determining whether the resources are available includes determining whether the user requesting the association has the needed authority or permission to request the resources. For example, an observability operator 124 at controller computing node 110 could transmit an access verification check request to an access control service. If the user does not have sufficient authority/permission, then the controller computing node determines that the system does not include sufficient resources to meet the request. [0200] In some embodiments, determining whether the resources are available includes determining whether the observability system can request, read, or otherwise obtain information needed for the association. Information needed for the association includes, for example and without limitation, observation data specified by the association definition and/or collection definition, the status of the association event, and/or the like.
- step 506 the controller computing node denies the request.
- step 508 the controller computing node determines one or more components that are associated with tracking the system event.
- a component could be, for example, a controller computing node, a central observability controller on the controller computing node, an agent computing node, a local observability controller on the agent computing node, an observation data collector on the agent computing node, and/or the like.
- determining the one or more components that are associated with tracking the system event is based on the association definition. The controller computing node analyzes the association definition to determine the observation data that is needed to track the association.
- the controller computing node determines, based on the observation data that is needed to track the association, one or more observation data collectors (e.g., observation data collector 154) that can collect the observation data. [0204] In some embodiments, if the one or more observation data collectors are located at the same agent computing node, then the controller computing node determines that the agent computing node is associated with tracking the system event (e.g., using the local observability controller of the agent computing node).
- the controller computing node determines that the controller computing node is associated with tracking the system event (e.g., using the central observability controller of the controller computing node).
- the controller computing node determines that each agent computing node is associated with tracking the system event. That is, the controller computing node determines that each agent computing node is capable of individually tracking the system event.
- the controller computing node determines one or more agent computing nodes are associated with tracking the system event (using the observation data collector(s) at each agent computing node). Determining the one or more components that are association with tracking the event could include determining which observation data collector(s) should receive the association tracking instructions. For example, the controller computing node could determine that a given observation data collector collects the observation data that is needed to track the association. Based on determining that the given observation data collector collects the necessary observation data, the controller computing node determines that the given observation data collector should receive the association tracking instructions.
- the controller computing node causes the one or more components to track the system event.
- a component included in the one or more components is the controller computing node or is a component that is at the controller computing node (e.g., a central observability controller)
- causing the one or more components to track the system event includes storing association information at the controller computing node.
- the controller computing node uses the stored association information to determine whether an association has become active/inactive.
- causing the one or more components to track the system event includes transmitting an association request to the agent computing node.
- a central observability controller 126 at a controller computing node 110 could transmit an association request to the local observability controller 152 at an agent computing node 140 that is associated with or includes a component that is associated with tracking the system event.
- the association request includes the association definition, or a portion thereof.
- the association request indicates the system event that should be tracked. [0210]
- the controller computing node transmits the instructions in conjunction with the association request to the agent computing node.
- the controller computing node determines one or more agent computing nodes that are associated with the observation data collection. In some embodiments, determining the one or more components that are associated with the observation data collection is based on the collection definition. The controller computing node analyzes the collection definition to determine the observation data that is collected when the association is active. The controller computing node determines, based on the observation data that is collected when the association is active, one or more observation data collectors (e.g., observation data collector 154) that can collect the observation data. The controller computing node identifies the agent computing node(s) corresponding to the one or more observation data collectors.
- the controller computing node analyzes the collection definition to determine the observability actions that are performed when the association is active. The controller computing node determines, based on the observability actions, the agent computing node(s) that are involved in performing each observability action. [0213] At step 514, the controller computing node causes one or more data collectors at the one or more agent computing nodes to be configured to collect the observation data. In some embodiments, causing the one or more data collectors to be configured to collect the observation data includes transmitting an association request to each agent computing node. Each agent computing node receives the association request and configures one or more observation data collectors at the agent computing node to collect the observation data.
- the association request includes the collection definition, or a portion thereof.
- the association request indicates the observation data that should be collected. Additionally, the association request could indicate the particular observation data collector that should collect the observation data.
- a central observability controller 126 at controller computing node 110 could forward a collection definition to the local observability controller 152 of an agent computing node 140.
- the local observability controller 152 configures the corresponding observation data collector based on the collection definition.
- the local observability controller 152 stores the collection definition and/or observation data that should be collected when the association is active. The local observability controller 152 could wait until it determines that the association is active before transmitting a data collection request to the observation data collector 154.
- causing a data collector to be configured to collect observation data includes transmitting data collection instructions to the agent computing node on which the data collector is executing.
- a central observability controller 126 at controller computing node 110 could forward a set of data collection instructions to the local observability controller 152 of an agent computing node 140.
- the local observability controller 152 transmits the set of data collection instructions to the observation data collector 154 at the agent computing node 140.
- steps of Figure 5 are shown in an order, in various embodiments, the operations can be performed in a different order. For example, in some embodiments, steps 512-514 could be performed prior to steps 508-510.
- the controller computing node transmits an association request to an agent computing node that includes both an association definition and a collection definition.
- steps 510 and 514 are performed together (i.e., causing components to track the system event and causing data collectors to be configured at the same time).
- Figure 6 is a flowchart of method steps for tracking an association at an agent computing node of an observability system, according to various embodiments. Although the method steps are described with reference to the system of Figure 1, persons skilled in the art will understand that any system configured to implement the method steps, in any order, falls within the scope of the present disclosure.
- a method 600 begins at step 602, where the agent computing node collects a first set of observation data.
- the first set of observation data comprises observation data related to an execution environment.
- the first set of observation data is the set of observation data that is collected while a given association is inactive. That is, the first set of observation data does not include observation data that would be collected while the given association is active.
- the set of observation data could also include observation data collected for associations other than the given association, if the other associations are active while the agent computing node is collecting the first set of observation data.
- the agent computing node detects that a system event has started. In some embodiments, an observation data collector at the agent computing node is responsible for tracking the system event.
- the observation data collector detects when the system event has started.
- a local observability controller at the agent computing node is responsible for tracking the system event.
- the local observability controller detects when the system event has started, based on observation data received from one or more observation data collectors.
- the agent computing node determines that the system event is associated with observation data collection.
- determining that the system event is associated with observation data collection includes determining that the system event is an association event. For example, the agent computing node could determine that the system event is specified in the association definition for an association. Additionally, determining that the system event is associated with observation data collection could include evaluating one or more criteria specified in the association definition.
- the agent computing node collects a second set of observation data in accordance with a set of data collection rules associated with the system event.
- the second set of observation data includes observation data associated with the system event.
- the second set of observation data includes observation data specified by the association corresponding to the system event.
- the second set of observation data generally includes observation data that is not included in the first set of observation data. That is, the second set of observation data is different from the first set of observation data that was previously being collected. While the agent computing node is collecting the second set of observation data, the agent computing node can continue to collect the first set of observation data.
- the agent computing node determines the collection definition associated with the system event.
- the agent computing node could retrieve stored association information that includes the collection definition.
- the collection definition indicates the set of data collection rules.
- collecting the second set of observation data includes determining, based on the set of data collection rules, what observation data should be collected.
- the second set of observation data includes the determined observation data.
- the data collection rules include or reference a set of instructions for collecting the observation data. Collecting the second set of observation data includes obtaining, installing, and/or executing the associated instructions.
- the data collection rules specify a plurality of triggers. Collecting the second set of observation data includes evaluating each trigger included in the plurality of triggers.
- the agent computing node When the trigger criteria for a given trigger is met, the agent computing node performs the action (e.g., observation data collection) specified by the trigger. If the trigger criteria are not met, then the agent computing node does not perform the action. Collecting the observation data could include periodically re-evaluating the triggers and perform the action when the trigger criteria are met.
- the data collection rules specify a frequency at which various observation data should be collected. The agent computing node determines, for a given piece of observation data, a frequency at which the piece of observation data should be collected. The agent computing node collects the given piece of observation data at the determined frequency.
- the association corresponding to the system event could include one or more observability actions.
- Collecting the second set of observation data further includes performing the one or more observability actions.
- the second set of observation data includes observation data relating to an application. Collecting the second set of observation data includes transmitting a request to the application for the observation data relating to the application.
- the agent computing node detects that the system event has ended.
- an observation data collector at the agent computing node is responsible for tracking the system event. The observation data collector detects when the system event has ended.
- a local observability controller at the agent computing node is responsible for tracking the system event. The local observability controller detects when the system event has ended, based on observation data received from one or more observation data collectors.
- the agent computing node stops collecting the second set of observation data.
- stopping collection of the second set of observation data includes no longer monitoring one or more data sources associated with the observation data.
- stopping collection of the second set of observation data includes stopping execution of data collection instructions. Additionally, in some embodiments, stopping collection of observation data includes removing (e.g., deleting, uninstalling, etc.) the data collection instructions.
- Figure 7 is a flowchart of method steps for tracking an association at a controller computing node of an observability system, according to various embodiments.
- a method 700 begins at step 702, where the controller computing node determines that a system event that is associated with observation data collection has started. In some embodiments, the controller computing node determines that a system event associated with observation data collection has started based on receiving a notification that an association event has started. In some embodiments, the controller computing node determines that a system event associated with observation data collection has started based on receiving a notification that an association has become active.
- the controller computing node determines that a system event associated with observation data collection has started based on stored association information.
- the stored information includes corresponding association definition for one or more associations.
- controller computing node determines whether the system event is included in any of the association definitions.
- the controller computing node determines whether one or more criteria specified in an association definition have been met. The controller computing node determines that a system event that is associated with observation data collection has started if the corresponding criteria have been met.
- the controller computing node determines a set of data collection rules associated with the system event.
- determining the set of data collection rules includes determining the collection definition associated with the system event. For example, the controller computing node could retrieve stored association information that includes the collection definition. The collection definition indicates the set of data collection rules. [0236] At step 706, the controller computing node identifies one or more agent computing nodes that should collect observation data associated with the system event, based on the set of data collection rules. [0237] At step 708, the controller computing node causes the one or more agent computing nodes to collect observation data associated with the system event in accordance with the set of data collection rules. In some embodiments, causing observation data associated with the event to be collected at the one or more agent computing nodes includes transmitting a request to each agent computing node.
- the controller computing node could transmit a data collection request to the corresponding local observability controller 152 at each agent computing node 140 of the one or more agent computing nodes.
- the request could indicate, for example, one or more of: the association corresponding to the event; the observation data that should be collected; the collection definition for the association; one or more observability actions that should be performed by component(s) of the agent computing node 140; instructions to be executed at the agent computing node 140; and/or the like.
- the controller computing node determines that the system event has ended. In some embodiments, the controller computing node determines that the system event has ended based on receiving a notification that an association event has ended.
- the controller computing node determines that the system event has ended based on receiving a notification that an association has become inactive. [0239] In some embodiments, the controller computing node determines whether one or more criteria specified in an association definition are no longer being met. The controller computing node determines that a system event that is associated with observation data collection has ended if the corresponding criteria were previously being met but are no longer being met. [0240] At step 712, the controller computing node causes the one or more agent computing nodes to stop collecting observation data associated with the system event. In some embodiments, causing observation data associated with the event to stop being collected at the one or more agent computing nodes includes transmitting a request to each agent computing node.
- the controller computing node could transmit a data collection stop request to the corresponding local observability controller 152 at each agent computing node 140 of the one or more agent computing nodes.
- the request could indicate, for example, one or more of: the association corresponding to the event; an indication that the association is inactive; an indication that the event ended; the observation data that should stop being collected; and/or the like.
- Exemplary Network Devices [0241]
- Figure 8 illustrates network devices (NDs) 800A-H in an exemplary network, according to various embodiments. In Figure 8, connectivity between the NDs 800A-H is illustrated by way of lines between various NDs.
- the NDs 800A-H are physical devices, and the connections between any two NDs can be a wireless connection or a wired connection (often referred to as a link).
- An additional line extending from NDs 800A, 800E, and 800F illustrates that these NDs connect the network to other network(s) and/or devices, and therefore, can act as ingress and egress points for the network.
- NDs 800A, 800E, and 800F can be referred to as edge NDs
- NDs 800B-D and 800G-H are referred to as core NDs.
- Figure 8 further illustrates three exemplary implementations of a network device 800: a special-purpose network device 802, a general purpose network device 804, and a hybrid network device 806.
- a special-purpose network device 802 uses custom application–specific integrated–circuits (ASICs) and a special-purpose operating system (OS).
- ASICs application–specific integrated–circuits
- OS special-purpose operating system
- special- purpose network device 802 includes networking hardware 810 comprising a set of one or more processor(s) 812, forwarding resource(s) 814 (which typically include one or more ASICs and/or network processors), and physical network interfaces (NIs) 816 (through which network connections are made, such as those shown by the connectivity between NDs 800A-H).
- Special-purpose network device 802 also includes non-transitory machine-readable storage media 818 which stores networking software 820.
- network hardware 810 executes the networking software 820 to instantiate a set of one or more networking software instance(s) 822.
- Each of the networking software instance(s) 822, and the portion of the networking hardware 810 that is executing that network software instance form a separate virtual network element 830A-R.
- Each of the virtual network element(s) (VNEs) 830A-R includes a control communication and configuration module 832A-R (sometimes referred to as a local control module or control communication module) and forwarding table(s) 834A-R, such that a given virtual network element (e.g., 830A) includes the control communication and configuration module (e.g., 832A), a set of one or more forwarding table(s) (e.g., 834A), and the portion of the networking hardware 810 that executes the virtual network element (e.g., 830A).
- a control communication and configuration module 832A-R sometimes referred to as a local control module or control communication module
- forwarding table(s) 834A-R forwarding table(s) 834A-R
- networking software 820 includes observability component 823, which when executed by networking hardware 810, causes the special-purpose network device 802 to perform one or more of the operations described above (e.g., to collect observation data, receive collected observation data, track associations, determine whether to perform observability actions, determine whether to stop performing observability actions, and/or the like).
- the special-purpose network device 802 can be physically and/or logically considered to include: 1) a ND control plane 824 (sometimes referred to as a control plane) comprising the processor(s) 812 that execute the control communication and configuration module(s) 832A-R; and 2) a ND forwarding plane 826 (sometimes referred to as a forwarding plane, a data plane, or a media plane) comprising forwarding resource(s) 814 that utilize the forwarding table(s) 834A-R and the physical NIs 816.
- a ND control plane 824 (sometimes referred to as a control plane) comprising the processor(s) 812 that execute the control communication and configuration module(s) 832A-R
- a ND forwarding plane 826 sometimes referred to as a forwarding plane, a data plane, or a media plane
- forwarding resource(s) 814 that utilize the forwarding table(s) 834A-R and the physical NIs 816.
- the ND control plane 824 (the processor(s) 812 executing the control communication and configuration module(s) 832A-R) is typically responsible for participating in controlling how data (e.g., packets) is to be routed (e.g., the next hop for the data and the outgoing physical NI for that data) and storing that routing information in the forwarding table(s) 834A-R, and the ND forwarding plane 826 is responsible for receiving that data on the physical NIs 816 and forwarding that data out the appropriate ones of the physical NIs 816 based on the forwarding table(s) 834A-R.
- data e.g., packets
- the ND forwarding plane 826 is responsible for receiving that data on the physical NIs 816 and forwarding that data out the appropriate ones of the physical NIs 816 based on the forwarding table(s) 834A-R.
- General-purpose network device 804 uses common off-the-shelf (COTS) processors and a standard OS. As shown in Figure 8, general-purpose network device 804 includes hardware 840 comprising a set of one or more processor(s) 842 (which are often COTS processors) and physical NIs 846, as well as non-transitory machine readable storage media 848 having stored therein software 850. During operation, the processor(s) 842 execute the software 850 to instantiate one or more sets of one or more applications 864A-R. [0247] In some embodiments, general-purpose network device 804 does not utilize any virtualization. In other embodiments, general-purpose network device 804 uses one or more forms of virtualization.
- COTS off-the-shelf
- virtualization layer 854 represents the kernel of an operating system (or shim executing on a base operating system) that allows for the creation of multiple instances 862A-R that can each be used to execute one or more of the sets of applications 864A-R.
- the multiple instances may also be referred to as software containers, virtualization engines, virtual private servers, jails, and/or the like.
- the multiple instances 862A-R are user spaces (e.g., a virtual memory space) that are separate from each other and/or separate from the kernel space in which the operating system is run.
- the set of applications running in a given user space unless explicitly allowed, cannot access the memory of the other processes (e.g., other user spaces).
- the virtualization layer 854 represents a hypervisor (sometimes referred to as a virtual machine monitor (VMM)) or a hypervisor executing on top of a host operating system.
- VMM virtual machine monitor
- Each of the sets of applications 864A-R is run on top of a guest operating system within a corresponding instance 862A-R (i.e., virtual machine) that is run on top of the hypervisor.
- the guest operating system and/or application(s) do not know that they are running on a virtual machine as opposed to running on a “bare metal” host electronic device.
- the guest operating system and/or application(s), through para-virtualization are aware of the present of virtualization.
- one, some, or all of the applications are implemented as unikernel(s).
- a unikernel can be generated by compiling directly with an application only a limited set of libraries (e.g., from a library operating system (LibOS) including drivers/libraries of OS services) that provide the particular OS services needed by the application.
- a unikernel can be implemented to run directly on hardware 840, directly on a hypervisor (e.g., running within a LibOS virtual machine), in a software container, and/or the like.
- various embodiments can be implemented with unikernels running directly on a hypervisor represented by virtualization layer 854, unikernels running within software containers represented by instances 862A-R, or as a combination of unikernels and the above-described techniques (e.g., unikernels and virtual machines both run directly on a hypervisor, unikernels and sets of applications that are run in different software containers).
- unikernels and virtual machines both run directly on a hypervisor, unikernels and sets of applications that are run in different software containers.
- the instantiation of the one or more sets of one or more applications 864A-R, as well as virtualization if implemented, are collectively referred to as software instance(s) 852.
- the virtual network element(s) 860A-R perform similar functionality to the virtual network element(s) 830A-R, e.g., similar to the control communication and configuration module(s) 832A and forwarding table(s) 834A.
- This virtualization of the hardware 840 is sometimes referred to as network function virtualization (NFV).
- NFV network function virtualization
- NFV can be used to consolidate many network equipment types onto industry standard high volume server hardware, physical switches, and physical storage, which could be located in data centers, NDs, customer premise equipment (CPE), and/or the like. While embodiments of the invention are illustrated with each instance 862A-R corresponding to one VNE 860A-R, other embodiments may implement this correspondence at a finer level granularity (e.g., line card virtual machines virtualize line cards, control card virtual machine virtualize control cards, etc.). It should be understood that the techniques described herein with reference to a correspondence of instances 862A-R to VNEs also apply to embodiments where such a finer level of granularity and/or unikernels are used.
- a finer level granularity e.g., line card virtual machines virtualize line cards, control card virtual machine virtualize control cards, etc.
- the virtualization layer 854 includes a virtual switch that provides similar forwarding services as a physical Ethernet switch. Specifically, this virtual switch forwards traffic between instances 862A-R and the physical NI(s) 846, as well as optionally between the instances 862A- R. In addition, this virtual switch can enforce network isolation between the VNEs 860A-R that, by policy, are not permitted to communicate with each other (e.g., by honoring virtual local area networks (VLANs)).
- VLANs virtual local area networks
- software 850 includes an observability component 853, which when executed by processor(s) 842, causes the general-purpose network device 804 to perform one or more of the operations described above (e.g., to collect observation data, receive collected observation data, track associations, determine whether to perform observability actions, determine whether to stop performing observability actions, and/or the like).
- Hybrid network device 806 includes a combination of special-purpose and general-purpose hardware and/or software.
- hybrid network device 806 could include custom ASICs/special- purpose OS and COTS processors/standard OS in a single ND or a single card within an ND.
- a platform VM i.e., a VM that that implements the functionality of the special-purpose network device 802 provides para-virtualization to the networking hardware present in the hybrid network device 806.
- a platform VM i.e., a VM that that implements the functionality of the special-purpose network device 802
- each of the VNEs receives data on the physical NIs (e.g., 816, 846) and forwards that data out the appropriate ones of the physical NIs (e.g., 816, 846).
- a VNE implementing IP router functionality could forward IP packets on the basis of some of the IP header information in the IP packet.
- IP header information includes, for example, source IP address, destination IP address, source port, destination port (where “source port” and “destination port” refer herein to protocol ports, as opposed to physical ports of a ND), transport protocol (e.g., user datagram protocol (UDP), Transmission Control Protocol (TCP), differentiated services code point (DSCP) values, and/or the like.
- a network interface may be physical or virtual, depending on the given implementation.
- an interface address is an IP address assigned to a NI, be it a physical NI or virtual NI.
- a virtual NI may be associated with a physical NI, with another virtual interface, or stand on its own (e.g., a loopback interface, a point-to-point protocol interface).
- a NI (physical or virtual) may be numbered (a NI with an IP address) or unnumbered (a NI without an IP address).
- a loopback interface (and its loopback address) is a specific type of virtual NI (and IP address) of a NE/VNE (physical or virtual) often used for management purposes. Such an IP address is referred to as the nodal loopback address.
- the IP address(es) assigned to the NI(s) of a ND are referred to as IP addresses of that ND.
- IP addresses of that NE/VNE At a more granular level, the IP address(es) assigned to NI(s) assigned to a NE/VNE implemented on a ND can be referred to as IP addresses of that NE/VNE.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Computer Networks & Wireless Communication (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Environmental & Geological Engineering (AREA)
- Mathematical Physics (AREA)
- Debugging And Monitoring (AREA)
Abstract
Des modes de réalisation de la présente demande concernent des techniques de gestion de données d'observation collectées par un système d'observabilité. Un procédé comprend la collecte d'un premier ensemble de données d'observation. Le procédé comprend en outre la détermination qu'un événement système a commencé tout en collectant le premier ensemble de données d'observation. Le procédé comprend en outre la détermination que l'événement système appartient à une association qui relie l'événement système à un ensemble d'actions d'observabilité. Le procédé comprend également, en réponse à la détermination du fait que l'événement système appartient à une association, le fait de provoquer l'exécution de l'ensemble d'actions d'observabilité, le fait de provoquer l'exécution de l'ensemble d'actions d'observabilité comprenant la collecte d'un second ensemble de données d'observation relatives à l'événement système conformément à un ensemble de règles de collecte de données d'observation.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/IB2023/060859 WO2025088365A1 (fr) | 2023-10-27 | 2023-10-27 | Système d'observabilité basé sur une association |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/IB2023/060859 WO2025088365A1 (fr) | 2023-10-27 | 2023-10-27 | Système d'observabilité basé sur une association |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025088365A1 true WO2025088365A1 (fr) | 2025-05-01 |
Family
ID=88697496
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/IB2023/060859 Pending WO2025088365A1 (fr) | 2023-10-27 | 2023-10-27 | Système d'observabilité basé sur une association |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025088365A1 (fr) |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20220083441A1 (en) * | 2019-05-13 | 2022-03-17 | Hewlett-Packard Development Company, L.P. | Data monitoring |
| US20230251902A1 (en) * | 2022-02-10 | 2023-08-10 | Cisco Technology, Inc. | Feedback-based tuning of telemetry collection parameters |
-
2023
- 2023-10-27 WO PCT/IB2023/060859 patent/WO2025088365A1/fr active Pending
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20220083441A1 (en) * | 2019-05-13 | 2022-03-17 | Hewlett-Packard Development Company, L.P. | Data monitoring |
| US20230251902A1 (en) * | 2022-02-10 | 2023-08-10 | Cisco Technology, Inc. | Feedback-based tuning of telemetry collection parameters |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12333347B2 (en) | Monitoring and policy control of distributed data and control planes for virtual nodes | |
| US10608914B2 (en) | Methods and devices for monitoring of network performance for container virtualization | |
| US11522780B1 (en) | Monitoring networks by detection of noisy agents | |
| EP3382543B1 (fr) | Surveillance de micro-niveau, visibilité et commande de ressources partagées internes à un processeur d'une machine hôte pour un environnement virtuel | |
| US10374900B2 (en) | Updating a virtual network topology based on monitored application data | |
| US11829797B1 (en) | Dynamic configuration of virtual machines | |
| US20190356971A1 (en) | Out-of-band platform tuning and configuration | |
| US9596141B2 (en) | Representing software defined networks using a programmable graph model | |
| US12007865B2 (en) | Machine learning for rule evaluation | |
| CN109074280B (zh) | 网络功能虚拟化 | |
| US20190363965A1 (en) | Monitoring connectivity and latency of a virtual network | |
| CN109075996B (zh) | 用于监视网络性能的监视控制器及因此执行的方法 | |
| US11144423B2 (en) | Dynamic management of monitoring tasks in a cloud environment | |
| JP2015503274A (ja) | 仮想レーンの動的割り当てを用いてファットツリートポロジにおける輻輳を緩和するためのシステムおよび方法 | |
| US10411742B2 (en) | Link aggregation configuration for a node in a software-defined network | |
| US11005968B2 (en) | Fabric support for quality of service | |
| EP2974433A2 (fr) | Identification et modification d'attributs au moyen d'une introspection et d'une réflexion par le matériel | |
| EP3837660A1 (fr) | Procédé et système de prédiction de violation d'un contrat intelligent à l'aide d'une création d'espace d'états dynamique | |
| WO2019228220A1 (fr) | Procédé et dispositif de gestion de tranche de réseau | |
| US10855546B2 (en) | Systems and methods for non-intrusive network performance monitoring | |
| US20220283823A1 (en) | Dynamic plugin management for system health | |
| WO2025088365A1 (fr) | Système d'observabilité basé sur une association | |
| EP4544756A1 (fr) | Système d'observabilité dynamique et adaptatif | |
| US20250379804A1 (en) | Dynamic and Adaptive Observability System | |
| WO2025003737A1 (fr) | Application d'accord de niveau de service infonuagique à base d'observabilité |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23801522 Country of ref document: EP Kind code of ref document: A1 |