WO2025099763A1 - Système et procédé de détermination d'états opérationnels d'une pluralité de services réseau - Google Patents
Système et procédé de détermination d'états opérationnels d'une pluralité de services réseau Download PDFInfo
- Publication number
- WO2025099763A1 WO2025099763A1 PCT/IN2024/052203 IN2024052203W WO2025099763A1 WO 2025099763 A1 WO2025099763 A1 WO 2025099763A1 IN 2024052203 W IN2024052203 W IN 2024052203W WO 2025099763 A1 WO2025099763 A1 WO 2025099763A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- network
- erm
- network service
- timer
- services
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0805—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
- H04L43/0817—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0681—Configuration of triggering conditions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/50—Network service management, e.g. ensuring proper service fulfilment according to agreements
- H04L41/5003—Managing SLA; Interaction between SLA and QoS
- H04L41/5009—Determining service level performance parameters or violations of service level contracts, e.g. violations of agreed response time or mean time between failures [MTBF]
- H04L41/5012—Determining service level performance parameters or violations of service level contracts, e.g. violations of agreed response time or mean time between failures [MTBF] determining service availability, e.g. which services are available at a certain point in time
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0823—Errors, e.g. transmission errors
- H04L43/0829—Packet loss
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0852—Delays
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0876—Network utilisation, e.g. volume of load or congestion level
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0876—Network utilisation, e.g. volume of load or congestion level
- H04L43/0888—Throughput
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/16—Threshold monitoring
Definitions
- a portion of the disclosure of this patent document contains material, which is subject to intellectual property rights such as, but are not limited to, copyright, design, trademark, Integrated Circuit (IC) layout design, and/or trade dress protection, belonging to Jio Platforms Limited (JPL) or its affiliates (hereinafter referred as owner).
- JPL Jio Platforms Limited
- owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all rights whatsoever. All rights to such intellectual property are fully reserved by the owner.
- the present disclosure relates generally to the field of communication systems. More particularly, the present disclosure relates to systems and methods for determining operational conditions of network services for health assessment.
- Event refers to an operation, an occurrence or a change in state within a network or its components that can trigger responses, alerts, or actions.
- Event Routing Manager refers to an element in an event-driven system where events need to be delivered from event sources to the appropriate event consumers or downstream services.
- Network services refers to a collection of independently deployable services of an application that communicate over the network.
- Health monitoring of network services refers to a process of tracking and assessment of the operational status, performance, and availability of network services.
- CPU usage refers to a percentage, indicating how much of the CPU's total capacity is being utilized at a given time.
- Memory utilization refers to the amount of memory being used by a system, an application, or a process compared to the total available memory.
- Network latency refers to a delay between a request for data and the receipt of that data.
- Error rate refers to a frequency of errors that occur in a system, typically expressed as a percentage or ratio of the total number of operations, transmissions, or requests.
- Proactive troubleshooting refers to a preventive approach to identifying and resolving potential issues in the system or the network before they escalate into significant problems.
- fault isolation refers to a process of identifying and isolating the source of a fault or failure within the system, the network, or the application.
- fault diagnostics refers to a process of identifying, analyzing, and understanding faults or failures in a system, application, or network. It involves a systematic approach to detect issues, determine their root causes, and assess their impact on the overall performance and functionality of the system.
- Roubleshooting is a process used to identify, diagnose, and resolve problems or issues in the system, the application, or the network.
- Monitoring rule refers to a set of predefined conditions and thresholds that help in tracking the performance, availability, and health of the network services.
- Monitoring tasks refers to a variety of activities aimed at ensuring the health, performance, and reliability of the services.
- Defined duration/interval refers to a specific, predetermined length of time during/after which an activity, event, or condition is expected to occur.
- Predefined threshold values refers to specific, established limits or criteria set in advance to monitor the performance, health, and behavior of the network services.
- Defined range refers to a specific interval or set of limits within which certain values or measurements are expected to fall.
- the central system (for example, a poll-based system) monitors parameters by fetching the data from the network services.
- the parameters of the network services are monitored to track the performance, availability, and overall health of the network services to ensure that the network services function correctly. Further, because of some critical issues, the service can get down between two poll durations. In such cases, it is difficult to troubleshoot the root cause of the problem.
- a method for determining an operational condition of a plurality of network services comprises initializing, by an event routing manager (ERM), a timer associated with at least one network service of the plurality of network services and determining, by the ERM, whether the initialized timer is equal to a configured timer corresponding to the at least one network service.
- the method further comprises if the initialized timer is equal to the configured timer, establishing, by the ERM, at least one connection with at least one source to fetch information corresponding to the at least one network service and executing, by the ERM, a process.
- the process includes a step of extracting at least one current value corresponding to each of a plurality of parameters associated with the at least one network service from the fetched information.
- the process includes comparing, by the ERM, the extracted at least one current value with a predefined threshold value corresponding to each of the plurality of parameters associated with the at least one network service.
- the process further includes determining, by the ERM, the operational condition of the at least one network service based on the comparison and triggering, by the ERM, an alarm to a monitoring system and logging an event based on the determined operational condition of the at least one network service.
- the method further comprises resetting, by the ERM, the initialized timer after the process execution.
- the at least one source is a network management system (NMS) or an operations, administration, and management (0AM).
- NMS network management system
- AM operations, administration, and management
- the ERM is connected to the NMS with a first interface, and the ERM is connected to the 0AM with a second interface.
- the method further comprises maintaining, by the ERM, a list of the plurality of parameters for each of the plurality of network services in a database.
- the plurality of parameters includes central processing unit (CPU) usage, memory utilization, network latency, and error rates.
- the operational state of the at least one network service is one of a fault state, an available state, and a healthy state.
- the method further comprises employing, by the ERM, one or more corrective actions to bring the at least one current value corresponding to each of the plurality of parameters within a defined range.
- the one or more corrective actions include scaling resources, reallocating workloads, or triggering failover mechanisms.
- the method further comprises identifying, by the ERM, a plurality of trends and patterns corresponding to each of the plurality of network services by analyzing the determined operational condition of each network service.
- the method further comprises planning, by the ERM, at least one of a plurality of proactive maintenance activities based on the identified plurality of trends and patterns.
- a system for determining an operational condition of a plurality of network services includes an event routing manager (ERM).
- the ERM comprises a processing unit configured to initialize a timer associated with at least one network service of the plurality of network services, and a determining unit is configured to determine whether the initialized timer is equal to a configured timer corresponding to the at least one network service. If the initialized timer is equal to the configured timer, the processing unit is configured to establish at least one connection with at least one source to fetch information corresponding to the at least one network service. The processing unit is configured to execute a process to extract at least one current value corresponding to each of a plurality of parameters associated with the at least one network service from the fetched information. The processing unit is configured to compare the extracted at least one current value with a predefined threshold value corresponding to each of the plurality of parameters associated with the at least one network service. The processing unit configured to determine the operational condition of the at least one network service is determined based on the comparison. The processing unit is configured to trigger an alarm to a monitoring system and log an event based on the determined operational condition of the at least one network service.
- the processing unit is configured to employ one or more corrective actions to bring the at least one current value corresponding to each of the plurality of parameters within a defined range.
- the one or more corrective actions includes scaling resources, reallocating workloads, or triggering failover mechanisms.
- the processing unit is configured to identify a plurality of trends and patterns corresponding to each of the plurality of network services and plan at least one of a plurality of proactive maintenance activities based on the identified plurality of trends and patterns.
- the present disclosure discloses a computer program product comprising a non-transitory computer-readable medium comprising instructions that, when executed by one or more processors, cause the one or more processors to perform a method for determining an operational condition of a plurality of network services is described.
- the method comprises initializing, by an event routing manager (ERM), a timer associated with at least one network service of the plurality of network services and determining, by the ERM, whether the initialized timer is equal to a configured timer corresponding to the at least one network service.
- ERP event routing manager
- the method further comprises if the initialized timer is equal to the configured timer, establishing, by the ERM, at least one connection with at least one source to fetch information corresponding to the at least one network service and executing, by the ERM, a process.
- the process includes step of extracting at least one current value corresponding to each of a plurality of parameters associated with the at least one network service from the fetched information.
- the process includes comparing, by the ERM, the extracted at least one current value with a predefined threshold value corresponding to each of the plurality of parameters associated with the at least one network service.
- the process further includes determining, by the ERM, the operational condition of the at least one network service based on the comparison and triggering, by the ERM, an alarm to a monitoring system and logging an event based on the determined operational condition of the at least one network service.
- An objective of the present disclosure is to provide a system and a method for continuously monitoring an operational state of a network service to detect anomalies and potential issues early.
- Another objective of the present disclosure is to regularly determine the health and availability of each network service to identify bottlenecks, performance degradation, or other issues that impact the overall system.
- the reliability of network services is enhanced by health monitoring.
- Yet another objective of the present disclosure is to analyze metrics and logs to identify performance bottlenecks, optimize resource utilization, and improve response times.
- Another objective of the present disclosure is to find the root cause of failures or abnormal behavior by analyzing the health and performance metrics of individual network services.
- Another objective of the present disclosure is to plan proactive maintenance activities and prioritize upgrades or patches by identifying trends and patterns in the network services. This helps to avoid service disruptions, security vulnerabilities, or performance degradation due to outdated or unsupported components.
- FIG. 1A illustrates an exemplary network architecture for implementing a system for determining operational conditions of a plurality of network services, in accordance with an embodiment of the present disclosure.
- FIG. IB illustrates an exemplary system architecture for facilitating autonomous health monitoring of the plurality of network services, in accordance with an embodiment of the present disclosure.
- FIG. 1C illustrates an exemplary block diagram of the system for determining the operational conditions of the plurality of network services, in accordance with an embodiment of the present disclosure.
- FIG. ID illustrates an exemplary connection architecture for facilitating communication between an event routing manager (ERM), an operations, administration, and management (0AM), and a network management system (NMS), in accordance with an embodiment of the present disclosure.
- ERP event routing manager
- AM operations, administration, and management
- NMS network management system
- FIG. 2 illustrates an exemplary flow diagram for configuring monitoring rules associated with a network service, in accordance with an embodiment of the present disclosure.
- FIG. 3 illustrates an exemplary flow diagram for performing health monitoring of the network service, in accordance with an embodiment of the present disclosure.
- FIG. 4 illustrates an exemplary flow diagram of a method for determining of the operational conditions of the plurality of network services by the ERM, in accordance with an embodiment of the present disclosure.
- FIG. 5 illustrates an exemplary computer system in which or with which embodiments of the present disclosure may be implemented.
- NMS Network Management System
- exemplary and/or “demonstrative” is used herein to mean serving as an example, instance, or illustration.
- the subject matter disclosed herein is not limited by such examples.
- any aspect or design described herein as “exemplary” and/or “demonstrative” is not necessarily to be construed as preferred or advantageous over other aspects or designs, nor is it meant to preclude equivalent exemplary structures and techniques known to those of ordinary skill in the art.
- the terms “includes,” “has,” “contains,” and other similar words are used in either the detailed description or the claims, such terms are intended to be inclusive like the term “comprising” as an open transition word without precluding any additional or other elements.
- mobile device “user equipment”, “user device”, “communication device”, “device” and similar terms are used interchangeably for the purpose of describing the invention. These terms are not intended to limit the scope of the invention or imply any specific functionality or limitations on the described embodiments. The use of these terms is solely for convenience and clarity of description. The invention is not limited to any particular type of device or equipment, and it should be understood that other equivalent terms or variations thereof may be used interchangeably without departing from the scope of the invention as defined herein.
- the central system for example, a poll-based system
- the central system monitors parameters by fetching the data from the service.
- the parameters of the network services are monitored to track the performance, availability, and overall health of the network services to ensure that the network services function correctly. Further, because of some critical issues, the network service can get down between two poll durations. In such cases, it is difficult to troubleshoot the root cause of the problem.
- the present disclosure aims to overcome the above-mentioned and other existing problems by providing a system and a method for continuously monitoring the state of network services.
- a number of predefined parameters associated with the network service are monitored over a configurable time.
- the number of predefined parameters may include, but are not limited to, latency, throughput, error rates, and resource utilization metrics, all of which are monitored over the configurable time interval defined by the system administrator.
- a critical notification/alarm is generated and dispatched to a centralized monitoring/management system.
- This alarm mechanism is a vital component of the system, enabling prompt identification and response to any anomalies or performance degradations within the network services architecture.
- the continuous surveillance of the health of the network service is crucial for ensuring the overall stability, availability, and efficiency of the network service architecture.
- the disclosed system allows for immediate detection of potential issues and enables the execution of automated responses and recovery protocols in reaction to critical conditions.
- the plurality of network services may include User Plane Function (UPF) network service, session management network service, network slice management network service, radio access network (RAN) optimization network service, edge computing network service, device management network service, analytics network service, security network service, billing and charging network service, or Application programming interface (API) gateway network service.
- the UPF network service efficiently routes data packets based on defined Quality of Service (QoS) parameters, ensuring that applications requiring low latency, such as gaming or video streaming, receive priority. By dynamically adjusting data paths, the UPF enhances user experience and optimizes network resource utilization.
- QoS Quality of Service
- the session management network service establishes, modifies, and terminates user sessions in a 5G environment.
- the session management network service manages session lifecycles and maintains session states, contributing to improved connectivity and service reliability.
- the network slice management network service enables the creation and management of virtual network slices, which are tailored segments of the network optimized for specific applications or user groups. This capability allows operators to allocate resources dynamically based on the unique requirements of various services, such as enhanced mobile broadband or loT connectivity. By facilitating network slicing, the network slice management enhances flexibility and supports diverse use cases in a 5G ecosystem.
- the RAN Optimization network service is focused on monitoring and improving the performance of the Radio Access Network. It continuously analyzes network conditions and adjusts resource allocations to minimize interference and maximize coverage.
- the edge computing network service processes data closer to the end-user, significantly reducing latency for real-time applications. By bringing computation and storage capabilities to the network edge, the edge computing network service enables use cases such as augmented reality, autonomous vehicles, and smart city applications requiring instant data processing.
- the device management network service oversees the lifecycle of connected devices within the 5G network.
- the device management network service handles tasks such as device onboarding, configuration management, and software updates.
- the analytics network service collects and processes data from various components of the 5G network, providing valuable insights into performance metrics and user behavior.
- the security network service implements robust security measures to protect user data and maintain the integrity of the 5G network.
- the security network service encompasses functionalities such as authentication, encryption, and real-time threat detection.
- the billing and charging network service manages the financial aspects of service consumption in the 5G network.
- the billing and charging network service tracks user activity in realtime, enabling flexible charging models based on usage patterns.
- the API gateway network service serves as a central point of communication between various network services and external applications in the 5G architecture.
- the API gateway network service manages incoming requests, handles authentication, and routes traffic to the appropriate network service.
- the API gateway network service simplifies integration, enhances scalability, and improves overall system performance by providing a unified interface and ensuring secure data exchange.
- FIGs. 1A-5 exemplary embodiments of the present disclosure will be described with reference to the accompanying drawings FIGs. 1A-5.
- the network architecture (100A) may include one or more user equipments (UEs) (104-1, 104-2... 104-N) associated with one or more users (102-1, 102-2... 102-N) in an environment.
- UEs user equipments
- a person of ordinary skill in the art will understand that one or more users (102-1, 102-2... 102-N) may collectively referred to as the users (102).
- UEs UE-1, 104-2... 104-N
- UE UEs
- the network (106) may also include, by way of example but not limitation, one or more of a radio access network (RAN), a wireless network, a wired network, an internet, an intranet, a public network, a private network, a packet-switched network, a circuit- switched network, an ad hoc network, an infrastructure network, a cable network, a cellular network, a satellite network, a fiber optic network, or some combination thereof.
- RAN radio access network
- wireless network a wireless network
- a wired network an internet, an intranet, a public network, a private network, a packet-switched network, a circuit- switched network, an ad hoc network, an infrastructure network, a cable network, a cellular network, a satellite network, a fiber optic network, or some combination thereof.
- FIG. 1A shows exemplary components of the network architecture (100A)
- the network architecture (100A) may include fewer components, different components, differently arranged components, or additional functional components than depicted in FIG. 1A. Additionally, or alternatively, one or more components of the network architecture (100A) may perform functions described as being performed by one or more other components of the network architecture (100A).
- FIG. IB illustrates an exemplary system architecture (100B) of the system (108) for facilitating autonomous health monitoring of the plurality of network services, in accordance with an embodiment of the present disclosure.
- the system architecture (100B) includes the ERM (130), a database (122), and a monitoring system (e.g., central monitoring/management system) (110), other services (124).
- the ERM (130) may communicate with the database (122).
- the ERM (130) may be in communication with the monitoring system (110) and other services (124).
- the other services (124) include, but are not limited to, network traffic, quality management, security management, configuration management, usage management, billing management, etc.
- the plurality of network services includes, but is not limited to, authorization network services, authentication network services, session management network services, user registration network services, password management network services, access control network services, configuration, billing management, quality management, location management, etc.
- the plurality of network services includes, but is not limited to, network services associated with a management and orchestration (MANO) framework.
- the network services associated with the MANO framework includes network services associated with components of the MANO such as a virtualized infrastructure manager (VIM), a containerized network function (CNF) manager (CNFM), and a network functions virtualization orchestrator (NFVO).
- the plurality of network services includes network services associated with 5G core service.
- the 5G core services include services associated with, but are not limited to, network slicing, session management, quality management, policy control, session management, network health monitoring, etc.
- the network service is implemented as a microservice.
- the microservice refers to a software architecture where an application is composed of small, independently deployable services, each responsible for a specific business function.
- the microservices communicate over well-defined application programming interfaces (APIs) and can be developed, deployed, and scaled independently.
- APIs application programming interfaces
- the ERM (130) may perform monitoring of rules, timers, and parameters associated with each network service with a corresponding threshold value stored in memory.
- a list of parameters corresponding to each of the plurality of network services with threshold values is maintained to monitor each network service's operational state over time.
- the parameters corresponding to the plurality of network services include, but are not limited to, latency, throughput, error rate, central processing unit (CPU) usage, memory usage, network traffic, response time, availability /uptime, user sessions.
- the latency refers to the time taken for requests to be processed.
- the throughput refers to number of requests processed per second.
- the error rate refers to percentage of failed requests or errors encountered.
- the CPU usage refers to percentage of CPU resources being used by the network service.
- the memory usage refers to amount of memory consumed by the network service.
- the network traffic refers to volume of data sent and received over the network.
- the response time refers to time taken to respond to the requests.
- the availability/uptime refers to percentage of time the network service is operational and accessible.
- the user sessions refer to number of active user sessions to provide insight about the service load and usage trends.
- a configurable/predefined threshold value is set for each parameter of the list to detect the threshold break/breach condition.
- the monitoring rules are configured to be executed for a specific duration or interval to determine the parameter values.
- the task is monitored after the specified time configured. Once the monitoring task is executed, the current value of the defined parameters is checked.
- various service parameters are defined for each of the network services.
- the ERM (130) monitors the predefined service parameters over the configurable time. On determining that the monitoring parameters cross the threshold value, the ERM (130) may send an alarm to the monitoring system (110). The ERM (130) may send critical notifications to any other system for corrective action.
- the ERM (130) performs health monitoring to monitor the state of network services continuously.
- the state of network services includes, but is not limited to, an active state, an idle state, a failed state, an unresponsive state, and an under-maintenance state.
- the active state indicates that the network service is active and performing designated tasks.
- the idle state indicates that the network service is running but not handling any requests or performing significant operations.
- the failed state indicates that the network service has encountered an error or issue that has caused the network service to stop functioning properly.
- the unresponsive state indicates that the network service is not responding to requests, and health checks are failing.
- the under-maintenance state indicates that the network service is not responding to requests, and health checks are failing.
- various metrics for example, CPU usage, memory utilization, network latency, error rates, etc.
- the metrics are monitored to detect high CPU usage, high memory utilization, high network latency, and high error rates.
- high CPU usage indicates that the network service is under heavy load, inefficiently processing requests.
- the anomalies in the CPU usage include sudden spikes or sustained high usage.
- High memory utilization causes performance issues, service crashes, an increase in collection time, etc.
- the high network latency may be caused due to network issues, overloaded services, or inefficient communication protocols.
- the network latency causes network bottlenecks or issues in inter-network service communication.
- the high error rate may be caused due to misconfigurations or external dependency failures.
- the high error rates cause failures, delays, service unavailability, etc.
- the early detection of anomalies and potential issues involves proactive monitoring, analyzing metrics and logs, and ensuring that network services operate efficiently and reliably. This enables proactive troubleshooting and helps prevent service degradation or failures.
- the ERM (130) regularly checks the health and availability of each network service to identify any bottlenecks, performance degradation, or other issues that might impact the overall system. This enhances the reliability of the network services. Appropriate actions (for example, scaling resources, reallocating workloads, or triggering failover mechanisms, etc.) are taken to ensure continuous service availability.
- scaling resources for the network services involves adjusting the amount of computational power, memory, and other resources allocated to each network service to handle varying loads and ensure optimal performance.
- reallocating workloads of the network services involves adjusting how and where incoming requests and data are distributed across the network services to optimize performance, improve efficiency, and ensure balanced resource usage.
- triggering failover mechanisms involves detecting failures and automatically redirecting traffic or requests to alternative network services to maintain service availability.
- the predefined parameters of the network service are monitored over the configurable time.
- the ERM (130) may log an error. Further, when the defined threshold value is not reached, the ERM (130) may log an error.
- the ERM (130) may identify performance bottlenecks. Further, valuable insights are obtained by analyzing metrics and error logs into the performance characteristics of the network services.
- MTTR mean time to repair
- the MTTR represents the average time taken to restore the network service to normal operation after a failure has been detected.
- the ERM (130) may track the overall health of the network services over time. By identifying trends and patterns (e.g., sudden spikes in resource usage or memory leaks, high request rates or sudden traffic spikes, incorrect resource limits), the ERM (130) may plan proactive maintenance activities and prioritize upgrades or patches.
- the proactive maintenance activities include, but are not limited to, routine updates, scaling, configuration management, implementing security measures, and adjusting resource allocation.
- the prioritization of upgrades or patches helps in addressing bugs, security vulnerabilities, and performance improvements. This helps in avoiding service disruptions, security vulnerabilities, or performance degradation due to outdated or unsupported components.
- the ERM (130) performs autonomous health monitoring of the plurality of network services by determining the operational states of the plurality of network services.
- the ERM (130) initializes a timer associated with at least one network service of the plurality of network services.
- the ERM (130) determines whether the initialized timer equals a configured timer corresponding to the at least one network service.
- the configured time is configured for a defined duration. If the initialized timer is equal to the configured timer, the ERM (130) establishes a connection with at least one data source to fetch information corresponding to the at least one network service.
- the data source includes the NMS and the 0AM.
- Information corresponding to the at least one network service is received from the 0AM and the NMS over EM_0A interface and EM_NMS interface, respectively.
- the information includes current values of a plurality of parameters corresponding to the at least one network service.
- the plurality of parameters includes central processing unit (CPU) usage, memory utilization, network latency, and error rates.
- the ERM (130) extracts the current values of each of the plurality of parameters associated with the at least one network service.
- the extracted current values of each of the plurality of parameters are compared with the predefined threshold values of the each of the plurality of parameters. Based on the comparison, the ERM (130) determines the operational condition/state of the at least one network service.
- the ERM (130) triggers the alarm to the monitoring system based on the determined operational condition/state of the at least one network service.
- the operational condition/state of the at least one network service is one of a fault state, an available state, and a healthy state.
- the fault state refers to a condition in which a network service is not functioning correctly, resulting in degradation or failure of intended operation.
- an authentication network service experiences a sudden spike in traffic, leading to resource exhaustion (i.e., high CPU usage and high memory usage).
- the available state refers to a condition where the network service is fully operational, can respond to requests, and performs the intended functions without errors.
- a user location service network service is in an available state, functioning optimally within the network, meeting user demands, and maintaining system integrity, with a response time of 150 milliseconds and an error rate of 0%.
- the healthy state of the network service refers to a condition where the network service is not only operational and available but also performing well in terms of resource usage and interactions.
- SMS short messaging service
- SMS short messaging service
- the health issues of the network service are resolved by taking corrective actions.
- the corrective actions include scaling resources, reallocating workloads, or triggering failover mechanisms.
- the scaling resources refers to adjusting the amount of the CPU usage, the memory usage, or other resources allocated to a network service to meet demand and maintain performance.
- a video streaming network service is experiencing an increase in user traffic during a live event.
- the corrective action such as scaling resources is applied.
- Vertical scaling includes increasing resources (e.g., CPU cores and memory).
- triggering of failover mechanism involves automatically redirecting traffic (i.e., incoming requests) to a backup network service or instance when a primary network service becomes unavailable.
- FIG. 1C illustrates an exemplary block diagram (110C) of the system (108) for determining the operational conditions of the plurality of network services, in accordance with an embodiment of the present disclosure.
- the one or more processor(s) (111) may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, logic circuitries, and/or any devices that process data based on operational instructions.
- the one or more processor(s) (111) may be configured to fetch and execute computer-readable instructions stored in the memory (112) of the system (108).
- the memory (112) may be configured to store one or more computer-readable instructions or routines in a non-transitory computer readable storage medium, which may be fetched and executed to create or share data packets over a network service.
- the memory (112) may include any non-transitory storage device including, for example, volatile memory such as random-access memory (RAM), or non-volatile memory such as erasable programmable read only memory (EPROM), flash memory, and the like.
- the interface(s) (114) may include a variety of interfaces, for example, interfaces for data input and output devices (RO), storage devices, and the like.
- the interface(s) (114) may facilitate communication through the system (108).
- the interface(s) (114) may also provide a communication pathway for one or more components of the system (108). Examples of such components include, but are not limited to, the database (122).
- the ERM ( 130) includes a determining unit (116) and a processing unit (120).
- the processing unit (120) is configured to initialize the timer associated with at least one network service of the plurality of network services.
- the timer is decided based on need (e.g., periodic tasks, measuring execution time, triggering of events).
- the timer triggers an event to determine the operational condition of the at least one network service.
- the timer is configured for the network service based on configurations (e.g., start the timer on receiving a request or at a defined time or at a defined interval). Based on the configuration, the timer associated with the at least one network service is initialized (e.g., in an operative aspect, after the defined interval (e.g., after every 5 minutes).
- the determining unit (116) is configured to determine whether the initialized timer is equal to a configured timer corresponding to the at least one network service.
- the initialized timer is configured to reset after the process (having a number of predefined steps) execution.
- the initialized timer may represent an interval between the monitoring tasks. For example, after executing monitoring task, the determining unit (116) is configured to wait up to the configured timer. If the initialized timer is equal to the configured timer, then the determining unit (116) execute the monitoring task again.
- the configured timer is configured for a defined duration. In an example, the defined duration for the configured timer is 2 seconds.
- the determining unit (116) determines whether the initialized timer is equal to the configured timer (i.e., whether the initialized timer is equal to 2 seconds).
- the processing unit is configured to establish at least one connection with the at least one source to fetch information corresponding to the at least one network service.
- the processing unit is configured to extract the plurality of parameters associated with the at least one network service.
- the plurality of parameters includes, but is not limited to, CPU usage, memory utilization, network latency, and error rates.
- the connection is established between the ERM (130) and one data source (e.g., the 0AM or the NMS) to fetch the information corresponding to the at least one network service.
- the information corresponding to the at least one network service includes current values of the plurality of parameters associated with the at least one network service.
- the processing unit (120) is configured to execute the process.
- the process includes extracting at least one current value corresponding to each of the plurality of parameters associated with the at least one network service from the fetched information.
- the initialized timer is configured to reset after the process execution.
- the processing unit (120) may include a comparing unit that is configured to compare the extracted at least one current value with a predefined threshold value corresponding to each of the plurality of parameters associated with the at least one network service.
- the extracted at least one current value is compared with the predefined threshold value corresponding to each of the plurality of parameters associated with the at least one network service.
- the determining unit (116) is configured to determine the operational condition of the at least one network service based on the comparison.
- the operational condition of the at least one network service includes, but is not limited to, the fault state, the available state, and the healthy state. Based on the comparison between the current values and the threshold values of each of the plurality of parameters, the operational condition of the at least one network service is determined. In an aspect, when the current values of at least one network service are within the threshold values, the operational condition of the at least one network service is considered to be in the healthy state. When some of the current values of the at least one network service are approaching the threshold values, the operational condition of the at least one network service is considered to be in the fault state (i.e., warning state).
- the operational condition of the at least one network service is considered to be in the fault state (i.e., critical state).
- the network service is considered to be in the available state.
- the available state of the network service includes, but is not limited to, processing requests within acceptable response times, error rates, having an adequate amount of resources, not facing any critical issues, etc.
- the processing unit (120) is configured to trigger an alarm to the monitoring system (110) and log the event based on the determined operational condition of the at least one network service.
- triggering of the alarm to the monitoring system (110) includes sending of a notification to the monitoring system (110), sending of automated responses (e.g., scaling resources, restarting of the network service, etc.).
- the processing unit (120) triggers the alarm to the monitoring system (110) when the CPU usage exceeds the predefined threshold value. This allows for early detection of anomalies and potential issues through parameter monitoring, enabling proactive troubleshooting and helping to prevent degradation or failures of the network services.
- the processing unit (120) is configured to resolve health issues of the at least one network service whose current value of the at least one parameter has crossed the predefined threshold value by taking corrective action to bring the values to a defined range.
- the issues are analyzed to determine the root cause.
- the root cause of the network service issues refers to the factors that trigger a failure or malfunction in one or more network services.
- the factors include, but are not limited to, service dependencies (i.e., issues in one network service affecting other network service), configuration errors (e.g., misconfigurations), infrastructure problems (e.g., server problems, resource constraints, etc.), data issues (e.g., problems with data integrity, database performance), deployment errors.
- service dependencies i.e., issues in one network service affecting other network service
- configuration errors e.g., misconfigurations
- infrastructure problems e.g., server problems, resource constraints, etc.
- data issues e.g., problems with data integrity, database performance
- deployment errors e.g., data corresponding to the issues are collected.
- the collected data is analyzed to determine the issues (e.g., patterns or spikes in error messages, stack traces, unusual log entries, etc.).
- the parameters i.e., performance metrics such as response time, error rate, throughput, etc.
- anomalies e.e., bottlenecks, reasons for failure or slowdown, bugs, mis
- Corrective actions are then identified and implemented to bring the parameter values back within the defined range.
- the network service is monitored to ensure that the parameter values stabilize within the desired range. For example, the CPU usage of the network service exceeds 90%, leading to performance degradation. Analyzing the CPU usage to determine the issues (e.g., refactor inefficient algorithms or reduce unnecessary computations).
- the corrective actions e.g., increasing resources or scaling up the network service by increasing the CPU resources allocated to the network service).
- the scaling adds more network service instances to distribute the load and reduce CPU usage per instance.
- the processing unit (120) is configured to identify a plurality of trends and patterns corresponding to each of the plurality of network services.
- the trends and patterns corresponding to each of the plurality of network services are identified to detect early warning signs of potential issues, recognize patterns in failures or outages, forecast capacity needs, troubleshoot issues, identify bottlenecks etc.
- identifying the trends and the patterns in the network services refers to a process of understanding performance, usage, and potential issues corresponding to the network services over time. Identifying the trends and patterns involves collecting and analyzing data related to the behavior and parameters (e.g., performance metrics) of the network services. Patterns that occur at specific times, usage metrics, and deviations from expected patterns are identified.
- the processing unit (120) is configured to plan at least one of a plurality of proactive maintenance activities based on the identified plurality of trends and patterns.
- the proactive maintenance activities refer to the ongoing processes and tasks required to ensure that network services are functioning optimally, remain secure, and continue to meet needs.
- the proactive maintenance activities include, but are not limited to, regular health checkups, regular updates, scalability testing, incident response planning, security audits and updates, etc.
- monitoring the at least one network service by determining the operation condition of the at least one network service is performed to assess the health of the at least one network service.
- the health monitoring allows continuous monitoring the health and availability of the at least one network service.
- the database (122) that includes data that may be either stored or generated as a result of functionalities implemented by any of the components of the processor (111).
- the database (122) may include any computer-readable medium known in the art including, for example, volatile memory, such as Static Random Access Memory (SRAM) and Dynamic Random Access Memory (DRAM), and/or non-volatile memory, such as Read Only Memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
- the database (122) may be implemented in the ERM (130).
- the database (122) is configured to store program instructions and data corresponding to the list of the plurality of parameters of the plurality of network services.
- the database (122) is configured to store the threshold values defined for the plurality of parameters.
- the program instructions include a program that implements a method for performing autonomous health monitoring of a plurality of network services by the ERM (130), in accordance with embodiments of the present disclosure and may implement other embodiments described in this specification.
- FIG. ID illustrates an exemplary connection architecture for facilitating communication between the ERM (130), an 0AM (134), and an NMS (132), in accordance with an embodiment of the present disclosure.
- the ERM (130) may communication with a plurality of data sources.
- the plurality of data sources includes the NMS (132) and the 0AM (134).
- the ERM (130) may communicate with the NMS (132) via a first interface (i.e., an event routing manager_network management system interface (EM_NMS interface)).
- the ERM (130) may communicate with the 0AM (134) via a second interface (i.e., an event routing manager_operations, administration, and management interface (EM_0A interface)).
- the ERM (130) may perform health monitoring of the network services via the EM_0A interface and the EM_NMS interface.
- the NMS (132) refers to a system for monitoring, controlling, and managing network resources and services.
- the NMS (132) enables network administrators to maintain the performance, availability, and security of the network infrastructure.
- the NMS (132) handles the plurality of network services, but are not limited to, network monitoring, fault management, configuration management, performance management, security management.
- the OAM ( 134) refers to the set of functions and processes used to manage and maintain the networks.
- the OAM (134) is essential for ensuring that network services operate efficiently, are reliable, and meet performance standards.
- the OAM (134) handles the plurality of network services, but are not limited to, network monitoring, fault management, configuration management, performance management, resource management, user management, backup and recovery management.
- the ERM (130) may receive information corresponding to the plurality of network services from the NMS (132) and the OAM (134) via the EM_NMS interface and the EM_0A interface, respectively.
- the information includes current values of the parameters corresponding to the plurality of network services.
- FIG. 2 illustrates an example flow diagram (200) for configuring monitoring rules associated with a network service, in accordance with an embodiment of the present disclosure.
- the ERM (130) may accept a request to configure the monitoring rule for one of the network services.
- the monitoring rule is configured to run a timer for duration/interval to check the parameter values.
- the timer of the defined duration/interval is set for the plurality of parameters (e.g., CPU usage, memory utilization, network latency, error rates) associated with the plurality of network services to be monitored.
- the ERM (130) may determine whether the frequency (i.e., the timer) is defined for a task (i.e., monitoring of health or operational condition of the network service).
- the frequency may refer a total number of executions to be performed to monitor the network service.
- the ERM (130) may send an error to the monitoring system (110).
- the ERM may accept the request to configure the monitoring rule and create a monitoring task for the request.
- the ERM (130) may determine whether the threshold value for the parameter is defined.
- threshold values for performance and operational parameters are set to maintain network health and ensure optimal performance.
- the ERM (130) may send an error to the monitoring system (110).
- the ERM (130) sends the error (e.g., “Configuration Error: No threshold value set for parameter” or “Missing threshold values for parameters”) to the monitoring system (110).
- the ERM (130) may configure the threshold value for the parameter in the list.
- parameters in cellular networks include bandwidth usage, signal strength, latency, CPU usage, and memory usage.
- the ERM (130) determines appropriate threshold values for the parameters based on operational requirements and historical data.
- Threshold -70 dBm. The signal strength below the level (i.e., -70 dBm) starts causing degradation in service.
- threshold 100 milliseconds.
- FIG. 3 illustrates an example flow diagram (300) for performing health monitoring of the network service, in accordance with an embodiment of the present disclosure.
- the ERM (130) may start monitoring of the network service.
- a timer i.e., initialized timer
- the monitoring task is started after the timer has reached the specific duration/interval.
- the ERM (130) may determine whether the timer configured for the specific duration/interval is reached.
- the ERM (130) may keep on determining whether the timer configured for the specific duration/interval is reached or not.
- the ERM (130) may determine the current value for parameters.
- the ERM (130) may determine whether the threshold value is defined.
- the ERM ( 130) determines whether threshold values are defined for the parameters or not.
- the ERM (130) receives current values of parameters of the network service from the source (e.g., the NMS (132) or the 0AM (134)). The ERM (130) compares the current values with the defined threshold values of the parameters of the network service.
- the source e.g., the NMS (132) or the 0AM (134)
- the ERM (130) may log the event in the database (122).
- the ERM (130) determines the current values of the parameters are below the predefined threshold values, the ERM (130) logs the event (e.g., the network service is working properly) in the database (122).
- the ERM ( 130) may raise the alarm/ send notification for corrective action to the monitoring system (110).
- the ERM (130) compares the current values with the threshold values.
- the ERM (130) may repeat the steps from step (304) to step (318).
- the ERM (130) is configured to perform health monitoring of network services when the timer has reached the specified duration/interval.
- FIG. 4 illustrates an exemplary flow diagram of a method (400) for determining the operational condition of the plurality of network services by the ERM (130), in accordance with an embodiment of the present disclosure.
- the method (400) includes initializing, by the ERM (130), a timer associated with at least one network service of the plurality of network services.
- the timer associated with at least one network service is initialized before determining the operational conditions of the network services.
- the timer associated with the authentication network service is initialized.
- the method (400) includes determining, by the ERM (130), whether the initialized timer is equal to a configured timer corresponding to the at least one network service.
- the configured timer is configured for a defined duration corresponding to the at least one network service.
- the defined duration for the configured timer is 2 seconds.
- the ERM (130) determines whether the initialized timer of the network service (i.e., authentication network service) equals the configured timer (e.g., 2 seconds).
- the method (400) includes if the initialized timer is equal to the configured timer, establishing, by the ERM (130), at least one connection with at least one source to fetch information corresponding to the at least one network service.
- the at least one source is the NMS (132) or the 0AM (134).
- the ERM (130) is connected to the NMS (132) with the first interface (e.g., EM_NMS interface), and the ERM is connected to the 0AM (134) with the second interface (e.g., EM_0A interface).
- the initialized timer is equal to the configured timer (i.e.
- the ERM (130) establishes connection with at least one source to fetch information corresponding to the at least one network service.
- the ERM (130) establishes connection with the one source (e.g., the 0AM (134)) to fetch information corresponding to the authentication network service.
- the information corresponding to the authentication network service includes, but is not limited to, response time, service availability, performance metrics (e.g., latency, throughput, error rates), resource utilization (e.g., CPU usage, memory utilization, disk input/output operations), session management (e.g., active sessions, session expiry rates, etc.), security metrics (e.g., successful or failed login attempts), etc.
- the initialized timer is configured to reset after the process execution.
- the method (400) includes executing, by the ERM (130), an extraction process to extract at least one current value corresponding to each of a plurality of parameters associated with the at least one network service from the fetched information.
- the method (400) includes comparing, by the ERM (130), the determined at least one current value with a predefined threshold value corresponding to each of the plurality of parameters associated with the at least one network service.
- the extracted current values of the parameters corresponding to the network service are compared with the threshold values of the parameters.
- the method (400) includes determining, by the ERM (130), the operational condition of the at least one network service based on the comparison.
- the operational condition of the network service is determined based on the comparison between current values and threshold values associated with the network service. In an example, upon comparing current values and threshold values of the authentication network service, the response time, the memory usage, the active sessions are below the threshold values. The error rate, the CPU usage are above the threshold values.
- the method (400) includes triggering, by the ERM (130), an alarm to a monitoring system (110) and logging an event based on the determined operational condition of the at least one network service.
- the ERM (130) triggers the alarm to notify the monitoring system (110).
- the ERM (130) employs one or more corrective actions to bring the at least one current value corresponding to each of the plurality of parameters within a defined range.
- the one or more corrective actions include scaling resources, reallocating workloads, or triggering failover mechanisms. In an example, when the error rate and the CPU usage are above the threshold values for the authentication network services, the alarm is triggered to the monitoring system.
- the monitoring system (110) sends the determined exact cause of the high CPU usage and the high error rate to the ERM (130).
- the ERM (130) employs the corrective actions for the high CPU usage and the high error rate.
- the corrective actions for the high CPU usage include, but are not limited to, resource scaling, asynchronous processing of authentication request, load balancing by distributing traffic across multiple instances of the authentication network service.
- the corrective actions for the high error rate include, but are not limited to, rate limiting and throttling, dependency monitoring, security measures, etc.
- the ERM (130) identifies a plurality of trends and patterns corresponding to each of the plurality of network services.
- the ERM (130) plans at least one of a plurality of proactive maintenance activities based on the identified plurality of trends and patterns.
- the plurality of trends and patterns corresponding to the authentication network service include, but is not limited to, usage patterns (e.g., peak usage time, common login times, devices used, locations, etc.), error rates trends (e.g., invalid credentials, timeouts, time -based trends (i.e., error rates during specific time period), success rates, failure rates, attacks, unusual access pattern, device usage statistics, browser trends, resource usage, response time, etc.
- the ERM (130) plans the proactive maintenance activities include, but are not limited to, capacity planning and scaling (e.g., forecasting demand, autoscaling), performance optimization (e.g., regularly reviewing authentication logic, database optimization), security enhancements (e.g., anomaly detection, rate limiting adjustments, security assessment to identify and address potential risks), infrastructure monitoring (e.g., monitoring tool), regular backs, updates, recovery plans, etc.
- capacity planning and scaling e.g., forecasting demand, autoscaling
- performance optimization e.g., regularly reviewing authentication logic, database optimization
- security enhancements e.g., anomaly detection, rate limiting adjustments, security assessment to identify and address potential risks
- infrastructure monitoring e.g., monitoring tool
- FIG. 5 illustrates an exemplary computer system (500) in which or with which embodiments of the present disclosure may be implemented.
- the computer system may include an external storage device (510), a bus (520), a main memory (530), a read-only memory (540), a mass storage device (550), a communication port(s) (560), and a processor (570).
- the processor (570) may include various modules associated with embodiments of the present disclosure.
- the communication port(s) (560) may be any of an RS-232 port for use with a modem-based dialup connection, a 10/100 Ethernet port, a Gigabit or 10 Gigabit port using copper or fiber, a serial port, a parallel port, or other existing or future ports.
- the communication port(s) (560) may be chosen depending on a network, such a Focal Area Network (LAN), Wide Area Network (WAN), or any network to which the computer system connects.
- LAN Focal Area Network
- WAN Wide Area Network
- the main memory (530) may be random access memory (RAM), or any other dynamic storage device commonly known in the art.
- the read-only memory (540) may be any static storage device(s) e.g., but not limited to, a Programmable Read Only Memory (PROM) chips for storing static information e.g., start-up or Basic Input/Output System (BIOS) instructions for the processor (570).
- the mass storage device (550) may be any current or future mass storage solution, which can be used to store information and/or instructions.
- Exemplary mass storage device (550) includes, but is not limited to, Parallel Advanced Technology Attachment (PATA) or Serial Advanced Technology Attachment (SATA) hard disk drives or solid-state drives (internal or external, e.g., having Universal Serial Bus (USB) and/or Firewire interfaces), one or more optical discs, Redundant Array of Independent Disks (RAID) storage, e.g., an array of disks.
- PATA Parallel Advanced Technology Attachment
- SATA Serial Advanced Technology Attachment
- SSD solid-state drives
- USB Universal Serial Bus
- RAID Redundant Array of Independent Disks
- the bus (520) communicatively couples the processor (570) with the other memory, storage, and communication blocks.
- the bus (520) may be, e.g., a Peripheral Component Interconnect (PCI)/PCI Extended (PCI-X) bus, Small Computer System Interface (SCSI), Universal Serial Bus (USB), or the like, for connecting expansion cards, drives, and other subsystems as well as other buses, such a front side bus (FSB), which connects the processor (570) to the computer system.
- PCI Peripheral Component Interconnect
- PCI-X PCI Extended
- SCSI Small Computer System Interface
- USB Universal Serial Bus
- operator and administrative interfaces e.g., a display, keyboard, joystick, and a cursor control device
- the bus (520) may also be coupled to the bus (520) to support direct operator interaction with the computer system.
- Other operator and administrative interfaces can be provided through network connections connected through the communication port(s) (560).
- Components described above are meant only to exemplify various possibilities. In no way should the aforementioned exemplary computer system limit the scope of the present disclosure.
- the exemplary computer system (500) is configured to execute a computer program product comprising a non-transitory computer-readable medium comprising instructions that, when executed by one or more processors, cause the one or more processors to perform a method for determining an operational condition of a plurality of network services.
- the method comprises initializing, by an event routing manager (ERM), a timer associated with at least one network service of the plurality of network services and determining, by the ERM, whether the initialized timer is equal to a configured timer corresponding to the at least one network service.
- ERP event routing manager
- the present disclosure provides technical advancement related autonomous health monitoring of a plurality of network services by determining operational conditions of the plurality of network services.
- This advancement addresses the limitations of existing solutions by using an Event Routing Manager (ERM) responsible for determining of the operational conditions of the plurality of network services.
- the performance metrics for example, CPU usage, memory utilization, network latency, and error rates, etc.
- the health monitoring enhances the reliability of the network services by regularly determining the health and availability of each service and identifying bottlenecks, performance degradation, or other issues that might impact the overall system.
- proactive maintenance activities are planned, and upgrades or patches are prioritized. This helps in avoiding service disruptions, security vulnerabilities, or performance degradation due to outdated or unsupported components.
- the present disclosure provides a technically advanced solution by providing a system and a method for continuously monitoring the health state (e.g., operational conditions) of network services.
- the vital metrics for example, CPU usage, memory utilization, network latency, and error rates, etc.
- the health monitoring enhances the reliability of the network services by regularly determining the health and availability of each service and identifying bottlenecks, performance degradation, or other issues that might impact the overall system. This helps to take appropriate actions (for example, scaling resources, reallocating workloads, or triggering failover mechanisms, etc.) to ensure continuous service availability.
- the insights are obtained for the performance characteristics of the network services.
- Health monitoring helps in avoiding service disruptions, security vulnerabilities, or performance degradation due to outdated or unsupported components.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Environmental & Geological Engineering (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
L'invention concerne un système (108) et un procédé (400) pour déterminer un état opérationnel d'une pluralité de services réseau par un gestionnaire de routage d'événements (ERM) (130). Le procédé (400) comprend les étapes consistant à initialiser un temporisateur associé à au moins un service réseau et à déterminer si le temporisateur initialisé est égal à un temporisateur configuré. Si le temporisateur initialisé est égal au temporisateur configuré, une connexion est établie entre l'ERM (130) et une source pour aller chercher des informations correspondant au service réseau. Des valeurs actuelles correspondant à une pluralité de paramètres associés au ou aux services réseau sont extraites. Les valeurs actuelles extraites sont comparées à des valeurs seuils prédéfinies de la pluralité de paramètres. L'état opérationnel du ou des services réseau est déterminé sur la base de la comparaison. Une alarme est déclenchée auprès d'un système de surveillance (110) ou un événement est journalisé sur la base de l'état opérationnel déterminé du ou des services réseau.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| IN202321076730 | 2023-11-09 | ||
| IN202321076730 | 2023-11-09 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025099763A1 true WO2025099763A1 (fr) | 2025-05-15 |
Family
ID=95695458
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/IN2024/052203 Pending WO2025099763A1 (fr) | 2023-11-09 | 2024-11-08 | Système et procédé de détermination d'états opérationnels d'une pluralité de services réseau |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025099763A1 (fr) |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112596975A (zh) * | 2020-12-15 | 2021-04-02 | 中国建设银行股份有限公司 | 对网络设备进行监控处理的方法、系统、设备和存储介质 |
| CN114302164A (zh) * | 2021-12-31 | 2022-04-08 | 广州华多网络科技有限公司 | 网络条件检测方法及其装置、设备、介质、产品 |
-
2024
- 2024-11-08 WO PCT/IN2024/052203 patent/WO2025099763A1/fr active Pending
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112596975A (zh) * | 2020-12-15 | 2021-04-02 | 中国建设银行股份有限公司 | 对网络设备进行监控处理的方法、系统、设备和存储介质 |
| CN114302164A (zh) * | 2021-12-31 | 2022-04-08 | 广州华多网络科技有限公司 | 网络条件检测方法及其装置、设备、介质、产品 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10601643B2 (en) | Troubleshooting method and apparatus using key performance indicator information | |
| US11159450B2 (en) | Nonintrusive dynamically-scalable network load generation | |
| KR101513408B1 (ko) | 통신 환경에서의 동적 신뢰도 및 보안 제공 | |
| JP7436737B1 (ja) | マルチベンダーを支援するサーバ管理システム | |
| JP2015510201A (ja) | クラウドネットワークにおける迅速な災害回復準備のための方法および装置 | |
| WO2005069999A2 (fr) | Mesure et classification fiables, automatiques et encastrees d'elements de reseau | |
| US20220052916A1 (en) | Orchestration of Activities of Entities Operating in a Network Cloud | |
| US11012298B2 (en) | Methods, systems, and computer readable mediums for selecting and configuring a computing system to support a replicated application | |
| US20230060758A1 (en) | Orchestration of Activities of Entities Operating in a Network Cloud | |
| CN120343604A (zh) | 网络恢复方法、装置、计算机设备、存储介质及程序产品 | |
| US10122602B1 (en) | Distributed system infrastructure testing | |
| US20220052937A1 (en) | Robust monitoring of it infrastructure performance | |
| WO2025099763A1 (fr) | Système et procédé de détermination d'états opérationnels d'une pluralité de services réseau | |
| EP3252995A1 (fr) | Procédé de détection des défaillances de réseau | |
| KR102885294B1 (ko) | 장애 대응이 가능한 서버 관리 시스템 | |
| WO2019241199A1 (fr) | Système et procédé de maintenance prédictive de dispositifs en réseau | |
| WO2024036043A1 (fr) | Procédé et appareil de commande de dispositifs électroniques | |
| US20250007931A1 (en) | Risk analysis based network and system management | |
| WO2025088631A1 (fr) | Système et procédé de routage de demandes d'événements dans un réseau | |
| Darwish et al. | Towards reliable mobile cloud computing | |
| WO2025094196A1 (fr) | Système et procédé pour interconnectivité entre composants d'un cadre mano | |
| WO2025017620A1 (fr) | Procédé et système de gestion d'anomalies dans un réseau | |
| WO2025012994A2 (fr) | Système et procédé mis en œuvre par processeur pour la maintenance automatisée de bases de données | |
| WO2025088639A1 (fr) | Système et procédé de routage de demandes d'événements dans un réseau | |
| WO2025094197A1 (fr) | Système et procédé pour effectuer des opérations par gestionnaire de routage d'événement |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24888250 Country of ref document: EP Kind code of ref document: A1 |