WO2025094199A1

WO2025094199A1 - System and method for service response latency profiling

Info

Publication number: WO2025094199A1
Application number: PCT/IN2024/052157
Authority: WO
Inventors: Aayush Bhatnagar; Sumit Thakur; Pramod JUNDRE; Ganmesh KOLI; Arun MAURYA; Kuldeep Singh
Original assignee: Jio Platforms Ltd
Current assignee: Jio Platforms Ltd
Priority date: 2023-10-31
Filing date: 2024-10-29
Publication date: 2025-05-08
Anticipated expiration: 2026-04-30

Abstract

A system (108) and method (400) for determining service response latency is described An execution time or latency of a service request is calculated based on the start time and end time of the operation of the service request. The calculated execution time is plotted in one of bucket of a histogram of the service request. Upon determining that the histogram is not initialized for the service request, the histogram for a specified bucket is initialized. Upon determining that exact bucket is not found in the histogram, finding a value rounded up to the bucket. When the exact bucket is found, a bucket index value is incremented for operation count. The calculated latencies for each similar service request are plotted in a corresponding histogram over time to monitor the response times during handling of the service requests. The histogram provides a recent latency and a total latency for the service requests.

Description

SYSTEM AND METHOD FOR SERVICE RESPONSE LATENCY

PROFILING

RESERVATION OF RIGHTS

[0001] A portion of the disclosure of this patent document contains material, which is subject to intellectual property rights such as, but are not limited to, copyright, design, trademark, Integrated Circuit (IC) layout design, and/or trade dress protection, belonging to Jio Platforms Limited (JPL) or its affiliates (herein after referred as owner). The owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all rights whatsoever. All rights to such intellectual property are fully reserved by the owner.

TECHNICAL FIELD

[0002] The present disclosure relates generally to the field of communication systems. More particularly, the present disclosure relates to a system and a method for service response latency profiling.

DEFINITION

[0003] As used in the present disclosure, the following terms are generally intended to have the meaning as set forth below, except to the extent that the context in which they are used to indicate otherwise.

[0004] The term “Application Programming Interface” as used herein, refers to a set of rules and protocols that allows one software application to communicate and interact with another. The APIs define the methods and data formats that applications can use to request and exchange information with each other, typically over a network. [0005] The term “Event Routing Manager (ERM)” as used herein, refers to a component or system responsible for managing and directing network events, notifications, or messages within the network. This includes routing events to appropriate network devices, applications, or systems based on predefined rules, policies, or conditions.

[0006] The term “Client” as used herein, refers to a device or software application that initiates communication with a server or service. Clients are fundamental components in client-server architecture, where they interact with servers to request services, resources, or data.

[0007] The term “Subscriber service” as used herein, refers to a service model where users or entities subscribe to access specific resources, features, or content provided by a service provider. This term is commonly used in telecommunications, media, software, and various other industries where services are offered on a subscription basis.

[0008] The term “Network performance analysis” as used herein, refers to the process of evaluating and assessing the operational efficiency, reliability, and overall effectiveness of a computer network. The network performance analysis involves examining various metrics, parameters, and behaviors of the network to understand its performance characteristics and identify areas for improvement. The network performance analysis is to ensure that the network meets its intended operational requirements, such as throughput, latency, packet loss, bandwidth utilization, and availability, among others. The network performance analysis typically involves using specialized tools and techniques to monitor, measure, and interpret data related to network performance, enabling network administrators and engineers to optimize network resources, troubleshoot issues, and enhance the overall performance and user experience.

[0009] The term “Network latency” as used herein, refers to the time delay that occurs when data packets travel from one point to another in a network. The network latency is typically measured in milliseconds (ms) and represents the total time it takes for a data packet to be sent from the source device to the destination device and for the acknowledgment of that packet to be received back at the source.

[0010] The term “Response time” as used herein, refers to the total time it takes for a system to respond to a user's request or action.

[0011] The term “Microservices” as used herein, refers to a software architectural style that structures an application as a collection of loosely coupled services.

[0012] The term “Networking delay” as used herein, refers to a network latency, refers to the time it takes for data packets to travel from one point in a network to another.

[0013] The term “Network bottleneck” as used herein, refers to a point in a network where the flow of data is constrained or limited, causing slower transmission speeds and potentially impacting overall network performance. Bottlenecks can occur due to various factors and can significantly degrade network efficiency and user experience. The factors causing the network bottleneck comprise network congestion, bandwidth limitations, hardware limitations, software limitations and single point of failure, etc.

[0014] The term “Capacity planning” as used herein, refers to forecasting and preparing for the future network requirements to ensure that the network infrastructure can support current and future demands efficiently. It's a proactive approach to scaling and optimizing network resources to meet performance goals while minimizing costs and downtime.

[0015] The term “Average latency” as used herein, refers to the average time it takes for data packets to travel from a source to a destination and back again. The average latency is calculated by taking multiple latency measurements over a period of time and computing the arithmetic mean (average) of those measurements. The average latency provides a general indication of the typical delay experienced in transmitting data across the network. [0016] The term “Peak latency” as used herein, refers to a maximum latency or delay experienced by data packets during periods of high network usage or under specific conditions. Unlike average latency, which represents the typical delay, the peak latency highlights the worst-case scenario where latency spikes occur beyond normal levels.

[0017] The term “Resource allocation” as used herein, refers to strategic distribution and management of network resources such as bandwidth, computing power, storage, and network devices to optimize performance, efficiency, and reliability.

[0018] The term “Histogram” as used herein, refers to a graphical representation of the distribution of numerical data. It consists of a series of contiguous bars, where the height of each bar represents the frequency or count of data points falling into specific intervals. The histograms are often used to visualize and analyze various performance metrics and characteristics of network data.

[0019] The term “Histogram bucket”, as used herein, refers to a specific range or interval into which data points are grouped. The histogram bucket is a segment of a histogram that represents a specific range of values, used to visualize the distribution of data across those ranges. Each bucket's size and placement are determined based on the data being analyzed and the insights to gain from the histogram.

[0020] The term “Offset value in histogram” as used herein, refers to the starting point of the range it represents along the axis of the histogram.

[0021] The term “Index value in histogram” as used herein, refers to either the position of individual data points within a dataset or the position of bins (buckets) along the x-axis of the histogram.

[0022] The term “operation type” as used herein, refers to the categorization or classification of fundamental operations performed by algorithms or processes.

[0023] The term “operation count" as used herein, refers to the number of fundamental operations or computational steps performed by an algorithm or process. [0024] The term “API performance latency metrics” as used herein, refers to the measurements used to assess the responsiveness and efficiency of APIs in terms of the time it takes to process requests and return responses. The API performance latency metrics comprises response time, latency, throughput, error rates, etc.

[0025] The term “Operation start time” as used herein, refers to the timestamp or moment when a specific operation or task begins its execution within a system or an application.

[0026] The term “Operation end time” as used herein, refers to the timestamp or moment when a specific operation, task, or process completes its execution within a system or application.

[0027] The term “Execution time” as used herein, refers to the amount of time taken by a specific process, task, or operation to complete its execution.

[0028] The term “Histogram bucket” as used herein, refers to a specific range or interval into which data points are grouped for the purpose of creating the histogram.

[0029] The term "value rounded up" as used herein, refers to the rounding of data points displayed on the y-axis of the histogram.

[0030] The term “API call” as used herein, refers to a request made by one software application to another through the API.

BACKGROUND

[0031] The following description of related art is intended to provide background information pertaining to the field of the disclosure. This section may include certain aspects of the art that may be related to various features of the present disclosure. However, it should be appreciated that this section be used only to enhance the understanding of the reader with respect to the present disclosure, and not as admissions of prior art. [0032] In a distributed system, it becomes challenging to pinpoint the exact microservice causing performance issues. This is because multiple microservices/application programs concurrently run on multiple nodes of the system. High response times may be indicative of potential service failures or network issues. Without monitoring, such problems might go unnoticed until they escalate and cause service outages or significant performance degradation, increasing downtime risk for the system. Also, slow response times and high latency negatively impact the user experience. Further, application programming interface (API) latency monitoring data is essential for capacity planning. Without it, it might be difficult to obtain accurate insights into resource usage, making it harder to scale microservices effectively. Overall, these problems impede performance, scalability, and reliability, making it essential to identify and resolve issues promptly for a better user experience, and to make informed decisions regarding capacity planning and resource allocation.

[0033] In a distributed system, identifying the precise microservice responsible for performance issues is challenging due to the concurrent operation of multiple microservices across various system nodes. High response times may signal potential service failures or network problems. Without monitoring, these issues could remain undetected until they escalate, leading to service outages or significant performance deterioration, thereby increasing system downtime risk. Moreover, slow response times and high latency adversely affect user experience. Additionally, monitoring application programming interface (API) latency data is crucial for effective capacity planning. Absence of such data complicates obtaining accurate insights into resource utilization, hindering effective microservice scaling. These challenges collectively obstruct performance, scalability, and reliability. It is essential to identify and resolve issues promptly for a better user experience and make well- informed decisions on capacity planning and resource allocation.

OBJECTS

[0034] Some of the objects of the present disclosure, which at least one embodiment herein satisfies, are as follows: [0035] An object of the present disclosure is to provide a system and a method for service response latency profiling.

[0036] Another object of the present disclosure is to perform service response latency profiling for a single API, or a group of APIs based on API performance latency metrics.

[0037] Another object of the present disclosure is to identify operations that cause delay by measuring and analyzing response times of individual microservices.

[0038] An object of the present disclosure is to identify potential bottlenecks in the system.

[0039] Yet another object of the present disclosure is to monitor latency to proactively address performance issues before users are impacted, thereby ensuring a smooth and responsive application experience.

[0040] Yet another object of the present disclosure is to plan resource allocation based on latency monitoring data.

[0041] Other objects and advantages of the present disclosure will be more apparent from the following description, which is not intended to limit the scope of the present disclosure.

SUMMARY

[0042] In an exemplary embodiment, a method for determining service response latency in a network, the method includes receiving a service request from a client, determining details corresponding to the received service request, where the details comprise an operation type, an operation start time and an operation end time , calculating a latency value or an execution time based on the determined operation start time and the determined operation end time, determining whether a histogram is initiated for the determined operation type of the received service request, on determining the histogram is initiated for the determined operation type of the received service request, determining at least one bucket from the plurality of buckets for the calculated execution time of the received service request, determining whether the calculated execution time of the received service request matches with at least one offset value of the determined at least one bucket, on detecting the calculated execution time of the received service request matches with the at least one offset value of the determined at least one bucket, incrementing an index value corresponding to the at least one offset value, and recording a value corresponding to the calculated execution time of the received service request in the histogram, where on detecting the calculated execution time of the received service request does not match with the at least one offset value of the determined at least one bucket, finding a value rounded up in a nearest bucket.

[0043] In some embodiments, the method further comprises on determining that the histogram is not initiated for the determined operation type of the received service request, calculating the histogram for the determined operation type of the received service request and recording a value corresponding to the calculated execution time of the received service request in the calculated histogram.

[0044] In some embodiments, the method further comprises determining a total latency and a recent latency for the determined operation type of the received service request based on the histogram of the determined operation type of the received service request, where the total latency is summation of all latency values of the plurality of service requests of the same operation type and the recent latency is configurable based on occurrence of latencies for latest number of counts.

[0045] In some embodiments, the method further comprises maintaining a plurality of buckets for a plurality of operation types in the histogram having offset values and index values, where the offset values of the plurality of buckets in the histogram refer to a plurality of ranges of the execution time of the plurality of operation types, and the index values of the plurality of buckets in the histogram refer to a number of operations having at least one same offset value from the offset values.

[0046] In some embodiments, the method further comprises maintaining an operation count for the plurality of service requests of the plurality of operation types, where on receiving the service request for a same operation type, incrementing the operation count.

[0047] In some embodiments, the latency value is calculated for a plurality of microservices of a single application programming interface (API) or a plurality of API of the same operation type.

[0048] In some embodiments, the database is configured to store a plurality of operation types, the operation start time, the operation end time, the operation count, the calculated latency value, the last latency value, the offset values, and the index values of the plurality of buckets.

[0049] In another exemplary embodiment, a system for determining service response latency, the system includes a receiving unit configured to receive a service request from a client, a determining unit configured to determine details corresponding to the received service request, where the details comprise an operation type, an operation start time and an operation end time, a calculating unit configured to calculate a latency value, or an execution time based on the determined operation start time and the determined operation end time, the determining unit configured to determine whether a histogram is initiated for the determined operation type of the received service request, on determining the histogram is initiated for the determined operation type of the received service request, the determining unit is configured to determine at least one bucket from the plurality of buckets for the calculated execution time of the received service request, the determining unit is configured to determine whether the calculated execution time of the received service request matches with at least one offset value of the determined at least one bucket, on detecting the calculated execution time of the received service request matches with the at least one offset value of the determined at least one bucket, a processing unit is configured to increment an index value corresponding to the at least one offset value, and record a value corresponding to the calculated execution time of the received service request in the histogram, where on detecting the calculated execution time of the received service request does not match with the at least one offset value of the determined at least one bucket, the processing unit is configured to find a value rounded up in a nearest bucket.

[0050] In some embodiments, on determining that the histogram is not initiated for the determined operation type of the received service request, the calculating unit is configured to calculate the histogram for the determined operation type of the received service request. The processing unit is configured to record a value corresponding to the calculated execution time of the received service request in the calculated histogram.

[0051] In some embodiments, the determining unit is configured to determine a total latency and a recent latency for the determined operation type of the received service request based on the histogram of the determined operation type of the received service request. The total latency is summation of all latency values of the plurality of service requests of the same operation type and the recent latency is configurable based on occurrence of latencies for latest number of counts.

[0052] In some embodiments, the processing unit is configured to maintain a plurality of buckets for a plurality of operation types in the histogram having offset values and index values. The offset values of the plurality of buckets in the histogram refer to a plurality of ranges of the execution time of the plurality of operation types, and the index values of the plurality of buckets in the histogram refer to a number of operations having at least one same offset value from the offset values.

[0053] In some embodiments, the processing unit is configured to maintain an operation count for the plurality of service requests of the plurality of operation types. On receiving the service request for a same operation type, the processing unit is configured to increment the operation count.

[0054] In some embodiments, the latency value is calculated for a plurality of microservices of a single application programming interface (API) or a plurality of APIs of the same operation type. [0055] In some embodiments, the database is configured to store a plurality of operation types, the operation start time, the operation end time, the operation count, the plurality of buckets, the calculated latency value, the last latency value, the offset values, the total latency, the recent latency, the index values of the plurality of buckets and the histogram of the plurality of operation types.

[0056] In yet another exemplary embodiment, a user equipment is communicatively coupled with a system. The coupling includes steps of receiving, by the system, a connection request, sending, by the system, an acknowledgment of the connection request to the user equipment, and transmitting a plurality of signals in response to the connection request, where the system is configured for determining service response latency in a network.

[0057] In yet another exemplary embodiment, the present invention discloses a computer program product comprising a non-transitory computer-readable medium comprising instructions that, when executed by one or more processors, cause the one or more processors to perform a method determining service response latency in a network, the method includes receiving a service request from a client, determining details corresponding to the received service request, where the details comprise an operation type, an operation start time and an operation end time , calculating a latency value or an execution time based on the determined operation start time and the determined operation end time, determining whether a histogram is initiated for the determined operation type of the received service request, on determining the histogram is initiated for the determined operation type of the received service request, determining at least one bucket from the plurality of buckets for the calculated execution time of the received service request, determining whether the calculated execution time of the received service request matches with at least one offset value of the determined at least one bucket, on detecting the calculated execution time of the received service request matches with the at least one offset value of the determined at least one bucket, incrementing an index value corresponding to the at least one offset value, and recording a value corresponding to the calculated execution time of the received service request in the histogram, where on detecting the calculated execution time of the received service request does not match with the at least one offset value of the determined at least one bucket, finding a value rounded up in a nearest bucket.

[0058] The foregoing general description of the illustrative embodiments and the following detailed description thereof are merely exemplary aspects of the teachings of this disclosure, and are not restrictive.

BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWING

[0059] The accompanying drawings, which are incorporated herein, and constitute a part of this disclosure, illustrate exemplary embodiments of the disclosed methods and systems in which like reference numerals refer to the same parts throughout the different drawings. Components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Some drawings may indicate the components using block diagrams and may not represent the internal circuitry of each component. It will be appreciated by those skilled in the art that disclosure of such drawings includes disclosure of electrical components, electronic components or circuitry commonly used to implement such components.

[0060] FIG. 1 illustrates an exemplary network architecture for implementing a system, in accordance with an embodiment of the present disclosure.

[0061] FIG. 2A illustrates an exemplary block diagram of the system, in accordance with an embodiment of the present disclosure.

[0062] FIG. 2B illustrates an exemplary system architecture for implementing service response latency profiling, in accordance with an embodiment of the present disclosure.

[0063] FIG. 3 illustrates an exemplary flow diagram for service response latency profiling, in accordance with an embodiment of the present disclosure. [0064] FIG. 4 illustrates an exemplary flow diagram of a method for determining service response latency in a network, in accordance with an embodiment of the present disclosure.

[0065] FIG. 5 illustrates an exemplary computer system in which or with which embodiments of the present disclosure may be implemented.

[0066] The foregoing shall be more apparent from the following more detailed description of the disclosure.

LIST OF REFERENCE NUMERALS

100 Network architecture

102 User

104 Computing device/User Equipment

106 Network

108 System

200A Block Diagram

200B System Architecture

202 Processor

204 Memory

206 Interface(s)

208 Processing Engine(s)

210 Database

212 Client

214 Event routing manager (ERM)

216 Subscriber Service

222 Receiving Unit

224 Determining Unit

226 Calculating Unit

228 Processing Unit

300 Flow Diagram

400 Method Flow

500 Computing System 510 External Storage Device

520 Bus

530 Main Memory

540 Read Only Memory

550 Mass Storage Device

560 Communication Port

570 Processor

DETAILED DESCRIPTION

[0067] In the following description, for the purposes of explanation, various specific details are set forth in order to provide a thorough understanding of embodiments of the present disclosure. It will be apparent, however, that embodiments of the present disclosure may be practiced without these specific details. Several features described hereafter can each be used independently of one another or with any combination of other features. An individual feature may not address any of the problems discussed above or might address only some of the problems discussed above. Some of the problems discussed above might not be fully addressed by any of the features described herein. Example embodiments of the present disclosure are described below, as illustrated in various drawings in which like reference numerals refer to the same parts throughout the different drawings.

[0068] The ensuing description provides exemplary embodiments only, and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the ensuing description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing an exemplary embodiment. It should be understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope of the disclosure as set forth.

[0069] Specific details are given in the following description to provide a thorough understanding of the embodiments. However, it will be understood by one of ordinary skill in the art that the embodiments may be practiced without these specific details. For example, circuits, systems, networks, processes, and other components may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail. In other instances, well-known circuits, processes, algorithms, structures, and techniques may be shown without unnecessary detail in order to avoid obscuring the embodiments.

[0070] Also, it is noted that individual embodiments may be described as a process that is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process is terminated when its operations are completed but could have additional steps not included in a figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination can correspond to a return of the function to the calling function or the main function.

[0071] The word “exemplary” and/or “demonstrative” is used herein to mean serving as an example, instance, or illustration. For the avoidance of doubt, the subject matter disclosed herein is not limited by such examples. In addition, any aspect or design described herein as “exemplary” and/or “demonstrative” is not necessarily to be construed as preferred or advantageous over other aspects or designs, nor is it meant to preclude equivalent exemplary structures and techniques known to those of ordinary skill in the art. Furthermore, to the extent that the terms “includes,” “has,” “contains,” and other similar words are used in either the detailed description or the claims, such terms are intended to be inclusive like the term “comprising” as an open transition word without precluding any additional or other elements.

[0072] Reference throughout this specification to “one embodiment” or “an embodiment” or “an instance” or “one instance” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

[0073] The terminology used herein is to describe particular embodiments only and is not intended to be limiting the disclosure. As used herein, the singular forms “a”, “an”, and “the” are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term “and/or” includes any combinations of one or more of the associated listed items. It should be noted that the terms “mobile device”, “user equipment”, “user device”, “communication device”, “device” and similar terms are used interchangeably for the purpose of describing the invention. These terms are not intended to limit the scope of the invention or imply any specific functionality or limitations on the described embodiments. The use of these terms is solely for convenience and clarity of description. The invention is not limited to any particular type of device or equipment, and it should be understood that other equivalent terms or variations thereof may be used interchangeably without departing from the scope of the invention as defined herein.

[0074] While considerable emphasis has been placed herein on the components and component parts of the preferred embodiments, it will be appreciated that many embodiments can be made and that many changes can be made in the preferred embodiments without departing from the principles of the disclosure. These and other changes in the preferred embodiment as well as other embodiments of the disclosure will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter is to be interpreted merely as illustrative of the disclosure and not as a limitation.

[0075] In a distributed system, it becomes challenging to pinpoint the exact microservice causing performance issues. This is because multiple microservices/application programs concurrently run on multiple nodes of the system. High response times may be indicative of potential service failures or network issues. Without monitoring, such problems might go unnoticed until they escalate and cause service outages or significant performance degradation, increasing downtime risk for the system. Also, slow response times and high latency negatively impact the user experience. Further, application programming interface (API) latency monitoring data is essential for capacity planning. Without it, it might be difficult to obtain accurate insights into resource usage, making it harder to scale microservices effectively. Overall, these problems impede performance, scalability, and reliability, making it essential to identify and resolve issues promptly for a better user experience, and to make informed decisions regarding capacity planning and resource allocation.

[0076] Accordingly, there is a need for systems and methods for service response latency profiling.

[0077] The present disclosure aims to overcome the above-mentioned and other existing problems in this field of technology by providing a system and a method for service response latency profiling.

[0078] Hereinafter, exemplary embodiments of the present disclosure will be described with reference to the accompanying drawings.

[0079] FIG. 1 illustrates an example network architecture (100) for implementing a system (108), in accordance with an embodiment of the present disclosure.

[0080] As illustrated in FIG. 1, the network architecture (100) comprises one or more computing devices (104-1, 104-2... 104-N) that may be connected to the system (108) through a network (106). A person of ordinary skill in the art will understand that the one or more computing devices (104-1, 104-2... 104-N) may be collectively referred as computing devices (104) and individually referred as a computing device (104). One or more users (102-1, 102-2... 102-N) may provide one or more requests to the system (108). A person of ordinary skill in the art will understand that the one or more users (102-1, 102-2... 102-N) may be collectively referred as users (102) and individually referred as a user (102). Further, the computing devices (104) may also be referred as a user equipment (UE) (104) or as UEs (104) throughout the disclosure.

[0081] In an embodiment, the computing device (104) may include, but not be limited to, a mobile, a laptop, etc. Further, the computing device (104) may include one or more in-built or externally coupled accessories including, but not limited to, a visual aid device such as a camera, audio aid, microphone, or keyboard. Furthermore, the computing device (104) may include a mobile phone, smartphone, virtual reality (VR) devices, augmented reality (AR) devices, a laptop, a general- purpose computer, a desktop, a personal digital assistant, a tablet computer, and a mainframe computer. Additionally, input devices for receiving input from the user (102) such as a touchpad, touch-enabled screen, electronic pen, and the like may be used.

[0082] In an embodiment, the network (106) may include at least one of a Fifth Generation (5G) network, Sixth Generation (6G) network, or the like. The network (106) may enable the user equipment (104) to communicate with other devices in the network architecture (100) and/or with the system (108). The network (106) may include a wireless card or some other transceiver connection to facilitate this communication. In another embodiment, the network (106) may be implemented as, or include any of a variety of different communication technologies such as a wide area network (WAN), a local area network (LAN), a wireless network, a mobile network, a Virtual Private Network (VPN), the Internet, the Public-Switched Telephone Network (PSTN), or the like.

[0083] In an embodiment, the network (106) may include, by way of example but not limitation, at least a portion of one or more networks having one or more nodes that transmit, receive, forward, generate, buffer, store, route, switch, process, or a combination thereof, etc. one or more messages, packets, signals, waves, voltage or current levels, some combination thereof, or so forth. The network (106) may also include, by way of example but not limitation, one or more of a wireless network, a wired network, an internet, an intranet, a public network, a private network, a packet- switched network, a circuit-switched network, an ad hoc network, an infrastructure network, a Public-Switched Telephone Network (PSTN), a cable network, a cellular network, a satellite network, a fiber optic network, or some combination thereof.

[0084] In an embodiment, the user equipment (104) is communicatively coupled with the system (108). The system (108) may receive a connection request from the UE (104). The system (108) may send an acknowledgment of the connection request to the UE (104). The UE (104) may transmit a plurality of signals in response to the connection request. The system (108) may be configured for determining service response latency in the network (106).

[0085] Although FIG. 1 shows exemplary components of the network architecture (100), in other embodiments, the network architecture (100) may include fewer components, different components, differently arranged components, or additional functional components than depicted in FIG. 1. Additionally, or alternatively, one or more components of the network architecture (100) may perform functions described as being performed by one or more other components of the network architecture (100).

[0086] FIG. 2A illustrates an exemplary block diagram (200A) of the system (108), in accordance with an embodiment of the present disclosure.

[0087] Referring to FIG. 2A, in an embodiment, the system (108) may include one or more processor(s) (202). The one or more processor(s) (202) may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, logic circuitries, and/or any devices that process data based on operational instructions. Among other capabilities, the one or more processor(s) (202) may be configured to fetch and execute computer- readable instructions stored in a memory (204) of the system (108). The memory (204) may be configured to store one or more computer-readable instructions or routines in a non-transitory computer readable storage medium, which may be fetched and executed to create or share data packets over a network service. The memory (204) may comprise any non-transitory storage device including, for example, volatile memory such as random-access memory (RAM), or non-volatile memory such as erasable programmable read only memory (EPROM), flash memory, and the like.

[0088] In an embodiment, the system (108) may include an interface(s) (206). The interface(s) (206) may comprise a variety of interfaces, for example, interfaces for data input and output devices (I/O), storage devices, and the like. The interface(s) (206) may facilitate communication through the system (108). The interface(s) (206) may also provide a communication pathway for one or more components of the system (108). Examples of such components include, but are not limited to, processing engine(s) (208) and a database (210). Further, the processing engine(s) (208) may include one or more engine(s) such as, but not limited to, an input/output engine, an identification engine and an optimization engine.

[0089] In an embodiment, the processing engine(s) (208) may be implemented as a combination of hardware and programming (for example, programmable instructions) to implement one or more functionalities of the processing engine(s) (208). In examples described herein, such combinations of hardware and programming may be implemented in several different ways. For example, the programming for the processing engine(s) (208) may be processorexecutable instructions stored on a non-transitory machine -readable storage medium and the hardware for the processing engine(s) (208) may comprise a processing resource (for example, one or more processors), to execute such instructions. In the present examples, the machine-readable storage medium may store instructions that, when executed by the processing resource, implement the processing engine(s) (208). In such examples, the system may comprise the machine-readable storage medium storing the instructions and the processing resource to execute the instructions, or the machine-readable storage medium may be separate but accessible to the system and the processing resource. In other examples, the processing engine(s) (208) may be implemented by electronic circuitry.

[0090] In an embodiment, the database (210) includes data that may be either stored or generated as a result of functionalities implemented by any of the components of the processor (202) or the processing engines (208). In an embodiment, the database (210) may be separate from the system (108). In an embodiment, the database (210) may be indicative of including, but not limited to, a relational database, a distributed database, a cloud-based database, or the like.

[0091] In an embodiment, the processing engine(s) (208) may comprise a plurality of units. The plurality of units of the system (108) may include, but not limited to, a receiving unit (222), a determining unit (224), a calculating unit (226) and a processing unit (228). In an aspect, the processing engine may be implemented in an event routing manager (ERM). In an aspect, the system (108) may include the ERM.

[0092] The receiving unit (222) is configured to receive a service request from a client. In an aspect, the service request refers to commands that users or systems can initiate to access or manage services. For example, the service requests in communication network comprises, but not limited to, activation requests, deactivation requests, modification requests, support requests, service inquiry requests, network management requests, emergency service requests, etc. The activation requests may comprise new service activation, feature activation, etc. The deactivation requests may comprise service cancellation, feature deactivation, etc. The modification requests may comprise plan change, number porting, etc. The support requests may comprise technical support, billing inquiries, etc. The service inquiry requests may comprise service availability, network coverage check, etc. The network management requests may comprise performance monitoring, network configuration changes, etc. The emergency service requests may comprise emergency service activations, network issue reporting, etc. [0093] The determining unit (224) is configured to determine details corresponding to the received service request. The details comprise an operation type, an operation start time and an operation end time. In an aspect, the operation type in the service request refers to the specific action or function that a user or a system is requesting from a service provider. In an aspect, for the received service request corresponding to database related requests, the operation type includes data creation, data addition, data delete, data validation, termination, data reading, data writing, etc. For example, in a provisioning service, the operation type includes activating a new mobile line for a customer. This involves setting up a new service, assigning a phone number, and configuring the necessary settings. In a modification service, the operation type includes changing a customer's service plan. This involves upgrading or downgrading the existing plan based on customer request, adjusting features and pricing accordingly. In an aspect, the operation start time in service requests refers to the specific moment when a requested service action or operation begins. The operation start time is used for tracking, managing, and reporting the progress of service requests. For example, if a customer requests to activate a new mobile line at 10:00 AM, the operation start time may be logged as 10:00 AM, marking the beginning of the provisioning process. In an aspect, operation end time in service requests refers to the specific moment when a requested service action or operation is completed. The operation end time is important in managing service requests effectively, aiding in performance tracking and customer satisfaction. For example, if a customer requests to activate a new mobile line at 10:00 AM and the activation is completed at 10: 15 AM, the operation end time would be logged as 10:15 AM, indicating the duration of the request.

[0094] The calculating unit (226) is configured to calculate a latency value, or an execution time based on the operation start time and the operation end time. In an aspect, the latency refers to the time delay experienced in processing service requests from initiation to completion. The latency is a performance metric that affects the overall user experience and system efficiency. For example, if a customer submits a request to activate a new mobile line, and the time from submission to completion is 200 ms. The latency is 200 ms for the service request. In an aspect, the execution time refers to the total time taken to complete a service request, from when it is initiated until the requested action is fully processed and confirmed. For example, if a customer requests to activate a new mobile line at 10:00 AM and receives confirmation that the service is active at 10:05 AM, the execution time for that service request may be 5 minutes.

[0095] The processing unit (228) is configured to maintain an operation count for the plurality of service requests of the plurality of operation type. On receiving the service request for same operation type, the processing unit (228) is configured to increment the operation count. In an aspect, the operation count in service requests refers to the number of individual actions or operations performed during the processing of the service request. The operation count can be recorded as part of request tracking and used in performance analytics. For example, if a service provider processes 100 service activation requests in a day, and the operation "assign phone number" is performed for each request, the operation count for that specific action is 100.

[0096] The determining unit (224) is configured to determining whether a histogram is initiated for the determined operation type of the received service request. On determining that histogram is not initiated for the determined operation type of the received service request, the calculating unit is configured to calculate the histogram for the determined operation type of the received service request for the calculated latency value or execution time. In the histogram, a plurality of buckets is maintained for at least one operation type having offset values and index values in the database (210). The offset values of the plurality of buckets in the histogram refers to a plurality of ranges of the execution time of the at least one operation type. The index values of the plurality of buckets in the histogram refers to a number of operations having at least one same offset value from the plurality of offset values. For example, for the service activation request, the bucket for execution time is 0-5 ms. The offset values of the bucket (0.0-1.0) ms, (1.1-2.0) ms, (2.1-3.0) ms, (3.1-4.0) ms, (4.1-5.0) ms. For offset value = 0.0-1.0 ms, number of service activation having offset value are 25. So, the index values for the offset value 0-1 ms is 25. Similarly, for offset value = 1.1-2.0 ms, the index value is 40. For offset value = 2.1-3.0 ms, the index value is 30. For offset value = 3.1-4.0 ms, the index value is 15. For offset value = 4.1-5.0 ms, the index value is 5.

[0097] Furthermore, on determining histogram is initiated for the determined operation type of the received service request, the determining unit (224) is configured to determine at least one bucket from the plurality of buckets for the calculated execution time of the received service request. For example, for the service activation request, the buckets for execution time are (0.0-5.0) ms, (5.1-10.0) ms, (10.1-15.0) ms, (15.1-20.0) ms. The operation start time of the received service request is 10:00:00.000000 AM and the operation end time is 10:00:00.002400 AM. The execution time of the received service request is the operation end time - the operation start time (i.e., 10:00:00.002400 - 10:00:00.000000) = 2.4 ms. So, the bucket for the received service request is 0.0-5.0 ms.

[0098] The determining unit (224) is configured to determine whether the calculated execution time of the received service request matches with at least one offset value of the determined at least one bucket. On detecting the calculated execution time of the received service request matches with the at least one offset value of the determined at least one bucket, the processing unit (228) is configured to increment the index value corresponding to the at least one offset value. For example, for the service activation request, the bucket for execution time is 0-5 ms. The execution time of the received service request is 2.4 ms. So, the offset value of the bucket for the received service request is 2.1-3.0 ms. For offset value = 2.1-3.0 ms, the index value is 30. After receiving the service activation request, the index value for offset value 2.1-3.0 ms is increased to 31.

[0099] On detecting the calculated execution time of the received service request does not match with the offset value of the determined at least one bucket, the processing unit (228) is configured to find a value rounded up in a nearest bucket. The processing unit (228) is configured to increment the index value of corresponding offset value of the nearest bucket. For example, for the service activation request, the execution time of the received service request is 5.099 ms. So, the value of the bucket for the received service request is rounded to 5.1 ms. The nearest bucket for the received service request is 5.1-10.0 ms. The index value for the offset value 5.1-6.0 ms of the bucket 5.1-10.0 ms is increased.

[00100] The determining unit (224) is configured to determine a total latency and a recent latency based on the histogram of the determined operation type of the received service request. The total latency is summation of all latency values of the plurality of service requests of the same operation type. For example, to determine the total latency, the latencies for the service activation request are 2.1, 2.4, 3.5, 4.9, 1.9, 3.3, 0.7, so the total latency for service activation request is (2.1 + 2.4 + 3.5 + 4.9 + 1.9 + 3.3 + 0.7) = 18.8. So, total latency = 18.8. Further, the recent latency is configurable based on occurrence of latencies for latest number of counts. For example, the recent latency may be last three counts or last five counts. Using above example, the recent latency for last three counts can be 1.9, 3.3, 0.7.

[00101] Furthermore, the latency value is calculated for a plurality of microservices of a single application programming interface (API) or a plurality of API of same operation type. In an aspect, the single API in the microservices represents one specific service or functionality. Each microservice is designed to perform a distinct task, such as user authentication, payment processing, or data retrieval. For example, a user service API may handle user registration and login. In an aspect, a group of APIs in the microservices provides a range of functionalities to support various services. For example, the group of APIs in a telecom application comprises a user management API, a billing API, a call management API, a messaging API, a network management API, and a service plan management. The user management API handles user registration, authentication and profile management. The billing management API manages billing cycles and payment processing. The call management API is for controlling call setup, termination and monitoring. The messaging API facilitates sending and receiving of messages. The network management API monitors network performance, managing configurations and detecting network outages. The service plan API is for managing different service plans and subscription details for users.

[00102] In an aspect, the database (210) is configured to store a plurality of operation types, the operation start time, the operation end time, the operation count, the plurality of buckets, the calculated latency value, the last latency value, the offset values, the total latency, the recent latency, the index values of the plurality of buckets and the histogram of the plurality of operation types.

[00103] Although FIG. 2A shows exemplary components of the system (108), in other embodiments, the system (108) may include fewer components, different components, differently arranged components, or additional functional components than depicted in FIG. 2A. Additionally, or alternatively, one or more components of the system (108) may perform functions described as being performed by one or more other components of the system (108).

[00104] FIG. 2B illustrates an exemplary system architecture (200B) for implementing service response latency profiling, in accordance with an embodiment of the present disclosure.

[00105] As shown in FIG. 2B, the system architecture (200B) includes a client/service (212), an event routing manager (ERM) (214), a subscriber service (216) and the database (210). In an aspect, the ERM (214) may be implemented in the system (108).

[00106] The client/service (212), the ERM (214) and the subscriber service (216) may be in communication with each other. The ERM (214) may be in communication with the database (210) to store monitored values of an operation. The database (210) may include histogram bucket offset values and index values. The database (210) may also store operation count and last operation count. Furthermore, the database (210) stores total latency and last latency. In an aspect, the database (210) is configured to store a plurality of operation types, the operation start time, the operation end time, the operation count, the plurality of buckets, the calculated latency value, the last latency value, the offset values, the total latency, the recent latency, the index values of the plurality of buckets and the histogram of the plurality of operation types.

[00107] In an aspect, the client/service (212) may send a service request to the ERM (214). The ERM (214) may route the service request to the subscriber service (216). At the same time, the ERM (214) may monitor the number of requests for service response latency profiling. The ERM (214) may determine operation metrices such as type of service operation, whether the histogram offset is available for service operation, operation start time, operation end time, operation count, etc. The ERM (214) also determines response time, latency based on the operation start time and operation end time. The ERM (214) further calculates the total latency and the last latency based on determined operation metrices. In an aspect, the service response latency profiling measures and analyzes the time delays (latencies) experienced while executing operations corresponding to the service requests. The service latency profiling helps in identifying bottlenecks and performance issues and taking corrective measures. According to the aspect of the present invention, API performance latency metrics for a single API or a group of APIs of same operation are monitored. The latency of each API which is coming across the microservice and within the microservice is monitored. The latency is plotted in a histogram over time. The response times of API in a single microservice or across the microservices are monitored in the histogram where processing needs chain of microservice interaction. The APIs which are taking longer time than expected are identified in the histogram based on the latency metrics. Accordingly, APIs that cause delays and also potential bottlenecks in the system are identified. This may reduce debugging time and efforts required for the system optimization.

[00108] In an implementation, an application is configured to perform service response latency profiling. In an aspect, the application is a software program designed to perform specific tasks for users. The application may maintain details for operation type, operation count, operation start time, and operation end time. Based on the operation start time and the operation end time, the application may determine the execution time or latency for the operation. The application may also maintain a bucket for a histogram. An offset in the histogram may denote the execution time. The execution time is time required to execute the operation from the operation start time to the operation end time. The offset in the histogram is value which denotes the number of operations which take same execution time. For API call, the application may record the start time and the end time. Based on the start time and the end time, latency required for the operation may be calculated. Once the latency is calculated, the latency is recorded (or plotted) in the histogram. In the histogram, offset value is incremented for the operation count. If exact match is not found, then the value is rounded up in the bucket. In a similar way, the total latency and recent latency may be calculated. The total latency is summation of all latency values. For total latency, the value of all latency is summarized. The recent latency value for the recent latency is configurable based on occurrence of latencies for latest number of counts. Further, the application stores operation types, the operation start time, the operation end time, the operation count, the plurality of buckets, the calculated latency value, the last latency value, the offset values, the index values of the plurality of buckets, the total latency, the recent latency, and the histogram of the plurality of operation types in the database (210).

[00109] In an aspect, the response time of individual microservice or multiple microservices are monitored based on the latency metrics. High response times are indicative of potential service failures or network issues. With the help of latency monitoring, problems such as potential service failures or network issues are easily identified before they cause service outages or significant performance degradation. This reduces the risk of system downtime.

[00110] In an implementation, monitoring of latency matrices (for example, execution time, high/low response time, high latency, total latency, average latency, peak latency, etc.) enables to proactively address performance issues before users are impacted, thereby ensuring a smooth and responsive application experience.

[00111] In an aspect, APIs latency monitoring data may be used for capacity planning. The average and peak latencies of various microservices are determined from the latency monitoring data. Accordingly, resource allocation can be planned based on the latency monitoring data. This also helps to effectively scale microservices for a single API or a group of APIs.

[00112] In an aspect, the system (108) may comprise ERM (214). The ERM (214) is configured to determine service response latency. The ERM (214) may receive a service request from a client. The ERM (214) may determine details corresponding to the received service request. The details comprise an operation type, an operation start time and an operation end time. The ERM (214) may calculate a latency value, or an execution time based on the operation start time and the operation end time. The ERM (214) may determine whether a histogram is initiated for the determined operation type of the received service request. On determining that histogram is not initiated for the determined operation type of the received service request, the ERM (214) may calculate a histogram for the determined operation type of the received service request. The ERM (214) may determine a total latency and a recent latency based on the calculated histogram.

[00113] FIG. 3 illustrates an example flow diagram (300) for service response latency profiling, in accordance with an embodiment of the present disclosure.

[00114] At step (302) of the flow diagram (300), the ERM (214) receives service request from the client/service (212). The ERM (214) determines the type of operation (service). The operation type may include creation of cloud-native network function (CNF), creation of convolutional neural network (CNN), terminating cloud native infrastructure solution (CNIS), or terminating centralized network configuration (CNC).

[00115] At step (304) of the flow diagram (300), the ERM (214) determines whether start time and end time are defined for the operation. If the start time and end time are not defined, the flow diagram (300) proceeds to step (306). If the start time and end time are defined, the flow diagram (300) proceeds to step (308).

[00116] At step (306) of the flow diagram (300), if the start time and end time are not defined, then the operation may fail. [00117] At step (308) of the flow diagram (300), if the start time and end time are defined, then the ERM (214) calculates an execution time. The execution is time required for the execution of the operation. The execution time is calculated based on the start time of the operation and end time of the operation. In an aspect, the execution time is calculated for single API call.

[00118] At step (310) of the flow diagram (300), the ERM (214) increments value for operation request count. The operation request count is the number of operations of same type which take the same operation time. For example, considering 10 milliseconds as an offset (for example, average execution time) required for execution of operations of same type, the operation request count is increased each time when the operations of same type take 10 milliseconds for its execution.

[00119] At step (312) of the flow diagram (300), the ERM (214) determines whether a histogram is initialized. If the histogram is not initialized, the flow diagram (300) proceeds to step (314), and if the histogram is initialized, the flow diagram (300) proceeds to step (316).

[00120] At step (314) of the flow diagram (300), if the histogram is not initialized, the ERM (214) initializes the histogram for specified bucket. Then the flow diagram (300) proceeds to step (316).

[00121] At step (316) of the flow diagram (300), if the histogram is initialized, the ERM (214) tries to find the bucket in the histogram for the execution time.

[00122] At step (318) of the flow diagram (300), the ERM (214) determines whether exact bucket is found or not. If the exact bucket is not found, the flow diagram (300) proceeds to step (320), and if the exact bucket is found, the flow diagram (300) proceeds to step (322).

[00123] At step (320) of the flow diagram (300), the ERM (214) finds the value rounded up in the bucket. In an example, the bucket may be defined from 10 to 20 milliseconds. If the API takes 20 milliseconds, then the bucket is rounded up.

Then the flow diagram (300) proceeds to step (322).

[00124] At step (322) of the flow diagram (300), if the exact bucket is found, the ERM (214) increments the bucket index value for the operation count. In an example, the bucket may be defined from 10 to 20 milliseconds. If the API takes within 10 to 20 milliseconds, then the exact bucket has been found. The bucket index value is incremented for operation count.

[00125] At step (324) of the flow diagram (300), the ERM (214) calculates the total latency and the last latency from previous readings. For total latency, the value of all latency is summarized. The recent latency is configurable based on occurrence of latencies for latest number of counts.

[00126] FIG. 4 illustrates an exemplary flow diagram of a method (400) for determining service response latency, in accordance with an embodiment of the present disclosure.

[00127] At step (402), the method (400) includes receiving a service request from a client. In an aspect, the client initiates a service request for a specific service. Then, the client sends the service request corresponding to the specific service to the ERM (214). The ERM (214) receives the service request from the client. The ERM (214) forwards the received service request to a subscriber service (e.g., Service provider of the specific service). An operation count is maintained for a plurality of service requests of a plurality of operation types in the database (210). On receiving the service request for same operation type, the operation count is incremented. For example, operation counts for operation type (e.g., SIM card activation) over a week, Monday = 150, Tuesday = 120, Wednesday = 200, Thursday = 180, Friday = 220, Saturday = 100, and Sunday = 90. So, the operation count for SIM card activation over the week = 150 + 120 + 200 + 180 + 220 + 100 + 90 = 1,060 activations.

[00128] At step (404), the method (400) includes determining details corresponding to the received service request. The details comprise an operation type, an operation start time and an operation end time. In an aspect, the ERM (214) processes content of the received service request and identifies the service request type and operation type by determining specific terms in the service request that indicate the type of service and operation. Further, the ERM (214) determines the operation start time by checking the timestamp when the service request is created. It maintains a log of significant activities along with their corresponding timestamps to track the duration of each step. Upon completion of processing the service request, completion time is logged. The operation end time is determined based on this completion time. For example, for the service request (e.g., SIM card activation), the operation type is SIM card activation. The sim card activation starts at 12:00:00 PM and ends at 12:00:05. So, the operation start time: 12:00:00 PM (12:00:00) and the operation end time: 12:00:05 PM (12:00:05).

[00129] At step (406), the method (400) includes calculating a latency value or an execution time based on the operation start time and the operation end time. In an aspect, the ERM (214) processes the determined details to calculate the latency value or execution time. The latency value or the execution time is calculated based on the operation start time and the operation end time. The latency value or an execution time = the operation end time - the operation start time. For example, for operation type (e.g., SIM card activation), the operation start time: 12:00:00 PM (12:00:00) and the operation end time: 12:00:05 PM (12:00:05). So, the latency value or an execution time = 12: 00:05 - 12: 00: 00 = 5 seconds.

[00130] At step (408), the method (400) includes determining whether a histogram is initiated for the determined operation type of the received service request. The histogram is created for the operation types based on the latency value or execution time. In an aspect, the ERM (214) determines whether the histogram is created for the determined operation type of the received service request by searching the determined operation type in the database (210). For example, determining whether the histogram is initiated for the operation type (e.g., SIM card activation).

[00131] At step (410), the method (400) further includes on determining the histogram is initiated for the determined operation type of the received service request, determining at least one bucket from the plurality of buckets for the calculated execution time of the received service request. For example, for the service activation request, the buckets for execution time are (0.0-5.0) ms, (5.1-10.0) ms, (10.1-15.0) ms, (15.1-20.0) ms. The operation start time of the received service request is 10:00:00.000000 AM and the operation end time is 10:00:00.002400 AM. The execution time of the received service request is the operation end time - the operation start time (10:00:00.002400 - 10:00:00.000000) = 2.4 ms. So, the bucket for the received service request is 0.0-5.0 ms.

[00132] At step (412), the method includes determining whether the calculated execution time of the received service request matches with at least one offset value of the determined at least one bucket. In an aspect, at least one bucket of the plurality of buckets is determined by matching the calculated execution time of the received service request with values of the plurality of buckets. Then, the calculated execution time of the received service request is matched with each offset value of the determined at least one bucket. For example, the execution time of the received service request is 2.4 ms. The bucket for execution time is 0-5 ms. So, the calculated execution time of the received service request is matched with each offset value (i.e., 0.0-1.0, 1.1. -2.0, 2.1-3.0, 3.1-4.0 and 4.1-5.0) of the bucket 0-5 ms.

[00133] At step (414), the method includes on detecting the calculated execution time of the received service request matches with the at least one offset value of the determined at least one bucket, incrementing the index value corresponding to the at least one offset value. For example, the execution time of the received service request is 2.4 ms. The bucket for execution time is 0-5 ms. So, the offset value of the bucket for the received service request is 2.1-3.0 ms. For offset value = 2.1-3.0 ms, the index value is 30. After receiving the service activation request, the index value for offset value 2.1-3.0 ms is increased to 31.

[00134] At step 416, the method (400) includes recording (416) a value corresponding to the calculated execution time of the received service request in the histogram. For example, the execution time of the received service request is 2.4 ms. The bucket for execution time is 0-5 ms. The offset value of the bucket for the received service request is 2.1-3.0 ms. For offset value = 2.1-3.0 ms, the index value is 30. After receiving the service activation request, the index value for offset value 2.1-3.0 ms is increased to 31. So, the execution time (i.e., 2.4 ms) of the service activation request is recorded in the histogram at offset value = 2.1 - 3.0 and the index value 31.

[00135] At step 418, the method (400) includes on detecting the calculated execution time of the received service request does not match with the at least one offset value of the determined at least one bucket, finding (418) a value rounded up in a nearest bucket. In an aspect, when the calculated execution time of the received service request does not match with the at least one offset value of the determined at least one bucket, a value rounded up in a nearest bucket is found for the calculated execution time of the received service request. When the value rounded up in the nearest bucket is found, the index value of corresponding offset value of the nearest bucket is incremented. For example, for the service activation request, the execution time of the received service request is 4.099999 ms. So, the value of the bucket for the received service request is rounded to 4.1 ms. The offset values of the bucket 0-5 ms are 0.0-1.0, 1.1. -2.0, 2.1-3.0, 3.1-4.0 and 4.1-5.0. The nearest bucket for the received service request is 4.1-5.0 ms. The index value for the offset value 4.1-5.0 ms of the bucket 0-5 ms is increased.

[00136] In an aspect, the method (400) further includes upon determining that histogram is not initiated for the determined operation type of the received service request, calculating a histogram for the determined operation type of the received service request and recording a value corresponding to the calculated execution time of the received service request in the calculated histogram. In an aspect, when the ERM (214) determines that the histogram is not initiated for the determined operation type of the received service request, the ERM (214) calculates the histogram for the determined operation type for the calculated latency value or execution time. The ERM (214) determines a range for the calculated latency value or execution time. A plurality of buckets is determined for the determined range. A value corresponding to the calculated execution time of the received service request is recorded in the calculated histogram. For example, the service request for operation “SIM card activation” is received. The execution time for the SIM card activation is 120 secs. On determining the histogram for operation type such as “SIM card activation” is not initiated, the histogram is created for the SIM card activation. The plurality of buckets for activation time comprises 0-1 min, 1-2 min, 2-3 min, 3-4 min, 4-5 min. For bucket 0-1 min, the offset values 0-10 sec, 11-20 sec, 21-30 sec, 31-40 sec, 41-50 sec, 51-60 sec. For bucket 1-2 min, the offset values are 61-70 sec, 71-80 sec, 81-90 sec, 91-100 sec, 101-110 sec, 111-120 sec. For bucket 2-3 min, the offset values are 121-130 sec, 131-140 sec, 141-150 sec, 151-160 sec, 161-170 sec, 171-180 sec. For bucket 3-4 min, the offset values are 180-190 sec, 191-200 sec, 201-210 sec, 211-220 sec, 221-230 sec, 231-240 sec. For bucket 4-5 min, the offset values are 241-250 sec, 251-260 sec, 261-270 sec, 271-280 sec, 281-290 sec, 291-300 sec. The received service request of SIM card activation has execution time 120 seconds. So, the execution time of the received service request falls in the bucket 1-2 mins. The offset value (i.e., 111-120 sec) of the bucket is increased to 1.

[00137] In an aspect, the method (400) includes maintaining a plurality of buckets for a plurality of operation types in the histogram. The plurality of buckets is maintained in the database (210). Further, the plurality of buckets may have offset values and index values. The offset values of the plurality of buckets in the histogram refers to a plurality of ranges of the execution time of the plurality of operation types. The index values of the plurality of buckets in the histogram refers to a number of operations having at least one same offset value from the offset values. In an aspect, the plurality of buckets is used to categorize different ranges of data. The offset values define the starting points for each bucket, while the index values represent specific data points or positions within the buckets. Each index value indicates how many occurrences of a specific operation type fall into a particular bucket. Maintaining the plurality of buckets for the plurality of operation types in the histogram allows for detailed insights into the distribution of data and a better understanding of how different operation types behave across varying ranges of values. Furthermore, the plurality of operation types may have distinct characteristics, and the plurality of operation types are categorized and analyzed effectively using the plurality of buckets. The offset values help define the boundaries of each bucket, enhancing data organization. By tracking the plurality of operation types across the plurality of buckets, each operation type's performance metrics (e.g., response times or failure rates) can be monitored, facilitating issue diagnosis and performance optimization. Histograms with the plurality of buckets provide better visual representations of data distributions, making complex data relationships easier to understand. For example, histogram for operation type such as “SIM card activation”, plurality of buckets for activation time comprises 0-1 min, 1-2 min, 2-3 min, 3-4 min, 4-5 min. For bucket 0-1 min, the offset values 0-10 sec, 11-20 sec, 21- 30 sec, 31-40 sec, 41-50 sec, 51-60 sec. For bucket 1-2 min, the offset values are 61- 70 sec, 71-80 sec, 81-90 sec, 91-100 sec, 101-110 sec, 111-120 sec. For bucket 2-3 min, the offset values are 121-130 sec, 131-140 sec, 141-150 sec, 151-160 sec, 161- 170 sec, 171-180 sec. For bucket 3-4 min, the offset values are 180-190 sec, 191-200 sec, 201-210 sec, 211-220 sec, 221-230 sec, 231-240 sec. For bucket 4-5 min, the offset values are 241-250 sec, 251-260 sec, 261-270 sec, 271-280 sec, 281-290 sec, 291-300 sec. The index value corresponding to each offset value represents number of activations having the corresponding offset value. For example, SIM 1 - activation time = 59 seconds, so the index value corresponding to the offset value 51-60 sec of the bucket 0-1 min is increased. For SIM 2, activation time = 222, so the index value corresponding to the offset value 221-230 sec of the bucket 3-4 min is increased.

[00138] In an aspect, the method (400) further includes determining a total latency and a recent latency for the determined operation type of the received service request based on the histogram of the determined operation type of the received service request. The total latency is summation of all latency values of the plurality of service requests of the same operation type. The recent latency is configurable based on occurrence of latencies for latest number of counts. In an aspect, upon receiving a plurality of service requests corresponding to the received service request, the ERM (214) calculates the latency values or the execution times for each of the plurality of service requests. The ERM (214) plots the calculated latency values or execution times of each of the plurality of service requests on the calculated histogram for the determined operation type of the received service request. The ERM (214) determines the total latency and the recent latency for each of the plurality of service requests corresponding to the received service request from the calculated histogram. For example, to determine the total latency, the latencies for the service activation request are 2.1, 2.4, 3.5, 4.9, 1.9, 3.3, 0.7, so the total latency for service activation request is (2.1 + 2.4 + 3.5 + 4.9 + 1.9 + 3.3 + 0.7) = 18.8. So, total latency = 18.8. Further, the recent latency is configurable based on occurrence of latencies for latest number of counts. For example, the recent latency may be last three counts or last five counts. Using above example, the recent latency for last three counts can be 1.9, 3.3, 0.7.

[00139] In an aspect, the method (400) further includes maintaining an operation count for the plurality of service requests of the plurality of operation types in the database (210). On receiving the service request for the same operation type, incrementing the operation count. Further, the incremented operation count is maintained in the database (210). In an aspect, the operation count in service requests refers to the number of individual actions or operations performed while processing the service request. The operation count can be recorded as part of request tracking and used in performance analytics. For example, if a service provider processes 100 service activation requests in a day, and the operation "assign phone number" is performed for each request, the operation count for that specific action is 100.

[00140] Furthermore, the latency value is calculated for a plurality of microservices of a single application programming interface (API) or a plurality of API of same operation type.

[00141] In an aspect, the database (210) is configured to store a plurality of operation types, the operation start time, the operation end time, the operation count, the plurality of buckets, the calculated latency value, the last latency value, the offset values, the total latency, the recent latency, the index values of the plurality of buckets and the histogram of the plurality of operation types. The plurality of operation types comprises, but is not limited to, card activation, plan enrollment, feature add-ons, device configuration for service activation, data usage management, security, updation services, etc. [00142] FIG. 5 illustrates an exemplary computer system (500) in which or with which embodiments of the present disclosure may be implemented.

[00143] As shown in FIG. 5, the computer system may include an external storage device (510), a bus (520), a main memory (530), a read-only memory (540), a mass storage device (550), communication port(s) (560), and a processor (570). A person skilled in the art will appreciate that the computer system may include more than one processor and communication ports. The processor (570) may include various modules associated with embodiments of the present disclosure. The communication port(s) (560) may be any of an RS-232 port for use with a modembased dialup connection, a 10/100 Ethernet port, a Gigabit or 10 Gigabit port using copper or fiber, a serial port, a parallel port, or other existing or future ports. The communication port(s) (560) may be chosen depending on a network, such a Local Area Network (LAN), Wide Area Network (WAN), or any network to which the computer system connects.

[00144] The main memory (530) may be random-access memory (RAM), or any other dynamic storage device commonly known in the art. The read-only memory (540) may be any static storage device(s) e.g., but not limited to, a Programmable Read Only Memory (PROM) chips for storing static information e.g., start-up or Basic Input/Output System (BIOS) instructions for the processor (570). The mass storage device (550) may be any current or future mass storage solution, which can be used to store information and/or instructions. Exemplary mass storage device (550) includes, but is not limited to, Parallel Advanced Technology Attachment (PATA) or Serial Advanced Technology Attachment (SATA) hard disk drives or solid-state drives (internal or external, e.g., having Universal Serial Bus (USB) and/or Firewire interfaces), one or more optical discs, Redundant Array of Independent Disks (RAID) storage, e.g., an array of disks.

[00145] The bus (520) communicatively couples the processor (570) with the other memory, storage, and communication blocks. The bus (520) may be, e.g., a Peripheral Component Interconnect (PCI)/PCI Extended (PCLX) bus, Small Computer System Interface (SCSI), Universal Serial Bus (USB), or the like, for connecting expansion cards, drives, and other subsystems as well as other buses, such a front side bus (FSB), which connects the processor (570) to the computer system.

[00146] Optionally, operator and administrative interfaces, e.g., a display, keyboard, joystick, and a cursor control device, may also be coupled to the bus (520) to support direct operator interaction with the computer system. Other operator and administrative interfaces can be provided through network connections connected through the communication port(s) (560). Components described above are meant only to exemplify various possibilities. In no way should the aforementioned exemplary computer system limit the scope of the present disclosure.

[00147] The exemplary computer system (500) is configured to execute a computer program product comprising a non-transitory computer-readable medium comprising instructions that, when executed by one or more processors, cause the one or more processors to perform a method for determining service response latency in a network, the method includes receiving a service request from a client, determining details corresponding to the received service request, where the details comprise an operation type, an operation start time and an operation end time , calculating a latency value or an execution time based on the determined operation start time and the determined operation end time, determining whether a histogram is initiated for the determined operation type of the received service request, on determining the histogram is initiated for the determined operation type of the received service request, determining at least one bucket from the plurality of buckets for the calculated execution time of the received service request, determining whether the calculated execution time of the received service request matches with at least one offset value of the determined at least one bucket, on detecting the calculated execution time of the received service request matches with the at least one offset value of the determined at least one bucket, incrementing the index value corresponding to the at least one offset value, and recording a value corresponding to the calculated execution time of the received service request in the histogram, where on detecting the calculated execution time of the received service request does not match with the at least one offset value of the determined at least one bucket, finding a value rounded up in a nearest bucket.

[00148] While the foregoing describes various embodiments of the invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof. The scope of the invention is determined by the claims that follow. The invention is not limited to the described embodiments, versions or examples, which are included to enable a person having ordinary skill in the art to make and use the invention when combined with information and knowledge available to the person having ordinary skill in the art.

[00149] The present disclosure provides technical advancement related to API latency monitoring. This advancement addresses the limitations of existing solutions by service response latency profiling. The disclosure involves analyzing latency metrics (for example, recent latency, total latency, and response time) to identify operation that causes delays and improve them accordingly. The APIs which take longer time than expected are identified with the help of histogram which is plotted based on the latency metrics. As a result, APIs that cause delays and potential bottlenecks are identified which offer significant improvements in identifying potential service failures or network issues before they cause service outages or significant performance degradation.

TECHNICAL ADVANCEMENTS

[00150] As is evident from the above, the present disclosure provides a technically advanced solution by providing a system and a method for service response latency profiling by monitoring latency matrices. The present disclosure described herein above has several technical advantages as follow:

1. By analyzing the latency metrics (for example, recent latency, total latency, and response time), developers are enabled to identify operation that causes delays and optimize them accordingly. The APIs which take longer time than expected are identified with the help of histogram which is plotted based on the latency metrics. As a result, APIs that cause delays and potential bottlenecks in the system are identified. This may reduce debugging time and efforts required for the system optimization. The problems such as potential service failures or network issues are easily identified before they cause service outages or significant performance degradation. This reduces the risk of system downtime. Further, monitoring of latency matrices of APIs enables developers to proactively determine performance issues before users are impacted. This ensures a smooth and responsive application experience, and improved user experience. The resource allocation can be planned based on the latency monitoring data. This helps to effectively scale microservices for a single API or a group of APIs.

Claims

CLAIMS We claim:

1. A method (400) for determining service response latency in a network (106), the method (400) comprising: receiving (402) a service request from a client; determining (404) details corresponding to the received service request, wherein the details comprise an operation type, an operation start time and an operation end time; calculating (406) a latency value or an execution time based on the determined operation start time and the determined operation end time; determining (408) whether a histogram is initiated for the determined operation type of the received service request; on determining the histogram is initiated for the determined operation type of the received service request, determining (410) at least one bucket from a plurality of buckets for the calculated execution time of the received service request; determining (412) whether the calculated execution time of the received service request matches with at least one offset value of the determined at least one bucket; on detecting the calculated execution time of the received service request matches with the at least one offset value of the determined at least one bucket, incrementing (414) an index value corresponding to the at least one offset value, and recording (416) a value corresponding to the calculated execution time of the received service request in the histogram, wherein on detecting the calculated execution time of the received service request does not match with the at least one offset value of the determined at least one bucket, finding (418) a value rounded up in a nearest bucket.

2. The method (400) as claimed in claim 1 further comprising: on determining that the histogram is not initiated for the determined operation type of the received service request, calculating the histogram for the determined operation type of the received service request and recording a value corresponding to the calculated execution time of the received service request in the calculated histogram.

3. The method (400) as claimed in claim 1 further comprising: determining a total latency and a recent latency for the determined operation type of the received service request based on the histogram of the determined operation type of the received service request, wherein the total latency is summation of all latency values of the plurality of service requests of the same operation type and the recent latency is configurable based on occurrence of latencies for latest number of counts.

4. The method (400) as claimed in claim 1 further comprising: maintaining a plurality of buckets for a plurality of operation types in the histogram having offset values and index values, wherein the offset values of the plurality of buckets in the histogram refer to a plurality of ranges of the execution time of the plurality of operation types, and the index values of the plurality of buckets in the histogram refer to a number of operations having at least one same offset value from the offset values.

5. The method (400) as claimed in claim 1 further comprising: maintaining an operation count for the plurality of service requests of the plurality of operation types, wherein on receiving the service request for a same operation type, incrementing the operation count.

6. The method (400) as claimed in claim 1 , wherein the latency value is calculated for a plurality of microservices of a single application programming interface (API) or a plurality of APIs of the same operation type.

7. The method (400) as claimed in claim 5, wherein the database (210) is configured to store a plurality of operation types, the operation start time, the operation end time, the operation count, the plurality of buckets, the calculated latency value, the last latency value, the offset values, the total latency, the recent latency, the index values of the plurality of buckets and the histogram of the plurality of operation types.

8. A system (108) for determining service response latency in a network (106), the system (108) comprising: a receiving unit (222) configured to receive a service request from a client; a determining unit (224) configured to determine details corresponding to the received service request, wherein the details comprise an operation type, an operation start time and an operation end time; a calculating unit (226) configured to calculate a latency value, or an execution time based on the determined operation start time and the determined operation end time; the determining unit (224) configured to determine whether a histogram is initiated for the determined operation type of the received service request; on determining the histogram is initiated for the determined operation type of the received service request, the determining unit (224) is configured to determine at least one bucket from the plurality of buckets for the calculated execution time of the received service request; the determining unit (224) is configured to determine whether the calculated execution time of the received service request matches with at least one offset value of the determined at least one bucket; on detecting the calculated execution time of the received service request matches with the at least one offset value of the determined at least one bucket, a processing unit (228) is configured to: increment an index value corresponding to the at least one offset value; and record a value corresponding to the calculated execution time of the received service request in the histogram, wherein on detecting the calculated execution time of the received service request does not match with the at least one offset value of the determined at least one bucket, the processing unit (228) is configured to find a value rounded up in a nearest bucket.

9. The system (108) as claimed in claim 8, wherein on determining that the histogram is not initiated for the determined operation type of the received service request, the calculating unit (226) is configured to calculate the histogram for the determined operation type of the received service request; and the processing unit (228) configured to record a value corresponding to the calculated execution time of the received service request in the calculated histogram.

10. The system (108) as claimed in claim 8, wherein the determining unit (224) is configured to determine a total latency and a recent latency for the determined operation type of the received service request based on the histogram of the determined operation type of the received service request, wherein the total latency is summation of all latency values of the plurality of service requests of the same operation type and the recent latency is configurable based on occurrence of latencies for latest number of counts.

11. The system (108) as claimed in claim 8, wherein the processing unit (228) configured to maintain a plurality of buckets for a plurality of operation types in the histogram having offset values and index values, wherein the offset values of the plurality of buckets in the histogram refer to a plurality of ranges of the execution time of the plurality of operation types, and the index values of the plurality of buckets in the histogram refer to a number of operations having at least one same offset value from the offset values.

12. The system (108) as claimed in claim 8, wherein the processing unit (228) configured to maintain an operation count for the plurality of service requests of the plurality of operation types, wherein on receiving the service request for a same operation type, the processing unit (228) is configured to increment the operation count.

13. The system (108) as claimed in claim 8, wherein the latency value is calculated for a plurality of microservices of a single application programming interface (API) or a plurality of APIs of the same operation type.

14. The system (108) as claimed in claim 12, wherein the database (210) is configured to store a plurality of operation types, the operation start time, the operation end time, the operation count, the plurality of buckets, the calculated latency value, the last latency value, the offset values, the total latency, the recent latency, the index values of the plurality of buckets and the histogram of the plurality of operation types.

15. A user equipment (104) communicatively coupled with a system (108), the coupling comprises steps of: receiving, by the system (108), a connection request; sending, by the system (108), an acknowledgment of the connection request to the user equipment (104); and transmitting a plurality of signals in response to the connection request, wherein the system (108) is configured for determining service response latency in a network (106), as claimed in claim 8.

16. A computer program product comprising a non -transitory computer-readable medium comprising instructions that, when executed by one or more processors, cause the one or more processors to execute a method (400) for determining service response latency in a network (106), the method (400) comprising: receiving (402) a service request from a client; determining (404) details corresponding to the received service request, wherein the details comprise an operation type, an operation start time and an operation end time; calculating (406) a latency value or an execution time based on the determined operation start time and the determined operation end time; determining (408) whether a histogram is initiated for the determined operation type of the received service request; on determining the histogram is initiated for the determined operation type of the received service request, determining (410) at least one bucket from the plurality of buckets for the calculated execution time of the received service request; determining (412) whether the calculated execution time of the received service request matches with at least one offset value of the determined at least one bucket; on detecting the calculated execution time of the received service request matches with the at least one offset value of the determined at least one bucket, incrementing (414) an index value corresponding to the at least one offset value, and recording (416) a value corresponding to the calculated execution time of the received service request in the histogram, wherein on detecting the calculated execution time of the received service request does not match with the at least one offset value of the determined at least one bucket, finding (418) a value rounded up in a nearest bucket.