US12001310B2 - Approximating activity loads in databases using smoothed time series - Google Patents
Approximating activity loads in databases using smoothed time series Download PDFInfo
- Publication number
- US12001310B2 US12001310B2 US17/222,010 US202117222010A US12001310B2 US 12001310 B2 US12001310 B2 US 12001310B2 US 202117222010 A US202117222010 A US 202117222010A US 12001310 B2 US12001310 B2 US 12001310B2
- Authority
- US
- United States
- Prior art keywords
- time series
- time
- activity
- processor
- event
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3409—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
- G06F11/3419—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment by assessing time
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/302—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
- G06F2201/835—Timestamp
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
- G06F2201/86—Event-based monitoring
Definitions
- the present techniques relate to analysis of data streams. More specifically, the techniques relate to analysis of data streams in databases.
- the loads of their activities may be measured during given timeframe.
- user activities are a set of separate events in time, and therefore is aggregated in some manner in order to build activity load statistics.
- One method of aggregating user activities is dividing the time into time windows of a predefined duration.
- the time windows can be either sub sequential or overlapping, in which sliding windows may be used.
- both time window methods have some drawbacks which are especially problematic when dealing with large data streams. For example, these methods may use both high storage capacity and computation resources which might impose problems when implementing the methods on larger scales.
- the statistical data models used by existing unsupervised and non-time-series methods are based on specific time frames or on approximate aggregations of the times scales.
- the specific time-frames may be a number of minutes, hourly, daily, or some other time frame.
- some applications may benefit from multiple time resolutions for every event at the time the event is captured.
- the cost of maintaining multiple statistical models, data models per time-frame, or converting from one to another may be very expensive. Managing several time windows simultaneously in existing methods may thus be expensive both in time and disk space.
- a system can include processor to monitor activity on a database server to generate an events stream.
- the processor can also further convert the events stream into a time series that approximates activity load at the database server using an exponential smoothing.
- the processor can also send the time series to a streaming analytics engine.
- any number of time series analytics may be executed on the activity of the database server in real-time.
- the processor is to monitor activity per each user or per each database.
- the additional monitoring may enable approximation per user or database in real-time.
- the exponential smoothing is based on a density function of the activity load that uses a smoothing exponential average of events in the events stream. In this embodiment, more recent events receive increasingly higher weights than older events.
- the exponential smoothing is calculated based on a current event timestamp and a previous event timestamp in the events stream.
- the timestamps enable the smoothing to be calculated using minimal resources.
- the processor is to approximate an activity load of database server for a number of simulated time windows in parallel.
- significant data storage may be saved by approximating simulated time windows in parallel.
- the processor is to approximate an updated activity load of the database server at each detected additional event in the events stream.
- processing resources are saved by not continually calculating activity load using windows of time.
- the streaming analytics engine is to perform anomaly detection on the time series. In this embodiment, anomalies can be detected in real-time on the activity in the database and on different time resolutions.
- a method can include monitoring, via a processor, activity on database server to generate an events stream.
- the method can further include converting, via the processor, the events stream into a time series that approximates an activity load of database server using an exponential smoothing.
- the method can also further include sending, via the processor, the time series to a streaming analytics engine.
- any number of time series analytics may be executed on the activity of the database server.
- the method can also include separating the events stream into per-user streams or per-database streams. In this embodiment, the separated streams may enable approximation per user or database in real-time.
- the method can also include receiving an adjustable smoothing factor and converting the events stream using the exponential smoothing based on the smoothing factor.
- the smoothing factor enables adjustable simulated time windows.
- the method can also include approximating an updated activity load of the database server at each detected additional event in the events stream.
- processing resources are saved by not continually calculating activity load using windows of time.
- executing the streaming analytics engine includes predicting a future activity load based on the time series.
- a future activity load can be predicted in real-time.
- executing the streaming analytics engine includes calculating statistics on the time series to generate anomalies in real time in response to detecting that a current value of the time series exceeds a deviation threshold.
- anomalies can be detected in real-time on the activity in the database.
- executing the streaming analytics engine includes clustering the time series with another calculated time series. In this embodiment, anomaly detection is enabled by comparison with the clusters.
- a computer program product for approximating activity loads in databases can include computer-readable storage medium having program code embodied therewith.
- the computer readable storage medium is not a transitory signal per se.
- the program code executable by a processor to cause the processor to monitor activity on a database server to generate an events stream.
- the program code can also cause the processor to convert the events stream into a time series that approximates activity load at the database server using an exponential smoothing.
- the program code can also cause the processor to send the time series to a streaming analytics engine.
- the program code can also cause the processor to separate the events stream into per-user streams or per-database streams. In this embodiment, the separated streams may enable approximation per user or per database.
- the program code can cause the processor to approximate an updated activity load of the database server at each detected additional event in the events stream.
- processing resources are saved by not continually calculating activity load using windows of time.
- the program code can also cause the processor to also further predict a future activity load based on the time series.
- a future activity load can be predicted in real-time.
- the program code can also cause the processor to also further calculate statistics on the time series to generate anomalies in real time in response to detecting that a current value of the time series exceeds a deviation threshold.
- anomalies can be detected in real-time on the activity in the database.
- the program code can also cause the processor to also further cluster the time series with another calculated time series. In this embodiment, anomaly detection is enabled by comparison with the clusters.
- FIG. 1 is a block diagram of an example system for approximating activity loads in databases using smoothed time series
- FIG. 2 is an example exponential smoothing of the time series data corresponding to events in a database
- FIG. 3 is a block diagram of an example method that can approximate activity loads in databases using smoothed time series
- FIG. 4 is a block diagram of an example computing device that can approximate activity loads in databases using smoothed time series
- FIG. 5 is a diagram of an example cloud computing environment according to embodiments described herein;
- FIG. 6 is a diagram of an example abstraction model layers according to embodiments described herein.
- FIG. 7 is an example tangible, non-transitory computer-readable medium that can approximate activity loads in databases using smoothed time series.
- a system includes a processor to monitor activity on a database server to generate an events stream.
- the processor can convert the events stream into a time series that approximates activity load at the database server using an exponential smoothing.
- the processor can also execute a streaming analytics engine on the time series.
- the embodiments provide a fast, light weight method which convert sequence of events into time series enabling application to many existing methods which work on time series for analytic.
- the embodiments may be used with any suitable prediction and outlier detection techniques.
- embodiments of the present disclosure enable a continuous approximation of activity load without a need for handling any kind of activity windows.
- the use of continuous approximation of the activity load may enable analyzing any time-window without the need to maintain time-windows aggregation statistics.
- the embodiments also use minimal calculations and no additional storage except previous value and previous timestamp of the event.
- the embodiments can thus achieve very similar results to other methods with much less resources used.
- the embodiments can evaluate different time frames of activity in parallel, which may double resource usage in existing methods. The embodiments thus enable efficient and flexible time resolution data-streaming-analytics.
- the evaluation of different time frames of activity may improve detection of anomalies and other potential applications using the embodiments described herein.
- FIG. 1 a block diagram shows an example system for approximating activity loads in databases using smoothed time series.
- the example system 100 of FIG. 1 includes a number of client devices 102 A, 102 B, and 102 C.
- the system 100 also includes a database server 104 communicatively coupled to the client devices 102 A, 102 B, and 102 C.
- the database server 104 includes an event monitor 106 , a load approximator 108 , and a streaming analytics engine 110 .
- the event monitor 106 of the database server 104 can monitor events received at the database server 104 from the client devices 102 A, 102 B, and 102 C.
- the event monitor 106 may generate an events stream based on the monitored events.
- the events stream may be separated per machine or per user.
- the events stream may be separated based on originating Internet Protocol (IP) address or username.
- IP Internet Protocol
- the events stream may be separated by type of actions.
- the events stream may be separated based on an SQL VERB that is used, an application that is used, etc.
- the load approximator 108 can convert the separate events stream into a time series that describes activity density.
- the load approximator 108 can organize the time domain as a discrete series. For example, every millisecond, or another time interval of any other magnitude, may be considered a separate interval which can either contain an event or not. Event occurrences in the events stream therefore can be described as a binary sequence (B(n)). For example, each of the discrete time intervals may be considered to have the value of one if the event occurred at that millisecond and zero otherwise.
- the load approximator 108 may convert the separate events stream into a time series using exponential smoothing.
- a function for calculating a smoothing exponential average of the binary series B(n) could be used to evaluate the density function of the activity.
- the formula of Eq. 1 may be adjusted such that the load approximator 108 may only calculate a value of the function on event time.
- the load approximator 108 may calculate the function in response to detecting an event in the event stream from the event monitor 106 .
- a simulated time window may be adjusted by adjusting the value of the smoothing factor ⁇ .
- the value of ⁇ may be set such that the simulated time window is a millisecond, a minute, an hour, a shift of any number of hours, a day, a week, etc. Eq.
- the streaming analytics engine 110 can analyze the approximated activity loads.
- the streaming analytics engine 110 can any suitable classical time series algorithm to analyze the streaming activity density data.
- the streaming analytics engine 110 can perform forecasting, data mining, pattern recognition, or machine learning.
- the streaming analytics engine 110 can perform time series analysis used for clustering, classification, query by content, anomaly detection as well as forecasting.
- the current values of the series can be used as general description of the user behavior when applied on different sub sections of user activity. For example, activities may be divided by user, object, command, records-affected, or source program.
- an object may be a data table.
- commands may include SQL VERB.
- SQL VERBs may include any of the keywords DELETE, INSERT, MERGE, SELECT, or UPDATE.
- user activity may be divided by SQL VERB, used application, etc., and the streaming analytics engine 110 can detect which part of a user activity is done with different SQL VERBs, applications, etc.
- the streaming analytics engine 110 can generate an up-to-date, concise description of a user behavior model that can be used for various goals.
- the user behavior model may be used for analysis of user behavior, user clustering, risk assessments, among other suitable uses.
- the streaming analytics engine 110 detect anomalies based on the approximated activity load.
- the streaming analytics engine 110 can then calculate statistics on the time series. For example, the streaming analytics engine 110 can calculate a mean, minimum, maximum, standard deviation, or any combination thereof, on the time series.
- the streaming analytics engine 110 can calculate these statistics as smoothing averages or in similar manner, such that anomaly alerts may be received in real-time without having to work according to any time window scheduling.
- the streaming analytics engine 110 can generate anomaly detections in real time as the current value of the time series breaks some aspects of the statistics. For example, streaming analytics engine 110 can generate anomaly detections in response to detecting that the approximated activity load exceeds half a standard deviation more than a calculated maximum value.
- the streaming analytics engine 110 can perform clustering on a times series. For example, users of a database may be clustered together based on their associated time series metrics. As one example, the streaming analytics engine 110 can perform both clustering and classification on received user streams. The classification may be learned by examples to recognize a type of user according to a time series pattern of the type of user, and a clustering may be calculated to determine a group of similar users among the users according to their time series. In various examples, the streaming analytics engine 110 can use these determined groups to find a user whose behavior moves from normal behavior of the group that the user belongs to.
- FIG. 1 the block diagram of FIG. 1 is not intended to indicate that the system 100 is to include all of the components shown in FIG. 1 . Rather, the system 100 can include fewer or additional components not illustrated in FIG. 1 (e.g., additional client devices, or additional resource servers, etc.).
- the binary representation described above may be theoretical for the understanding of the method and may not actually be calculated or stored in any stage of the system.
- the streaming analytics engine 110 may alternatively be located on a second server (not shown) communicatively coupled to the database server 104 , and the database server 104 may include a stream transmitter (not shown) that transmits the time series to the streaming analytics engine 110 on the second server.
- FIG. 2 is a block diagram shows an example exponential smoothing of the time series data corresponding to events in a database.
- the example exponential smoothing is generally referred to by the reference number 200 .
- FIG. 2 includes a first graph 202 of a binary time series and a second graph 204 of a smoothed time series.
- the first graph 202 includes a set of events 206 detected at various time intervals. For example, each of the events 206 may have been detected at a particular millisecond, second, minute, hour, etc., depending on the interval used.
- the events 206 of the first graph 202 may be input into a smoothing function to generate a continuous output 208 of approximated event load values over time.
- the y axis may indicate the approximated event load at any point in time and the x axis of graph 204 may indicate the passage of time.
- the continuous set of smoothed time series values 208 may be output and used in various applications.
- the output values 208 may be input into a machine learning model during training to generate a trained machine learning model that can analyze input time series data using various simulated time windows in parallel.
- the block diagram of FIG. 2 is not intended to indicate that the system 100 is to include all of the components shown in FIG. 2 . Rather, the exponential smoothing 200 can include fewer or additional components not illustrated in FIG. 2 (e.g., additional client devices, or additional resource servers, etc.).
- FIG. 3 is a process flow diagram of an example method that can approximate activity loads in databases using smoothed time series.
- the method 300 can be implemented with any suitable computing device, such as the computing device 400 of FIG. 4 and is described with reference to the system 100 and exponential smoothing 200 of FIGS. 1 and 2 .
- the methods described below can be implemented by the processors 402 or 702 of FIGS. 4 and 7 .
- activity on a database server is monitored to generate an events stream.
- the activity monitored may include a variety of users or databases.
- the events stream may be separated into per-user streams or per-database streams.
- the per-user streams or per-database streams may further be sub-divided into object, command, records-affected, source program, or any combination thereof.
- the events stream is converted into a time series that approximates activity load at the database server using an exponential smoothing.
- an adjustable smoothing factor may also be received and convert the events stream converted using the exponential smoothing based on the smoothing factor.
- an updated activity load of the database server may be approximated at each detected additional event in the events stream.
- any of a number of streams may be converted into time series.
- the events stream is sent to a streaming analytics engine.
- the streaming analytics engine may be on an external server or implemented locally on the same computing device.
- the streaming analytics engine may then be executed on the time series.
- the streaming analytics engine may include a prediction, anomaly detection, or clustering based on the time series. For example, a future activity load may be predicted based on the time series.
- statistics may be calculated on the time series to generate anomalies in real time in response to detecting that a current value of the time series exceeds a deviation threshold.
- the time series may be clustered with another calculated time series.
- users of a database may be clustered together based on their associated time series metrics. For example, given the mean values of SELECT, UPDATE, INSERT, for each of a number of user sub streams, the sub streams can be clustered into groups by similarity.
- the process flow diagram of FIG. 3 is not intended to indicate that the operations of the method 300 are to be executed in any particular order, or that all of the operations of the method 300 are to be included in every case. Additionally, the method 300 can include any suitable number of additional operations.
- the techniques described herein may be implemented in a cloud computing environment.
- a computing device configured to approximate activity loads in databases using smoothed time series may be implemented in a cloud computing environment. It is understood in advance that although this disclosure may include a description on cloud computing, implementation of the teachings recited herein are not limited to a cloud computing environment. Rather, embodiments of the present invention are capable of being implemented in conjunction with any other type of computing environment now known or later developed.
- Cloud computing is a model of service delivery for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g. networks, network bandwidth, servers, processing, memory, storage, applications, virtual machines, and services) that can be rapidly provisioned and released with minimal management effort or interaction with a provider of the service.
- This cloud model may include at least five characteristics, at least three service models, and at least four deployment models.
- On-demand self-service a cloud consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with the service's provider.
- Resource pooling the provider's computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to demand. There is a sense of location independence in that the consumer generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter).
- Rapid elasticity capabilities can be rapidly and elastically provisioned, in some cases automatically, to quickly scale out and rapidly released to quickly scale in. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be purchased in any quantity at any time.
- Measured service cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported providing transparency for both the provider and consumer of the utilized service.
- level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts).
- SaaS Software as a Service: the capability provided to the consumer is to use the provider's applications running on a cloud infrastructure.
- the applications are accessible from various client devices through a thin client interface such as a web browser (e.g., web-based email).
- a web browser e.g., web-based email.
- the consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings.
- PaaS Platform as a Service
- the consumer does not manage or control the underlying cloud infrastructure including networks, servers, operating systems, or storage, but has control over the deployed applications and possibly application hosting environment configurations.
- IaaS Infrastructure as a Service
- the consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, deployed applications, and possibly limited control of select networking components (e.g., host firewalls).
- Private cloud the cloud infrastructure is operated solely for an organization. It may be managed by the organization or a third party and may exist on-premises or off-premises.
- Public cloud the cloud infrastructure is made available to the general public or a large industry group and is owned by an organization selling cloud services.
- Hybrid cloud the cloud infrastructure is a composition of two or more clouds (private, community, or public) that remain unique entities but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load-balancing between clouds).
- a cloud computing environment is service oriented with a focus on statelessness, low coupling, modularity, and semantic interoperability.
- An infrastructure comprising a network of interconnected nodes.
- FIG. 4 is block diagram of an example computing device that can approximate activity loads in databases using smoothed time series.
- the computing device 400 may be for example, a server, desktop computer, laptop computer, tablet computer, or smartphone.
- computing device 400 may be a cloud computing node.
- Computing device 400 may be described in the general context of computer system executable instructions, such as program modules, being executed by a computer system.
- program modules may include routines, programs, objects, components, logic, data structures, and so on that perform particular tasks or implement particular abstract data types.
- Computing device 400 may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network.
- program modules may be located in both local and remote computer system storage media including memory storage devices.
- the computing device 400 may include a processor 402 that is to execute stored instructions, a memory device 404 to provide temporary memory space for operations of said instructions during operation.
- the processor can be a single-core processor, multi-core processor, computing cluster, or any number of other configurations.
- the memory 404 can include random access memory (RAM), read only memory, flash memory, or any other suitable memory systems.
- the processor 402 may be connected through a system interconnect 406 (e.g., PCI®, PCI-Express®, etc.) to an input/output (I/O) device interface 408 adapted to connect the computing device 400 to one or more I/O devices 410 .
- the I/O devices 410 may include, for example, a keyboard and a pointing device, wherein the pointing device may include a touchpad or a touchscreen, among others.
- the I/O devices 410 may be built-in components of the computing device 400 , or may be devices that are externally connected to the computing device 400 .
- the processor 402 may also be linked through the system interconnect 406 to a display interface 412 adapted to connect the computing device 400 to a display device 414 .
- the display device 414 may include a display screen that is a built-in component of the computing device 400 .
- the display device 414 may also include a computer monitor, television, or projector, among others, that is externally connected to the computing device 400 .
- a network interface controller (NIC) 416 may be adapted to connect the computing device 400 through the system interconnect 406 to the network 418 .
- the NIC 416 can transmit data using any suitable interface or protocol, such as the internet small computer system interface, among others.
- the network 418 may be a cellular network, a radio network, a wide area network (WAN), a local area network (LAN), or the Internet, among others.
- An external computing device 420 may connect to the computing device 400 through the network 418 .
- external computing device 420 may be an external webserver 420 .
- external computing device 420 may be a cloud computing node.
- the processor 402 may also be linked through the system interconnect 406 to a storage device 422 that can include a hard drive, an optical drive, a USB flash drive, an array of drives, or any combinations thereof.
- the storage device may include an event monitor module 424 , a load approximator module 426 , and a streaming analytics module 428 .
- the event monitor module 424 can monitor activity on a database server to generate an events stream.
- the monitor module 424 can monitor user activity per each user or per each database.
- the events stream can be separated by parameters into any number of sub streams, such as a per-user streams or per-database streams.
- the load approximator module 426 can convert the events stream into a time series that approximates activity load at the database server using an exponential smoothing.
- the exponential smoothing may be based on a density function of the activity load that uses a smoothing exponential average of events in the events stream.
- the exponential smoothing is calculated based on a current event timestamp and a previous event timestamp in the events stream.
- the exponential smoothing may be calculated using the Eq. 2 above.
- the load approximator module 426 can approximate an activity load of database server for any number of simulated time windows in parallel. For example, the load approximator module 426 can calculate an activity load for each of the simulated time windows using different alpha values.
- the load approximator module 426 can approximate an updated activity load of the database server at each detected additional event in the events stream.
- the load approximator module 426 can then send the events stream to the streaming analytics engine 428 .
- the streaming analytics engine 428 can execute a streaming analytics engine on the time series.
- the streaming analytics engine 428 can perform anomaly detection on the time series.
- the streaming analytics engine 428 can cluster activities in the time series into groups based on one or more parameters.
- the streaming analytics engine 428 can cluster the time series in order to associate a particular user with a group of users. For example, users of a database may be clustered together based on their associated time series parameters.
- the streaming analytics engine 428 can predict a future activity load based on the time series. In various examples, the streaming analytics engine 428 can calculate statistics on the time series to generate anomalies in real time in response to detecting that a current value of the time series exceeds a deviation threshold.
- FIG. 4 the block diagram of FIG. 4 is not intended to indicate that the computing device 400 is to include all of the components shown in FIG. 4 . Rather, the computing device 400 can include fewer or additional components not illustrated in FIG. 4 (e.g., additional memory components, embedded controllers, modules, additional network interfaces, etc.).
- the streaming analytics engine 428 can be implemented as a client on the storage device 422 and an engine on an external computing device 420 .
- any of the functionalities of the event monitor 424 , the load approximator module 426 , and the streaming analytics engine 428 may be partially, or entirely, implemented in hardware and/or in the processor 402 .
- the functionality may be implemented with an application specific integrated circuit, logic implemented in an embedded controller, or in logic implemented in the processor 402 , among others.
- the functionalities of the event monitor module 424 , load approximator module 426 , and streaming analytics engine 428 can be implemented with logic, wherein the logic, as referred to herein, can include any suitable hardware (e.g., a processor, among others), software (e.g., an application, among others), firmware, or any suitable combination of hardware, software, and firmware.
- cloud computing environment 500 comprises one or more cloud computing nodes 502 with which local computing devices used by cloud consumers, such as, for example, personal digital assistant (PDA) or cellular telephone 504 A, desktop computer 504 B, laptop computer 504 C, and/or automobile computer system 504 N may communicate.
- Nodes 502 may communicate with one another. They may be grouped (not shown) physically or virtually, in one or more networks, such as Private, Community, Public, or Hybrid clouds as described hereinabove, or a combination thereof.
- This allows cloud computing environment 500 to offer infrastructure, platforms and/or software as services for which a cloud consumer does not need to maintain resources on a local computing device.
- computing devices 504 A-N shown in FIG. 5 are intended to be illustrative only and that computing nodes 502 and cloud computing environment 500 can communicate with any type of computerized device over any type of network and/or network addressable connection (e.g., using a web browser).
- Virtualization layer 602 provides an abstraction layer from which the following examples of virtual entities may be provided: virtual servers; virtual storage; virtual networks, including virtual private networks; virtual applications and operating systems; and virtual clients.
- management layer 604 may provide the functions described below.
- Resource provisioning provides dynamic procurement of computing resources and other resources that are utilized to perform tasks within the cloud computing environment.
- Metering and Pricing provide cost tracking as resources are utilized within the cloud computing environment, and billing or invoicing for consumption of these resources. In one example, these resources may comprise application software licenses.
- Security provides identity verification for cloud consumers and tasks, as well as protection for data and other resources.
- User portal provides access to the cloud computing environment for consumers and system administrators.
- Service level management provides cloud computing resource allocation and management such that required service levels are met.
- Service Level Agreement (SLA) planning and fulfillment provide pre-arrangement for, and procurement of, cloud computing resources for which a future requirement is anticipated in accordance with an SLA.
- SLA Service Level Agreement
- Workloads layer 606 provides examples of functionality for which the cloud computing environment may be utilized. Examples of workloads and functions which may be provided from this layer include: mapping and navigation; software development and lifecycle management; virtual classroom education delivery; data analytics processing; transaction processing; and data streaming analysis.
- the present invention may be a system, a method and/or a computer program product at any possible technical detail level of integration
- the computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention
- a non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing.
- RAM random access memory
- ROM read-only memory
- EPROM or Flash memory erasable programmable read-only memory
- SRAM static random access memory
- CD-ROM compact disc read-only memory
- DVD digital versatile disk
- memory stick a floppy disk
- a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon
- a computer readable storage medium is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
- Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
- the network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers.
- a network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
- Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
- the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
- the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
- electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
- These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
- These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
- the computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
- FIG. 7 a block diagram is depicted of an example tangible, non-transitory computer-readable medium 700 that can approximate activity loads in databases using smoothed time series.
- the tangible, non-transitory, computer-readable medium 700 may be accessed by a processor 702 over a computer interconnect 704 .
- the tangible, non-transitory, computer-readable medium 700 may include code to direct the processor 702 to perform the operations of the method 300 of FIG. 3 .
- an event monitor module 706 includes code to monitor activity on a database server to generate an events stream.
- the monitor module 706 includes code to separate the events stream into per-user streams or per-database streams.
- a load approximator module 708 includes code to convert the events stream into a time series that approximates activity load at the database server using an exponential smoothing.
- load approximator module 708 includes code to approximate an updated activity load of the database server at each detected additional event in the events stream.
- the load approximator module 708 further includes code to send the events stream to a streaming analytics engine.
- the streaming analytics engine module 710 includes code to execute a streaming analytics engine on the time series.
- the streaming analytics engine module 710 may include code to predict a future activity load based on the time series.
- the streaming analytics engine module 710 may include code to calculate statistics on the time series to generate anomalies in real time in response to detecting that a current value of the time series exceeds a deviation threshold.
- the streaming analytics engine module 710 may include code to cluster the time series with another calculated time series.
- each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s).
- the functions noted in the block may occur out of the order noted in the figures.
- two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
- each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions. It is to be understood that any number of additional software components not shown in FIG. 7 may be included within the tangible, non-transitory, computer-readable medium 700 , depending on the specific application.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Computing Systems (AREA)
- Computer Hardware Design (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Debugging And Monitoring (AREA)
Abstract
Description
F(n)=α*B(n)+(1−α)*F(n−1) Eq. 1
where α is a smoothing factor, and 0<α<1, and the weight between new values and historical values is adjusted using different α values. For example, with a smaller value for α, the function may remember events longer in time in the past. Unlike averaging over a time window, each value may have a different weight. For example, more recent values may have larger weights than older ones. However, calculating F(n) explicitly may not be practical because doing so may involve a calculation operation for every time unit. For example, such calculations may not be practical if the time unit is a millisecond. Therefore, in various examples, the formula of Eq. 1 may be adjusted such that the
F(n)=α+F(p)*(1−α)(1−α)(n-p) Eq. 2
where n is the current event timestamp, and p is a previous event timestamp. In various examples, a simulated time window may be adjusted by adjusting the value of the smoothing factor α. For example, because different alpha values provide different weights for historical events versus present events, therefore the different alpha values may be used to control the amount of historical values taken into account. However, the resulting density is an approximation for the activity in the time window because the weight of each event in the simulated window is not equal. In various examples, the value of α may be set such that the simulated time window is a millisecond, a minute, an hour, a shift of any number of hours, a day, a week, etc. Eq. 2 is thus calculated only in event times and provides an approximation to the density of events in a given time window. Because the
Claims (20)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/222,010 US12001310B2 (en) | 2021-04-05 | 2021-04-05 | Approximating activity loads in databases using smoothed time series |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/222,010 US12001310B2 (en) | 2021-04-05 | 2021-04-05 | Approximating activity loads in databases using smoothed time series |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20220318119A1 US20220318119A1 (en) | 2022-10-06 |
| US12001310B2 true US12001310B2 (en) | 2024-06-04 |
Family
ID=83450281
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/222,010 Active 2041-04-20 US12001310B2 (en) | 2021-04-05 | 2021-04-05 | Approximating activity loads in databases using smoothed time series |
Country Status (1)
| Country | Link |
|---|---|
| US (1) | US12001310B2 (en) |
Citations (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030200134A1 (en) | 2002-03-29 | 2003-10-23 | Leonard Michael James | System and method for large-scale automatic forecasting |
| US20050089063A1 (en) * | 2003-10-24 | 2005-04-28 | Takaaki Haruna | Computer system and control method thereof |
| US20060191161A1 (en) * | 2000-05-02 | 2006-08-31 | Wunderlin William J | System and method for controlling a dryer appliance |
| US7310590B1 (en) * | 2006-11-15 | 2007-12-18 | Computer Associates Think, Inc. | Time series anomaly detection using multiple statistical models |
| US20130046725A1 (en) * | 2011-08-15 | 2013-02-21 | Software Ag | Systems and/or methods for forecasting future behavior of event streams in complex event processing (cep) environments |
| US8566483B1 (en) | 2009-12-17 | 2013-10-22 | Emc Corporation | Measuring data access activity |
| US8799225B2 (en) | 2003-11-05 | 2014-08-05 | Lumigent Technologies, Inc. | Process and system for auditing database activity |
| US20140258254A1 (en) | 2013-03-08 | 2014-09-11 | Oracle International Corporation | Analyzing database cluster behavior by transforming discrete time series measurements |
| US20150351670A1 (en) * | 2014-06-06 | 2015-12-10 | Dexcom, Inc. | Fault discrimination and responsive processing based on data and context |
| US20160336154A1 (en) * | 2015-05-12 | 2016-11-17 | Hitachi High-Technologies Corporation | Plasma processing apparatus, data processing apparatus and data processing method |
| US20180278634A1 (en) * | 2017-03-23 | 2018-09-27 | International Business Machines Corporation | Cyber Security Event Detection |
| US20190042727A1 (en) | 2017-08-01 | 2019-02-07 | International Business Machines Corporation | Database access monitoring with selective session information retrieval |
| US10380290B1 (en) * | 2014-06-24 | 2019-08-13 | Ansys, Inc. | Systems and methods for parallel transient analysis and simulation |
| US20190370229A1 (en) | 2012-09-28 | 2019-12-05 | Oracle International Corporation | Techniques for activity tracking, data classification, and in database archiving |
-
2021
- 2021-04-05 US US17/222,010 patent/US12001310B2/en active Active
Patent Citations (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060191161A1 (en) * | 2000-05-02 | 2006-08-31 | Wunderlin William J | System and method for controlling a dryer appliance |
| US20030200134A1 (en) | 2002-03-29 | 2003-10-23 | Leonard Michael James | System and method for large-scale automatic forecasting |
| US20050089063A1 (en) * | 2003-10-24 | 2005-04-28 | Takaaki Haruna | Computer system and control method thereof |
| US8799225B2 (en) | 2003-11-05 | 2014-08-05 | Lumigent Technologies, Inc. | Process and system for auditing database activity |
| US7310590B1 (en) * | 2006-11-15 | 2007-12-18 | Computer Associates Think, Inc. | Time series anomaly detection using multiple statistical models |
| US8566483B1 (en) | 2009-12-17 | 2013-10-22 | Emc Corporation | Measuring data access activity |
| US20130046725A1 (en) * | 2011-08-15 | 2013-02-21 | Software Ag | Systems and/or methods for forecasting future behavior of event streams in complex event processing (cep) environments |
| US20190370229A1 (en) | 2012-09-28 | 2019-12-05 | Oracle International Corporation | Techniques for activity tracking, data classification, and in database archiving |
| US20140258254A1 (en) | 2013-03-08 | 2014-09-11 | Oracle International Corporation | Analyzing database cluster behavior by transforming discrete time series measurements |
| US20150351670A1 (en) * | 2014-06-06 | 2015-12-10 | Dexcom, Inc. | Fault discrimination and responsive processing based on data and context |
| US10380290B1 (en) * | 2014-06-24 | 2019-08-13 | Ansys, Inc. | Systems and methods for parallel transient analysis and simulation |
| US20160336154A1 (en) * | 2015-05-12 | 2016-11-17 | Hitachi High-Technologies Corporation | Plasma processing apparatus, data processing apparatus and data processing method |
| US20180278634A1 (en) * | 2017-03-23 | 2018-09-27 | International Business Machines Corporation | Cyber Security Event Detection |
| US20190042727A1 (en) | 2017-08-01 | 2019-02-07 | International Business Machines Corporation | Database access monitoring with selective session information retrieval |
Non-Patent Citations (4)
| Title |
|---|
| "Converting table data to a time series", GitHub, Mar. 2, 2021, 6 pages. https://cloud.ibm.com/docs/sql-query?topic=sql-query-examples_common. |
| Cao, Wei et al., "Timon: A Timestamped Event Database for Efficient Telemetry Data Processing and Analytics", SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, Jun. 2020, 15 pages. |
| Dix ("The Power of Time Series Databases with Paul Dix", Screaming in the Cloud Podcast, Oct. 23, 2019) (Year: 2019). * |
| Mintz, Josh, "IBM Releases SQL-Native Time Series Processing in Cloud", IBM Cloud Technologies, Jul. 3, 2019, 9 pages. https://www.ibm.com/cloud/blog/announcements/ibm-releases-sql-native-time-series-processing-in-cloud. |
Also Published As
| Publication number | Publication date |
|---|---|
| US20220318119A1 (en) | 2022-10-06 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11062231B2 (en) | Supervised learning system training using chatbot interaction | |
| US11770305B2 (en) | Distributed machine learning in edge computing | |
| US11269714B2 (en) | Performance anomaly detection | |
| US10191792B2 (en) | Application abnormality detection | |
| US10795937B2 (en) | Expressive temporal predictions over semantically driven time windows | |
| US20200106856A1 (en) | Cognitive allocation of monitoring resources for cloud applications | |
| US12456074B2 (en) | Incremental machine learning for a parametric machine learning model | |
| US11354338B2 (en) | Cognitive classification of workload behaviors in multi-tenant cloud computing environments | |
| US11665180B2 (en) | Artificially intelligent security incident and event management | |
| US11212162B2 (en) | Bayesian-based event grouping | |
| US12223419B2 (en) | Controlling performance of deployed deep learning models on resource constrained edge device via predictive models | |
| US20210117775A1 (en) | Automated selection of unannotated data for annotation based on features generated during training | |
| US11164047B2 (en) | Object detection optimization | |
| US20200150957A1 (en) | Dynamic scheduling for a scan | |
| US12210939B2 (en) | Explaining machine learning based time series models | |
| US11221938B2 (en) | Real-time collaboration dynamic logging level control | |
| US12393476B2 (en) | Early detection of information technology (IT) failures using multimodal correlation and prediction | |
| US12045317B2 (en) | Feature selection using hypergraphs | |
| US12001310B2 (en) | Approximating activity loads in databases using smoothed time series | |
| US20240103903A1 (en) | Dynamic pod priority inference utilizing service mesh telemetry data | |
| US11811520B2 (en) | Making security recommendations | |
| US20230024397A1 (en) | Classification of mouse dynamics data using uniform resource locator category mapping | |
| US11132556B2 (en) | Detecting application switches in video frames using min and max pooling | |
| US20230274169A1 (en) | Generating data slice rules for data generation | |
| US20230280982A1 (en) | Real-time computing resource deployment and integration |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW YORK Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BILLER, OFER HAIM;SOFER, ODED;SIGNING DATES FROM 20210331 TO 20210404;REEL/FRAME:055821/0249 |
|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |