WO2025071598A1

WO2025071598A1 - Ensemble machine learning for time series data with gaps

Info

Publication number: WO2025071598A1
Application number: PCT/US2023/075273
Authority: WO
Inventors: Pengfei Li; Dan Wang; Pei YANG
Original assignee: Visa International Service Association
Current assignee: Visa International Service Association
Priority date: 2023-09-27
Filing date: 2023-09-27
Publication date: 2025-04-03
Anticipated expiration: 2026-03-27

Abstract

Ensemble machine learning can be used to make predictions based on time series data with gaps. Multiple models are trained on different (overlapping) sets or portions of the available time series data, and the predictions from the different models are aggregated to generate predictions. The models can include one model trained on all of the time series data and a second model trained using just the data points that immediately follow the gaps. Models in an ensemble can also include models that use all features of the data points and models that use only a subset of features of the data points.

Description

Ensemble Machine Learning for Time Series Data with Gaps

TECHNICAL FIELD

[0001] This disclosure relates generally to systems that leverage predictive machine learning and in particular to ensemble machine learning for making predictions from time series data with gaps.

BACKGROUND

[0002] Forecasting, or making predictions regarding future events, is important to planning. For instance, in a computer system that dynamically allocates processing resources among various tasks or clients, predicting future demand by the various tasks and clients can help to avoid processing bottlenecks. To be useful, forecasts should be reliable.

[0003] Machine learning techniques have been developed to generate forecasts using time series data. “Time series” data refers generally to data samples that are collected over time, generally at regular intervals (e.g., hourly, daily, weekly), with each data point representing a measurement (or set of measurements) associated with a given time. Examples of time series data include weather data (e.g., temperature, wind, barometric pressure, etc. at a given date/time); data related to resource use in a computer system as a function of time (e.g., number of active processes, memory or CPU utilization); data representing a volume of activity as a function of time (e.g., daily number of transactions or total dollar value of transactions at a shop, online store, or financial institution), and so on. Time series data can reflect a combination of long-term trends and cyclical fluctuations (e.g., daily, weekly, or seasonal changes), as well as random variability. By analyzing time series data, machine learning models can find hidden patterns that enable reliable predictions regarding specific future time points. Quality of predictions made by machine learning models can be assessed using criteria such as accuracy (how well the predictions match reality) and stability (how much a small change to the input data alters the predictions).

[0004] One confounding factor that machine learning models can encounter when analyzing time series data is the occurrence of gaps in the time series, where a data point that should be present is absent. Because of the temporal dimension of the analysis, machine learning algorithms that encounter gaps in time series data typically use interpolation techniques (e.g., a moving average) to fill in any gaps with assumed data values.

SUMMARY

[0005] Interpolation techniques can address occasional gaps in a time series without sacrificing accuracy or stability. However, where the gaps themselves have a recurring pattern, interpolation can lead to reduced accuracy and/or stability. By way of example, consider a time series involving daily data from an activity that occurs only on weekdays (Monday through Friday). The time series data will have a gap every weekend (Saturday and Sunday). In addition, the data on Mondays, the day immediately following the gap, may be systematically different from other days due to the absence of weekend activity.

[0006] Accordingly, certain embodiments described herein relate to techniques for machine learning predictions from time series data with recurring gaps. Ensemble machine learning can be used, in which multiple models are trained on different (overlapping) sets or portions of the available training data and the predictions from the different models are aggregated to generate predictions. For instance, a first model can be trained using all of the time series data while a second model is trained using just the data points that immediately follow the gaps (e.g., only the Monday data points in the example above). Additional (overlapping) subsets of the training data can also be defined and used to train additional models in the ensemble. For instance, each data point in the time series may include multiple features. The features can be divided into primary features and secondary features, where the primary features include the variable(s) to be predicted (and optionally other data items that strongly affect the variable, such as date-related features) and the secondary features include other available information that may be correlated with the variable. Thus, in some embodiments, the training data is divided into four overlapping training sets: a first training set that includes the one or more primary features and the one or more secondary features of all of the data points; a second training set that includes the one or more primary features but not the one or more secondary features of all of the data points; a third training set that includes the one or more primary features and the one or more secondary features of only data points associated with times immediately following one of the regularly occurring gaps; and a fourth training set that includes the one or more primary features but not the one or more secondary features of only the data points associated with times immediately following one of the regularly occurring gaps. One or more machine learning models can be trained on each training set, independently of any other models. After training the machine learning models, predictions made by the different models can be aggregated (e.g., by computing the mean or median value or by training another machine learning model to perform an aggregation such as a weighted average using learned parameters) to produce a net prediction.

[0007] Certain embodiments relate to computer-implemented methods that can include: obtaining time series data including a plurality of data points, each data point associated with a particular time according to a regular temporal pattern and having one or more primary features including a variable, one or more secondary features, wherein the time series data includes a pattern of regularly occurring gaps in which data points according to the regular temporal pattern are absent; defining a number (N) of overlapping training sets from the time series data, the N overlapping training sets including: a first training set that includes the one or more primary features and the one or more secondary features of all of the data points; a second training set that includes the one or more primary features but not the one or more secondary features of all of the data points; a third training set that includes the one or more primary features and the one or more secondary features of only data points for which the associated time immediately follows one of the regularly occurring gaps; and a fourth training set that includes the one or more primary features but not the one or more secondary features of only the data points for which the associated time immediately follows one of the regularly occurring gaps; training an ensemble of time series prediction models to predict a value of the variable corresponding to a future time stamp, wherein the time series prediction models are machine learning models and wherein the training of the ensemble of time series prediction models includes performing a number N of training processes, each training process using a different one of the N overlapping training sets, thereby producing N trained time series prediction models; using each of the N trained time series prediction models to generate a respective prediction of the value of the variable at a first future time that immediately follows a future one of the regularly occurring gaps; and aggregating the respective predictions of the N trained time series prediction models to provide a net prediction of the value of the variable at the first future time.

[0008] In these and other embodiments, various time series prediction models can be used, including one or more of a Prophet model, an XGBoost model, or an ARIMA model. [0009] In these and other embodiments, the data points can correspond to different days and the regularly occurring gaps can correspond to weekends.

[0010] In these and other embodiments, the primary features of each data point can include date-related information, and the secondary features of each data point can include features representing one or more items of secondary information that have a nonzero correlation with the variable.

[0011] In these and other embodiments, aggregating the respective predictions of the N trained time series prediction models to provide a net prediction can include computing a mean or a median of the respective predictions of the N trained time series prediction models.

[0012] In these and other embodiments, the variable can represent a load on a server system, and the method can further include increasing or decreasing availability of a computational resource at the server system based at least in part on the net prediction of the value of the variable.

[0013] Certain embodiments relate to computer systems that include a memory and a processor coupled to the memory. The memory can store time series data including a plurality of data points, each data point associated with a particular time according to a regular temporal pattern and having one or more features including a variable, wherein the time series data includes a pattern of regularly occurring gaps in which data points according to the regular temporal pattern are absent. The processor can be configured to: define a number (TV) of overlapping training sets from the time series data, the N overlapping training sets including at least a first training set that includes all of the data points in the time series data and a second training set that includes only data points in the time series data for which the associated time immediately follows one of the regularly occurring gaps; train an ensemble of time series prediction models to predict values of the variable corresponding to a future time, wherein the ensemble includes a plurality of machine learning models and wherein the training includes performing a number N of training processes, each training process using a different one of the N overlapping training sets, thereby producing N trained machine learning models; use each of the N trained machine learning models to generate a respective prediction of the value of the variable at a first future time that immediately follows a future one of the regularly occurring gaps; and aggregate the respective predictions of the N trained machine learning models to provide a net prediction of the value of the variable at the first future time. [0014] In these and other embodiments, each data point can have at least one primary feature and at least one secondary feature, and the processor can be further configured such that the N overlapping training sets further include: a third training set that includes only the primary features of all of the data points in the time series data; and a fourth training set that includes only the primary features of only the data points for which the associated time immediately follows one of the regularly occurring gaps.

[0015] In these and other embodiments, the processor can be further configured such that the ensemble of time series prediction models further includes at least two different machine learning models trained on a same one of the N overlapping training sets.

[0016] In these and other embodiments, at least one of the machine learning models can be one of a Prophet model, an XGBoost model, or an ARIMA model.

[0017] In these and other embodiments, the data points can correspond to different days and the regularly occurring gaps can correspond to weekends.

[0018] In these and other embodiments, the variable can represent a load on a server system, and the processor can be further configured to increase or decrease availability of a computational resource at the server system based at least in part on the net prediction of the value of the variable.

[0019] Certain embodiments relate to a computer-readable storage medium having stored therein program code instructions that, when executed by a processor in a computer system, cause the processor to perform a method that includes: obtaining time series data including a plurality of data points, each data point having one or more primary features including a variable, one or more secondary features, and an associated time stamp, wherein the time series data includes a pattern of regularly occurring gaps between the time stamps; defining a number (TV) of overlapping training sets from the time series data, the N overlapping training sets including: a first training set that includes the one or more primary features and the one or more secondary features of all of the data points; a second training set that includes the one or more primary features but not the one or more secondary features of all of the data points; and a third training set that includes the one or more primary features and the one or more secondary features of only data points having time stamps immediately following one of the regularly occurring gaps; a fourth training set that includes the one or more primary features but not the one or more secondary features of only the data points having time stamps immediately following one of the regularly occurring gaps; training a number (7) of time series prediction models to predict a value of the variable corresponding to a future time stamp, wherein each time series prediction model is a machine learning model and wherein the training of each time series prediction model includes performing a number N of training processes, each training process using a different one of the N overlapping training sets, thereby producing a total number N*T of trained time series prediction models; using each of the N*T trained time series prediction models to generate a respective prediction of the value of the variable at a first future time stamp that immediately follows a future one of the regularly occurring gaps; and aggregating the respective predictions of the N*T trained time series prediction models to provide a net prediction of the value of the variable at the first future time stamp. In these and other embodiments, T can be at least 2.

[0020] In these and other embodiments, the time series prediction models can include one or more of a Prophet model, an XGBoost model, or an ARIMA model.

[0021] In these and other embodiments, the data points can correspond to different days and the regularly occurring gaps can correspond to weekends.

[0022] In these and other embodiments, aggregating the respective predictions of the N*T trained time series prediction models to provide a net prediction can include computing a mean or a median of the respective predictions of the N*T trained time series prediction models.

[0023] In these and other embodiments, aggregating the respective predictions of the N*T trained time series prediction models to provide a net prediction includes using an additional machine learning model that has been trained to compute a net prediction from the respective predictions of the N*T trained time series prediction models.

[0024] In these and other embodiments, the variable can represent a load on a server system and the method can further include increasing or decreasing availability of a computational resource at the server system based at least in part on the net prediction of the value of the variable.

[0025] The following detailed description, together with the accompanying drawings, will provide a better understanding of the nature and advantages of the claimed invention. BRIEF DESCRIPTION OF THE DRAWINGS

[0026] FIG. 1 shows a simplified block diagram of a system in which some embodiments can operate.

[0027] FIG. 2 is a flow diagram of a process for using ensemble machine learning to make predictions from time series data according to some embodiments.

[0028] FIG. 3 shows an example of how training sets can be defined from a set of time series data according to some embodiments.

[0029] FIG. 4 shows a simplified schematic diagram illustrating training of an ensemble of machine learning models according to some embodiments

[0030] FIG. 5 shows a simplified schematic diagram illustrating training of an expanded ensemble of machine learning models according to some embodiments.

TERMS

[0031] The following terms may be used herein.

[0032] A “computer system” refers generally to a device or apparatus that is capable of executing program code (also referred to as “instructions”). A computer system can include a processor and a memory, as well as other components such as user interfaces that enable human interaction with the computer system and/or communication interfaces that enable computer systems to exchange information-bearing signals with other computer systems.

[0033] A “processor” may refer to any suitable data computation device or devices. A processor may comprise one or more microprocessors working together to achieve a desired function. The processor may include a CPU that comprises at least one high-speed data processor adequate to execute program components for executing user and/or system generated requests. The CPU may be a microprocessor such as AMD’s Athlon, Duron and/or Opteron; IBM and/or Motorola’s PowerPC; IBM’s and Sony’s Cell processor; Intel’s Celeron, Itanium, Pentium, Xenon, and/or Xscale; and/or the like processor(s). A processor can also include one or more co-processors that operate under control of a CPU to perform specific tasks; examples include graphics processors, neural processors, and the like.

[0034] A “server computer,” or “server,” may refer to a computer or cluster of computers. A server computer may be a powerful computing system, such as a large mainframe. Server computers can also include minicomputer clusters or a group of servers functioning as a unit. In one example, a server computer can include a database server coupled to a web server. In another example, a server computer can include a collection of processors, a communication interface that receives requests to execute jobs using the processors, and a control system that assigns jobs to specific processors. A server computer may comprise one or more computational apparatuses and may use any of a variety of computing structures, arrangements, and compilations for servicing requests from one or more client computers.

[0035] A “client computer,” or “client,” may refer to a computer or cluster of computers that receives some service from a server computer (or another computing system). The client computer may access this service via a communication network such as the internet or any other appropriate communication network. A client computer may make requests to server computers including requests for data or requests to execute a job (or program). As some examples, a client computer can send a request to a server to process a batch of data (e.g., transaction data), a request to modify content of a database maintained at the server (e.g., adding or updating database records), or a request to retrieve data from a database maintained at the server. A client computer may comprise one or more computational apparatuses and may use a variety of computing structures, arrangements, and compilations for performing its functions, including requesting and receiving data or services from server computers.

[0036] “ Time series data” may refer to a data set consisting of a sequence of data points, each of which is associated with a particular time according to a regular temporal pattern. Examples include daily data, hourly data, weekly data, or the like. Each data point includes one or more features, or variables, that are measured or computed at the associated time. In some instances, one or more data points may be “missing” from the time series, meaning that a data point is absent that, according to the regular temporal pattern, should be present. Such missing data points are also referred to as “gaps” in the time series data A data point is said to “immediately” follow a gap if it is the first data point in the time series after a gap of one or more missing data points. For instance, if time series data includes daily data for Monday through Friday and no data for Saturday or Sunday, then Saturday and Sunday constitute a gap, and the Monday data point is the data point immediately following the gap. In some cases, gaps may occur with a regular pattern.

[0037] A “machine learning model” may refer to a file, program, software executable, instruction set, etc., that has been “trained” to recognize patterns or make predictions. For example, a machine learning model can take time series data as input and predict, or forecast, values for future data points in the series. As a more specific example, a machine learning model can take weather data as an input and predict a likelihood that it will rain later in the week. A machine learning model can be trained using “training data” (e.g., to identify patterns in the training data) and then apply this training when it is used for its intended purpose. A machine learning model may be defined by “model parameters,” which can comprise numerical values that define how the machine learning model performs its function. Training a machine learning model can comprise an iterative process used to determine a set of model parameters that achieve the best performance for the model.

DETAILED DESCRIPTION

[0038] The following description of exemplary embodiments is presented for the purpose of illustration and description. It is not intended to be exhaustive or to limit the claimed embodiments to the precise form described, and persons skilled in the art will appreciate that many modifications and variations are possible. The embodiments have been chosen and described in order to best explain their principles and practical applications to thereby enable others skilled in the art to best make and use various embodiments and with various modifications as are suited to the particular use contemplated.

[0039] In machine learning analysis of time series data, interpolation techniques (e.g., using a moving average or the like) can be used to fill in gaps. In cases where gaps have no particular pattern or effect on the following data point(s), interpolation may be appropriate. However, where the gaps have a recurring (or regular) pattern, interpolation can lead to reduced accuracy. By way of example, if a particular computer system operates only on weekdays (Monday through Friday), daily time series data related to operation of the system may have a gap every weekend (Saturday and Sunday). In addition, the data associated with Mondays may be systematically different from data associated with other weekdays. For instance, on Monday the system may need to perform extra operations responsive to events that occurred during the weekend. One specific example relates to transaction volume at a financial institution, where transactions are not processed during the weekend. However, transactions may occur during the weekend, and transaction processing on Monday may include weekend transactions in addition to same-day transactions. In such cases, interpolating to fill in the gaps in the time series data may produce inaccurate predictions, particularly for a data point immediately following a gap (Mondays in the example above). [0040] Accordingly, certain embodiments described herein relate to techniques that can improve accuracy and/or stability of machine learning predictions using time series data with recurring gaps. Ensemble machine learning can be used, in which multiple models are trained on different (overlapping) sets or portions of the available training data and the predictions from the different models are aggregated to generate predictions. For instance, a first model can be trained using all of the time series data while a second model is trained using just the data points that immediately follow the gaps (e.g., only the Monday data points in the example above). Additional (overlapping) subsets of the training data can also be defined and used to train additional models in the ensemble. For instance, each data point in the time series may include multiple features. The features can be divided into primary features and secondary features, where the primary features include the variable(s) to be predicted (and optionally other data items that strongly affect the variable, such as date- related features) and secondary features include other available information that may affect the variable. Thus, in some embodiments, the training data is divided into four overlapping training sets: a first training set that includes the one or more primary features and the one or more secondary features of all of the data points; a second training set that includes the one or more primary features but not the one or more secondary features of all of the data points; a third training set that includes the one or more primary features and the one or more secondary features of only data points associated with times immediately following one of the regularly occurring gaps; and a fourth training set that includes the one or more primary features but not the one or more secondary features of only the data points associated with times immediately following one of the regularly occurring gaps.

System Overview

[0041] FIG. 1 shows a simplified block diagram of a system 100 in which some embodiments can operate. System 100 includes a server system 102 communicating with client systems 104 via a network 106, which can be, e.g., the internet, a local area network, a private network or any other network. For purposes of illustration, it is assumed that server system 102 is a scalable system (e.g., a server farm) with constituent subsystems (e.g., server racks or blades) that can be brought online or taken offline as desired. For instance, when demand from clients 104 is low, it may be desirable to power down constituent subsystems that are not needed. When demand increases, additional constituent subsystems can be powered up. [0042] A supervisor system 110 can be or include a computer system that monitors and manages operations of server system 102. For instance, supervisor system 110 can predict future demand for (or load on) sever system 102 and can power up or power down constituent subsystems in anticipation of changing needs. In some embodiments, supervisor system can include a resource manager 112 configured to determine processing needs and change operating parameters of server system 102 accordingly (e.g., powering up or down constituent subsystems or allocating constituent subsystems to particular processing tasks). Resource manager 112 can make determinations based at least in part on previous patterns of usage of server system 110. For instance, supervisor system 110 can collect usage data 120 as a function of time. Examples of usage data include the number of client requests per day, the total volume of data received and/or sent per day, the number of CPU cycles required to service client requests in a given day, and so on. While daily data is used as an example, it should be understood that other granularity of data (e.g., hourly or weekly) may also be used.

[0043] Usage data 120 can be treated as time series data, in which each data point is associated with a particular time (e.g., a particular day, such as Friday, September 1, 2023). Each data point includes a value for a “primary” variable whose future value is to be predicted and may also include other data values representing other variables, some or all of which may correlate (to at least some degree) with the primary variable. For instance, if the primary variable is the total volume of data received in a day, additional variables may include the number of client requests, the number of different clients that sent at least one request, or information about the particular day (e.g., whether it immediately follows or precedes a holiday).

[0044] Machine learning (ML) model ensemble 122 can include multiple ML models 124 that are separately trained using different (and overlapping) subsets of usage data 120 to make predictions of future values of the primary variable based on time series data (in this case a particular subset of usage data 120). Examples of ML models 124 and subsets of usage data 120 are described below. Once trained, the ML models 124 can generate predictions of future values of the primary variable. ML model ensemble 122 can aggregate (or merge) the predictions from ML models 124 into a single prediction, referred to herein as a “net” prediction. This net prediction can be provided to resource manager 112. Resource manager 112 can use the net prediction to inform decisions about operations of server system 102. For instance, if the prediction relates to a load on (or demand for resources of) server system 102, resource manager 112 can determine whether and when to increase or decrease availability of computation resources (e.g., powering up or down various constituent subsystems of server system 102) to accommodate the load.

[0045] It will be appreciated that system 100 is illustrative of one context in which making predictions from time series data is useful. As another example, supervisor system 110 can use predictions to allocate resources of server system 102 (e.g., constituent subsystems of server system 102) to different processing tasks as relative demand for different tasks fluctuates. Further, predictions from time series data have numerous other uses beyond management of computational resources, and techniques described herein can be applied in any context where predictions are being made from time series data that includes a pattern of recurring gaps.

Machine Learning Model Ensemble

[0046] According to some embodiments, ML model ensemble 122 is designed and constructed to provide accurate and stable predictions from time series data that includes a regular pattern of gaps. FIG. 2 is a flow diagram of a process 200 for using ensemble machine learning to make predictions according to some embodiments. Process 200 can be implemented in a computer system such as supervisor system 110 of FIG. 1 or any other computer system where making predictions from time series data is desired.

[0047] At block 202, time series data is obtained. For clarity of description, an example will be used in which the time series data is collected on a daily basis during the week and not at all on weekends. In this example, the time series data for each week includes a data point associated with each weekday (Monday, Tuesday, Wednesday, Thursday, Friday) but no data points associated with weekend days (Saturday, Sunday). It should be understood that the time series data may also have occasional gaps on weekdays (e.g., if a holiday falls on a Thursday); however, such gaps generally occur with lower frequency than the weekend gaps and do not occur in a regular pattern. Each data point in the time series data can include a time stamp identifying the associated time. The time stamp can have any format or granularity desired, provided that different data points in the time series data have different time stamps. (For instance, for daily data, the time stamp can be the associated date; for hourly data, a date and hour can be included; and so on.) Each data point can also include multiple features associated with the time stamp. In some embodiments, the features can be divided into “primary” features and “secondary” features. The primary features can include the primary variable to be predicted and can also include other features that strongly correlate with the primary variable. The secondary features can include other features that correlate less strongly with the primary variable. By way of a specific example, system 100 can be implemented for settlement of payment-card transactions, and the primary variable can be the volume of data (e.g., number of transactions). Other primary features can include date- related features such as the day of the week, the month, whether the day immediately precedes or follows a holiday, indicators of external events that may have disrupted normal activity patterns (e.g., a natural disaster or public health emergency), and the like. Secondary features in this example can include information about types of transactions (e.g., authorizations, clearing or settlement data, etc.).

[0048] At block 204, multiple overlapping training sets can be defined from the time series data. FIG. 3 shows an example of how four training sets can be defined from a set of time series data according to some embodiments. As shown, a time series data set 300 can include daily data for some number (M) of weeks (e.g., a calendar quarter, a year, or multiple years); weeks are represented as data blocks 302-1 through 302-M. For each week, data points are available for Monday, Tuesday, Wednesday, Thursday, and Friday (represented as rows 304- ia through 304-ze for block 302-z, where index z runs from 1 to M). In this example, time series data set 300 includes a total of M*5 data points. Each data point includes primary features and secondary features. In this example, each Monday data point is preceded by a gap (indicated as a space between data blocks 302).

[0049] A first training set (Set 1) can include all of time series data set 300. In addition, three other training sets are defined in this example by selecting different (overlapping) subsets of time series data set 300. Training set 320 (Set 2) includes just the primary features of each data point 304-la through 304-Me in time series data set 300. Training set 330 (Set 3) includes the primary and secondary features of just the data points 304-la, 304-2a, . . . 304- Ma that immediately follow the gaps (Mondays in this example). Training set 340 (Set 4) includes just the primary features for just the data points 304-la, 304-2a, . . . 304-Ma that immediately follow the gaps (Mondays in this example). It should be noted that training sets 300 and 320 each include M*5 data points, while training sets 330 and 340 each include AT data points. Further, all of the training sets are extracted from the same time series data and cover the same /W-week time period.

[0050] Referring again to FIG. 2, at block 206, each training set is used to train a separate machine learning model (e.g., the different models 124-1 through 124-A shown in FIG. 1) to predict a value for the primary variable. For instance, in the example shown in FIG. 3, four training sets are defined, and four ML models (N= 4) are trained. All of the ML models can be based on the same algorithm, and any ML algorithm suitable for generating predictions from time series data can be used. Examples of suitable algorithms include Prophet models (where Prophet refers to the open-source ML software for time series forecasting released by Facebook’s Core Data Science team, available online via GitHub); autoregressive integrated moving average (ARIMA) models; gradient boosting regressor models (e.g., using the open- source XGBoost library); long short-term memory (LSTM) models; random forest models; and time delay neural networks.

[0051] Training of a machine learning model involves automated processes to determine, or “learn,” optimal values for internal parameters of the model, such as the weights for each node or coefficients of a parametric function such as a curve-fitting function or a transform function. A standard approach to training involves iteratively processing data samples through the model and adjusting the parameters of the model, with the goal of minimizing a loss function that characterizes a difference between the output of the model for a given input and an expected result determined from a source other than the model. Loss functions can be selected based in part on the particular model, and optimization of loss functions can proceed using various techniques. In the case of predictive models trained on time series data, learning can involve “predicting the past,” with actual data from previous events being used to establish expected results. Training typically occurs across multiple “epochs,” where each epoch corresponds to a pass through the training sample set. Adjustment to parameters of the model (e.g., weights or coefficients) can occur multiple times during an epoch; for instance, the training data can be divided into “batches” or “mini-batches” and weight adjustment can occur after each batch or mini-batch. Aspects of machine learning models and training that are relevant to understanding the present disclosure are described herein; any other aspects can be modified as desired.

[0052] Training of the models at block 206 can proceed independently for each training set, and the models can be trained sequentially or in parallel as desired. For training sets that include all of the data points, the time series can be treated as a time series with gaps. For instance, training sets 300 and 320 in FIG. 3 can each be treated as a daily time series with gaps corresponding to the weekends. For training sets that include only the data points following the gaps, the time series can be treated as a gapless time series with a lower frequency of data points. For instance, training sets 330 and 340 in FIG. 3 can each be treated as a weekly time series without gaps rather than a daily time series with gaps.

[0053] At block 208, an aggregation function can be defined to merge predictions from different models in the ensemble. The aggregation function can take as inputs a prediction from each model and can produce an output, also referred to as a “net prediction,” based on the predictions of the models. For example, the aggregation function can be an average (mean) or median value computed from the predictions. In some embodiments, the aggregation function can be based on a trained model, such as a weighted average or a multilayer perceptron. Training of such a model can be based on predicting the past as described above and can occur after training of the models in the ensemble.

[0054] The aggregation function can be dependent on what is being predicted. For instance, in the example shown in FIG. 3, models trained using training sets 330 and 340 only have data pertaining to Mondays and therefore can be expected to have no predictive power for any day other than Monday. Accordingly, predictions for Mondays can be aggregated from all four trained models, while predictions for weekdays other than Mondays can be aggregated from just the models trained using data sets 300 and 320.

[0055] Referring again to FIG. 2, at block 210, the trained models and the defined aggregation function can be used to predict the value of the variable. Such predictions can be used to inform various decisions (e.g., resource management decisions in system 100 of FIG. 1).

[0056] Further illustrating the operation of ensemble machine learning, FIG. 4 shows a simplified schematic diagram illustrating training of an ensemble of ML models according to some embodiments using the training sets of FIG. 3. In this example, four ML models 424-1 through 424-4 are provided. Each ML model 424-1 through 421-4 can be based on the same algorithm, such as a Prophet model, an XGBoost model, an ARIMA model, or the like. Alternatively, different algorithms can be used for different ML models 424-1 through 421-4. In any event, each ML model 424-1 through 421-4 can be separately trained using a different training set extracted from the time series data as described above with reference to FIG. 3. ML model 424-1 is trained using first training set 300, which includes all of the time series data. ML model 424-2 is trained using second training set 320, which includes just the primary features for all of the data points in the time series data. ML model 424-3 is trained using third training set 330, which includes all of the features for just the data points immediately following the recurring gaps (Mondays in this example). ML model 424-4 is trained using fourth training set 340, which includes just the primary features for just the data points immediately following the recurring gaps (Mondays in this example). As described above, ML models 424-1 and 424-2 can treat their training data as daily time series data with regular gaps, while ML models 424-3 and 424-4 can treat their training data as weekly time series data without regular gaps.

[0057] In some embodiments, in addition to the regular gaps, the time series data may have other missing data points where no data is associated a particular date that corresponds to a weekday. Missing data points can occur, for example, due to a holiday, a reporting lapse or the like. Such missing data points do not have a regular pattern and can be filled in for purposes of training predictive ML models, e.g., using interpolation or other techniques.

[0058] Once trained, models 424-1 through 424-4 can output respective predictions 426-1 through 426-4. Because the models are trained on different (though overlapping) data sets, predictions 426-1 through 426-4 are not necessarily the same. It should also be noted that ML models 424-3 and 424-4 are trained using only the data points immediately following the recurring gaps (Mondays in this example); consequently, models 424-3 and 424-4 may provide predictions 426-3 and 426-4 only for days that immediately follow a recurring gap.

[0059] Predictions 426-1 through 426-4 can be received at merging logic 430. Merging logic 430 can implement an aggregation function (e.g., as described above with reference to block 208 of FIG. 2) to generate a final, or net, prediction 432. For instance, if the prediction is being generated for a day immediately following a gap (a Monday in this example), merging logic 430 can aggregate all four predictions 426-1 through 426-4. If the prediction is made for any other day (e.g., Tuesday through Friday), merging logic 430 can aggregate predictions 426-1 and 426-2 while ignoring any predictions 426-3, 426-4 from models 424-3 and 424-4.

[0060] It will be appreciated that the example shown in FIG. 4 is illustrative and that many variations and modifications are possible. Different training sets can be defined using different (and overlapping) subsets of the available time series data. For instance, the features in the data set could be divided into more than two categories (e.g., primary, secondary, and tertiary features), and different training sets could be defined using features in various combinations of these categories. The ML models trained on different training sets can be based on the same algorithm or different algorithms as desired, provided that each ML model can be trained to generate predictions of future values from time series data. Net predictions such as prediction 432 can be used in the same manner as any other prediction from time series data.

Expanded Machine Learning Model Ensembles

[0061] In the example of FIG. 4, one ML model is trained on each training set. In some embodiments, further improvements in accuracy and/or stability of predictions can be obtained by training multiple models on some or all of the training sets.

[0062] By way of example, FIG. 5 shows a simplified schematic diagram illustrating training of an expanded ensemble of ML models according to some embodiments, again using the training sets of FIG. 3. FIG. 5 is generally similar to FIG. 4, except that in this example, two ML models are trained using each data set. For example, ML models 524-la and 524-lb are each trained using first training set 300. ML models 524-la and 524-lb can be based on different algorithms from each other. For example, ML model 524-la can be a Prophet model while ML model 524-lb can be an XGBoost model; other combinations of models can be substituted. Similarly, ML models 524-2a and 524-2b, which can be based on different algorithms from each other, are each trained using second training set 320; ML models 524-3a and 524-3b, which can be based on different algorithms from each other, are each trained using third training set 330; and ML models 524-4a and 524-4b, which can be based on different algorithms from each other, are each trained using fourth training set 340. It should be understood that ML models trained using different training sets can use the same algorithm. For instance, ML models 524-la, 524-2a, 524-3a, and 524-4a can each be a Prophet model while ML models 524-lb, 524-2b, 524-3b, and 524-4b are each an XGBoost model. However, this is not required, and any number or combination of models can be trained using any of the training sets.

[0063] Once trained, models 524-la through 524-4b can output respective predictions 526- la through 526-4b. Because the models use different algorithms and are trained on different (though overlapping) data sets, predictions 526- la through 526-4b are not necessarily the same. It should also be noted that ML models 524-3a, 524-3b, 524-4a, and 524-4b are trained using only the data points immediately following the recurring gaps (Mondays in this example); consequently, models 524-3a, 524-3b, 524-4a, and 524-4b may provide predictions 526-3a, 526-3b, 526-4a, and 526-4b only for days that immediately follow a recurring gap. [0064] Predictions 526-la through 526-4b can be received at merging logic 530. Merging logic 530 can implement an aggregation function (e.g., as described above with reference to block 208 of FIG. 2) to generate a final, or net, prediction 532. For instance, if the prediction is desired for a day immediately following a gap (a Monday in this example), merging logic 530 can aggregate all eight predictions 526-la through 526-4b. If the prediction is made for any other day (e.g., Tuesday through Friday), merging logic 530 can aggregate predictions 526-la, 526-lb, 526-2a, and 526-2b while ignoring any predictions 526-3a, 526-3b, 526-4a, 526-4b from models 524-3a, 524-3b, 524-4a, and 524-4b.

[0065] It will be appreciated that many variations of an expanded ensemble are possible. For instance, the models trained on different training sets can implement different algorithms or different combinations of algorithms. The number of models trained on different training sets can also be different; for instance, two models can be trained on each of training sets 300 and 330 while one model is trained on each of training sets 320 and 340. Any number and combination of ML models can be trained on a given training set and included in the ensemble. Net predictions such as prediction 532 can be used in the same manner as any other prediction from time series data.

Additional Embodiments

[0066] While the invention has been described with reference to specific embodiments, those skilled in the art will appreciate that variations and modifications are possible. For instance, different machine learning models can be used. Any number and combination of machine learning models can be used in combination with each other, e.g., on different training sets as described above. The models can be retrained from time to time (e.g., on a daily, weekly, monthly, or yearly basis) as new data becomes available. In some embodiments, old data points can expire and be removed from the training sets prior to retraining.

[0067] Ensemble machine learning models of the kind described herein can be used to generate prediction of any quantifiable variable from any time series data that includes a pattern of regularly occurring gaps. By way of example, in a system that processes transactions (e.g., financial transactions such settlement of payment card transactions, stock trades, currency exchanges, or the like) only on weekdays, the number of transactions or dollar volume of transactions on any given weekday can be predicted using techniques described herein. [0068] Further, while daily time series data with gaps corresponding to weekends has been used as an example, similar techniques can apply to other time series having a different pattern of recurring gaps. For example, time series data can include hourly data points with recurring gaps corresponding to some portion of each day (e.g., midnight to 8 a.m.); daily data points with recurring gaps corresponding to the last day of each month; daily or weekly data points with recurring seasonal gaps; and so on. As long as there is a recurring pattern of gaps and the next data point following a recurring gap is expected to be affected in some manner by the existence of the gap, techniques of the kind described herein can be used to provide more accurate and/or stable predictions for a variable at a time point immediately following a gap.

[0069] All processes described herein are illustrative and can be modified. Operations can be performed in a different order from that described, to the extent that logic permits; operations described above may be omitted or combined; and operations not expressly described above may be added.

[0070] It should be understood that any of the embodiments of the present invention can be implemented in the form of control logic using hardware (e.g., an application specific integrated circuit or field programmable gate array) and/or using computer software with a generally programmable processor in a modular or integrated manner. As used herein a processor includes a single-core processor, multi-core processor on a same integrated chip, or multiple processing units on a single circuit board or networked. Based on the disclosure and teachings provided herein, a person of ordinary skill in the art will know and appreciate other ways and/or methods to implement embodiments of the present invention using hardware and a combination of hardware and software.

[0071] Any of the software components or functions described in this application may be implemented as software code to be executed by a processor using any suitable computer language such as, for example, Java, C, C++, C#, Objective-C, Swift, or scripting language such as Perl or Python using, for example, conventional or object-oriented techniques. The software code may be stored as a series of instructions or commands on a computer readable storage medium; suitable media include random access memory (RAM), a read only memory (ROM), a magnetic medium such as a hard-drive or a floppy disk, or an optical medium such as a compact disk (CD) or DVD (digital versatile disk), flash memory, and the like. The computer readable storage medium may be any combination of one or more such storage devices, and suitable media may be packaged with a compatible device. Any such computer readable storage medium may reside on or within a single computer product (e.g. a hard drive, a CD, or an entire computer system), and may be present on or within different computer products within a system or network.

[0072] Such programs may also be encoded and transmitted using carrier signals adapted for transmission via wired, optical, and/or wireless networks conforming to a variety of protocols, including the Internet. As such, a computer readable transmission medium may be created using a data signal encoded with such programs, e.g., to download via the internet. It should be understood that transmission media are transitory and distinct from computer readable storage media, which are non-transitory.

[0073] Any of the methods described herein may be totally or partially performed with a computer system including one or more processors, which can be configured to perform the steps, e.g., by providing suitable program code for execution by the processors. Thus, embodiments can involve computer systems configured to perform the steps of any of the methods described herein, potentially with different components performing a respective steps or a respective group of steps. Although presented as numbered steps or blocks, steps of methods described herein can be performed at a same time or in a different order. Additionally, portions of these steps may be used with portions of other steps from other methods. Also, all or portions of a step may be optional. Additionally, some or all steps of any of the methods can be performed with logic modules, circuits, or other means for performing these steps.

[0074] While various components are described herein with reference to particular blocks, it is to be understood that these blocks are defined for convenience of description and are not intended to imply a particular physical arrangement of component parts. The blocks need not correspond to physically distinct components, and the same physical components can be used to implement aspects of multiple blocks. Components described as dedicated or fixed- function circuits can be configured to perform operations by providing a suitable arrangement of circuit components (e.g., logic gates, registers, switches, etc.); automated design tools can be used to generate appropriate arrangements of circuit components implementing operations described herein. Components described as processors or microprocessors can be configured to perform operations described herein by providing suitable program code. Various blocks might or might not be reconfigurable depending on how the initial configuration is obtained. Embodiments of the present invention can be realized in a variety of apparatus including electronic devices implemented using a combination of circuitry and software.

[0075] A recitation of “a”, “an” or “the” is intended to mean “one or more” unless specifically indicated to the contrary. The use of “or” is intended to mean an “inclusive or,” and not an “exclusive or” unless specifically indicated to the contrary.

[0076] All patents, patent applications, publications and description mentioned herein are incorporated by reference in their entirety for all purposes. None is admitted to be prior art.

[0077] The above description is illustrative and is not restrictive. Many variations of the invention will become apparent to those skilled in the art upon review of the disclosure. The scope of patent protection should, therefore, be determined not with reference to the above description, but instead should be determined with reference to the following claims along with their full scope or equivalents.

Claims

WHAT IS CLAIMED IS:

1. A computer-implemented method comprising: obtaining time series data including a plurality of data points, each data point associated with a particular time according to a regular temporal pattern and having one or more primary features including a variable, one or more secondary features, wherein the time series data includes a pattern of regularly occurring gaps in which data points according to the regular temporal pattern are absent; defining a number (N) of overlapping training sets from the time series data, the N overlapping training sets including: a first training set that includes the one or more primary features and the one or more secondary features of all of the data points; a second training set that includes the one or more primary features but not the one or more secondary features of all of the data points; a third training set that includes the one or more primary features and the one or more secondary features of only data points for which the associated time immediately follows one of the regularly occurring gaps; and a fourth training set that includes the one or more primary features but not the one or more secondary features of only the data points for which the associated time immediately follows one of the regularly occurring gaps; training an ensemble of time series prediction models to predict a value of the variable corresponding to a future time stamp, wherein the time series prediction models are machine learning models and wherein the training of the ensemble of time series prediction models includes performing a number N of training processes, each training process using a different one of the N overlapping training sets, thereby producing N trained time series prediction models; using each of the N trained time series prediction models to generate a respective prediction of the value of the variable at a first future time that immediately follows a future one of the regularly occurring gaps; and aggregating the respective predictions of the N trained time series prediction models to provide a net prediction of the value of the variable at the first future time.

2. The method of claim 1 wherein the time series prediction model includes a Prophet model, an XGBoost model, or an ARIMA model.

3. The method of claim 1 wherein the data points correspond to different days and the regularly occurring gaps correspond to weekends.

4. The method of claim 1 wherein the primary features of each data point include date-related information.

5. The method of claim 1 wherein the one or more secondary features of each data point represent one or more items of secondary information that have a nonzero correlation with the variable.

6. The method of claim 1 wherein aggregating the respective predictions of the N trained time series prediction models to provide a net prediction includes computing a mean or a median of the respective predictions of the N trained time series prediction models.

7. The method of claim 1 wherein the variable represents a load on a server system and wherein the method further comprises: increasing or decreasing availability of a computational resource at the server system based at least in part on the net prediction of the value of the variable.

8. A computer system comprising: a memory to store time series data including a plurality of data points, each data point associated with a particular time according to a regular temporal pattern and having one or more features including a variable, wherein the time series data includes a pattern of regularly occurring gaps in which data points according to the regular temporal pattern are absent; and a processor coupled to the memory and configured to: define a number (TV) of overlapping training sets from the time series data, the N overlapping training sets including at least a first training set that includes all of the data points in the time series data and a second training set that includes only data points in the time series data for which the associated time immediately follows one of the regularly occurring gaps; train an ensemble of time series prediction models to predict values of the variable corresponding to a future time, wherein the ensemble includes a plurality of machine learning models and wherein the training includes performing a number N of training processes, each training process using a different one of the N overlapping training sets, thereby producing N trained machine learning models; use each of the N trained machine learning models to generate a respective prediction of the value of the variable at a first future time that immediately follows a future one of the regularly occurring gaps; and aggregate the respective predictions of the N trained machine learning models to provide a net prediction of the value of the variable at the first future time.

9. The computer system of claim 8 wherein each data point has a plurality of features including a primary feature and a secondary feature, and wherein the processor is further configured such that the N overlapping training sets further include: a third training set that includes only the primary features of all of the data points in the time series data; and a fourth training set that includes only the primary features of only the data points for which the associated time immediately follows one of the regularly occurring gaps.

10. The computer system of claim 8 wherein the processor is further configured such that the ensemble of time series prediction models further includes at least two different machine learning models trained on a same one of the N overlapping training sets.

11. The computer system of claim 8 wherein at least one of the machine learning models is one of a Prophet model, an XGBoost model, or an ARIMA model.

12. The computer system of claim 8 wherein the data points correspond to different days and the regularly occurring gaps correspond to weekends.

13. The computer system of claim 8 wherein the variable represents a load on a server system and wherein the processor is further configured to: increase or decrease availability of a computational resource at the server system based at least in part on the net prediction of the value of the variable.

14. A computer-readable storage medium having stored therein program code instructions that, when executed by a processor in a computer system, cause the processor to perform a method comprising: obtaining time series data including a plurality of data points, each data point having one or more primary features including a variable, one or more secondary features, and an associated time stamp, wherein the time series data includes a pattern of regularly occurring gaps between the time stamps; defining a number (N) of overlapping training sets from the time series data, the N overlapping training sets including: a first training set that includes the one or more primary features and the one or more secondary features of all of the data points; a second training set that includes the one or more primary features but not the one or more secondary features of all of the data points; and a third training set that includes the one or more primary features and the one or more secondary features of only data points having time stamps immediately following one of the regularly occurring gaps; a fourth training set that includes the one or more primary features but not the one or more secondary features of only the data points having time stamps immediately following one of the regularly occurring gaps; training a number (7) of time series prediction models to predict a value of the variable corresponding to a future time stamp, wherein each time series prediction model is a machine learning model and wherein the training of each time series prediction model includes performing a number N of training processes, each training process using a different one of the N overlapping training sets, thereby producing a total number N*T of trained time series prediction models; using each of the N*T trained time series prediction models to generate a respective prediction of the value of the variable at a first future time stamp that immediately follows a future one of the regularly occurring gaps; and aggregating the respective predictions of the N*T trained time series prediction models to provide a net prediction of the value of the variable at the first future time stamp.

15. The computer-readable storage medium of claim 14 wherein Tris at least 2.

16. The computer-readable storage medium of claim 14 wherein the time series prediction models include one or more of a Prophet model, an XGBoost model, or an ARI MA model.

17. The computer-readable storage medium of claim 14 wherein the data points correspond to different days and the regularly occurring gaps correspond to weekends.

18. The computer-readable storage medium of claim 14 wherein aggregating the respective predictions of the N*T trained time series prediction models to provide a net prediction includes computing a mean or a median of the respective predictions of the N*T trained time series prediction models.

19. The computer-readable storage medium of claim 14 wherein aggregating the respective predictions of the N*T trained time series prediction models to provide a net prediction includes using an additional machine learning model that has been trained to compute a net prediction from the respective predictions of the N*T trained time series prediction models.

20. The computer-readable storage medium of claim 14 wherein the variable represents a load on a server system and wherein the method further comprises: increasing or decreasing availability of a computational resource at the server system based at least in part on the net prediction of the value of the variable.