US20170140278A1 - Using machine learning to predict big data environment performance - Google Patents
Using machine learning to predict big data environment performance Download PDFInfo
- Publication number
- US20170140278A1 US20170140278A1 US14/944,969 US201514944969A US2017140278A1 US 20170140278 A1 US20170140278 A1 US 20170140278A1 US 201514944969 A US201514944969 A US 201514944969A US 2017140278 A1 US2017140278 A1 US 2017140278A1
- Authority
- US
- United States
- Prior art keywords
- metadata
- machine learning
- new active
- performance
- data processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G06N99/005—
Definitions
- the present disclosure relates to computing systems, and, in particular, to methods, systems, and computer program products for predicting the performance of a data processing system in performing an analysis of a big data dataset.
- Big data is a term or catch-phrase that is often used to describe data sets of structured and/or unstructured data that are so large or complex that they are often difficult to process using traditional data processing applications. Data sets tend to grow to such large sizes because the data are increasingly being gathered by cheap and numerous information generating devices. Big data can be characterized by 3Vs: the extreme volume of data, the variety of types of data, and the velocity at which the data is processed. Although big data doesn't refer to any specific quantity or amount of data, the term is often used in referring to petabytes or exabytes of data. The big data datasets can be processed using various analytic and algorithmic tools to reveal meaningful information that may have applications in a variety of different disciplines including government, manufacturing, health care, retail, real estate, finance, and scientific research.
- a method comprises performing operations as follows on a processor: receiving a big data dataset comprising new active data; receiving a request to predict a level of performance with respect to a performance parameter of a data processing system in analyzing the new active data; selecting a machine learning algorithm from a plurality of machine learning algorithms based on the performance parameter to obtain a selected machine learning algorithm; selecting a group of historical metadata from a plurality of groups of historical metadata of datasets that have previously been analyzed using the data processing system to provide a selected group of historical metadata; applying the selected machine learning algorithm to the selected group of historical metadata to generate a model of the selected group of historical metadata; obtaining metadata of the new active data; applying the model to the metadata of the new active data to generate a prediction of the level of performance with respect to the performance parameter; and configuring the data processing system for analyzing the new active data based on the prediction.
- a system comprises a processor and a memory coupled to the processor, which comprises computer readable program code embodied in the memory that when executed by the processor causes the processor to perform operations comprising: receiving a big data dataset comprising new active data; receiving a request to predict a level of performance with respect to a performance parameter of a data processing system in analyzing the new active data; selecting a machine learning algorithm from a plurality of machine learning algorithms based on the performance parameter to obtain a selected machine learning algorithm; selecting a group of historical metadata from a plurality of groups of historical metadata of datasets that have previously been analyzed using the data processing system to provide a selected group of historical metadata; applying the selected machine learning algorithm to the selected group of historical metadata to generate a model of the selected group of historical metadata; obtaining metadata of the new active data; applying the model to the metadata of the new active data to generate a prediction of the level of performance with respect to the performance parameter; and configuring the data processing system for analyzing the new active data based on the prediction.
- a computer program product comprises a tangible computer readable storage medium comprising computer readable program code embodied in the medium that when executed by a processor causes the processor to perform operations comprising: receiving a big data dataset comprising new active data; receiving a request to predict a level of performance with respect to a performance parameter of a data processing system in analyzing the new active data; selecting a machine learning algorithm from a plurality of machine learning algorithms based on the performance parameter to obtain a selected machine learning algorithm; selecting a group of historical metadata from a plurality of groups of historical metadata of datasets that have previously been analyzed using the data processing system to provide a selected group of historical metadata; applying the selected machine learning algorithm to the selected group of historical metadata to generate a model of the selected group of historical metadata; obtaining metadata of the new active data; applying the model to the metadata of the new active data to generate a prediction of the level of performance with respect to the performance parameter; and configuring the data processing system for analyzing the new active data based on the prediction.
- FIG. 1 is a block diagram of a decision support system for configuring a data processing system for analyzing a big data dataset in accordance with some embodiments of the inventive subject matter;
- FIG. 2 illustrates a data processing system that may be used to implement the big data environment advisor system of FIG. 1 in accordance with some embodiments of the inventive subject matter;
- FIG. 3 is a block diagram that illustrates a software/hardware architecture for configuring a data processing system for analyzing a big data dataset in accordance with some embodiments of the present inventive subject matter
- FIG. 4 is a block diagram that illustrates functional relationships between the modules of FIG. 3 ;
- FIG. 5 is a flowchart that illustrates operations for configuring a data processing system for analyzing a big data dataset in accordance with some embodiments of the inventive subject matter.
- a “service” includes, but is not limited to, a software and/or hardware service, such as cloud services in which software, platforms, and infrastructure are provided remotely through, for example, the Internet.
- a service may be provided using Software as a Service (SaaS), Platform as a Service (PaaS), and/or Infrastructure as a Service (IaaS) delivery models.
- SaaS Software as a Service
- PaaS Platform as a Service
- IaaS Infrastructure as a Service
- customers In the SaaS model, customers generally access software residing in the cloud using a thin client, such as a browser, for example.
- the PaaS model the customer typically creates and deploys the software in the cloud sometimes using tools, libraries, and routines provided through the cloud service provider.
- the cloud service provider may provide the network, servers, storage, and other tools used to host the customer's application(s).
- the cloud service provider provides physical and/or virtual machines along with hypervisor(s). The customer installs operating system images along with application software on
- data processing facility includes, but is not limited to, a hardware element, firmware component, and/or software component.
- a data processing system may be configured with one or more data processing facilities.
- Some embodiments of the inventive subject matter stem from a realization that big data datasets may differ in a variety of ways, including the traditional 3V characteristics of volume, variety, and velocity as well as other characteristics, such as variability (e.g., data inconsistency), veracity (quality of the data), and complexity.
- a data processing environment used to analyze or process one big data dataset may be less suitable for analyzing or processing a different big data dataset.
- Some embodiments of the inventive subject matter may provide the operators of a big data analysis data processing system a prediction of how well the data processing may perform in analyzing a big data dataset with respect to one or more performance parameters.
- the performance parameters may include, but are not limited to, time of execution for performing an analysis, a probability of success (e.g., determining a pattern in the big data dataset), the amount of processor resources used in performing the analysis, and the amount of memory resources used in performing the analysis.
- Some embodiments of the inventive subject matter may provide a Decision Support System (DSS) for generating the prediction of how well a data processing system may perform in analyzing a given big data dataset, which can then be used to configure the data processing system for improved performance.
- the decision support system may generate the performance prediction in response to a new prediction request for a new big data dataset based on historical job data corresponding to previous big data datasets that have been analyzed and based on various machine learning algorithms that have been used in predicting the performance of analyzing previous big data datasets, which have had their accuracy evaluated based on actual results.
- FIG. 1 is a block diagram of a DSS for configuring a data processing system for analyzing a big data dataset in accordance with some embodiments of the inventive subject matter.
- a DSS big data environment advisor data processing system 105 is configured to receive a big data dataset comprising new active data along with a prediction request to predict the performance of a data processing system with respect to one or more performance parameters in analyzing the new active data.
- the big data environment advisor data processing system 105 may generate the performance prediction based on historical job metadata corresponding to previous big data datasets that have been analyzed and based on various machine learning algorithms that have been used in predicting the performance of analyzing previous big data datasets, which have had their accuracy evaluated based on actual results.
- the performance prediction generated by the DSS big data environment advisor 105 may be used as a basis for configuring a data processing system to analyze the new active data in the big data dataset.
- Configuring a data processing system may involve various operations including, but not limited to, adjusting the processing, memory, networking, and other resources associated with the data processing system.
- Configuring the data processing system may also involve scheduling which jobs are run at certain times and/or re-assigning jobs between the data processing system and other data processing systems.
- the particular analytic tools and applications that are used to process the big data dataset may be selected enhance efficiency.
- FIG. 1 illustrates a decision support system for configuring a data processing system for analyzing a big data dataset in accordance with some embodiments of the inventive subject matter it will be understood that embodiments of the present invention are not limited to such configurations, but are intended to encompass any configuration capable of carrying out the operations described herein.
- a data processing system 200 that may be used to implement the DSS big data environment advisor 105 of FIG. 1 , in accordance with some embodiments of the inventive subject matter, comprises input device(s) 202 , such as a keyboard or keypad, a display 204 , and a memory 206 that communicate with a processor 208 .
- the data processing system 200 may further include a storage system 210 , a speaker 212 , and an input/output (I/O) data port(s) 214 that also communicate with the processor 208 .
- the storage system 210 may include removable and/or fixed media, such as floppy disks, ZIP drives, hard disks, or the like, as well as virtual storage, such as a RAMDISK.
- the I/O data port(s) 214 may be used to transfer information between the data processing system 200 and another computer system or a network (e.g., the Internet). These components may be conventional components, such as those used in many conventional computing devices, and their functionality, with respect to conventional operations, is generally known to those skilled in the art.
- the memory 206 may be configured with a DSS big data environment advisor module 216 that may provide functionality that may include, but is not limited to, configuring a data processing system for analyzing a big data dataset in accordance with some embodiments of the inventive subject matter.
- FIG. 3 illustrates a processor 300 and memory 305 that may be used in embodiments of data processing systems, such as the data processing system 200 of FIG. 2 , respectively, for configuring a data processing system for analyzing a big data dataset according to some embodiments of the inventive subject matter.
- the processor 300 communicates with the memory 305 via an address/data bus 310 .
- the processor 300 may be, for example, a commercially available or custom microprocessor.
- the memory 305 is representative of the one or more memory devices containing the software and data used for configuring a data processing system for analyzing a big data dataset in accordance with some embodiments of the inventive subject matter.
- the memory 305 may include, but is not limited to, the following types of devices: cache, ROM, PROM, EPROM, EEPROM, flash, SRAM, and DRAM.
- the memory 305 may contain two or more categories of software and/or data: an operating system 315 and a DSS big data environment advisor module 320 .
- the operating system 315 may manage the data processing system's software and/or hardware resources and may coordinate execution of programs by the processor 300 .
- the DSS big data environment advisor module 320 may comprise a data classification module 325 , an algorithm mapping module 330 , a prediction engine module 335 , and a data center management interface module 340 .
- the data classification module 325 may be configured to collect metadata corresponding to the analysis jobs performed previously on other big data datasets by various data processing systems and data processing system configurations including the data processing system target for a current active data dataset.
- the algorithm mapping module 330 may be configured to select a machine learning algorithm form a plurality of machine learning algorithm that may be the most accurate in determining a prediction for the performance of a data processing system in analyzing a current active data dataset. This selection may be made based on one or more previous predictions with respect to various data processing systems and data processing system configurations.
- the prediction engine module 335 may be configured to generate a prediction of the performance of a data processing system with respect to one or more performance parameters in response to a request identifying the one or more performance parameters and new active data forming part of a big data dataset to be analyzed.
- the prediction engine module 335 may select a group of historical metadata (i.e., metadata for data that has already been analyzed by one or more data processing systems) that most closely matches the metadata of the new active data to be analyzed from the data classification module 325 and may select a machine learning algorithm that is the most efficient at generating a prediction for the particular performance parameter(s) from the algorithm mapping module 330 .
- the prediction engine module 335 may then apply the particular machine learning algorithm received from the algorithm mapping module 330 to the group of historical metadata to build a prediction model, which may be an equation, graph, or other mechanism for specifying a relationship between the data points in the group of historical metadata.
- the prediction model may then be applied to the metadata of the new active data to generate a prediction of the level of performance with respect to one or more performance parameters in analyzing the new active data on the data processing system.
- the data center management interface module 340 may be configured to communicate changes to a configuration of a data processing system based on the prediction generated by the prediction engine module 335 .
- the DSS big data environment advisor data processing system 105 may be integrated as part of a data center management system or may be a stand-alone system that communicates with a data center management system over a network or suitable communication connection.
- FIG. 3 illustrates hardware/software architectures that may be used in data processing systems, such as the data processing system 200 of FIG. 2 for configuring a data processing system for analyzing a big data dataset according to some embodiments of the inventive subject matter, it will be understood that the present invention is not limited to such a configuration but is intended to encompass any configuration capable of carrying out operations described herein.
- Computer program code for carrying out operations of data processing systems discussed above with respect to FIGS. 1-3 may be written in a high-level programming language, such as Python, Java, C, and/or C++, for development convenience.
- computer program code for carrying out operations of the present invention may also be written in other programming languages, such as, but not limited to, interpreted languages.
- Some modules or routines may be written in assembly language or even micro-code to enhance performance and/or memory usage. It will be further appreciated that the functionality of any or all of the program modules may also be implemented using discrete hardware components, one or more application specific integrated circuits (ASICs), or a programmed digital signal processor or microcontroller.
- ASICs application specific integrated circuits
- the functionality of the DSS big data environment advisor data processing system 105 , the data processing system 200 of FIG. 2 , and hardware/software architecture of FIG. 3 may each be implemented as a single processor system, a multi-processor system, a multi-core processor system, or even a network of stand-alone computer systems, in accordance with various embodiments of the inventive subject matter.
- Each of these processor/computer systems may be referred to as a “processor” or “data processing system.”
- the data processing apparatus of FIGS. 1-3 may be used to determine how to configure a product for localization to a geographic region according to various embodiments described herein.
- These apparatus may be embodied as one or more enterprise, application, personal, pervasive and/or embedded computer systems and/or apparatus that are operable to receive, transmit, process and store data using any suitable combination of software, firmware and/or hardware and that may be standalone or interconnected by any public and/or private, real and/or virtual, wired and/or wireless network including all or a portion of the global communication network known as the Internet, and may include various types of tangible, non-transitory computer readable media.
- the memory 206 coupled to the processor 208 and the memory 305 coupled to the processor 300 include computer readable program code that, when executed by the respective processors, causes the respective processors to perform operations including one or more of the operations described herein with respect to FIGS. 4-5 .
- FIG. 4 is a block diagram that illustrates functional relationships between the modules of FIG. 3 .
- the data classification module 325 provides an active data metadata procurement module 405 and a passive data metadata procurement module 410 .
- the active data metadata procurement module 405 may be configured to obtain metadata for new active data that is received for processing as it is received.
- the passive data metadata procurement module 410 may be configured to fetch the historical metadata for all datasets that have previously been analyzed using the data processing system, the data processing system as configured differently, and/or other data processing systems.
- the collected metadata is compiled at block 415 as metadata and statistical metadata.
- a clustering module 420 may be configured to perform a cluster analysis on the historical metadata of block 415 based on a plurality of attributes to generate groups of historical metadata with similar attribute sets represented as module 425 .
- the attributes may include, but are not limited to, an analysis job name, a data processing system name, a time of execution for performing an analysis, an amount of memory used in performing an analysis, type of analysis performed, and an amount of data processed during performing an analysis.
- the number of groups that are crated for each attribute set is determined by the clustering algorithm used where a new sub-group is formed when there is sufficient amount of similar data.
- the cardinality of the groups depends on correlation in the historical metadata.
- the algorithm mapping module 330 provides a library of possible machine learning algorithms that can be used in generating a model for predicting the performance of a data processing system in the analyzing a big data dataset. Different machine learning algorithms may generate better models than others depending on the particular performance parameter of interest. Thus, the algorithm mapping module 330 may maintain information on the accuracy of the resulting performance predictions when various machine learning algorithms were previously used for various performance parameters. The algorithm mapping module 330 may provide to the prediction engine 335 the machine learning algorithm that has resulted in the most accurate predictions for a particular performance parameter at block 435 . The algorithm mapping module 330 may also provide one or more default machine learning algorithms when no historical prediction accuracy data is available for a particular performance parameter.
- Various machine learning algorithms can be used in accordance with embodiments of the inventive subject matter, including, but not limited to, kernel density estimation, K-means, kernel principal components analysis, linear regression, neighbors, non-negative matrix factorization, support vector machines, dimensionality reduction, fast singular value decomposition, and decision tree.
- the remaining blocks of FIG. 4 may comprise components of the prediction engine module 335 .
- a big data dataset comprising new active data may be received at block 440 .
- embodiments of the present invention can be used to generate a prediction of the performance of the data processing system in analyzing the new active data.
- a prediction request may be received at block 445 that comprises a request to predict a level of performance of the data processing system with respect to one or more parameters.
- the performance parameters may include, but are not limited to, a time for execution for performing an analysis, a probability of determining a pattern in the new active data, resources, such as processing, memory, and network used in performing the analysis, and the like in accordance with various embodiments of the inventive subject matter.
- the prediction engine module 335 communicates with the algorithm mapping module 330 at block 450 to obtain the best machine learning algorithm for the particular performance parameter to be predicted at block 455 .
- the prediction engine module 335 obtains metadata of the new active data at block 460 and communicates with the data classification module 325 to perform a comparison to determine which group of historical metadata most closely resembles the metadata of the new active data.
- the selected group of historical metadata, which was identified based on the comparison, is output at block 465 .
- a model or prediction model is generated at block 470 based on the selected machine learning algorithm at block 455 and the selected group of historical metadata at block 465 .
- the model may be an equation, graph, or other construct/mechanism for specifying a relationship between the data points in the group of historical metadata. For example, if linear regression is chosen as the machine learning algorithm, an equation may be generated that most fits the data points in the group of historical metadata.
- the resulting model is output at block 475 .
- the prediction engine module 335 applies the model obtained at block 475 to the metadata of the new active data at block 480 to generate a prediction 485 of the level of performance with respect to the requested performance parameter.
- the makespan value may be computed by applying the model generated by the machine learning algorithm to the metadata of the new active data of the big data dataset to be analyzed.
- the prediction 485 can be used to configure the data processing system for analyzing the big data dataset comprising the new active data. For example, various thresholds may be defined for one or more parameters that when compared to the predicted performance level provide an indication that changes need to be made to the data processing system before the big data dataset is provided to the data processing system for analysis to improve the performance of the data processing system.
- an ensemble methodology may be used where multiple machine learning algorithms are applied to the selected group of historical metadata to generate a plurality of models.
- the plurality of models may then be applied to the metadata of the new active data to generate a plurality of predictions, which can then be processed using an ensemble methodology to provide a final prediction.
- the ensemble methodology may be used when the models generated by the machine learning algorithms are independent of each other.
- the ensemble methods may include, but are not limited to, Bayes optimal classifier, bagging, boosting, Bayesian parameter averaging, Bayesian model combination, bucket of models, and stacking.
- FIG. 5 is a flowchart that illustrates operations for configuring a data processing system for analyzing a big dataset in accordance with some embodiments of the inventive subject matter.
- operations begin at block 500 where the prediction engine module 335 receives a big data dataset comprising new active data along with a performance prediction request at block 505 .
- the performance prediction request is a request to predict a level of performance of the data processing system that will be assigned to analyze bit data dataset comprising the new active data based on one or more performance parameters.
- the prediction engine module 335 selects a machine learning algorithm at block 510 provided by the algorithm mapping module 330 based on the one or more performance parameters contained in the request.
- the prediction engine module 335 selects a group of historical metadata at block 5154 from a plurality of groups of historical metadata that have previously been analyzed using the data processing system and/or other data processing systems including the present data processing system configured differently.
- the selected machine learning algorithm is applied to the selected group of historical metadata at block 520 to generate a model of the selected group of historical metadata.
- the prediction engine module 335 obtains metadata of the new active data at block 525 and applies the model generated at block 520 to the metadata of the new active data to generate a prediction of the level of performance of the data processing system with respect to the one or more performance parameters at block 530 .
- the configuration of the data processing system may be configured at block 535 based on the prediction of the level of performance of the data processing system with respect to the performance parameter.
- Some embodiments of the inventive subject matter may provide a DSS that can assist users of a big data analysis center in configuring their data processing system for a particular big, data analysis task to meet, for example, requirements of service level agreements. Unexpected alerts and breakdowns may be reduced as a data processing system may be better configured to process a big data analysis job before the job starts. As big data is by definition resource intensive in terms of the amount and complexity of the data to be analyzed, even minor improvements in data processing system performance can result in large savings in terms of cost, resource usage, and time.
- a prediction of the performance of a data processing system is generated in a technology agnostic manner and uses ensemble approaches of machine learning, progressive clustering, and online learning.
- the DSS described herein is self-tuning by improving historical metadata group selection used in model generation based on newly arriving metadata corresponding to new big data analysis jobs.
- aspects of the present disclosure may be illustrated and described herein in any of a number of patentable classes or contexts including any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof. Accordingly, aspects of the present disclosure may be implemented entirely hardware, entirely software (including firmware, resident software, micro-code, etc.) or combining software and hardware implementation that may all generally be referred to herein as a “circuit,” “module,” “component,” or “system.” Furthermore, aspects of the present disclosure may take the form of a computer program product comprising one or more computer readable media having computer readable program code embodied thereon.
- the computer readable media may be a computer readable signal medium or a computer readable storage medium.
- a computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.
- a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
- a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof.
- a computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
- Program code embodied on a computer readable signal medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
- Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C++, C#, VB.NET, Python or the like, conventional procedural programming languages, such as the “C” programming language, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, dynamic programming languages such as Python, Ruby and Groovy, or other programming languages.
- the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
- the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider) or in a cloud computing environment or offered as a service such as a Software as a Service (SaaS).
- LAN local area network
- WAN wide area network
- SaaS Software as a Service
- These computer program instructions may also be stored in a computer readable medium that when executed can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions when stored in the computer readable medium produce an article of manufacture including instructions which when executed, cause a computer to implement the function/act specified in the flowchart and/or block diagram block or blocks.
- the computer program instructions may also be loaded onto a computer, other programmable instruction execution apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatuses or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
- each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s).
- the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Debugging And Monitoring (AREA)
Abstract
A method includes performing operations as follows on a processor: receiving a big data dataset comprising new active data, receiving a request to predict a level of performance with respect to a performance parameter of a data processing system in analyzing the new active data, selecting a machine learning algorithm from a plurality of machine learning algorithms based on the performance parameter to obtain a selected machine learning algorithm, selecting a group of historical metadata from a plurality of groups of historical metadata of datasets that have previously been analyzed using the data processing system to provide a selected group of historical metadata, applying the selected machine learning algorithm to the selected group of historical metadata to generate a model of the selected group of historical metadata, obtaining metadata of the new active data, applying the model to the metadata of the new active data to generate a prediction of the level of performance with respect to the performance parameter; and configuring the data processing system for analyzing the new active data based on the prediction.
Description
- The present disclosure relates to computing systems, and, in particular, to methods, systems, and computer program products for predicting the performance of a data processing system in performing an analysis of a big data dataset.
- Big data is a term or catch-phrase that is often used to describe data sets of structured and/or unstructured data that are so large or complex that they are often difficult to process using traditional data processing applications. Data sets tend to grow to such large sizes because the data are increasingly being gathered by cheap and numerous information generating devices. Big data can be characterized by 3Vs: the extreme volume of data, the variety of types of data, and the velocity at which the data is processed. Although big data doesn't refer to any specific quantity or amount of data, the term is often used in referring to petabytes or exabytes of data. The big data datasets can be processed using various analytic and algorithmic tools to reveal meaningful information that may have applications in a variety of different disciplines including government, manufacturing, health care, retail, real estate, finance, and scientific research.
- In some embodiments of the inventive subject matter, a method comprises performing operations as follows on a processor: receiving a big data dataset comprising new active data; receiving a request to predict a level of performance with respect to a performance parameter of a data processing system in analyzing the new active data; selecting a machine learning algorithm from a plurality of machine learning algorithms based on the performance parameter to obtain a selected machine learning algorithm; selecting a group of historical metadata from a plurality of groups of historical metadata of datasets that have previously been analyzed using the data processing system to provide a selected group of historical metadata; applying the selected machine learning algorithm to the selected group of historical metadata to generate a model of the selected group of historical metadata; obtaining metadata of the new active data; applying the model to the metadata of the new active data to generate a prediction of the level of performance with respect to the performance parameter; and configuring the data processing system for analyzing the new active data based on the prediction.
- In other embodiments of the inventive subject matter, a system comprises a processor and a memory coupled to the processor, which comprises computer readable program code embodied in the memory that when executed by the processor causes the processor to perform operations comprising: receiving a big data dataset comprising new active data; receiving a request to predict a level of performance with respect to a performance parameter of a data processing system in analyzing the new active data; selecting a machine learning algorithm from a plurality of machine learning algorithms based on the performance parameter to obtain a selected machine learning algorithm; selecting a group of historical metadata from a plurality of groups of historical metadata of datasets that have previously been analyzed using the data processing system to provide a selected group of historical metadata; applying the selected machine learning algorithm to the selected group of historical metadata to generate a model of the selected group of historical metadata; obtaining metadata of the new active data; applying the model to the metadata of the new active data to generate a prediction of the level of performance with respect to the performance parameter; and configuring the data processing system for analyzing the new active data based on the prediction.
- In still other embodiments of the inventive subject matter, a computer program product comprises a tangible computer readable storage medium comprising computer readable program code embodied in the medium that when executed by a processor causes the processor to perform operations comprising: receiving a big data dataset comprising new active data; receiving a request to predict a level of performance with respect to a performance parameter of a data processing system in analyzing the new active data; selecting a machine learning algorithm from a plurality of machine learning algorithms based on the performance parameter to obtain a selected machine learning algorithm; selecting a group of historical metadata from a plurality of groups of historical metadata of datasets that have previously been analyzed using the data processing system to provide a selected group of historical metadata; applying the selected machine learning algorithm to the selected group of historical metadata to generate a model of the selected group of historical metadata; obtaining metadata of the new active data; applying the model to the metadata of the new active data to generate a prediction of the level of performance with respect to the performance parameter; and configuring the data processing system for analyzing the new active data based on the prediction.
- It is noted that aspects described with respect to one embodiment may be incorporated in different embodiments although not specifically described relative thereto. That is, all embodiments and/or features of any embodiments can be combined in any way and/or combination. Moreover, other methods, systems, articles of manufacture, and/or computer program products according to embodiments of the inventive subject matter will be or become apparent to one with skill in the art upon review of the following drawings and detailed description. It is intended that all such additional systems, methods, articles of manufacture, and/or computer program products be included within this description, be within the scope of the present inventive subject matter, and be protected by the accompanying claims. It is further intended that all embodiments disclosed herein can be implemented separately or combined in any way and/or combination.
- Other features of embodiments will be more readily understood from the following detailed description of specific embodiments thereof when read in conjunction with the accompanying drawings, in which:
-
FIG. 1 is a block diagram of a decision support system for configuring a data processing system for analyzing a big data dataset in accordance with some embodiments of the inventive subject matter; -
FIG. 2 illustrates a data processing system that may be used to implement the big data environment advisor system ofFIG. 1 in accordance with some embodiments of the inventive subject matter; -
FIG. 3 is a block diagram that illustrates a software/hardware architecture for configuring a data processing system for analyzing a big data dataset in accordance with some embodiments of the present inventive subject matter; -
FIG. 4 is a block diagram that illustrates functional relationships between the modules ofFIG. 3 ; and -
FIG. 5 is a flowchart that illustrates operations for configuring a data processing system for analyzing a big data dataset in accordance with some embodiments of the inventive subject matter. - In the following detailed description, numerous specific details are set forth to provide a thorough understanding of embodiments of the present disclosure. However, it will be understood by those skilled in the art that the present invention may be practiced without these specific details. In some instances, well-known methods, procedures, components and circuits have not been described in detail so as not to obscure the present disclosure. It is intended that all embodiments disclosed herein can be implemented separately or combined in any way and/or combination. Aspects described with respect to one embodiment may be incorporated in different embodiments although not specifically described relative thereto. That is, all embodiments and/or features of any embodiments can be combined in any way and/or combination.
- As used herein, a “service” includes, but is not limited to, a software and/or hardware service, such as cloud services in which software, platforms, and infrastructure are provided remotely through, for example, the Internet. A service may be provided using Software as a Service (SaaS), Platform as a Service (PaaS), and/or Infrastructure as a Service (IaaS) delivery models. In the SaaS model, customers generally access software residing in the cloud using a thin client, such as a browser, for example. In the PaaS model, the customer typically creates and deploys the software in the cloud sometimes using tools, libraries, and routines provided through the cloud service provider. The cloud service provider may provide the network, servers, storage, and other tools used to host the customer's application(s). In the IaaS model, the cloud service provider provides physical and/or virtual machines along with hypervisor(s). The customer installs operating system images along with application software on the physical and/or virtual infrastructure provided by the cloud service provider.
- As used herein, the term “data processing facility” includes, but is not limited to, a hardware element, firmware component, and/or software component. A data processing system may be configured with one or more data processing facilities.
- Some embodiments of the inventive subject matter stem from a realization that big data datasets may differ in a variety of ways, including the traditional 3V characteristics of volume, variety, and velocity as well as other characteristics, such as variability (e.g., data inconsistency), veracity (quality of the data), and complexity. As a result, a data processing environment used to analyze or process one big data dataset may be less suitable for analyzing or processing a different big data dataset. Some embodiments of the inventive subject matter may provide the operators of a big data analysis data processing system a prediction of how well the data processing may perform in analyzing a big data dataset with respect to one or more performance parameters. The performance parameters may include, but are not limited to, time of execution for performing an analysis, a probability of success (e.g., determining a pattern in the big data dataset), the amount of processor resources used in performing the analysis, and the amount of memory resources used in performing the analysis.
- Some embodiments of the inventive subject matter may provide a Decision Support System (DSS) for generating the prediction of how well a data processing system may perform in analyzing a given big data dataset, which can then be used to configure the data processing system for improved performance. The decision support system may generate the performance prediction in response to a new prediction request for a new big data dataset based on historical job data corresponding to previous big data datasets that have been analyzed and based on various machine learning algorithms that have been used in predicting the performance of analyzing previous big data datasets, which have had their accuracy evaluated based on actual results.
- Although described herein with respect to evaluating the performance of a data processing system for analyzing big data datasets, it will be understood that embodiments of the present inventive subject matter are not limited thereto and may be applicable to evaluating the performance of data processing systems generally with respect to a variety of different tasks.
-
FIG. 1 is a block diagram of a DSS for configuring a data processing system for analyzing a big data dataset in accordance with some embodiments of the inventive subject matter. A DSS big data environment advisordata processing system 105 is configured to receive a big data dataset comprising new active data along with a prediction request to predict the performance of a data processing system with respect to one or more performance parameters in analyzing the new active data. The big data environment advisordata processing system 105 may generate the performance prediction based on historical job metadata corresponding to previous big data datasets that have been analyzed and based on various machine learning algorithms that have been used in predicting the performance of analyzing previous big data datasets, which have had their accuracy evaluated based on actual results. - The performance prediction generated by the DSS big
data environment advisor 105 may be used as a basis for configuring a data processing system to analyze the new active data in the big data dataset. Configuring a data processing system may involve various operations including, but not limited to, adjusting the processing, memory, networking, and other resources associated with the data processing system. Configuring the data processing system may also involve scheduling which jobs are run at certain times and/or re-assigning jobs between the data processing system and other data processing systems. In addition, the particular analytic tools and applications that are used to process the big data dataset may be selected enhance efficiency. - Although
FIG. 1 illustrates a decision support system for configuring a data processing system for analyzing a big data dataset in accordance with some embodiments of the inventive subject matter it will be understood that embodiments of the present invention are not limited to such configurations, but are intended to encompass any configuration capable of carrying out the operations described herein. - Referring now to
FIG. 2 , adata processing system 200 that may be used to implement the DSS bigdata environment advisor 105 ofFIG. 1 , in accordance with some embodiments of the inventive subject matter, comprises input device(s) 202, such as a keyboard or keypad, adisplay 204, and amemory 206 that communicate with aprocessor 208. Thedata processing system 200 may further include astorage system 210, aspeaker 212, and an input/output (I/O) data port(s) 214 that also communicate with theprocessor 208. Thestorage system 210 may include removable and/or fixed media, such as floppy disks, ZIP drives, hard disks, or the like, as well as virtual storage, such as a RAMDISK. The I/O data port(s) 214 may be used to transfer information between thedata processing system 200 and another computer system or a network (e.g., the Internet). These components may be conventional components, such as those used in many conventional computing devices, and their functionality, with respect to conventional operations, is generally known to those skilled in the art. Thememory 206 may be configured with a DSS big data environment advisor module 216 that may provide functionality that may include, but is not limited to, configuring a data processing system for analyzing a big data dataset in accordance with some embodiments of the inventive subject matter. -
FIG. 3 illustrates aprocessor 300 andmemory 305 that may be used in embodiments of data processing systems, such as thedata processing system 200 ofFIG. 2 , respectively, for configuring a data processing system for analyzing a big data dataset according to some embodiments of the inventive subject matter. Theprocessor 300 communicates with thememory 305 via an address/data bus 310. Theprocessor 300 may be, for example, a commercially available or custom microprocessor. Thememory 305 is representative of the one or more memory devices containing the software and data used for configuring a data processing system for analyzing a big data dataset in accordance with some embodiments of the inventive subject matter. Thememory 305 may include, but is not limited to, the following types of devices: cache, ROM, PROM, EPROM, EEPROM, flash, SRAM, and DRAM. - As shown in
FIG. 3 , thememory 305 may contain two or more categories of software and/or data: anoperating system 315 and a DSS big dataenvironment advisor module 320. In particular, theoperating system 315 may manage the data processing system's software and/or hardware resources and may coordinate execution of programs by theprocessor 300. The DSS big dataenvironment advisor module 320 may comprise adata classification module 325, an algorithm mapping module 330, aprediction engine module 335, and a data center management interface module 340. - The
data classification module 325 may be configured to collect metadata corresponding to the analysis jobs performed previously on other big data datasets by various data processing systems and data processing system configurations including the data processing system target for a current active data dataset. The algorithm mapping module 330 may be configured to select a machine learning algorithm form a plurality of machine learning algorithm that may be the most accurate in determining a prediction for the performance of a data processing system in analyzing a current active data dataset. This selection may be made based on one or more previous predictions with respect to various data processing systems and data processing system configurations. Theprediction engine module 335 may be configured to generate a prediction of the performance of a data processing system with respect to one or more performance parameters in response to a request identifying the one or more performance parameters and new active data forming part of a big data dataset to be analyzed. Theprediction engine module 335 may select a group of historical metadata (i.e., metadata for data that has already been analyzed by one or more data processing systems) that most closely matches the metadata of the new active data to be analyzed from thedata classification module 325 and may select a machine learning algorithm that is the most efficient at generating a prediction for the particular performance parameter(s) from the algorithm mapping module 330. Theprediction engine module 335 may then apply the particular machine learning algorithm received from the algorithm mapping module 330 to the group of historical metadata to build a prediction model, which may be an equation, graph, or other mechanism for specifying a relationship between the data points in the group of historical metadata. The prediction model may then be applied to the metadata of the new active data to generate a prediction of the level of performance with respect to one or more performance parameters in analyzing the new active data on the data processing system. The data center management interface module 340 may be configured to communicate changes to a configuration of a data processing system based on the prediction generated by theprediction engine module 335. The DSS big data environment advisordata processing system 105 may be integrated as part of a data center management system or may be a stand-alone system that communicates with a data center management system over a network or suitable communication connection. - Although
FIG. 3 illustrates hardware/software architectures that may be used in data processing systems, such as thedata processing system 200 ofFIG. 2 for configuring a data processing system for analyzing a big data dataset according to some embodiments of the inventive subject matter, it will be understood that the present invention is not limited to such a configuration but is intended to encompass any configuration capable of carrying out operations described herein. - Computer program code for carrying out operations of data processing systems discussed above with respect to
FIGS. 1-3 may be written in a high-level programming language, such as Python, Java, C, and/or C++, for development convenience. In addition, computer program code for carrying out operations of the present invention may also be written in other programming languages, such as, but not limited to, interpreted languages. Some modules or routines may be written in assembly language or even micro-code to enhance performance and/or memory usage. It will be further appreciated that the functionality of any or all of the program modules may also be implemented using discrete hardware components, one or more application specific integrated circuits (ASICs), or a programmed digital signal processor or microcontroller. - Moreover, the functionality of the DSS big data environment advisor
data processing system 105, thedata processing system 200 ofFIG. 2 , and hardware/software architecture ofFIG. 3 , may each be implemented as a single processor system, a multi-processor system, a multi-core processor system, or even a network of stand-alone computer systems, in accordance with various embodiments of the inventive subject matter. Each of these processor/computer systems may be referred to as a “processor” or “data processing system.” - The data processing apparatus of
FIGS. 1-3 may be used to determine how to configure a product for localization to a geographic region according to various embodiments described herein. These apparatus may be embodied as one or more enterprise, application, personal, pervasive and/or embedded computer systems and/or apparatus that are operable to receive, transmit, process and store data using any suitable combination of software, firmware and/or hardware and that may be standalone or interconnected by any public and/or private, real and/or virtual, wired and/or wireless network including all or a portion of the global communication network known as the Internet, and may include various types of tangible, non-transitory computer readable media. In particular, thememory 206 coupled to theprocessor 208 and thememory 305 coupled to theprocessor 300 include computer readable program code that, when executed by the respective processors, causes the respective processors to perform operations including one or more of the operations described herein with respect toFIGS. 4-5 . -
FIG. 4 is a block diagram that illustrates functional relationships between the modules ofFIG. 3 . Referring now toFIG. 4 , thedata classification module 325 provides an active datametadata procurement module 405 and a passive datametadata procurement module 410. The active datametadata procurement module 405 may be configured to obtain metadata for new active data that is received for processing as it is received. The passive datametadata procurement module 410 may be configured to fetch the historical metadata for all datasets that have previously been analyzed using the data processing system, the data processing system as configured differently, and/or other data processing systems. The collected metadata is compiled atblock 415 as metadata and statistical metadata. Aclustering module 420 may be configured to perform a cluster analysis on the historical metadata ofblock 415 based on a plurality of attributes to generate groups of historical metadata with similar attribute sets represented asmodule 425. In accordance with various embodiments of the inventive subject matter, the attributes may include, but are not limited to, an analysis job name, a data processing system name, a time of execution for performing an analysis, an amount of memory used in performing an analysis, type of analysis performed, and an amount of data processed during performing an analysis. The number of groups that are crated for each attribute set is determined by the clustering algorithm used where a new sub-group is formed when there is sufficient amount of similar data. The cardinality of the groups depends on correlation in the historical metadata. - The algorithm mapping module 330 provides a library of possible machine learning algorithms that can be used in generating a model for predicting the performance of a data processing system in the analyzing a big data dataset. Different machine learning algorithms may generate better models than others depending on the particular performance parameter of interest. Thus, the algorithm mapping module 330 may maintain information on the accuracy of the resulting performance predictions when various machine learning algorithms were previously used for various performance parameters. The algorithm mapping module 330 may provide to the
prediction engine 335 the machine learning algorithm that has resulted in the most accurate predictions for a particular performance parameter atblock 435. The algorithm mapping module 330 may also provide one or more default machine learning algorithms when no historical prediction accuracy data is available for a particular performance parameter. Various machine learning algorithms can be used in accordance with embodiments of the inventive subject matter, including, but not limited to, kernel density estimation, K-means, kernel principal components analysis, linear regression, neighbors, non-negative matrix factorization, support vector machines, dimensionality reduction, fast singular value decomposition, and decision tree. - The remaining blocks of
FIG. 4 may comprise components of theprediction engine module 335. A big data dataset comprising new active data may be received atblock 440. Before sending the new active data to a data processing system for processing, embodiments of the present invention can be used to generate a prediction of the performance of the data processing system in analyzing the new active data. Thus, a prediction request may be received atblock 445 that comprises a request to predict a level of performance of the data processing system with respect to one or more parameters. The performance parameters may include, but are not limited to, a time for execution for performing an analysis, a probability of determining a pattern in the new active data, resources, such as processing, memory, and network used in performing the analysis, and the like in accordance with various embodiments of the inventive subject matter. Theprediction engine module 335 communicates with the algorithm mapping module 330 atblock 450 to obtain the best machine learning algorithm for the particular performance parameter to be predicted atblock 455. Theprediction engine module 335 obtains metadata of the new active data atblock 460 and communicates with thedata classification module 325 to perform a comparison to determine which group of historical metadata most closely resembles the metadata of the new active data. The selected group of historical metadata, which was identified based on the comparison, is output atblock 465. - A model or prediction model is generated at
block 470 based on the selected machine learning algorithm atblock 455 and the selected group of historical metadata atblock 465. In accordance with various embodiments of the inventive subject matter, the model may be an equation, graph, or other construct/mechanism for specifying a relationship between the data points in the group of historical metadata. For example, if linear regression is chosen as the machine learning algorithm, an equation may be generated that most fits the data points in the group of historical metadata. The resulting model is output atblock 475. Theprediction engine module 335 applies the model obtained atblock 475 to the metadata of the new active data atblock 480 to generate aprediction 485 of the level of performance with respect to the requested performance parameter. For example, if the performance parameter is a time for execution for performing an analysis, the makespan value may be computed by applying the model generated by the machine learning algorithm to the metadata of the new active data of the big data dataset to be analyzed. Theprediction 485 can be used to configure the data processing system for analyzing the big data dataset comprising the new active data. For example, various thresholds may be defined for one or more parameters that when compared to the predicted performance level provide an indication that changes need to be made to the data processing system before the big data dataset is provided to the data processing system for analysis to improve the performance of the data processing system. - In some embodiments of the inventive subject matter, to improve the accuracy of the prediction, rather than using a single machine learning algorithm that is considered the most accurate for generating a prediction for a particular performance parameter, an ensemble methodology may be used where multiple machine learning algorithms are applied to the selected group of historical metadata to generate a plurality of models. The plurality of models may then be applied to the metadata of the new active data to generate a plurality of predictions, which can then be processed using an ensemble methodology to provide a final prediction. The ensemble methodology may be used when the models generated by the machine learning algorithms are independent of each other. In accordance with various embodiments of the inventive subject matter, the ensemble methods may include, but are not limited to, Bayes optimal classifier, bagging, boosting, Bayesian parameter averaging, Bayesian model combination, bucket of models, and stacking.
-
FIG. 5 is a flowchart that illustrates operations for configuring a data processing system for analyzing a big dataset in accordance with some embodiments of the inventive subject matter. Referring toFIG. 5 , operations begin atblock 500 where theprediction engine module 335 receives a big data dataset comprising new active data along with a performance prediction request atblock 505. The performance prediction request is a request to predict a level of performance of the data processing system that will be assigned to analyze bit data dataset comprising the new active data based on one or more performance parameters. Theprediction engine module 335 selects a machine learning algorithm atblock 510 provided by the algorithm mapping module 330 based on the one or more performance parameters contained in the request. Theprediction engine module 335 selects a group of historical metadata at block 5154 from a plurality of groups of historical metadata that have previously been analyzed using the data processing system and/or other data processing systems including the present data processing system configured differently. The selected machine learning algorithm is applied to the selected group of historical metadata atblock 520 to generate a model of the selected group of historical metadata. Theprediction engine module 335 obtains metadata of the new active data atblock 525 and applies the model generated atblock 520 to the metadata of the new active data to generate a prediction of the level of performance of the data processing system with respect to the one or more performance parameters atblock 530. The configuration of the data processing system may be configured atblock 535 based on the prediction of the level of performance of the data processing system with respect to the performance parameter. - Some embodiments of the inventive subject matter may provide a DSS that can assist users of a big data analysis center in configuring their data processing system for a particular big, data analysis task to meet, for example, requirements of service level agreements. Unexpected alerts and breakdowns may be reduced as a data processing system may be better configured to process a big data analysis job before the job starts. As big data is by definition resource intensive in terms of the amount and complexity of the data to be analyzed, even minor improvements in data processing system performance can result in large savings in terms of cost, resource usage, and time. A prediction of the performance of a data processing system, according to embodiments of the inventive subject matter, is generated in a technology agnostic manner and uses ensemble approaches of machine learning, progressive clustering, and online learning. Moreover, the DSS described herein is self-tuning by improving historical metadata group selection used in model generation based on newly arriving metadata corresponding to new big data analysis jobs.
- In the above-description of various embodiments of the present disclosure, aspects of the present disclosure may be illustrated and described herein in any of a number of patentable classes or contexts including any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof. Accordingly, aspects of the present disclosure may be implemented entirely hardware, entirely software (including firmware, resident software, micro-code, etc.) or combining software and hardware implementation that may all generally be referred to herein as a “circuit,” “module,” “component,” or “system.” Furthermore, aspects of the present disclosure may take the form of a computer program product comprising one or more computer readable media having computer readable program code embodied thereon.
- Any combination of one or more computer readable media may be used. The computer readable media may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an appropriate optical fiber with a repeater, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
- A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable signal medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
- Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C++, C#, VB.NET, Python or the like, conventional procedural programming languages, such as the “C” programming language, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, dynamic programming languages such as Python, Ruby and Groovy, or other programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider) or in a cloud computing environment or offered as a service such as a Software as a Service (SaaS).
- Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable instruction execution apparatus, create a mechanism for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
- These computer program instructions may also be stored in a computer readable medium that when executed can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions when stored in the computer readable medium produce an article of manufacture including instructions which when executed, cause a computer to implement the function/act specified in the flowchart and/or block diagram block or blocks. The computer program instructions may also be loaded onto a computer, other programmable instruction execution apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatuses or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
- The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various aspects of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
- The terminology used herein is for the purpose of describing particular aspects only and is not intended to be limiting of the disclosure. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. Like reference numbers signify like elements throughout the description of the figures.
- The corresponding structures, materials, acts, and equivalents of any means or step plus function elements in the claims below are intended to include any disclosed structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present disclosure has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the disclosure in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the disclosure. The aspects of the disclosure herein were chosen and described in order to best explain the principles of the disclosure and the practical application, and to enable others of ordinary skill in the art to understand the disclosure with various modifications as are suited to the particular use contemplated.
Claims (20)
1. A method comprising:
performing operations as follows on a processor:
receiving a big data dataset comprising new active data;
receiving a request to predict a level of performance with respect to a performance parameter of a data processing system in analyzing the new active data;
selecting a machine learning algorithm from a plurality of machine learning algorithms based on the performance parameter to obtain a selected machine learning algorithm;
selecting a group of historical metadata from a plurality of groups of historical metadata of datasets that have previously been analyzed using the data processing system to provide a selected group of historical metadata;
applying the selected machine learning algorithm to the selected group of historical metadata to generate a model of the selected group of historical metadata;
obtaining metadata of the new active data;
applying the model to the metadata of the new active data to generate a prediction of the level of performance with respect to the performance parameter; and
configuring the data processing system for analyzing the new active data based on the prediction.
2. The method of claim 1 , wherein the data processing system is one of a plurality of data processing systems, wherein the metadata of the new active data and the metadata of the historical metadata correspond to a plurality of attributes; and
wherein selecting the group of historical metadata comprises:
performing a cluster analysis of the metadata of the datasets that have been previously analyzed based on the plurality of attributes;
generating the plurality of groups of historical metadata based on the cluster analysis; and
selecting the group of historical metadata from the plurality of groups of historical metadata based on a comparison of the metadata of the new active data with the plurality of groups of historical metadata.
3. The method of claim 2 , wherein the plurality of attributes comprises an analysis job name, a data processing system name, a time of execution for performing an analysis, an amount of memory used in performing an analysis, type of analysis performed, and an amount of data processed during performing an analysis.
4. The method of claim 1 , wherein selecting the machine learning algorithm, comprises:
collecting a plurality of previous predictions of the level of performance of the data processing system for a plurality of previous requests to predict the level of performance of the data processing system with respect to a plurality of performance parameters; and
selecting the machine learning algorithm based on the performance parameter and the plurality of previous predictions.
5. The method of claim 4 , wherein the performance parameter is one of the plurality of performance parameters; and
wherein the plurality of performance parameters comprises a time of execution for performing an analysis, a probability of determining a pattern in the new active data, and memory resources used in performing an analysis.
6. The method of claim 4 , wherein applying the selected machine learning algorithm to the selected group of historical metadata to generate the model of the selected group of historical metadata comprises:
applying a plurality of machine learning algorithms to the selected group of historical metadata to generate a plurality of models, respectively.
7. The method of claim 6 , wherein applying the model to the metadata of the new active data to generate the prediction of the level of performance with respect to the performance parameter comprises:
applying the plurality of models to the metadata of the new active data using an ensemble method to generate the prediction.
8. The method of claim 7 , wherein the ensemble method comprises one of Bayes optimal classifier, bagging, boosting, Bayesian parameter averaging, Bayesian model combination, bucket of models, and stacking.
9. The method of claim 8 , wherein the plurality of machine learning algorithms comprise kernel density estimation, K-means, kernel principal components analysis, linear regression, neighbors, non-negative matrix factorization, support vector machines, dimensionality reduction, fast singular value decomposition, and decision tree.
10. A system, comprising:
a processor; and
a memory coupled to the processor and comprising computer readable program code embodied in the memory that when executed by the processor causes the processor to perform operations comprising:
receiving a big data dataset comprising new active data;
receiving a request to predict a level of performance with respect to a performance parameter of a data processing system in analyzing the new active data;
selecting a machine learning algorithm from a plurality of machine learning algorithms based on the performance parameter to obtain a selected machine learning algorithm;
selecting a group of historical metadata from a plurality of groups of historical metadata of datasets that have previously been analyzed using the data processing system to provide a selected group of historical metadata;
applying the selected machine learning algorithm to the selected group of historical metadata to generate a model of the selected group of historical metadata;
obtaining metadata of the new active data;
applying the model to the metadata of the new active data to generate a prediction of the level of performance with respect to the performance parameter; and
configuring the data processing system for analyzing the new active data based on the prediction.
11. The system of claim 10 , wherein the data processing system is one of a plurality of data processing systems, wherein the metadata of the new active data and the metadata of the historical metadata correspond to a plurality of attributes; and
wherein selecting the group of historical metadata comprises:
performing a cluster analysis of the metadata of the datasets that have been previously analyzed based on the plurality of attributes;
generating the plurality of groups of historical metadata based on the cluster analysis; and
selecting the group of historical metadata from the plurality of groups of historical metadata based on a comparison of the metadata of the new active data with the plurality of groups of historical metadata.
12. The system of claim 10 , wherein selecting the machine learning algorithm, comprises:
collecting a plurality of previous predictions of the level of performance of the data processing system for a plurality of previous requests to predict the level of performance of the data processing system with respect to a plurality of performance parameters; and
selecting the machine learning algorithm based on the performance parameter and the plurality of previous predictions.
13. The system of claim 12 , wherein applying the selected machine learning algorithm to the selected group of historical metadata to generate the model of the selected group of historical metadata comprises:
applying a plurality of machine learning algorithms to the selected group of historical metadata to generate a plurality of models, respectively.
14. The system of claim 13 , wherein applying the model to the metadata of the new active data to generate the prediction of the level of performance with respect to the performance parameter comprises:
applying the plurality of models to the metadata of the new active data using an ensemble method to generate the prediction.
15. The system of claim 14 , wherein the plurality of machine learning algorithms comprise kernel density estimation, K-means, kernel principal components analysis, linear regression, neighbors, non-negative matrix factorization, support vector machines, dimensionality reduction, fast singular value decomposition, and decision tree.
16. A computer program product, comprising:
a tangible computer readable storage medium comprising computer readable program code embodied in the medium that when executed by a processor causes the processor to perform operations comprising:
receiving a big data dataset comprising new active data;
receiving a request to predict a level of performance with respect to a performance parameter of a data processing system in analyzing the new active data;
selecting a machine learning algorithm from a plurality of machine learning algorithms based on the performance parameter to obtain a selected machine learning algorithm;
selecting a group of historical metadata from a plurality of groups of historical metadata of datasets that have previously been analyzed using the data processing system to provide a selected group of historical metadata;
applying the selected machine learning algorithm to the selected group of historical metadata to generate a model of the selected group of historical metadata;
obtaining metadata of the new active data;
applying the model to the metadata of the new active data to generate a prediction of the level of performance with respect to the performance parameter; and
configuring the data processing system for analyzing the new active data based on the prediction.
17. The system of claim 16 , wherein the data processing system is one of a plurality of data processing systems, wherein the metadata of the new active data and the metadata of the historical metadata correspond to a plurality of attributes; and
wherein selecting the group of historical metadata comprises:
performing a cluster analysis of the metadata of the datasets that have been previously analyzed based on the plurality of attributes;
generating the plurality of groups of historical metadata based on the cluster analysis; and
selecting the group of historical metadata from the plurality of groups of historical metadata based on a comparison of the metadata of the new active data with the plurality of groups of historical metadata.
18. The system of claim 16 , wherein selecting the machine learning algorithm, comprises:
collecting a plurality of previous predictions of the level of performance of the data processing system for a plurality of previous requests to predict the level of performance of the data processing system with respect to a plurality of performance parameters; and
selecting the machine learning algorithm based on the performance parameter and the plurality of previous predictions.
19. The system of claim 18 , wherein applying the selected machine learning algorithm to the selected group of historical metadata to generate the model of the selected group of historical metadata comprises:
applying a plurality of machine learning algorithms to the selected group of historical metadata to generate a plurality of models, respectively.
20. The system of claim 19 , wherein applying the model to the metadata of the new active data to generate the prediction of the level of performance with respect to the performance parameter comprises:
applying the plurality of models to the metadata of the new active data using an ensemble method to generate the prediction.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US14/944,969 US20170140278A1 (en) | 2015-11-18 | 2015-11-18 | Using machine learning to predict big data environment performance |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US14/944,969 US20170140278A1 (en) | 2015-11-18 | 2015-11-18 | Using machine learning to predict big data environment performance |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| US20170140278A1 true US20170140278A1 (en) | 2017-05-18 |
Family
ID=58690120
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US14/944,969 Abandoned US20170140278A1 (en) | 2015-11-18 | 2015-11-18 | Using machine learning to predict big data environment performance |
Country Status (1)
| Country | Link |
|---|---|
| US (1) | US20170140278A1 (en) |
Cited By (37)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20170236061A1 (en) * | 2016-02-11 | 2017-08-17 | International Business Machines Corporation | Performance comparison |
| US20190095299A1 (en) * | 2017-09-28 | 2019-03-28 | Cnex Labs, Inc. | Storage system with machine learning mechanism and method of operation thereof |
| CN110766232A (en) * | 2019-10-30 | 2020-02-07 | 支付宝(杭州)信息技术有限公司 | Dynamic prediction method and system thereof |
| CN111291071A (en) * | 2020-01-21 | 2020-06-16 | 北京字节跳动网络技术有限公司 | Data processing method and device and electronic equipment |
| CN111291027A (en) * | 2020-01-15 | 2020-06-16 | 杭州华网信息技术有限公司 | Data preprocessing method |
| US10721239B2 (en) * | 2017-03-31 | 2020-07-21 | Oracle International Corporation | Mechanisms for anomaly detection and access management |
| CN111612022A (en) * | 2019-02-25 | 2020-09-01 | 日本电气株式会社 | Method, apparatus and computer storage medium for analyzing data |
| CN111625440A (en) * | 2020-06-04 | 2020-09-04 | 中国银行股份有限公司 | Method and device for predicting performance parameters |
| WO2020177862A1 (en) * | 2019-03-06 | 2020-09-10 | Telefonaktiebolaget Lm Ericsson (Publ) | Prediction of device properties |
| CN111679952A (en) * | 2020-06-08 | 2020-09-18 | 中国银行股份有限公司 | Alarm threshold generation method and device |
| CN112134310A (en) * | 2020-09-18 | 2020-12-25 | 贵州电网有限责任公司 | Big data-based artificial intelligent power grid regulation and control operation method and system |
| US20210110305A1 (en) * | 2019-10-09 | 2021-04-15 | Mastercard International Incorporated | Device monitoring system and method |
| CN112686433A (en) * | 2020-12-21 | 2021-04-20 | 上海东普信息科技有限公司 | Express quantity prediction method, device, equipment and storage medium |
| US20210125104A1 (en) * | 2019-10-25 | 2021-04-29 | Onfido Ltd | Machine learning inference system |
| US11003493B2 (en) | 2018-07-25 | 2021-05-11 | International Business Machines Corporation | Application and storage based scheduling |
| CN113158585A (en) * | 2021-05-25 | 2021-07-23 | 国网陕西省电力公司电力科学研究院 | Method, device and equipment for predicting arc resistance of arc-proof fabric |
| US20220036486A1 (en) * | 2020-07-31 | 2022-02-03 | CBRE, Inc. | Systems and methods for deriving rating for properties |
| KR20220029004A (en) * | 2020-09-01 | 2022-03-08 | 국민대학교산학협력단 | Cloud-based deep learning task execution time prediction system and method |
| CN114819391A (en) * | 2022-05-19 | 2022-07-29 | 中山大学 | Photovoltaic power generation power prediction method based on historical data set time span optimization |
| CN114970307A (en) * | 2022-02-25 | 2022-08-30 | 海仿(上海)科技有限公司 | General reverse calculation method applied to high-end equipment material design optimization |
| US11501191B2 (en) | 2018-09-21 | 2022-11-15 | International Business Machines Corporation | Recommending machine learning models and source codes for input datasets |
| US11516255B2 (en) | 2016-09-16 | 2022-11-29 | Oracle International Corporation | Dynamic policy injection and access visualization for threat detection |
| CN115658419A (en) * | 2022-09-14 | 2023-01-31 | 平安科技(深圳)有限公司 | Model data monitoring method, device, medium and equipment |
| CN115767072A (en) * | 2022-10-19 | 2023-03-07 | 中国电信股份有限公司 | Early warning method and device, electronic equipment and storage medium |
| CN115982139A (en) * | 2022-11-23 | 2023-04-18 | 中国地质大学(北京) | Method, device, electronic equipment and storage medium for cleaning terrain data in mining area |
| CN116070938A (en) * | 2022-12-26 | 2023-05-05 | 深圳市中政汇智管理咨询有限公司 | Automatic generation method, device, equipment and storage medium of performance standard |
| US20230206287A1 (en) * | 2021-12-23 | 2023-06-29 | Microsoft Technology Licensing, Llc | Machine learning product development life cycle model |
| WO2023091784A3 (en) * | 2021-11-22 | 2023-07-06 | Jabil Inc. | Apparatus, engine, system and method for predictive analytics in a manufacturing system |
| CN116502544A (en) * | 2023-06-26 | 2023-07-28 | 武汉新威奇科技有限公司 | Electric screw press life prediction method and system based on data fusion |
| WO2023158621A1 (en) * | 2022-02-15 | 2023-08-24 | Applied Materials, Inc. | Process control knob estimation |
| WO2023158887A1 (en) * | 2022-02-18 | 2023-08-24 | Mattertraffic Inc. | Analyzing and tracking user actions over digital twin models and in the metaverse |
| CN116882597A (en) * | 2023-09-07 | 2023-10-13 | 国网信息通信产业集团有限公司 | Virtual power plant control method, device, electronic equipment and readable medium |
| CN117033876A (en) * | 2023-07-26 | 2023-11-10 | 北京半人科技有限公司 | Digital matrix processing method based on multistage coupling algorithm |
| CN117272839A (en) * | 2023-11-20 | 2023-12-22 | 北京阿迈特医疗器械有限公司 | Support press-holding performance prediction method and device based on neural network |
| CN117390079A (en) * | 2023-09-05 | 2024-01-12 | 西安易诺敬业电子科技有限责任公司 | Data processing method and system for data center |
| US11914349B2 (en) | 2016-05-16 | 2024-02-27 | Jabil Inc. | Apparatus, engine, system and method for predictive analytics in a manufacturing system |
| WO2025017744A1 (en) * | 2023-07-19 | 2025-01-23 | Jio Platforms Limited | Method and system to pre-process raw network performance data |
Citations (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20070288414A1 (en) * | 2006-06-07 | 2007-12-13 | Barajas Leandro G | System and method for selection of prediction tools |
| US7480640B1 (en) * | 2003-12-16 | 2009-01-20 | Quantum Leap Research, Inc. | Automated method and system for generating models from data |
| US8311967B1 (en) * | 2010-05-14 | 2012-11-13 | Google Inc. | Predictive analytical model matching |
| US20140173618A1 (en) * | 2012-10-14 | 2014-06-19 | Xplenty Ltd. | System and method for management of big data sets |
| US20140372346A1 (en) * | 2013-06-17 | 2014-12-18 | Purepredictive, Inc. | Data intelligence using machine learning |
| US20150310335A1 (en) * | 2014-04-29 | 2015-10-29 | International Business Machines Corporation | Determining a performance prediction model for a target data analytics application |
| US20160048415A1 (en) * | 2014-08-14 | 2016-02-18 | Joydeep Sen Sarma | Systems and Methods for Auto-Scaling a Big Data System |
| US20170017521A1 (en) * | 2015-07-13 | 2017-01-19 | Palo Alto Research Center Incorporated | Dynamically adaptive, resource aware system and method for scheduling |
| US20180181641A1 (en) * | 2015-06-23 | 2018-06-28 | Entit Software Llc | Recommending analytic tasks based on similarity of datasets |
-
2015
- 2015-11-18 US US14/944,969 patent/US20170140278A1/en not_active Abandoned
Patent Citations (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7480640B1 (en) * | 2003-12-16 | 2009-01-20 | Quantum Leap Research, Inc. | Automated method and system for generating models from data |
| US20070288414A1 (en) * | 2006-06-07 | 2007-12-13 | Barajas Leandro G | System and method for selection of prediction tools |
| US8311967B1 (en) * | 2010-05-14 | 2012-11-13 | Google Inc. | Predictive analytical model matching |
| US20140173618A1 (en) * | 2012-10-14 | 2014-06-19 | Xplenty Ltd. | System and method for management of big data sets |
| US20140372346A1 (en) * | 2013-06-17 | 2014-12-18 | Purepredictive, Inc. | Data intelligence using machine learning |
| US20150310335A1 (en) * | 2014-04-29 | 2015-10-29 | International Business Machines Corporation | Determining a performance prediction model for a target data analytics application |
| US20160048415A1 (en) * | 2014-08-14 | 2016-02-18 | Joydeep Sen Sarma | Systems and Methods for Auto-Scaling a Big Data System |
| US20180181641A1 (en) * | 2015-06-23 | 2018-06-28 | Entit Software Llc | Recommending analytic tasks based on similarity of datasets |
| US20170017521A1 (en) * | 2015-07-13 | 2017-01-19 | Palo Alto Research Center Incorporated | Dynamically adaptive, resource aware system and method for scheduling |
Non-Patent Citations (1)
| Title |
|---|
| Herodotou, Herodotos, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong, Fatma Bilgen Cetin, and Shivnath Babu. "Starfish: a self-tuning system for big data analytics." In Cidr, vol. 11, no. 2011, pp. 261-272. 2011. (Year: 2011) * |
Cited By (43)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9953263B2 (en) * | 2016-02-11 | 2018-04-24 | International Business Machines Corporation | Performance comparison for determining a travel path for a robot |
| US20170236061A1 (en) * | 2016-02-11 | 2017-08-17 | International Business Machines Corporation | Performance comparison |
| US11914349B2 (en) | 2016-05-16 | 2024-02-27 | Jabil Inc. | Apparatus, engine, system and method for predictive analytics in a manufacturing system |
| US11516255B2 (en) | 2016-09-16 | 2022-11-29 | Oracle International Corporation | Dynamic policy injection and access visualization for threat detection |
| US10721239B2 (en) * | 2017-03-31 | 2020-07-21 | Oracle International Corporation | Mechanisms for anomaly detection and access management |
| US11265329B2 (en) | 2017-03-31 | 2022-03-01 | Oracle International Corporation | Mechanisms for anomaly detection and access management |
| US20190095299A1 (en) * | 2017-09-28 | 2019-03-28 | Cnex Labs, Inc. | Storage system with machine learning mechanism and method of operation thereof |
| US11003493B2 (en) | 2018-07-25 | 2021-05-11 | International Business Machines Corporation | Application and storage based scheduling |
| US11501191B2 (en) | 2018-09-21 | 2022-11-15 | International Business Machines Corporation | Recommending machine learning models and source codes for input datasets |
| CN111612022A (en) * | 2019-02-25 | 2020-09-01 | 日本电气株式会社 | Method, apparatus and computer storage medium for analyzing data |
| WO2020177862A1 (en) * | 2019-03-06 | 2020-09-10 | Telefonaktiebolaget Lm Ericsson (Publ) | Prediction of device properties |
| US11569909B2 (en) * | 2019-03-06 | 2023-01-31 | Telefonaktiebolaget Lm Ericsson (Publ) | Prediction of device properties |
| US20210110305A1 (en) * | 2019-10-09 | 2021-04-15 | Mastercard International Incorporated | Device monitoring system and method |
| US20210125104A1 (en) * | 2019-10-25 | 2021-04-29 | Onfido Ltd | Machine learning inference system |
| CN110766232A (en) * | 2019-10-30 | 2020-02-07 | 支付宝(杭州)信息技术有限公司 | Dynamic prediction method and system thereof |
| CN111291027A (en) * | 2020-01-15 | 2020-06-16 | 杭州华网信息技术有限公司 | Data preprocessing method |
| CN111291071A (en) * | 2020-01-21 | 2020-06-16 | 北京字节跳动网络技术有限公司 | Data processing method and device and electronic equipment |
| CN111625440A (en) * | 2020-06-04 | 2020-09-04 | 中国银行股份有限公司 | Method and device for predicting performance parameters |
| CN111679952A (en) * | 2020-06-08 | 2020-09-18 | 中国银行股份有限公司 | Alarm threshold generation method and device |
| US20220036486A1 (en) * | 2020-07-31 | 2022-02-03 | CBRE, Inc. | Systems and methods for deriving rating for properties |
| WO2022050477A1 (en) * | 2020-09-01 | 2022-03-10 | 국민대학교산학협력단 | System and method for predicting execution time of cloud-based deep learning task |
| KR102504939B1 (en) | 2020-09-01 | 2023-03-02 | 국민대학교산학협력단 | Cloud-based deep learning task execution time prediction system and method |
| KR20220029004A (en) * | 2020-09-01 | 2022-03-08 | 국민대학교산학협력단 | Cloud-based deep learning task execution time prediction system and method |
| CN112134310A (en) * | 2020-09-18 | 2020-12-25 | 贵州电网有限责任公司 | Big data-based artificial intelligent power grid regulation and control operation method and system |
| CN112686433A (en) * | 2020-12-21 | 2021-04-20 | 上海东普信息科技有限公司 | Express quantity prediction method, device, equipment and storage medium |
| CN113158585A (en) * | 2021-05-25 | 2021-07-23 | 国网陕西省电力公司电力科学研究院 | Method, device and equipment for predicting arc resistance of arc-proof fabric |
| WO2023091784A3 (en) * | 2021-11-22 | 2023-07-06 | Jabil Inc. | Apparatus, engine, system and method for predictive analytics in a manufacturing system |
| US20230206287A1 (en) * | 2021-12-23 | 2023-06-29 | Microsoft Technology Licensing, Llc | Machine learning product development life cycle model |
| WO2023158621A1 (en) * | 2022-02-15 | 2023-08-24 | Applied Materials, Inc. | Process control knob estimation |
| US12191126B2 (en) | 2022-02-15 | 2025-01-07 | Applied Materials, Inc. | Process control knob estimation |
| WO2023158887A1 (en) * | 2022-02-18 | 2023-08-24 | Mattertraffic Inc. | Analyzing and tracking user actions over digital twin models and in the metaverse |
| CN114970307A (en) * | 2022-02-25 | 2022-08-30 | 海仿(上海)科技有限公司 | General reverse calculation method applied to high-end equipment material design optimization |
| CN114819391A (en) * | 2022-05-19 | 2022-07-29 | 中山大学 | Photovoltaic power generation power prediction method based on historical data set time span optimization |
| CN115658419A (en) * | 2022-09-14 | 2023-01-31 | 平安科技(深圳)有限公司 | Model data monitoring method, device, medium and equipment |
| CN115767072A (en) * | 2022-10-19 | 2023-03-07 | 中国电信股份有限公司 | Early warning method and device, electronic equipment and storage medium |
| CN115982139A (en) * | 2022-11-23 | 2023-04-18 | 中国地质大学(北京) | Method, device, electronic equipment and storage medium for cleaning terrain data in mining area |
| CN116070938A (en) * | 2022-12-26 | 2023-05-05 | 深圳市中政汇智管理咨询有限公司 | Automatic generation method, device, equipment and storage medium of performance standard |
| CN116502544A (en) * | 2023-06-26 | 2023-07-28 | 武汉新威奇科技有限公司 | Electric screw press life prediction method and system based on data fusion |
| WO2025017744A1 (en) * | 2023-07-19 | 2025-01-23 | Jio Platforms Limited | Method and system to pre-process raw network performance data |
| CN117033876A (en) * | 2023-07-26 | 2023-11-10 | 北京半人科技有限公司 | Digital matrix processing method based on multistage coupling algorithm |
| CN117390079A (en) * | 2023-09-05 | 2024-01-12 | 西安易诺敬业电子科技有限责任公司 | Data processing method and system for data center |
| CN116882597A (en) * | 2023-09-07 | 2023-10-13 | 国网信息通信产业集团有限公司 | Virtual power plant control method, device, electronic equipment and readable medium |
| CN117272839A (en) * | 2023-11-20 | 2023-12-22 | 北京阿迈特医疗器械有限公司 | Support press-holding performance prediction method and device based on neural network |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20170140278A1 (en) | Using machine learning to predict big data environment performance | |
| US11455322B2 (en) | Classification of time series data | |
| US11836578B2 (en) | Utilizing machine learning models to process resource usage data and to determine anomalous usage of resources | |
| CN114616540B (en) | Autonomous Cloud Node Scoping Framework for Big Data Machine Learning Use Cases | |
| US11386128B2 (en) | Automatic feature learning from a relational database for predictive modelling | |
| US11048718B2 (en) | Methods and systems for feature engineering | |
| JP6649405B2 (en) | Computer-implemented method, computer program, and system for estimating computational resources for performing data mining tasks on a distributed computing system | |
| US11443228B2 (en) | Job merging for machine and deep learning hyperparameter tuning | |
| US12124961B2 (en) | System for continuous update of advection-diffusion models with adversarial networks | |
| US9329837B2 (en) | Generating a proposal for selection of services from cloud service providers based on an application architecture description and priority parameters | |
| US11366809B2 (en) | Dynamic creation and configuration of partitioned index through analytics based on existing data population | |
| US20190156243A1 (en) | Efficient Large-Scale Kernel Learning Using a Distributed Processing Architecture | |
| US10373071B2 (en) | Automated intelligent data navigation and prediction tool | |
| US11302096B2 (en) | Determining model-related bias associated with training data | |
| US12223432B2 (en) | Using disentangled learning to train an interpretable deep learning model | |
| US11521749B2 (en) | Library screening for cancer probability | |
| US12333379B2 (en) | Optimized selection of data for quantum circuits | |
| US20210357781A1 (en) | Efficient techniques for determining the best data imputation algorithms | |
| US12314777B2 (en) | Efficient adaptive allocation of resources for computational systems via statistically derived linear models | |
| US20230128532A1 (en) | Distributed computing for dynamic generation of optimal and interpretable prescriptive policies with interdependent constraints | |
| JP2024531141A (en) | Providing a machine learning model based on desired metric values | |
| US12242780B2 (en) | Agent assisted model development | |
| US20250208911A1 (en) | INTERFERENCE DETECTION-BASED SCHEDULING FOR SHARING GPUs | |
| US20230136461A1 (en) | Data allocation with user interaction in a machine learning system | |
| US20230128821A1 (en) | Multi-polytope machine for classification |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: CA, INC., NEW YORK Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GUPTA, SMRATI;DOMINIAK, JACEK;MARIMADAIAH, SANJAI;SIGNING DATES FROM 20151118 TO 20151125;REEL/FRAME:037620/0525 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |