WO2025080271A2 - Automated feature-engineering-based machine learning model building and deployment - Google Patents
Automated feature-engineering-based machine learning model building and deployment Download PDFInfo
- Publication number
- WO2025080271A2 WO2025080271A2 PCT/US2023/076781 US2023076781W WO2025080271A2 WO 2025080271 A2 WO2025080271 A2 WO 2025080271A2 US 2023076781 W US2023076781 W US 2023076781W WO 2025080271 A2 WO2025080271 A2 WO 2025080271A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- model
- feature
- runtime
- configurations
- deployment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/60—Software deployment
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
Definitions
- a Machine Learning (“ML”) model may solve problems for which development of algorithms by programmers would be cost-prohibitive. Instead, a ML model may solve such problems by being trained to “learn” its own algorithms without being explicitly told what to do.
- Feature-engineering is a traditional and effective approach for building a ML model (e.g., based on industrial time-series data).
- the training data is substantially large, it may be more efficient to first pre-compute and store feature data, and then perform feature selection, before finally building a ML model based only a subset of the pre-computed features.
- the model is deployed (e.g., to a performance-critical edge environment) only those selected features are computed.
- the features may be computed on-demand and fed to the model directly instead of passing through an intermediate feature database (as during model building).
- the difference in feature computing and feeding between the development and deployment stages usually leads to manual coding efforts during model deployment which can be a time consuming and error prone process - especially when a substantial number of features are involved.
- Commercial machine learning platforms though providing an automated process for feature transformation (e.g., from model building to model deployment), do not address the need of feature-engineering-based ML for situations such as a performance-critical edge environment.
- Some technical advantages of some embodiments disclosed herein are improved systems and methods to provide automated feature-engineering-based ML model building and deployment in an automatic, rapid, and accurate manner.
- FIG. 5 is a version of a feature-engineering based ML model development and deployment framework that includes a feature calculation object according to some embodiments.
- FIG. 10 is a display according to some embodiments.
- a user may access the system 200 via one of the monitoring devices (e.g., a Personal Computer (“PC”), tablet, smartphone, or remotely through a remote gateway connection) to view information about and/or manage information in accordance with any of the embodiments described herein.
- the monitoring devices e.g., a Personal Computer (“PC”), tablet, smartphone, or remotely through a remote gateway connection
- an interactive graphical display interface may let a user define and/or adjust certain parameters e.g., time-series measurement properties or data about a cyber-physical system) and/or provide or receive automatically generated recommendations or results from the system 200 (as well as other devices).
- the processor 810 also communicates with a storage device 830.
- the storage device 830 may comprise any appropriate information storage device, including combinations of magnetic storage devices (e.g., a hard disk drive), optical storage devices, mobile telephones, and/or semiconductor memory devices.
- the storage device 830 stores a program 812 and/or ML model building and deployment 814 for controlling the processor 810.
- the processor 810 performs instructions of the programs 812, 814, and thereby operates in accordance with any of the embodiments described herein.
- the processor 810 may perform feature-engineering based on historical industrial time-series data.
- the processor 810 may create a ML model based on selected features and output information about the ML model.
- the processor 810 may also extract required features associated with the ML model and automatically create, via a feature configuration compiler, feature runtime configurations based on the required features associated with the ML model. The processor 810 may then generate at least one deployable object based on the generated feature runtime configurations and the ML model. In some embodiments, the processor 810 may can then perform on-demand feature calculations based on the feature runtime configurations, execute the ML model based on industrial runtime data, and generate model output.
- the programs 812, 814 may be stored in a compressed, uncompiled and/or encrypted format.
- the programs 812, 814 may furthermore include other program elements, such as an operating system, clipboard application, a database management system, and/or device drivers used by the processor 810 to interface with peripheral devices.
- information may be “received” by or “transmitted” to, for example: (i) the stateful, nonlinear embedding platform 800 from another device; or (ii) a software application or module within the stateful, nonlinear embedding platform 800 from another software application, module, or any other source.
- the storage device 830 further stores feature configurations 860 e.g., describing selected features for an industrial asset), model/model fusion configurations 870, and/or a ML model database 900.
- feature configurations 860 e.g., describing selected features for an industrial asset
- model/model fusion configurations 870 e.g., describing selected features for an industrial asset
- ML model database 900 e.g., describing selected features for an industrial asset
- An example of a database that may be used in connection with the stateful, nonlinear embedding platform 800 will now be described in detail with respect to FIG. 9. Note that the database described herein is only one example, and additional and/or different information may be stored therein. Moreover, various databases might be split or combined in accordance with any of the embodiments described herein.
- a table that represents the ML mode database 900 that may be stored at the platform 800 according to some embodiments.
- the table may include, for example, entries identifying monitoring nodes (sensor nodes and other types of nodes) associated with a cyber-physical system.
- the table may also define fields 902, 904, 906, 908, 910 for each of the entries.
- the fields 902, 904, 906, 908, 910 may, according to some embodiments, specify: a ML model identifier 902, feature configurations 904, post processing configurations 906, a deployment date and time 908, and a status 910.
- the ML model database 900 may be created and updated, for example, when a new physical system is monitored or modeled and/or on-line operation values are received from monitoring nodes.
- the ML model identifier 902 may be, for example, a unique alphanumeric code identifying a ML model that has been (or is being) built and/or deployed by the system.
- the feature configurations 904 and post processing configurations 906 may facilitate an automatic creation and deployment of the ML model.
- the deployment date and time 908 may indicate when the ML model was deployed (e.g., to an edge computing environment).
- the status 910 might indicate that the ML model has been deployed, has been replaced (e.g., with a newer model), is currently being built by the system, etc.
- embodiments may provide automated feature-engineering-based ML model building and deployment.
- the ML model building and deployment process (software) might consist of two libraries (a feature library and a model library) and four execution components (feature pre-computing, feature selection, model building, and runtime executor).
- Embodiments may use configuration files to define component internal settings, library referencing, and/or inter-component referencing.
- feature configuration meta data is managed together with pre-computed feature data and model configuration meta data is managed together with trained model objects.
- Embodiments may analyze feature and model configuration data to compile a simplified feature configuration that includes only those feature definitions selected for runtime use.
- Some embodiments create runtime model object with on-demand feature calculation using the simplified feature configuration without re-coding for runtime feature calculation.
- Embodiments may be associated with various types of ML models.
- industrial asset control systems that operate physical systems (e.g., associated with power turbines, jet engines, locomotives, autonomous vehicles, etc.) are increasingly connected to the Internet.
- these control systems may be vulnerable to threats, such as cyber-attacks (e.g., associated with a computer virus, malicious software, etc.), that could disrupt electric power generation and distribution, damage engines, inflict vehicle malfunctions, etc.
- Current methods primarily consider threat detection in Information Technology (“IT,” such as, computers that store, retrieve, transmit, manipulate data) and Operation Technology (“OT,” such as direct monitoring devices and communication bus interfaces).
- IT Information Technology
- OT Operation Technology
- FDIA Fault Detection Isolation and Accommodation
- a threat might occur even in other types of threat monitoring nodes such as actuators, control logical(s), etc.
- FDIA is limited only to naturally occurring faults in one sensor at a time.
- FDIA systems do not address multiple simultaneously occurring faults as they are normally due to malicious intent. Note that quickly detecting an attack may be important when responding to threats in an industrial asset (e.g., to reduce damage, to prevent the attack from spreading to other assets, etc.).
- FIG. 10 illustrates an interactive Graphical User Interface (“GUI”) display 1000 that provided a current status analysis 1010 for an industrial asset such as a power grid 1020.
- the analysis 1010 might be based on cyber-physical system information (e.g., including a feature vector 1030 and decision boundaries).
- GUI Graphical User Interface
- User selection of an “Edit” icon 1040 might let an operator or administration update or adjust the system.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Stored Programmes (AREA)
- Feedback Control In General (AREA)
Abstract
A system associated with Machine Learning ("ML") models may include a ML model building platform to perform feature-engineering based on historical industrial time-series data. The ML model building platform may create a ML model based on selected features and output information about the ML model. A model deployment engine may extract required features associated with the ML model and automatically create, via a feature configuration compiler, feature runtime configurations based on the required features associated with the ML model. The model deployment engine may then generate at least one deployable object based on the generated feature runtime configurations and the ML model. In some embodiments, a model runtime deployment platform can then perform on-demand feature calculations based on the feature runtime configurations, execute the ML model based on industrial runtime data, and generate model output.
Description
AUTOMATED FEATURE-ENGINEERING-BASED MACHINE LEARNING MODEL BUILDING AND DEPLOYMENT
[0001] This invention was made with government support under Contract DE- CR0000005 awarded by the US Department of Energy. The government has certain rights in the invention.
BACKGROUND
[0002] A Machine Learning (“ML”) model may solve problems for which development of algorithms by programmers would be cost-prohibitive. Instead, a ML model may solve such problems by being trained to “learn” its own algorithms without being explicitly told what to do. Feature-engineering (including feature extraction and feature selection) is a traditional and effective approach for building a ML model (e.g., based on industrial time-series data). When the training data is substantially large, it may be more efficient to first pre-compute and store feature data, and then perform feature selection, before finally building a ML model based only a subset of the pre-computed features. When the model is deployed (e.g., to a performance-critical edge environment) only those selected features are computed. For example, the features may be computed on-demand and fed to the model directly instead of passing through an intermediate feature database (as during model building). The difference in feature computing and feeding between the development and deployment stages usually leads to manual coding efforts during model deployment which can be a time consuming and error prone process - especially when a substantial number of features are involved. Commercial machine learning platforms, though providing an automated process for feature transformation (e.g., from model building to model deployment), do not address the need of feature-engineering-based ML for situations such as a performance-critical edge environment.
[0003] It would therefore be desirable to provide automated feature-engineeringbased ML model building and deployment in an automatic, rapid, and accurate manner.
SUMMARY
[0004] According to some embodiments, a system associated with ML models may include a ML model building platform to perform feature-engineering based on historical industrial time-series data. The ML model building platform may create a ML model based on selected features and output information about the ML model. A model deployment engine may extract required features associated with the ML model and automatically create, via a feature configuration compiler, feature runtime configurations based on the required features associated with the ML model. The model deployment engine may then generate at least one deployable object based on the generated feature runtime configurations and the ML model. In some embodiments, a model runtime deployment platform can then perform on- demand feature calculations based on the feature runtime configurations, execute the ML model based on industrial runtime data, and generate model output.
[0005] Some embodiments comprise: means for performing, by a ML model building platform, feature-engineering based on historical industrial time-series data; means for creating at least one ML model based on selected features; means for extracting required features associated with the at least one ML model at a model deployment engine; means for automatically creating, by the model deployment engine via a feature configuration compiler, feature runtime configurations based on the required features associated with the at least one ML model; means for generating at least one deployable object based on the generated feature runtime configurations and the at least one ML model; and means for performing, by a model runtime deployment platform, on-demand feature calculations based on the feature runtime configurations, executing the at least one ML model based on industrial runtime data, and generating model output.
[0006] Some technical advantages of some embodiments disclosed herein are improved systems and methods to provide automated feature-engineering-based ML model building and deployment in an automatic, rapid, and accurate manner.
BRIEF DESCRIPTION OF THE DRAWINGS
[0007] FIGS. 1 A, IB, 1C, and ID show existing approaches to ML model building and deployment.
[0008] FIG. 2 is a high-level block diagram of a ML model system in accordance with some embodiments.
[0009] FIGS. 3 A and 3B are ML model methods according to some embodiments.
[0010] FIG. 4 is a ML model auto-build framework in accordance with some embodiments.
[0011] FIG. 5 is a version of a feature-engineering based ML model development and deployment framework that includes a feature calculation object according to some embodiments.
[0012] FIG. 6 is a system with a deployable object associated with compilation of multiple configurations to a single computational graph in accordance with some embodiments.
[0013] FIG. 7 shows aspects of an auto-build framework according to some embodiments.
[0014] FIG. 8 is a block diagram of a platform according to some embodiments of the present invention.
[0015] FIG. 9 is a tabular portion of ML mode database in accordance with some embodiments.
[0016] FIG. 10 is a display according to some embodiments.
DETAILED DESCRIPTION
[0017] In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of embodiments. However, it will be understood by those of ordinary skill in the art that the embodiments may be practiced without these specific details. In other instances, well-known methods, procedures, components and circuits have not been described in detail so as not to obscure the embodiments.
[0018] FIGS. 1A, IB, 1C, and ID show existing approaches to ML model building and deployment. In particular, FIG. 1A is a system 101 that performs on-demand feature calculation for both model development and deployment (e.g., as performed by AMAZON™ ML). Both historical data 111 and runtime data 161 are provided for on-demand data transformation and feature calculation 151. The calculation 151 is then used for both model
building 121 (to create a model object 131) and model execution 171 (to create a model output 181). That is, the feature calculation 151 is defined as on-demand transformation procedures and applied to both historical data 111 and runtime data 161. Such an approach may be appropriate for applications where the feature calculation task on the historical data 111 is light because it allows for a fully automated process to transition from model development to deployment.
[0019] As another example, FIG. IB is a system 102 that uses a feature database as a medium for both model development and deployment (e.g., as performed by AMAZON™ SageMaker). As before, both historical data 112 and runtime data 162 are provided for on- demand data transformation and feature calculation 152. The result of the calculation 152 is then stored in a feature store 192. Information from the feature store 192 can then be used for both model building 192 (to create a model object 132) and model execution 194 (to create a model output 182). That is, the feature calculation 152 and model execution 194 execute in two asynchronous processes which each rely on the feature store 192 as a medium to connect. The feature calculation 152 procedures and feature store 192 handle both offline and runtime execution. Such an approach may be good for applications with a substantial amount historical data 112 and no stringent requirements on runtime performance (because it allows for a fully automated process to transition from model development to deployment). FIG. 1C is one example of a system 103 that utilizes this type of commercial feature store solution. An application integration component 123 to access information from a data warehouse 113. The component 123 might be associated ith, for example, the FEAST® standalone, open-source feature store that can be used to store and serve features consistently for offline training and online inference. The component 123 may provide information to a feature data frame 163 that is used by a model trainer 173. A feature vector 143 from an online store 133 can be provided to a model 153 to generate a ML prediction.
[0020] As still another example, FIG. ID is a system 100 that incorporates a featureengineering-enabled ML solution for performance-critical edge deployment. Here, historical data 110 is provided to a local, cluster, and/or cloud computing environment 120. The environment 120 includes feature pre-calculation 122, a feature store 124, feature selection 126, and model building 128 that create a ML model object 130. A manual conversion 156 is then performed using the feature pre-calculation 122 and model object 130. After the manual conversion 156, an edge computing environment 170 may be used to deploy the model object
130 using runtime data 160. The edge computing environment 170 includes on-demand feature calculation 172 and model execution 174 to generate a ML model output 180. Such an approach may conduct feature-engineering and model building in the local/cluster/cloud environment 120 using the best tools available and perform feature pre-calculation to improve computational efficiency for large historical time-series data. A relatively small subset of features may then be selected from the pre-calculated ones and used to build models as appropriate. For deployment, the system 100 only needs to calculate those selected features in real-time to serve to the model object 130. Because of differences associated with the two environments 120, 170, the on-demand feature calculation module 172 needs to be manually converted 156 prior deployment - which can be time consuming and error prone task.
[0021] To avoid these problems, some embodiments described herein provide a system and method for creating and deploying a ML model. Specifically, a system and method may provide automated feature-engineering-based ML model building and deployment in an automatic, rapid, and accurate manner. This can help avoid manual code conversion during model deployment to an edge environment. Moreover, embodiments may use descriptive feature configurations to consolidate feature pre-calculation and on-demand calculation. According to some embodiments, a software library such as the TensorFlow free and open-source software library for ML and artificial intelligence may be used to generate models directly deployable in a C/C++ computing environment.
[0022] FIG. 2 is a high-level architecture of a ML model system 200 in accordance with some embodiments. The system 200 may include historical industrial time-series data 210 (e.g., storing a plurality of time-series measurements that represent normal or abnormal operation of a cyber-physical system). Industrial runtime data 260 might be associated with substantially real-time values (e.g., during online operation of the cyber-physical system) from each of a plurality of “monitoring nodes” (e.g., “MNi,” “MN2,” . . . , “MNN” for “1, 2, ... , N” different monitoring nodes). As used herein, the phrase “monitoring node” might refer to, for example, sensor data, signals sent to actuators, motors, pumps, and auxiliary equipment, intermediary parameters that are not direct sensor signals not the signals sent to auxiliary equipment, control logical(s), etc. These may represent, for example, threat monitoring nodes that receive data from a threat monitoring system in a continuous fashion in the form of continuous signals or streams of data or combinations thereof. Moreover, the
nodes may be used to monitor occurrences of cyber-threats or other abnormal events (e.g., sensor faults). This data path may be designated specifically with encryptions or other protection mechanisms so that the information may be secured and cannot be tampered with via cyber-attacks.
[0023] The historical industrial time-series data 210 may be provided to a ML model building platform 220 that uses feature-engineering 225 to create a ML model 230. Information from the ML model building platform 220 may be provided to a model deployment engine 250 that executes an automated feature-engineering compiler 255 to create a feature runtime configuration. As used herein, the term “automated” may refer to a process that requires little or no human intervention. A model runtime deployment platform 270 may then use the feature runtime configuration and the industrial runtime data 260 to execute on-demand feature calculations 275 and generate a ML model output 280 e.g., indicating whether an industrial asset is currently under cyber-attack or experiencing a fault).
[0024] As used herein, devices, including those associated with the system 200 and any other device described herein, may exchange information via any communication network which may be one or more of a Local Area Network (“LAN”), a Metropolitan Area Network (“MAN”), a Wide Area Network (“WAN”), a proprietary network, a Public Switched Telephone Network (“PSTN”), a Wireless Application Protocol (“WAP”) network, a Bluetooth network, a wireless LAN network, and/or an Internet Protocol (“IP”) network such as the Internet, an intranet, or an extranet. Note that any devices described herein may communicate via one or more such communication networks.
[0025] The various data sources may be locally stored or reside remote from the ML model building platform 220, the model deployment engine 250, and/or the model runtime deployment platform 270 (which might also be associated with, for example, offline or online learning). Although a single ML model building platform 220, model deployment engine 250, and model runtime deployment platform 270 are shown in FIG. 2, any number of such devices may be included. Moreover, various devices described herein might be combined according to embodiments of the present invention. For example, in some embodiments, the ML model building platform 220, the model deployment engine 250, and/or the model runtime deployment platform 270 and one or more data sources might comprise a single apparatus. The ML model building platform 220, model deployment engine 250, and/or
model runtime deployment platform 270 functions may be performed by a constellation of networked apparatuses in a distributed processing or cloud-based architecture.
[0026] A user may access the system 200 via one of the monitoring devices (e.g., a Personal Computer (“PC”), tablet, smartphone, or remotely through a remote gateway connection) to view information about and/or manage information in accordance with any of the embodiments described herein. In some cases, an interactive graphical display interface may let a user define and/or adjust certain parameters e.g., time-series measurement properties or data about a cyber-physical system) and/or provide or receive automatically generated recommendations or results from the system 200 (as well as other devices).
[0027] For example, FIG. 3 A illustrates a ML model method that might be performed by some or all of the elements of the system 200 described with respect to FIG. 2. The flow charts described herein do not imply a fixed order to the steps, and embodiments of the present invention may be practiced in any order that is practicable. Note that any of the methods described herein may be performed by hardware, software, or any combination of these approaches. For example, a computer-readable storage medium may store thereon instructions that when executed by a machine result in performance according to any of the embodiments described herein.
[0028] At S310, a ML model building platform may perform feature-engineering on historical data to create at least one ML model at S320. The ML model building platform may be associated with, for example, a local computing environment, a cluster computing environment, and/or a cloud computing environment. In some embodiments, the ML model building platform is further to perform feature pre-calculations on the historical industrial time-series data to create results that are stored in a feature database. Note that the ML model may be associated with a model object and model metadata.
[0029] At S330, a model deployment engine extracts required features associated with the at least one ML model. At S340, the model deployment engine may automatically create, via a feature configuration compiler, feature runtime configurations based on the required features associated with the at least one ML model. The feature configuration compiler may, for example, create the feature runtime configurations in accordance with feature computing configurations and model metadata.
[0030] In some embodiments (as illustrated by dashed lines in FIG. 3A), the feature configuration compiler generates at least one deployable object at S350 based on the generated feature runtime configurations and the at least one ML model. The deployable object may be associated with a TensorFlow or PyTorch software library for ML (e.g., the feature configuration compiler may utilize a TensorFlow or PyTorch computation graph). The deployable object may comprise, in some embodiments, a deployable feature calculation object. At S360, a model runtime deployment platform may perform on-demand feature calculations based on the feature runtime configurations, execute the at least one ML model based on industrial runtime data, and generate model output. The model runtime deployment platform may be associated with an edge computing environment. According to some embodiments, the edge computing environment is associated with read-only access to original data source, read-write access to intermediate storage, and/or read-write access to model artifact storage.
[0031] In some embodiments, the historical industrial time-series data is associated with an industrial asset, such as a turbine, a gas turbine, a wind turbine, an engine, a jet engine, a locomotive engine, a refinery, a power grid, an autonomous vehicle, etc. Moreover, the historical industrial time-series data might include information from a sensor monitoring node, an actuator monitoring node, a control monitoring node, etc. In some embodiments, the ML model uses a feature space decision boundary to detect a local or global abnormal operating condition that represents at least one of a cyber-attack and a fault (e.g., and generate an alert signal).
[0032] FIG. 3B is a ML model method according to another embodiments. As before, at S311 at least one ML model is created based on historical industrial time-series data. At S321, a model deployment engine may automatically create feature runtime configurations based on the required features associated with the at least one ML model. A set of modules (e.g., including the feature runtime configurations) may then be compiled at S331 into a single deployable object to be used at runtime. According to some embodiments, the deployable object is associated with a single computational graph and the multiple modules further include pre-processing configurations, model configurations, post processing configurations, etc.
[0033] Thus, embodiments may provide a software solution to automate the transition from model building to deployment for ML applications that use feature-engineering on large
historical industrial time-series data and deployment on performance-critical edge environment. For example, FIG. 4 is a ML model auto-build framework 400 in accordance with some embodiments. In particular, the framework 400 generates feature runtime configurations and then interprets the configurations at runtime to perform on-demand feature calculations. The framework 400 includes historical data 410 that is provided to a local, cluster, and/or cloud computing environment 420. The environment 420 includes feature precalculation 422, feature data 424, feature selection 426, and model building 428 that create a set of at least one model object 432 and model metadata 434 (e.g., to track model inputs).. The environment 420 further includes feature pre-computing configurations 442 and a feature configuration compiler 444. The feature configuration compiler 44 may create feature runtime configurations 446 that includes only essential features. An edge computing environment 470 may be used to on-demand feature calculation using the feature runtime configurations 446 and runtime data 460. The edge computing environment 470 includes on- demand feature calculation 472 and model execution 474 to generate a ML model output 480 (a collection and presentation of results). At runtime, the edge computing environment 470 may simply execute a runtime feature execution module and a model object 432. Such an approach may simplify deployment effort and improve runtime execution performance.
[0034] Thus, embodiments, may substantially speed up a transition from model development to model deployment for those ML applications that require feature-engineering (especially for those that involve large industrial time-series data) with a substantial performance-critical deployment need. Moreover, embodiments may use the selected feature configurations to describe feature pre-calculation so that a simplified on-demand version of feature calculation can be automatically created for runtime use without manual coding. Feature pre-calculation may be optimized for batch processing to handle large amounts historical data, while on-demand feature calculation may be optimized for speed and minimal redundancy (note that both pre-calculation and on-demand might share the same core feature function code base). If feature pre-calculation is used during model development, it may be difficult to avoid producing a runtime feature calculation wrapper that is related to, but different from, feature pre-calculation. Using interpretable feature configuration files is a relatively simple and straightforward way to realize such a conversion without manual coding.
[0035] FIG. 5 is a system 500 incorporating feature-engineering based ML model development and deployment that includes a feature calculation object according to some embodiments. That is, the system generates system runtime configurations and then generates a feature runtime calculation object which is used on perform on-demand feature calculation. The system 500 includes historical data 510 that is provided to a local, cluster, and/or cloud computing environment 520. The environment 520 includes feature pre-calculation 522, feature data 524, feature selection 526, and model building 528 that create a set of ML model objects 532 and model meta data 534. The meta data 534 and feature pre-computing configurations 542 may be used by a feature configuration compiler 544 to automatically create feature runtime configurations 546 that define only essential features. The feature runtime configurations 546 are used to generate a deployable feature calculation object 548. An edge computing environment 570 may then us the object 548 and runtime data 560. The edge computing environment 570 includes an on-demand feature calculation 572 (which uses the feature calculation object 548) and model execution 574 to generate a ML model output 580. Embodiments may track model inputs (as meta data 534) together with trained model objects 532.
[0036] FIG. 6 is a system 600 with a deployable object associated with compilation of multiple configurations to a single computational graph in accordance with some embodiments. The system 600 generates feature runtime configurations and then compiles all modules (including a feature runtime module, models, etc.} into a single deployable object to be used at runtime. The system 600 includes historical data 610 that is provided to a local, cluster, and/or cloud computing environment 620. The environment 620 includes feature precalculation 622, feature data 624, feature selection 626, and model building 628 that create a set of ML models 630. The models 630 and feature pre-computing configurations 642 may be used by a feature configuration compiler 644 to automatically create feature runtime configurations 654 that define only essential features. The system 600 consolidates (compiles) independently-developed data processing steps into a single deployment package or deployable object 650 (e.g., compiled to a single computational graph). In particular, the data processing steps may include pre-processing configurations 652 (such as non-leaming- based transforms, feature calculation, domain rules, etc.}, the feature runtime configurations 654, model/model fusion configurations 646 (e.g., associated with a fusion of multiple ML models 630 or a simple collection of multiple ML models 630), post-processing configurations 658 (e.g., associated with conformance logics), etc. An edge computing
environment 670 may use the deployed object 650 and runtime data 660. The edge computing environment 670 includes a model execution 674 that uses the deployable object 650 to generate output collection and presentation 680.
[0037] FIG. 7 shows 700 aspects of an auto-build framework according to some embodiments. At S710, model development might be associated with Python and/or be deployable in Python or C/C++ environment (through TensorFlow). At S720, modularized model creation may be used to develop a solution step-by-step (e.g., to assemble a full pipeline afterwards for runtime use). Moreover, the steps may be through Java Script Object Notation (“JSON”) -based configuration files or Application Programming Interface (“API”) calls. At S730, industrial feature-engineering practice may pre-calculate features during model development (and use on-demand feature calculation during runtime) and select features by names during model creation (to auto-compile selected feature functions for runtime). At S740, data management may arrange for raw data, feature data, and prediction outputs to be managed by a standard data access interface. In addition, the data may be customized for DOE based time-series data with meta data support. Some embodiments may provide for flexible model creation sample selection and implement a file-based solution (potentially adopting a third-party data store as an underlying technology). At S750, model management may model models and evaluation results with a standard model access interface and implement a file-based solution (potentially adopting a third-party model store as an underlying technology). At S760, the framework may be extensible at various levels with defined APIs, specification, and conventions.
[0038] The embodiments described herein may be implemented using any number of different hardware configurations. For example, FIG. 8 is a block diagram of a platform 800 that may be, for example, associated with the system 200 of FIG. 2. The platform 800 comprises a processor 810, such as one or more commercially available Central Processing Units (“CPUs”) in the form of one-chip microprocessors, coupled to a communication device 820 configured to communicate via a communication network (not shown in FIG. 8). The communication device 820 may be used to communicate, for example, with one or more remote monitoring nodes, user platforms, etc. The platform 800 further includes an input device 840 (e.g., a computer mouse and/or keyboard to input model or sensor configuration data, etc. and/an output device 850 (c.g, a computer monitor to render a display, provide alerts, transmit recommendations, and/or create reports). According to some embodiments, a
mobile device, monitoring physical system, and/or PC may be used to exchange information with the platform 800.
[0039] The processor 810 also communicates with a storage device 830. The storage device 830 may comprise any appropriate information storage device, including combinations of magnetic storage devices (e.g., a hard disk drive), optical storage devices, mobile telephones, and/or semiconductor memory devices. The storage device 830 stores a program 812 and/or ML model building and deployment 814 for controlling the processor 810. The processor 810 performs instructions of the programs 812, 814, and thereby operates in accordance with any of the embodiments described herein. For example, the processor 810 may perform feature-engineering based on historical industrial time-series data. The processor 810 may create a ML model based on selected features and output information about the ML model. The processor 810 may also extract required features associated with the ML model and automatically create, via a feature configuration compiler, feature runtime configurations based on the required features associated with the ML model. The processor 810 may then generate at least one deployable object based on the generated feature runtime configurations and the ML model. In some embodiments, the processor 810 may can then perform on-demand feature calculations based on the feature runtime configurations, execute the ML model based on industrial runtime data, and generate model output.
[0040] The programs 812, 814 may be stored in a compressed, uncompiled and/or encrypted format. The programs 812, 814 may furthermore include other program elements, such as an operating system, clipboard application, a database management system, and/or device drivers used by the processor 810 to interface with peripheral devices.
[0041] As used herein, information may be “received” by or “transmitted” to, for example: (i) the stateful, nonlinear embedding platform 800 from another device; or (ii) a software application or module within the stateful, nonlinear embedding platform 800 from another software application, module, or any other source.
[0042] In some embodiments (such as the one shown in FIG. 8), the storage device 830 further stores feature configurations 860 e.g., describing selected features for an industrial asset), model/model fusion configurations 870, and/or a ML model database 900. An example of a database that may be used in connection with the stateful, nonlinear embedding platform 800 will now be described in detail with respect to FIG. 9. Note that the
database described herein is only one example, and additional and/or different information may be stored therein. Moreover, various databases might be split or combined in accordance with any of the embodiments described herein.
[0043] Referring to FIG. 9, a table is shown that represents the ML mode database 900 that may be stored at the platform 800 according to some embodiments. The table may include, for example, entries identifying monitoring nodes (sensor nodes and other types of nodes) associated with a cyber-physical system. The table may also define fields 902, 904, 906, 908, 910 for each of the entries. The fields 902, 904, 906, 908, 910 may, according to some embodiments, specify: a ML model identifier 902, feature configurations 904, post processing configurations 906, a deployment date and time 908, and a status 910. The ML model database 900 may be created and updated, for example, when a new physical system is monitored or modeled and/or on-line operation values are received from monitoring nodes.
[0044] The ML model identifier 902 may be, for example, a unique alphanumeric code identifying a ML model that has been (or is being) built and/or deployed by the system. The feature configurations 904 and post processing configurations 906 may facilitate an automatic creation and deployment of the ML model. The deployment date and time 908 may indicate when the ML model was deployed (e.g., to an edge computing environment). The status 910 might indicate that the ML model has been deployed, has been replaced (e.g., with a newer model), is currently being built by the system, etc.
[0045] Thus, embodiments may provide automated feature-engineering-based ML model building and deployment. The ML model building and deployment process (software) might consist of two libraries (a feature library and a model library) and four execution components (feature pre-computing, feature selection, model building, and runtime executor). Embodiments may use configuration files to define component internal settings, library referencing, and/or inter-component referencing. According to some embodiments, feature configuration meta data is managed together with pre-computed feature data and model configuration meta data is managed together with trained model objects. Embodiments may analyze feature and model configuration data to compile a simplified feature configuration that includes only those feature definitions selected for runtime use. Some embodiments create runtime model object with on-demand feature calculation using the simplified feature configuration without re-coding for runtime feature calculation.
[0046] The following illustrates various additional embodiments of the invention. These do not constitute a definition of all possible embodiments, and those skilled in the art will understand that the present invention is applicable to many other embodiments. Further, although the following embodiments are briefly described for clarity, those skilled in the art will understand how to make any changes, if necessary, to the above-described apparatus and methods to accommodate these and other embodiments and applications.
[0047] Although specific hardware and data configurations have been described herein, note that any number of other configurations may be provided in accordance with embodiments of the present invention (e.g., some of the information associated with the databases described herein may be combined or stored in external systems).
[0048] Embodiments may be associated with various types of ML models. For example, industrial asset control systems that operate physical systems (e.g., associated with power turbines, jet engines, locomotives, autonomous vehicles, etc.) are increasingly connected to the Internet. As a result, these control systems may be vulnerable to threats, such as cyber-attacks (e.g., associated with a computer virus, malicious software, etc.), that could disrupt electric power generation and distribution, damage engines, inflict vehicle malfunctions, etc. Current methods primarily consider threat detection in Information Technology (“IT,” such as, computers that store, retrieve, transmit, manipulate data) and Operation Technology (“OT,” such as direct monitoring devices and communication bus interfaces). Cyber-threats can still penetrate through these protection layers and reach the physical “domain” as seen in 2010 with the Stuxnet attack. Such attacks can diminish the performance of an industrial asset and may cause a total shutdown or even catastrophic damage to a plant. Currently, Fault Detection Isolation and Accommodation (“FDIA”) approaches only analyze sensor data, but a threat might occur even in other types of threat monitoring nodes such as actuators, control logical(s), etc. Also note that FDIA is limited only to naturally occurring faults in one sensor at a time. FDIA systems do not address multiple simultaneously occurring faults as they are normally due to malicious intent. Note that quickly detecting an attack may be important when responding to threats in an industrial asset (e.g., to reduce damage, to prevent the attack from spreading to other assets, etc.). Making such a detection quickly (e.g., at substantially sample speed) can be aided by ML models and cyber-physical systems often have an overwhelmingly large number of physical measurements (which makes attack detection directly based on the physical measurements
challenging). For example, FIG. 10 illustrates an interactive Graphical User Interface (“GUI”) display 1000 that provided a current status analysis 1010 for an industrial asset such as a power grid 1020. The analysis 1010 might be based on cyber-physical system information (e.g., including a feature vector 1030 and decision boundaries). User selection of an “Edit” icon 1040 might let an operator or administration update or adjust the system.
[0049] The present invention has been described in terms of several embodiments solely for the purpose of illustration. Persons skilled in the art will recognize from this description that the invention is not limited to the embodiments described but may be practiced with modifications and alterations limited only by the spirit and scope of the appended claims.
Claims
1. A system associated with Machine Learning (“ML”) models, comprising: a ML model building platform to: perform feature-engineering on historical industrial time-series data, create at least one ML model based on selected features, and output information about the at least one ML model; and a model deployment engine to: extract required features associated with the at least one ML model, automatically create, via a feature configuration compiler, feature runtime configurations based on the required features associated with the at least one ML model.
2. The system of claim 1, wherein the model deployment engine is further to: generate at least one deployable object based on the generated feature runtime configuration and the at least one ML model.
3. The system of claim 2, wherein the at least one deployable object is associated with a TensorFlow or PyTorch software library for ML.
4. The system of claim 3, wherein the at least one deployable object comprises a deployable feature calculation object.
5. The system of claim 1, wherein the ML model building platform is associated with at least one of: (i) a local computing environment, (ii) a cluster computing environment, and (iii) a cloud computing environment.
6. The system of claim 1, further comprising: a model runtime deployment platform to perform on-demand feature calculations in accordance with the feature runtime configurations, execute the at least one ML model based on industrial runtime data, and generate model output.
7. The system of claim 6, wherein the model runtime deployment platform is associated with an edge computing environment.
8. The system of claim 7, wherein the edge computing environment is associated with at least one of: (i) read-only access to original data source, (ii) read-write access to intermediate storage, and (iii) read-write access to model artifact storage.
9. The system of claim 1, wherein the ML model building platform is further to perform feature pre-calculations on the historical industrial time-series data to create results that are stored in a feature database.
10. The system of claim 1, wherein a ML model is associated with a model object and model metadata.
11. The system of claim 10, wherein the feature configuration compiler creates the feature runtime configurations in accordance with feature computing configurations and the model metadata.
12. The system of claim 1, wherein the feature-engineering is associated with at least one of feature selection and feature extraction.
13. The system of claim 1, wherein the historical industrial time-series data is associated with an industrial asset comprising at least one of: (i) a turbine, (ii) a gas turbine,
(iii) a wind turbine, (iv) an engine, (v) a jet engine, (vi) a locomotive engine, (vii) a refinery, (viii) a power grid, and (ix) an autonomous vehicle.
14. The system of claim 13, wherein the historical industrial time-series data includes information from at least one of: (i) a sensor monitoring node, (ii) an actuator monitoring node, and (iii) a control monitoring node.
15. The system of claim 14, wherein the at least one ML model uses a feature space decision boundary to detect an abnormal operating condition that represents at least one of a cyber-attack and a fault.
16. A system associated with Machine Learning (“ML”) models, comprising: a ML model framework to: create at least one ML model based on historical industrial time-series data; automatically create, by a model deployment engine, feature runtime configurations based on required features associated with the at least one ML model; and compile a set of multiple modules, including the feature runtime configurations, into a deployable object to be used at runtime.
17. The system of claim 16, wherein the set of multiple modules further includes at least one of: (i) pre-processing configurations, (ii) non-leaming-based transforms, (iii) feature calculations, (iv) domain rules, (v) ML model fusion configurations, (vi) post processing configurations, and (vii) conformance logic.
18. A computerized method associated with Machine Learning (“ML”) models, comprising: performing, by a ML model building platform, feature-engineering based on historical industrial time-series data; creating at least one ML model based on selected features; extracting required features associated with the at least one ML model at a model deployment engine; automatically creating, by the model deployment engine via a feature configuration compiler, feature runtime configurations based on the required features associated with the at least one ML model; and deploying the at least one ML model.
19. The method of claim 18, further comprising: generating at least one deployable object based on the generated feature runtime configuration and the at least one ML model.
20. A non-transitory, computer-readable medium storing instructions that, when executed by a computer processor, cause the computer processor to perform a method associated with Machine Learning (“ML”) models, the method comprising: performing, by a ML model building platform, feature-engineering based on historical industrial time-series data; creating at least one ML model based on selected features; extracting required features associated with the at least one ML model at a model deployment engine; automatically creating, by the model deployment engine via a feature configuration compiler, feature runtime configurations based on the required features associated with the at least one ML model; and deploying the at least one ML model.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/US2023/076781 WO2025080271A2 (en) | 2023-10-13 | 2023-10-13 | Automated feature-engineering-based machine learning model building and deployment |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/US2023/076781 WO2025080271A2 (en) | 2023-10-13 | 2023-10-13 | Automated feature-engineering-based machine learning model building and deployment |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2025080271A2 true WO2025080271A2 (en) | 2025-04-17 |
| WO2025080271A3 WO2025080271A3 (en) | 2025-09-04 |
Family
ID=95395197
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2023/076781 Pending WO2025080271A2 (en) | 2023-10-13 | 2023-10-13 | Automated feature-engineering-based machine learning model building and deployment |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025080271A2 (en) |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20220076102A1 (en) * | 2019-06-28 | 2022-03-10 | Samsung Electronics Co., Ltd. | Method and apparatus for managing neural network models |
| US12340293B2 (en) * | 2019-07-18 | 2025-06-24 | International Business Machines Corporation | Machine learning model repository management and search engine |
| US20210173642A1 (en) * | 2019-12-10 | 2021-06-10 | Cognizant Technology Solutions India Pvt. Ltd. | System and method for optimizing software quality assurance during software development process |
| US20220138621A1 (en) * | 2020-11-04 | 2022-05-05 | Capital One Services, Llc | System and method for facilitating a machine learning model rebuild |
| WO2023091624A1 (en) * | 2021-11-21 | 2023-05-25 | Schlumberger Technology Corporation | Machine learning model deployment, management and monitoring at scale |
-
2023
- 2023-10-13 WO PCT/US2023/076781 patent/WO2025080271A2/en active Pending
Also Published As
| Publication number | Publication date |
|---|---|
| WO2025080271A3 (en) | 2025-09-04 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP3515041B1 (en) | System and method for abstracting characteristics of cyber-physical systems | |
| US20250147866A1 (en) | Resilient estimation for grid situational awareness | |
| US11252169B2 (en) | Intelligent data augmentation for supervised anomaly detection associated with a cyber-physical system | |
| US11886155B2 (en) | Distributed industrial performance monitoring and analytics | |
| US10417415B2 (en) | Automated attack localization and detection | |
| JP6941965B2 (en) | Domain-level threat detection for industrial asset control systems | |
| US10476902B2 (en) | Threat detection for a fleet of industrial assets | |
| US10671060B2 (en) | Data-driven model construction for industrial asset decision boundary classification | |
| US20190219994A1 (en) | Feature extractions to model large-scale complex control systems | |
| US11005863B2 (en) | Threat detection and localization for monitoring nodes of an industrial asset control system | |
| US10678912B2 (en) | Dynamic normalization of monitoring node data for threat detection in industrial asset control system | |
| CN113009889B (en) | Distributed industrial performance monitoring and analysis platform | |
| US11005870B2 (en) | Framework to develop cyber-physical system behavior-based monitoring | |
| US20200089874A1 (en) | Local and global decision fusion for cyber-physical system abnormality detection | |
| WO2019226853A1 (en) | System and method for anomaly and cyber-threat detection in a wind turbine | |
| CN110493025A (en) | Method and device for fault root cause diagnosis based on multi-layer directed graph | |
| CN111133391A (en) | Method for controlled sharing of wind farm and wind turbine data, data analysis algorithms and results of data analysis | |
| US20180157771A1 (en) | Real-time adaptation of system high fidelity model in feature space | |
| US20240419154A1 (en) | Artificial intelligence based data-driven interconnected digital twins | |
| Hacks et al. | Creating meta attack language instances using archimate: Applied to electric power and energy system cases | |
| WO2025080271A2 (en) | Automated feature-engineering-based machine learning model building and deployment | |
| US20190080259A1 (en) | Method of learning robust regression models from limited training data | |
| Allison et al. | Digital twin-enhanced methodology for training edge-based models for cyber security applications | |
| US20170372237A1 (en) | System and method for producing models for asset management from requirements | |
| CN110928761B (en) | Demand chain and system and method for application thereof |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23955605 Country of ref document: EP Kind code of ref document: A2 |