[go: up one dir, main page]

WO2025223683A1 - Systems and methods for generating natural language descriptions of time series data - Google Patents

Systems and methods for generating natural language descriptions of time series data

Info

Publication number
WO2025223683A1
WO2025223683A1 PCT/EP2024/069575 EP2024069575W WO2025223683A1 WO 2025223683 A1 WO2025223683 A1 WO 2025223683A1 EP 2024069575 W EP2024069575 W EP 2024069575W WO 2025223683 A1 WO2025223683 A1 WO 2025223683A1
Authority
WO
WIPO (PCT)
Prior art keywords
time series
series data
deviation
processors
variables
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/EP2024/069575
Other languages
French (fr)
Inventor
Aravind D
Naveen Kumar
Bhanu venkata sai phani CHATTI
Abhinav NIRMAL
Shantanu SABOO
Leny Thangiah
Umesh Uppili
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Siemens AG
Siemens Corp
Original Assignee
Siemens AG
Siemens Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siemens AG, Siemens Corp filed Critical Siemens AG
Publication of WO2025223683A1 publication Critical patent/WO2025223683A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/20Administration of product repair or maintenance
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B23/00Testing or monitoring of control systems or parts thereof
    • G05B23/02Electric testing or monitoring
    • G05B23/0205Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults
    • G05B23/0218Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults characterised by the fault detection method dealing with either existing or incipient faults
    • G05B23/0224Process history based detection method, e.g. whereby history implies the availability of large amounts of data
    • G05B23/024Quantitative history assessment, e.g. mathematical relationships between available data; Functions therefor; Principal component analysis [PCA]; Partial least square [PLS]; Statistical classifiers, e.g. Bayesian networks, linear regression or correlation analysis; Neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition

Definitions

  • the present invention relates to the field of data analytics systems and more particularly relates to systems and methods for generating natural language descriptions of time series data an using an industrial intelligence module having a large language model and a knowledge base (LLM).
  • LLM knowledge base
  • the assets may include mechanical systems, electromechanical systems, electronic systems, and other systems. These assets may have different attributes such as availability, performance, behaviour, efficiency, maintainability, reliability, serviceability, and other attributes. These attributes of each of the assets may affect the overall performance of the assets in the industrial environment. In order to monitor the overall performance of the assets, a plant operator at any point in time must monitor multiple sensors that are providing time series data critical to the process.
  • the operator is required to interpret the time series data based on his domain understanding to identify and comprehend trends that could lead to potential problems in the asset within the industrial environment.
  • the domain understanding of the trends in the time series data that could lead to potential anomalies is gained through experience and is not documented normally.
  • the operator with limited experience in these systems might not be able to identify these trends and even experienced operators might miss some of these trends in the early stages as they are focused on monitoring several variables.
  • This problem arises as there are several limitations in comprehending the time series data.
  • the time series data is complex and difficult to interpret.
  • the time series data is continuously generated and therefore the volume of the time series data is huge. Further, as the systems include multiple sensors, the dimensions of the time series data grow exponentially. Furthermore, as the time series data is captured in the form of unstructured data, the data comprehension difficulty is further enhanced.
  • Some conventional prediction systems use Al models which require an expert operator to interpret the time series data. However, these conventional prediction systems do not translate results into a form that eases the decision-making for the operator. Such conventional prediction systems use domain knowledge which is in textual form and associating the domain knowledge with the time series data is not feasible because of a multimodality of data. Further, another limitation of these conventional prediction systems is the large amount of time taken by the expert operator to manually or via Al system to identify the anomaly and/or derive insights/KPIs for assets in the plant and then apply the domain knowledge to feed in the information. Further, there are a smaller number of expert operators available to make inferences from the time series data to knowledge.
  • Some conventional prediction systems identify change from a particular operating range of the equipment and flag that as an anomaly using alarms without the aspect of underlying context and collating the results in textual form providing better inferences.
  • These conventional prediction systems mostly provide insights based on the time series data and algorithms that operate on this time series data to identify the problem.
  • these conventional prediction systems lack key insights that could be derived from the time series data for anomaly detection as well as resolution.
  • assets such as a power plant, wind farm, power grid, manufacturing facility, process plants, buildings (residential or non-residential areas) and so on.
  • Examples of an industrial environment or technical installation may include a complex industrial set-up such as a manufacturing facility, process plants, storage facility, transportation.
  • the industrial environment may refer to any vertical and/or domain in business.
  • different verticals treated as industrial environment for the purpose of this disclosure may include but not limited to automobiles, textiles, every distribution, energy production, buildings, factories, and medical equipment.
  • the term “industrial intelligence module” refers to a component of one or more processor configured to process, analyse, and interpret data in the technical installation.
  • the industrial intelligence module may alternatively be referred to as the industrial intelligence layer (IIL), within the scope of the present invention.
  • IIL industrial intelligence layer
  • the meaning of the term “industrial intelligence module” is consistent with the system disclosed in priority application 202441032988.
  • the industrial intelligence module may utilize advanced artificial intelligence techniques and tools to generate actionable insights and recommendations based on a time series data received.
  • the industrial intelligence module provides a comprehensive solution for monitoring, analysing, and improving industrial processes.
  • the industrial intelligence module generates understandable and actionable insights, ultimately helping to optimize the performance and reliability of the technical installation.
  • the term “sensors” refers to a devices or instruments used to measure various physical, chemical, or environmental parameters within the technical installation.
  • the sensors collect data over time, which forms the basis of the time series data used for analysis.
  • the “sensors” comprises of position sensors, rotary encoders, dynamometers, proximity sensors, current sensors, accelerometers, temperature sensors, acoustic sensors, voltage sensors associated with the assets in the technical installation that provide data related to assets.
  • assets may refer to any device, system, instrument or machinery manufactured or used in an industry that may be employed for performing an operation.
  • assets may also include any devices or instruments deployed or functioning in a non-industrial environment such as buildings.
  • Example of assets include any machinery in a technical system or technical installation/facility such as motors, gears, bearings, shafts, switchgears, rotors, circuit breakers, protection devices, remote terminal units, transformers, reactors, disconnectors, gear-drive, gradient coils, magnet, chillers, radio frequency coils, appliances, electronic devices, chillers, pumps, heat exchangers, cooling towers, air compressors, boilers, fluid bed driers, coating machines, carbonation towers etc.
  • machinery in a technical system or technical installation/facility such as motors, gears, bearings, shafts, switchgears, rotors, circuit breakers, protection devices, remote terminal units, transformers, reactors, disconnectors, gear-drive, gradient coils, magnet, chillers, radio frequency coils, appliances, electronic devices, chillers, pumps, heat exchangers, cooling towers, air compressors, boilers, fluid bed driers, coating machines, carbonation towers etc.
  • time series data refers to a sequence of data points collected or recorded at successive points in time, usually at uniform intervals.
  • the time series data may be used to monitor and analyse the behaviour of various variables within the technical installation over time.
  • the time series data may be fundamental for analysing trends, detecting anomalies, and making predictions in a technical installation. By collecting and examining this data over time, the system can identify deviations from expected behaviour, determine relationships between variables, classify the severity of issues, and generate natural language descriptions and recommendations for addressing these deviations.
  • the term “one or more variables” refers to a different measurable quantities or attributes within the technical installation that are being monitored by sensors.
  • the one or more variables represent the specific aspects of the system’s operation that can change over time and provide insight into the system’s performance, condition, and any potential issues.
  • the one or more variables may be temperature within in the technical installation, pressure, flow rate, humidity.
  • the one or more variables may be crucial for understanding the operational state of the technical installation.
  • Monitoring of the one or more variables at real-time via sensors may enable the system to collect time series data, detect deviations, determine relationships between variables, classify the severity of deviations, and generate natural language descriptions and recommendations to address any issues.
  • the comprehensive monitoring of the one or more variables may ensure the efficient, safe, and reliable operation of the technical installation.
  • the term “deviation” refers to any anomaly or unexpected change in the measured variables that suggests a potential issue or abnormal condition within the technical installation. Detecting these deviations is critical for maintaining the safety, efficiency, and reliability of the system. In an example, detection of the deviations may be used to trigger further analysis, determine relationships, classify the severity of the deviations, and generate natural language descriptions and recommendations for addressing them. The deviation may be classified into major deviation and minor deviation.
  • the term “knowledge base” as used herein refers to a heterogeneous database comprising information pertaining to the domain and the industrial environment.
  • the knowledge base is a centralized repository or database that contains domain- specific information, asset specific information, process- specific information, sensor- specific information, etc.
  • the knowledge base also contains information relevant to the specific industry, sector, or domain in which the organization operates. This may include technical specifications, manufacturing processes, equipment manuals, safety procedures, regulatory requirements, and industry standards.
  • the knowledge base also comprises information retrieved from one or more Al models deployed in the industrial environment.
  • the knowledge base also comprises information extracted from documentation of performance parameters, efficiency parameters, anomalies, root cause analysis, prediction, and resolution anomalies, etc.
  • the knowledge base also comprises information extracted from knowledge graphs comprising information of the plant and its assets in a hierarchical manner.
  • the knowledge base also comprises images, videos, audios etc. having information of the industrial environment.
  • the term “natural language description” refers to a textual output from the industrial intelligence module post analysing and processing the time series data into human-readable text.
  • the natural language description provides the technical analysis and findings of the industrial intelligence module articulated in a clear, concise, and understandable manner for users who may not be familiar with the technical details.
  • the natural language description might be something like: "The temperature sensor has recorded a significant deviation from the baseline, indicating a potential overheating issue. It is recommended to inspect the cooling system and ensure proper ventilation.
  • the term “user” as used herein refers to a human interacting with the system for responses.
  • the user may also refer to a virtual assistant or co-pilot capable of querying the system with natural language query.
  • the term “one or more Al models” refer to a plurality of neural networks.
  • neural networks include but are not limited to, convolutional neural network (CNN), deep neural network (DNN), recurrent neural network (RNN), and Restricted Boltzmann Machine (RBM).
  • the learning technique for training each Al model uses a plurality of learning data to cause, allow, or control the system 100 to make a determination or analysis.
  • Examples of learning techniques include but are not limited to supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning. At least one of a plurality of CNN, DNN, RNN, RMB models and the like may be implemented to thereby achieve execution of the present subject matter’s mechanism through the Al models.
  • LLM Large Language Model
  • Transformers employ self-attention mechanisms to process input sequences and capture long-range dependencies, enabling the model to generate coherent and contextually relevant text.
  • LLMs undergo two main stages of training: pre-training and fine-tuning.
  • pre-training the model is trained on a large corpus of text data using unsupervised learning techniques to learn general language patterns and semantics.
  • fine-tuning the pre-trained model is further optimized on domain- specific or task-specific datasets to adapt its knowledge and capabilities to specific applications.
  • LLMs have the ability to generate human-like text based on given prompts or inputs. They can produce coherent paragraphs, articles, stories, code, or responses to questions by predicting the next words or tokens in the sequence based on the context provided.
  • LLMs exhibit a strong understanding of context and semantics in natural language. They can infer meaning, resolve ambiguity, and generate text that is contextually relevant and coherent with the given input.
  • the object of the present invention is achieved by a computer-implemented method for generating natural language descriptions of time series data in a system.
  • the method comprises receiving, by one or more processors, the time series data for one or more variables measured by one or more sensors in the technical installation.
  • the method further comprises periodically splitting time series data into pre-defined time segments based on a predefined time interval.
  • time series data By dividing the time series data into smaller, manageable segments, it becomes easier to analyze and interpret the data.
  • patterns, trends, and anomalies that may not be evident in the full dataset (time series data) may be more readily identified in smaller time intervals.
  • advantageously processing split time series data may reduce the computational load on the system. Thus, leading to faster processing and more efficient use of resources, especially important when dealing with large datasets.
  • the method comprises detecting, by one or more processors, , a deviation in time series data based on a comparison with predefined baseline for each variable of the one or more variables in the time series data using a deviation detection model.
  • the method further comprises determining the predefined baseline for each variable of the one or more variables based on domain expertise knowledge available in the knowledge base.
  • the predefined baseline corresponds to realistic and expected values derived from domain expertise incorporating contextual knowledge about the processes and variables.
  • contextual knowledge ensures that the predefined baseline may account for normal operational variations and conditions, reducing false positives in deviation detection. Therefore, predefined baseline based on domain expertise knowledge may distinguish between normal operational fluctuations and true anomalies.
  • such distinction may improve sensitivity of the deviation detection model, to identify significant deviations that may indicate underlying issues or opportunities for improvement.
  • the method comprises determining, by the one or more processors, relationships between variables of the detected deviations.
  • the method further includes identifying the dependencies and causalities between the variables and determining relationships between variables based on the identification of dependencies. Further, the method comprises classifying, by the one or more processors, the detected deviations as one of a major deviation, a minor deviation, or a normal deviation using statistical techniques and Al model trained on specific domain knowledge. The classification may be based on the determined relationships and a comparison of the detected deviation with the baseline.
  • the statistical techniques provide a robust foundation for detecting and classifying deviations by analysing historical data patterns and relationships. Thus, ensuring baseline accuracy for detecting deviations.
  • the method includes leveraging domain- specific knowledge to refine the classification process. Thus, the classification may be easily adaptable to various domains and applications, enhancing the method’s scalability and flexibility to different industrial scenarios.
  • the method comprises extracting, by the one or more processors, using the industrial intelligence module comprising a large language model and a knowledge base, one or more recommendations for addressing the classified deviation, based on the determined relationships.
  • the one or more recommendations may be tailored to the specific context and domain, leveraging the knowledge base and large language model. Thus, ensuring that the suggested actions are relevant and effective in addressing the deviations identified.
  • the method further comprises extracting a set of templates associated with the detected deviation from the knowledge base, wherein the set of templates comprises statistically relevant information capable of describing the behaviour of the time series data.
  • the method comprises generating, by the one or more processors using the industrial intelligence module, natural language descriptions of the time series data based on the extracted one or more recommendations.
  • the method comprises generating the natural language descriptions of the time series data based a predefined template and the generated descriptions include a summary for the major deviation and the minor deviation.
  • the predefined template ensures that the generated natural language descriptions follow a consistent standard format. This standardization makes it easier for users to understand and compare reports across different time periods and datasets.
  • the summaries corresponding to both major deviation and minor deviation may ensure that all significant aspects of the time series data is covered. Consequently, providing a clear and detailed overview of the deviations, aiding in thorough analysis and decision-making.
  • summarizing major and minor deviations separately may assist in prioritizing issues. For instance, major deviations may be highlighted for immediate attention, while minor deviations may be monitored and addressed as needed, ensuring that critical problems may not be overlooked.
  • the natural language descriptions are generated for each time segment of the pre-defined time segments the generated description on a display screen to provide an alert.
  • generating the natural language descriptions for each time segment and displaying them as alerts may enable real-time monitoring. For instance, allowing operators to quickly identify and respond to deviations or anomalies as they occur, thus, enhancing the responsiveness and timeliness of interventions.
  • the method comprises the summary for the major deviation and the minor deviation as a textual description of determined anomalies in the time series data.
  • the textual descriptions provide a clear and concise explanation of the deviations, for example, anomalies detected in the time series data. Consequently, the textual descriptions help users quickly understand the nature and significance of the deviations without needing to interpret complex data visualizations or raw data.
  • the textual descriptions is accessible to a broader audience, including those who may not have technical expertise in data analysis. Thus, the broader accessibility may ensure that all relevant stakeholders may understand and act on the information (deviation summary).
  • describing anomalies in the textual description may allow for the inclusion of contextual information which may consequently assist in diagnosing issues and planning effective responses.
  • the textual descriptions may be used as training materials for new operators and as the knowledge base for future reference.
  • the method comprises analysing, by one or more processors, a status of the system at a particular point in time to determine anomalies in the time series data.
  • analysing the status at specific points in time may allow for the immediate detection of anomalies, enabling quick responses to potential issues before any escalation.
  • the method comprises, implementing, by one or more processors, a sequence of troubleshooting queries to identify a root cause for the determined anomalies using the industrial intelligence module.
  • the sequence of queries may streamline the root cause analysis, thus, reducing the time and effort required to pinpoint underlying problems.
  • the method includes, identifying, by one or more processors, one or more insights, corrective actions, or preventive actions based on the identified root cause using the industrial intelligence module.
  • the industrial intelligence module may integrate domain- specific knowledge and advanced analytical models to enhance the accuracy and relevance of the root cause analysis and subsequent recommendations. Consequently, the industrial intelligence module may consider a wide range of factors and data points, providing a more holistic view of the system and its anomalies.
  • the object of the present invention is also achieved by an apparatus for generating natural language descriptions of time series data.
  • the apparatus comprises a memory and one or more processors communicatively coupled to the memory.
  • the memory comprises programmable instructions executable by the one or more processors.
  • the programmable instructions when executed by the one or more processors, cause the one or more processors to perform one or more methods, as discussed throughout the present invention.
  • the object of the present invention is also achieved by a system for generating natural language descriptions of time series data in a technical installation.
  • the system includes the industrial intelligence module comprising of the knowledge base. Further, the industrial intelligence module is communicatively coupled with the one or more Al models and the apparatus for generating descriptions of time series data in the technical installation based on performing one or more methods, as discussed throughout the present invention.
  • the object of the present invention is also achieved by a computer-program product being disclosed.
  • the computer-program product has machine-readable instructions stored therein, that when executed by one or more processors, cause the one or more processors to perform one or more methods, as discussed throughout the present invention.
  • the object of the present invention is also achieved by a non-transitory computer-readable medium being disclosed.
  • the non-transitory computer-readable medium is encoded with executable instructions, that when executed by one or more processors, cause the one or more processors to perform one or more methods, as discussed throughout the present invention.
  • FIG 1A, IB illustrates a block diagram of a system for generating natural language descriptions of time series data, in accordance with an embodiment of the present invention
  • FIG 2 illustrates a block diagram of the apparatus for generating the natural language descriptions of the time series data, in accordance with an embodiment of the present invention
  • FIG 3 illustrates a diagram depicting a process flow for generating the natural language descriptions of the time series data, in accordance with an embodiment of the present invention
  • FIG 4 illustrates a functional block diagram of the system for generating the natural language descriptions of the time series data, in accordance with an embodiment of the present invention.
  • FIG 5 illustrates a flow chart of a method for generating the natural language descriptions of the time series data, in accordance with an embodiment of the present invention.
  • FIG la illustrates a block diagram of a system 100 for generating natural language descriptions of time series data in a technical installation, in accordance with an embodiment of the present invention.
  • the system includes one or more assets 102, a sensing unit 104, an industrial intelligence layer 106 comprising of a knowledge base 106a and a large language model (LLM) 106b. Further, the system includes one or more Al models 108, a client device 110, an edge device 112, a network 114, and an apparatus 116.
  • the system 100 can be understood as an industrial intelligence layer (IIL), a chatbot or a query engine capable of assisting the users/plant operators with a structured response related to time series data.
  • IIL industrial intelligence layer
  • chatbot a query engine capable of assisting the users/plant operators with a structured response related to time series data.
  • the one or more assets 102 include mechanical systems, electromechanical systems, electronic systems, and other systems in an industrial environment such as a power plant, wind farm, power grid, manufacturing facility, process plants, and so on.
  • assets refers to any device, system, instrument, or machinery manufactured or used in the industrial environment that may be employed for performing an operation.
  • assets 102 include any machinery in a technical system or technical installation/facility such as motors, gears, bearings, shafts, switch gears, rotors, circuit breakers, protection devices, remote terminal units, transformers, reactors, disconnectors, gear-drive, gradient coils, magnet, radio frequency coils, appliances, electronic devices, chillers, pumps, heat exchangers, cooling towers, air compressors, boilers, fluid bed driers, coating machines, carbonation towers etc.
  • machinery in a technical system or technical installation/facility such as motors, gears, bearings, shafts, switch gears, rotors, circuit breakers, protection devices, remote terminal units, transformers, reactors, disconnectors, gear-drive, gradient coils, magnet, radio frequency coils, appliances, electronic devices, chillers, pumps, heat exchangers, cooling towers, air compressors, boilers, fluid bed driers, coating machines, carbonation towers etc.
  • the term sensing unit 104 may refer to one or more sensors for acquiring and transmitting time series data from the one or more assets 102.
  • the sensing unit 104 may include, but are not limited to, position sensors, rotary encoders, dynamometers, proximity sensors, current sensors, accelerometers, temperature sensors, acoustic sensors, and voltage sensors.
  • the sensing unit 104 may acquire real-time condition data indicative of one or more operating conditions of the one or more assets 102 in realtime.
  • the term one or more variables may refer to aspects that are being measured by the sensing unit 104.
  • the one or more variables may include, but are not limited to, position, rotation, force, distance, electric current, acceleration, temperature, sound or vibrations, and voltage.
  • the industrial intelligence module 106 may refer to a component of one or more processor configured to process, analyse, and interpret data in the technical installation.
  • the industrial intelligence module may alternatively be referred to as the industrial intelligence layer (IIL), within the scope of the present disclosure.
  • the industrial intelligence module may utilize advanced artificial intelligence techniques and tools to generate actionable insights and recommendations based on a time series data received.
  • the industrial intelligence module provides a comprehensive solution for monitoring, analysing, and improving industrial processes.
  • the industrial intelligence module generates understandable and actionable insights, ultimately helping to optimize the performance and reliability of the technical installation.
  • the industrial intelligence module 106 may be communicatively coupled with the one or more Al models 108.
  • the term ‘knowledge base' 106a may refer to a database comprising data and information pertaining to the one or more assets 102 in the form of a knowledge graph comprising a plurality of nodes linked to each other.
  • the knowledge base 106a may include an algorithm or domain expert knowledge to identify normal, minor, and major deviations for measured time series data.
  • the knowledge base 106a may also include historical data including information of past deviations in the time series data.
  • the knowledge base 106a may also include historical analysis data including information of past analysis of the time series data.
  • the knowledge base 106a may also include information corresponding to the one or more sensors in a plant and standard operating procedures in the plant.
  • the knowledge base 106a may also include information of an external knowledge base or documentation to determine the normal, minor, and major deviation for the measured time series data. Furthermore, the knowledge base 106a may also have information extracted from the schematics of the plant including graphical representation of the process flow such as piping and instrumentation diagram, asset design sheets, and alike.
  • the industrial intelligence module 106 comprises the knowledge base 106a which contains domain- specific information, best practices, historical data, and expert recommendations related to the technical installation.
  • the knowledge base 106a may serve as a reference for the industrial intelligence module 106 to extract relevant information while extracting one or more recommendations for addressing deviation in the time series data.
  • the industrial intelligence module 106 may ensure that the one or more recommendations may not only be based on real-time data but also on historical insights and expert knowledge, leading to more accurate and actionable recommendations .
  • the one or more Al models 108 comprises the plurality of neural network layers.
  • neural networks include but are not limited to, convolutional neural network (CNN), deep neural network (DNN), recurrent neural network (RNN), and Restricted Boltzmann Machine (RBM).
  • the one or more Al models 108 is configured to process and analyse large volumes of the time series data, to identify patterns, make predictions, and provide insights that are valuable for decision-making.
  • the one or more Al models 108 may analyse the time series data collected from the one or more sensors 104 in the technical installation.
  • the one or more sensors 104 measure various variables over time, such as temperature, pressure, or voltage, providing a continuous stream of data.
  • the one or more Al models 108 are configured to detect deviations in the time series data based on comparing the measured data with the predefined baselines. Thus, the one or more Al models 108 may identify deviations from expected patterns. These deviations may indicate potential issues or anomalies in the operation of the technical installation. Furthermore, the one or more Al models 108 are configured to determine relationships between variables. Advantageously, based on determining relationships, the system 100 may determine root causes of the deviations and any potential impact.
  • the one or more Al models 108 is configured to classify the deviations based on the determined relationships and the magnitude of the detected deviations.
  • the one or more Al models 108 may classify deviation as major deviations, minor deviations, or no deviations.
  • the classification may assist the system 100 in prioritizing and focus attention on the most critical issues of the technical installation.
  • the industrial intelligence module 106 is communicatively coupled with the one or more Al models 108 by leveraging the knowledge base 106a to extract one or more recommendations for addressing the classified deviations.
  • the one or more recommendations comprises maintenance actions, operational adjustments, or other interventions aimed at resolving or mitigating the detected issues in the technical installation.
  • the industrial intelligence module 106 utilizes the large language model (LLM) 106b to generate the natural language descriptions of the time series data based on the extracted one or more recommendations.
  • the LLM 106b may be trained on the domain knowledge pertinent to a particular industrial environment. Consequently, the LLM 106b understands the terminology, context, and nuances specific to that industry.
  • the industrial intelligence module 106 utilizes the LLM 106b to generate a summary like “The temperature of Reactor has shown a consistent upward trend over the past 24 hours, indicating a possible need for cooling system adjustment”.
  • generation of the natural language descriptions may allow for easy interpretation and communication of the insights derived from the time series data. Thus, facilitating informed decision-making by users or operators, engineers, or other stakeholders involved in managing the technical installation.
  • the LLM trained with the domain knowledge pertinent to the particular industrial environment or the technical installation to understand and effectively operate within the context of the technical installation.
  • the LLM 106b is trained on datasets that include the specific terminologies, processes, workflows, and common scenarios relevant to the technical installation. Consequently, the LLM 106b becomes adept at recognizing patterns, understanding context, and generating text that is both accurate and contextually appropriate for that specific field.
  • training the LLM 106b with the domain knowledge advantageously generate outputs that are highly accurate and relevant to the technical installation, thus, reducing the chances of misinterpretation or irrelevant suggestions.
  • the LLM 106b provides insights that are practical and actionable within the technical installation.
  • the LLM 106b becomes proficient in the technical natural language used in the industry, ensuring clear and effective communication.
  • operations may involve various complex processes such as mixing, heating, and chemical reactions, with stringent requirements for quality control and safety.
  • the one or more assets used might include reactors, mixers, filtration units, and quality testing machines, all of which generate continuous streams of the time series data.
  • the LLM 106b may be trained for this specific environment, using data gathered from the factory, including sensor data, operational logs, maintenance records, and quality control reports. Further, the LLM 106b may also be trained using domain- Specific documents such as technical manuals, safety protocols, process descriptions, and industry standards to help the LLM 106b understand the procedural and regulatory context.
  • the one or more Al models 108 analyzes the time series data to detect deviations and understand their implications, while the industrial intelligence module 106 enhances the analysis of the one or more Al models 108 by providing actionable recommendations and generating the natural language descriptions such as descriptive reports, thereby improving the overall intelligence and efficiency of managing the technical installation.
  • an operator 118 provides a requirement for generating natural language descriptions of the time series data via the client device 110.
  • the requirements may include a request for continuous monitoring of the one or more assets 102.
  • the requirements may include a request for calculating efficiency or metrics of performance of the one or more assets 102.
  • the requirements may include a request for generating one or more insights, corrective actions, and preventive actions for the deviation of the time series data.
  • the requirements may be stored in a memory of the edge device 112 or may be input to the edge device 112 by the operator 118.
  • the edge device 112 may be communicatively coupled to the client device 110.
  • client devices 110 include personal computers, workstations, personal digital assistants, and humanmachine interfaces.
  • the client device 108 may enable the operator to input one or more requirements through a web-based interface.
  • the edge device 112 Upon receiving the one or more requirements from the operator, the edge device 112 transmits a request for generating the summary for the major deviation and the minor deviation of the time series data.
  • the apparatus 116 is deployed in a cloud computing environment.
  • cloud computing environment refers to a processing environment comprising configurable computing physical and logical resources, for example, networks, servers, storage, applications, services, etc., and data distributed over the network 114, for example, the internet.
  • the cloud computing environment provides on- demand network access to a shared pool of configurable computing physical and logical resources.
  • the apparatus 116 comprises of a network interface (not shown) for communicating with the one or more edge devices 112 via the network 114.
  • the apparatus 116 is an edge computing device.
  • edge computing refers to a computing environment that is capable of being performed on an edge device (e.g., connected to one or more sensing units in an industrial setup and to a remote server(s) such as for computing server(s) or cloud computing server(s) on other end), which may be a compact computing device that has a small form factor and resource constraints in terms of computing power.
  • a network of the edge computing devices can also be used to implement the apparatus. Such a network of edge computing devices is referred to as a fog network.
  • FIG lb illustrates a block diagram of the system 100 for generating the natural language descriptions of the time series data in the technical installation, in accordance with an embodiment of the present invention.
  • the functional blocks of the system 100 are implemented in conjunction with as an industrial predictive analytics engine (IPAE) 122 and intent and an industrial intelligence layer (IIL) 120.
  • IIL industrial intelligence layer
  • the constructional and operational features of the IIL 120 that are already explained in application 202441032988, the entire contents of which are incorporated herein by reference, are not explained in detail in the description of Fig lb.
  • the operator or the user 118 initiates the query via the client device 110.
  • the IIL 120 may receive the query from the client device 110 and provide it to the system 100 to relate and generate descriptions of the time series data.
  • the system 100 receives the time series data from the one or more sensors 104 via the IPAE 122.
  • the entire contents of which are incorporated herein by reference of the application PCT/EP2022/058317and is omitted herein for the sake of brevity.
  • the system 100 is configured to process the received time series data based on selecting the one or more Al models 108, for instance, domain tuned Al model 124a, statistical data Al models 124b, and the vision-based Al models 124c, depending on the nature of the time series data.
  • the one or more Al models 108 is configured to process incoming time series data by employing various analytical techniques tailored to the specific characteristics of the time series data and a source of the one or more sensors 104. This includes utilizing at least one of, the domain tuned Al models 124a, the statistical data Al models 124b, and the vision-based Al models 124c, depending on the nature of the time series data being received.
  • the system 100 may dynamically selects the one or more Al models 108 based on the source of the one or more sensors 104, thus, ensuring a finely tuned approach to extracting meaningful insights from the time series data.
  • the selected one or more Al models 108 may detect anomalies, identify relationships between one or more variables, and classify deviations based on their significance.
  • the system 100 is configured to utilize the domain-tuned Al models 124a to analyse the time series data. For instance, if the time series data relates to machine performance, the system 100 may use the domain-tuned Al models 124a specifically trained on manufacturing data. For another instance, the system 100 may use the domain-tuned Al models 124a trained to detect anomalies in equipment behaviour or to predict when maintenance might be needed in the technical installation. Further, the domain-tuned Al models 124a are trained using the IPAE 122, the entire contents of which are incorporated herein by reference of the application PCT/EP2022/058317and is omitted herein for the sake of brevity.
  • the system 100 is configured to utilize the statistical data Al models 124b in response to receiving the time series data related to environmental conditions such as temperature and humidity.
  • the system 100 may use the statistical data Al models 124b such as time-series analysis to identify trends or seasonal patterns in the time series data.
  • the system 100 is configured to utilize the vision Al models 124c.
  • the one or more sensors 104 may capture visual data, such as surveillance cameras monitoring the production floor of the technical installation
  • the vision Al models 124c may be used.
  • the vision Al models 124c may analyse images or video feeds to detect defects in products or identify safety hazards at the technical installation.
  • the industrial intelligence layer 106 receives the time series data from the one or more Al models 108 such as the domain tuned Al models 124a, the statistical Al data models 124b, and the vision-based Al models 124c.
  • the industrial intelligence layer 106 includes the knowledge base 106a which stores information about the technical installation, its components, historical data, and best practices for maintenance and operation.
  • the industrial intelligence module 106 is in constant communication with the one or more Al models 108.
  • the industrial intelligence module 106 leverages the knowledge base 106a to extract one or more recommendations for addressing the classified deviation.
  • the one or more recommendations comprises of actions to be taken to rectify the anomaly, such as adjusting operating parameters, scheduling maintenance, or investigating potential equipment failures.
  • the industrial intelligence layer 106 is configured to generate natural language descriptions of the time series data using the Large Language Model (LLM) 106b.
  • LLM Large Language Model
  • the LLM 106b is trained on vast amounts of text data and is capable of generating humanlike text based on input prompts or query by the user 118.
  • the natural language descriptions of the time series data may provide context, insights, and actionable information based on the detected deviations and recommended actions.
  • the LLM 106b is already explained in the application 202441032988 and is not explained in detail herein for the sake of brevity.
  • the natural language descriptions of the time series data is displayed on the client device 110.
  • FIG 2 illustrates a block diagram of the apparatus 116 for generating the natural language descriptions of the time series data, in accordance with an embodiment of the present invention.
  • the apparatus 116 comprises a processing unit 202, a memory unit 204, a communication unit 206, an I/O interface 208, and an output unit 210.
  • the apparatus 114 can be a computer, a workstation, or a virtual machine running on host hardware.
  • the apparatus 114 can be a real or a virtual group of computers (the technical term for a real group of computers is “cluster”, the technical term for a virtual group of computers is “cloud”).
  • the processing unit 202 may include one or more processors as a single processing unit or several units.
  • the processing unit 202 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions.
  • the one or more processors are configured to fetch and execute computer-readable instructions and data stored in the memory unit 204.
  • the memory unit 204 includes one or more computer-readable storage media.
  • the memory unit 204 may include non-volatile storage elements. Examples of such non-volatile storage elements may include magnetic hard discs, optical discs, floppy discs, flash memories, or forms of electrically programmable memories (EPROM) or electrically erasable and programmable (EEPROM) memories.
  • the memory may, in some examples, be considered a non-transitory storage medium.
  • the term “non-transitory” may indicate that the storage medium is not embodied in a carrier wave or a propagated signal. However, the term “non-transitory” should not be interpreted to mean that the memory is non-movable.
  • a non-transitory storage medium may store data that can, over time, change (e.g., in Random Access Memory (RAM) or cache.
  • RAM Random Access Memory
  • the memory unit 204 may further include any non-transitory computer-readable medium known in the art including, for example, volatile memory, such as static random-access memory (SRAM) and dynamic random-access memory (DRAM), and/or non-volatile memory, such as read-only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
  • volatile memory such as static random-access memory (SRAM) and dynamic random-access memory (DRAM)
  • DRAM dynamic random-access memory
  • non-volatile memory such as read-only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
  • the memory unit 204 includes a database 212 and an.
  • the database 212 is configured to be accessed by the processing unit 202 and stores information as required by processing unit 202 to perform the one or more functions.
  • the database 212 may store the time series data for the one or more variables measured by one or more sensors of the sensing unit 104.
  • the database 212 may also store historical data including information of past deviations in the time series data and extracted one or more insights, corrective actions, and preventive actions in the past.
  • the database 212 may serve dual functions in storing different types of information. Primarily, it holds the time series data, most of which is collected from one or more sensors of the sensing unit 104.
  • the time series data from the one or more sensors of the sensing unit 104 provides a continuous stream of information that reflects the operational status over time.
  • the database 212 may also maintains the knowledge base, which includes historical data on past deviations in the time series data.
  • the knowledge base comprises of extracted insights, corrective actions, and preventive actions that have been previously implemented.
  • the AI/ML module 214 may include an Artificial Intelligence (Al) model and Large Language Models (LLMs).
  • Each Al model may include a plurality of neural network layers.
  • neural networks include but are not limited to, convolutional neural network (CNN), deep neural network (DNN), recurrent neural network (RNN), and Restricted Boltzmann Machine (RBM).
  • the learning technique for training each Al model uses a plurality of learning data to cause, allow, or control the system 100 to make a determination or analysis. Examples of learning techniques include but are not limited to supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning.
  • At least one of a plurality of CNN, DNN, RNN, RMB models and the like may be implemented to thereby achieve execution of the present subject matter’s mechanism through the Al models.
  • a function associated with the AI/ML module 214 may be performed through the non-volatile memory, the volatile memory, and the processing unit 202.
  • LLM Large Language Models
  • LLMs may refer to a type of Al model designed to understand and generate human-like text based on vast amounts of natural language data. These models are built using deep learning architectures, particularly transformer-based architectures. LLMs are trained on massive datasets containing billions or even trillions of words from diverse sources such as books, articles, websites, and other textual content. The large-scale training enables the model to learn complex patterns, structures, and nuances of human language. LLMs are typically built using deep learning architectures, particularly transformer architectures.. LLMs undergo two main stages of training: pre-training and fine- tuning. During pre-training, the model is trained on a large corpus of text data using unsupervised learning techniques to learn general language patterns and semantics.
  • LLMs have the ability to generate human-like text based on given prompts or inputs. They can produce coherent paragraphs, articles, stories, code, or responses to questions by predicting the next words or tokens in the sequence based on the context provided. LLMs exhibit a strong understanding of context and semantics in natural language. They can infer meaning, resolve ambiguity, and generate text that is contextually relevant and coherent with the given input.
  • the industrial intelligence module 106 generate the natural language descriptions of time series data using the LLM 106b, such as contextually relevant and coherent text, predictively constructing sentences based on the query from the client device 110. For instance, the LLM 106b may take the extracted one or more recommendations about the deviations and produce detailed, human-readable natural language descriptions that explain the data trends, deviations, and suggested actions in a manner that is easily understandable by the users.
  • the integration of the LLM 106b with the industrial intelligence module 106 which is communicatively coupled to the apparatus 116, generates insightful and articulate descriptions of the time series data, making complex time series analyses more accessible and actionable for users.
  • the one or more sensors 104 detects a significant deviation in temperature within a manufacturing process
  • the one or more Al models 108 classifies it as a major deviation and identifies the relationship between temperature and other variables, such as pressure.
  • the industrial intelligence module 106 then extracts the one or more recommendation, such as adjusting the cooling system. Consequently, the LLM 106b based on the one or more recommendation, generates the comprehensive report, describing the detected temperature anomaly, its potential causes, and the recommended corrective action, thereby facilitating quick and informed decisionmaking.
  • the communication unit 206 is configured to communicate sensor data, or any other content over a communication network. Further, the communication unit 206 may include a communication port or a communication interface for sending and receiving signals from the apparatus 114 via the communication network.
  • the communication port or the communication interface may be a part of the processing unit 202 or may be a separate component.
  • the communication port may be created in software or may be a physical connection in hardware.
  • the communication port may be configured to connect with the communication network, external media, the display, or any other components in the system 100, or combinations thereof.
  • the connection with the communication network may be a physical connection, such as a wired Ethernet connection, or may be established wirelessly as discussed above. Likewise, the additional connections with other components of the system 100 may be physical or may be established wirelessly.
  • the communication unit 206 may include the Wi-Fi module or Bluetooth module for enabling wireless communication capability and data exchange capability between various modules of the system 100.
  • the I/O interface 208 refers to hardware or software components that enable communication between various modules of the system 100.
  • the I/O interface 208 serves as a communication medium for exchanging information, commands, signals, or query responses with other devices or systems.
  • the I/O interface 208 may be a part of the processing unit 202 or maybe a separate component.
  • the I/O interface 208 may be created in software or maybe a physical connection in hardware.
  • the I/O interface 208 may be configured to connect with an external network, external media, the display, or any other components, or combinations thereof.
  • the external network may be a physical connection, such as a wired Ethernet connection, or may be established wirelessly.
  • the output unit 210 comprises of a display device.
  • the display device may be an Augmented Reality/Virtual Reality (AR/VR) device to display a virtual environment to the user.
  • the display device may include a display screen.
  • the display screen may be Light Emitting Diode (LED), Liquid Crystal Display (LCD), Organic Light Emitting Diode (OLED), Active Matrix Organic Light Emitting Diode (AMOLED), or Super Active Matrix Organic Light Emitting Diode (AMOLED) screen.
  • the display screen may be of varied resolutions.
  • the output unit 210 is further configured for presenting the generated summary for the major deviation and the minor deviation on the client device 108.
  • the processing unit 202 is configured to extract information from the knowledge base 106 which comprises domain knowledge stored in the form of a knowledge graph. Further, the processing unit 202 is also configured for obtaining time series data from the sensing unit 104. Further, the processing unit 202 is configured to detect deviation in time series data based on a comparison with predefined baseline for each variable of the one or more variables in the time series data using the deviation detection model. For instance, the processing unit 202 periodically splits the time series data into pre-defined time segments based on a predefined time interval. Then, the processing unit 202 determines the deviation in time series data for each time segment of the pre-defined time segments.
  • the processing unit 202 is configured to classify the deviation as one of a major deviation, a minor deviation, or a normal deviation. For instance, the processing unit 202 determines a corresponding threshold value corresponding to each of the major deviation, the minor deviation, and the normal deviation. Then, the processing unit 202 classifies the deviation by comparing the deviation with the corresponding threshold values. Furthermore, in addition to using statistical analysis such as by comparing the deviation with the corresponding threshold value, additional AI/ML techniques, such as clustering or classification may be combined with causality or explainer modules, to classify the detected deviations of the time series data. Advantageously, enabling a more nuanced and accurate classification by identifying patterns and relationships within the time series data that might not be apparent through statistical analysis alone.
  • the processing unit 202 is configured to extract one or more insights, corrective actions, and preventive actions for the classified deviation. For instance, the processing unit 202 uses the one or more Al models 108 to extract the one or more insights, the corrective actions, and the preventive actions.
  • the processing unit 202 is configured to extract a set of templates associated with the detected deviation from the knowledge base 106a.
  • the set of templates comprises statistically relevant information capable of describing the behaviour of the time series data.
  • the set of templates corresponds to structured frameworks or outlines the expected patterns and behaviours exhibited by the time series data under normal circumstances.
  • the set of templates comprises of key statistical information, such as typical ranges, trends, and fluctuations, which may be considered indicative of normal operations of the technical installation.
  • the processing unit 202 may utilizes the set of templates to compare the observed data against the expected requirements of the technical installation. For example, if the time series data represents temperature readings in the manufacturing process, the set of templates may include information about the expected temperature range, typical variations, and any known patterns related to temperature fluctuations during different production phases.
  • the processing unit 212 may refer to the set of templates to understand the nature and significance of the deviation. This allows the apparatus 116 to assess whether the observed deviation falls within acceptable bounds or if it requires further investigation and corrective action.
  • the LLM 106b may generate the natural language descriptions based on the set of templates.
  • the LLM 106b may add a layer of linguistic sophistication to the analysis, enabling the apparatus 116 to convey complex statistical findings in a more accessible and intuitive manner.
  • the processing unit 202 presents the summary for the major deviation and the minor deviation as the textual description of determined anomalies in the time series data on the output device.
  • the usability of the apparatus 116 enhances by providing users with clear and concise summaries of the detected deviations.
  • the textual descriptions generated by the processors may encapsulate the key insights from the analysis, allowing stakeholders to quickly grasp the nature and significance of the deviations without needing to delve into the raw data or statistical metrics.
  • the apparatus 116 may be based in the technical installation such as a manufacturing plant for monitoring and managing production processes, then the below workflow is followed: a.
  • the one or more sensors 104 may be installed on various assets, such as machines, conveyor belts, and environmental controls, to measure variables like temperature, pressure, speed, and humidity.
  • the apparatus 116 includes the one or more Al models 108 that may be configured to analyse the time series data from the one or more sensors 104 to detect deviations and classify them.
  • the industrial intelligence module 106 with the knowledge base 106a, and in communication with the apparatus 116, may be used to recommend corrective actions based on detected deviations.
  • the LLM 106b may be integrated with the industrial intelligence module 106 to generate the natural language descriptions of the time series data and the recommended actions.
  • the apparatus 116 may be configured to monitor a critical process involving a high- temperature furnace used for metal forging.
  • the one or more sensors 104 on the furnace measure variables like temperature, pressure, and gas flow rates.
  • the one or more Al models 108 may detect that the furnace temperature has deviated significantly from its baseline. The detected deviation shows the temperature spiked to 1200°C, while the baseline temperature is 1100°C.
  • the one or more Al models 108 may analyse the relationship between temperature and other variables. Consequently, determining that the increase in temperature correlates with an unexpected increase in gas flow rate. d) Thus, based on the severity and potential impact, the deviation is classified as a major deviation.
  • the industrial intelligence module 106 may refers to the knowledge base 106a and extracts the one or more recommendations. Consequently, suggesting reducing the gas flow rate and checking the gas valve for malfunctions.
  • the LLM 106b takes the technical details and recommendations and generates the natural language description.
  • the generated natural language description may be in form of summary, “On October 17, 2024, at 07:10, the furnace temperature in the metal forging process was detected to have a significant deviation. The temperature spiked to 1200°C, exceeding the baseline of 1100°C. This increase in temperature was found to correlate with an unexpected rise in the gas flow rate. Immediately, reduce the gas flow rate to bring the temperature back to its baseline. Proceed by inspecting the gas valve for any malfunctions or blockages that may have caused the increase in flow rate. Thus, address these recommendations promptly to stabilize the production process and prevent potential damage to the furnace. ”
  • the present invention also contemplates a computer-program product, having machine-readable instructions stored therein, when executed by the one or more processors, cause the one or more processors to perform a method for generating natural language descriptions of the time series data in the industrial environment.
  • the details on the method(s) performed by the one or more processors have been elaborated in subsequent paragraphs at least with reference to FIG. 5.
  • FIG 3 illustrates a diagram depicting a process flow 300 for generating the natural language descriptions of the time series data, in accordance with an embodiment of the present invention.
  • the time series data is input to the system 100.
  • the one or more processors of the processing unit 202 receive the time series data measured by the one or more sensors of the sensing unit 104.
  • the one or more processors set up algorithms and models to generate process-driven knowledge.
  • the one or more processors use the knowledge base 106 to set up the algorithms and the models.
  • the knowledge base 106 includes the algorithm or the domain expert knowledge, the historical data, the historical analysis data, the external knowledge base, and the documentation.
  • the one or more processors uses the knowledge base 106 to not only detect anomalies in the time series data but also identify any deviations, insights, or patterns present within the time series data.
  • the expanded functionality allows the one or more processors to leverage the comprehensive information stored in the knowledge base 106a, enabling them to detect a broader range of data anomalies and extract valuable insights that may inform decision-making processes or trigger proactive interventions in the technical installation.
  • the one or more processors may detect the anomaly in terms of events with direct consequences for business such as shutdown, upset, or quality by utilizing the historical data.
  • an LSTM Auto encoders trained on historical health data is used to detect the anomaly.
  • the one or more processors uses the knowledge base 106 to define or identify a baseline for the normal, minor, and major deviation for the measured time series data for different variables.
  • the one or more processors identify the baseline using the domain expert knowledge. For example, in the case of batch processes in the pharmaceutical industry, there are specific quality control aspects that restrict the variation of each variable and define the path each variable should take when a batch progresses. This information may be used to define specific descriptions for what can be called as normal for each variable and what should be considered as a deviation.
  • the one or more processors uses the knowledge base 106 to identify dependencies of measured variables on other variables. The dependencies may be identified using the existing algorithms stored in the knowledge base 106. For example, in the case of operation in the pharmaceutical industry, the speed of the drum motor is causal for the temperature spike in the time series data. This information is stored in the knowledge base 106 and may be used to identify the dependencies.
  • the one or more processors uses the external knowledge base or the documentation to identify consequence/remedial action.
  • the one or more processors may generate the knowledge base including cause, remedial actions, preventive and corrective action, limitations, and other information for different components, sensor tags, and the variables.
  • the external knowledge base or the documentation may include external and internal data sources such as public Large Language Models (LLMs), blogs, etc.
  • LLMs public Large Language Models
  • the one or more processors may also generate metadata for the knowledge corpus.
  • the one or more processors sets up a model or algorithm to detect deviation parameters for sensor tags or the variables and detect the major, minor, and normal deviations.
  • the one or more processors may set the Al model to detect and identify the deviation parameters for the variables using the time series data.
  • the Al model is used to analyze the time series data for specific patterns that indicate the major or minor deviations.
  • the analysis may include trend analysis, seasonality detection, change point detection, or pattern-matching techniques to identify abnormal behavior.
  • one or more processors may detect the deviation parameter for temperature depending on the time taken using the Al model. For instance, a temperature spike of 10C within 5 sec is an anomalous behavior.
  • examples of the algorithm may include statistical methods (e.g., Z-score, Grubbs' test), machine learning approaches (e.g., Isolation Forest, One-Class SVM), or time series- specific techniques (e.g., Seasonal Hybrid ESD, LSTM autoencoders). These methods learn the patterns of normal behavior and flag instances that significantly deviate from those patterns as anomalies.
  • statistical methods e.g., Z-score, Grubbs' test
  • machine learning approaches e.g., Isolation Forest, One-Class SVM
  • time series- specific techniques e.g., Seasonal Hybrid ESD, LSTM autoencoders
  • the one or more processors splits the time series data into the pre-defined time segments based on the predefined time interval. Further, the one or more processors split the time series data based on a determination whether the process is a continuous process or a batch process.
  • the one or more processors determines the deviation in the time series data and classifies the deviation as one of the major deviation, the minor deviation, or the normal deviation.
  • the one or more processors may use the Al model that is set up using the knowledge corpus such as the definitions for normal (baseline) as well as deviations.
  • the one or more Al models monitors the individual variables for specific sliding windows to identify the deviations.
  • the one or more Al models may be trained based on the historical data, the historical analysis data, the information corresponding to the one or more sensors in the plant and standard operating procedure in the plant, and the domain expertise information for managing the deviation.
  • the one or more processors identify and classify the deviation in each time segment.
  • the one or more processors may store the result in a database.
  • the one or more processors may create a set of major deviation, a set of normal deviation, and a set of minor deviation.
  • the one or more processors describe each deviation in the set of major deviation and the set of minor deviation.
  • a description of deviation for temperature variable may be a template such as “Temperature changed from 20C to 30C during the initial batch window at the rate of 2C per minute which is abnormal”.
  • the one or more processors extract, using the industrial intelligence layer 106 , the one or more insights, the corrective actions, and the preventive actions for each deviation in the set of major deviation and the set of minor deviation. Further, the one or more processors may generate, using the industrial intelligence layer 106, the set of templates for the major deviation and the minor deviation based on the extracted the one or more insights, the corrective actions, and the preventive actions. The one or more processors may also detect anomalies in the time series data using the knowledge base 106a. The one or more processors may also use the information of dependencies of the measured variable on other variables.
  • the one or more processors generate, using the LLMs 106b, the natural language descriptions that include a summary for the major deviation and the minor deviation based on the generated set of templates. For instance, the one or more processors generate the summary for each deviation in the time window.
  • the summary provides an overview of problems in the process or the asset.
  • the one or more processors may generate the summary based on requirements such as for every day, week, or as required by the operator. Further, the one or more processors consider the knowledge base metadata information with other references along with the textual description generated from deviation for the summarization task to get the summary report for the user or operator.
  • FIG 4 illustrates a functional block diagram 400 of the system 100 for generating the natural language descriptions of the time series data, in accordance with an embodiment of the present invention.
  • the functional blocks 400 of the system are implemented as an industrial predictive analytics engine (IPAE).
  • the functional blocks 400 include a configuration application programming interface (API) 401, a Time Series (TS) verbalization API 403, a data source API 405, a model management API 407, an SQL database 409, an influx database 411, the knowledge base 106, an internet server 413, an ML model selection module 415, an external User Interface (UI) 417, and an output module 419.
  • the TS verbalization API 403 includes a TS verbalization backend layer 421, a data acquisition module 423, an information extraction module 425, a verbalization module 427, and an Al tool interface 429.
  • the configuration application API 401 configures the one or more assets 102 in the system 100.
  • the configuration application API 401 also adds any new asset in the system 100. Any new asset may be added to the system 100 by adding all related metadata, diagrams, sensors, KPIs, and any other available information to the SQL database 409. The new asset is added in the system 100 based on an input received by the operator 118 of the client device 108.
  • the operator 118 inputs a set of requirements that are stored in the memory of the edge device 110.
  • the set of requirements may be input by the operator 118 to the client device 108.
  • the set of requirements may comprise information pertaining to one or more parameters for predicting the condition of the one or more assets 102.
  • the set of requirements may include conditions for estimating the remaining useful life of the one or more assets 102.
  • the set of requirements may include information for continuous monitoring of the one or more assets 102.
  • the set of requirements may include information for calculating efficiency or metrics of performance of the one or more assets 102.
  • the TS verbalization backend layer 421 receives the information of the one or more assets 102 from the SQL database 409.
  • the information may include sensors, KPIs, and any other available information associated with the one or more assets 102.
  • the data source API 405 is configured for acquiring data from the sensing units 104.
  • the data source API 405 reads the multiple data sources and stores the acquired data in the influx database 411.
  • the data acquisition module 423 obtains the set of requirements for managing the one or more assets 102. Further, the data acquisition module 423 acquires the time series data from the influx database 411 and the information of the one or more assets 102 from the TS verbalization backend layer 421.
  • the information extraction module 425 is configured for extracting feature information associated with the selected one or more assets 105 based on the received data by the data acquisition module 423.
  • the features may be a pump temperature or pump pressure. Therefore, a pressure or temperature may be the feature we need to verbalize.
  • the knowledge base 106 refers to the database comprising data and information pertaining to the one or more assets 102 in the form of the knowledge graph.
  • the knowledge base 106 is further connected to an internet server 413 to use features of Al tools via the internet.
  • the model management API 407 manages a plurality of Al models stored in the memory unit 204.
  • the ML model selection module 415 selects the Al model among the plurality of Al models stored in the memory unit 204.
  • the verbalization module 427 manages, monitors, and detects anomalies and recommends actions using the Al model.
  • the verbalization module may generate the natural language descriptions.
  • the generated natural language descriptions may include a short-term verbalization summary or a long-term verbalization summary based on the input from the operator 118.
  • the output module 419 is configured to provide the generated natural language descriptions that include the short-term verbalization summary or the long-term verbalization summary to the operator 118.
  • the short-term verbalization summary may include KPI trends for a short duration, variance information in the time series data, outliner factor in the time series data, health information of the asset, information of control parameter for the asset, alarms, the one or more insights, KPI forecasts, or optimization result.
  • the long-term verbalization summary may include health degradation data, downtime information, or a summary of the KPI of the last month.
  • the Al tool interface 429 uses the LLM 106b and Al tools may compare the generated summary with a query given by the operator 118.
  • the external UI 417 presents the answer to the query on the client device 108.
  • FIG 5 illustrates a flow chart of a method 500 for generating the natural language descriptions of the time series data, in accordance with an embodiment of the present invention.
  • the method 500 includes a series of operation steps 502 through 512 performed by the processing unit 202 of the system 100.
  • the processing unit 202 receives the time series data for the one or more variables measured by the one or more sensors of the sensing unit 104.
  • the one or more variables are associated with the one or more assets 102. Further, the processing unit 202 may split the time series data into pre-defined time segments based on the predefined time interval.
  • a chemical dye factory may install a network of the one or more sensors 104 to monitor various parameters critical to operations of the chemical dye factory.
  • the one or more sensors 104 may continuously collect the time series data representing variables like temperature, pressure, and chemical concentrations across different stages of the production process in the chemical dye factory.
  • the processing unit 202 in the chemical dye factory receives the time series data transmitted by the one or more sensors 104 throughout the facility. This time series data may be obtained in real-time and reflects the current operational status of the dye factory.
  • each variable in the time series data may corresponds to a specific asset or piece of equipment within the dye factory.
  • temperature readings may come from sensors installed in reactors, pressure measurements from valves, and chemical concentrations from analysers.
  • the processing unit 202 may split it into pre-defined time segments based on the predefined time interval. For instance, the time series data may be segmented into hourly intervals, each representing the measurements collected over the past hour. Thus, if the predefined time interval is one hour, the processing unit 202 segments the time series data into hourly segments. For each segment, the processing unit 202 may collect and aggregate the measurements recorded by the one or more sensors 104 during that hour.
  • the splitting of the time series data enables the processing unit 202 efficiently, identify trends, detect deviations, and generate insights on an hourly basis.
  • providing a structured approach to monitoring the chemical dye factory’ s performance over time and facilitates the detection of changes or abnormalities within each time segment.
  • the flow of the method 500 now proceeds to step 504.
  • the processing unit 202 detecting, by the one or more processors, a deviation in the time series data based on a comparison with predefined baseline for each variable of the one or more variables in the time series data using the deviation detection model.
  • the deviation detection model may among the one or more Al models 108, trained to recognize patterns and deviations in the time series data.
  • the deviation detection model may be configured to compare the real-time measurements with the predefined baselines established for each variable.
  • the deviation detection model is configured to compare the current measurements with the predefined baselines for each variable. Further, if the observed values deviate significantly from the expected ranges as included in the predefined baselines, the deviation detection model may flag such instances as deviations.
  • the predefined baseline for each variable of the one or more variables may be determined based domain expertise knowledge available in the knowledge base 106a.
  • the knowledge base 106a comprises of historical data, operational guidelines, and expert insights gathered from years of experience in the chemical dye processing. If the baseline temperature for a reactor is 80°C based on historical data and expert recommendations, the deviation detection model may identify any deviations from this baseline temperature during production. Further, over time, as the chemical dye factory’s processes evolve or equipment undergoes maintenance or upgrades, the baselines may be adjusted by the users accordingly to reflect changes in operational norms.
  • the deviation detection model continuously analyse the time series data in real-time, for detecting deviations or anomalies as soon as they occur in the technical installation.
  • the early detection enables prompt intervention by the technical installation operators, minimizing the potential impact of abnormal conditions on production efficiency, product quality, and safety.
  • the deviation detection model may allow for the establishment of predefined baselines tailored to the specific operational requirements and characteristics of the technical installation, in the example scenario such as the chemical dye factory.
  • the deviation detection model may adapt and refine its detection algorithms based on feedback from detected deviations and their outcomes.
  • the deviation detection model may learn from past incidents, identify recurring patterns, and continuously improve its ability to distinguish between normal variations and abnormal events.
  • the deviation detection model may be able to continuously monitor a wide range of process variables in real-time thus, well-suited for large-scale industrial environments like chemical dye factories.
  • the deviation detection model may scale seamlessly to accommodate additional sensors, variables, and production units as the plant expands or undergoes modifications, ensuring comprehensive coverage of all critical processes.
  • determining relationships between variables of the detected deviations may include, identifying the dependencies and causalities between the variables and determining relationships between variables based on the identification of dependencies.
  • the flow of the method 500 now proceeds to step 508.
  • the processing unit 202 classifies, the detected deviations of the time series data as one of the major deviation, the minor deviation, or no deviation using statistical techniques based on the determined relationships and a comparison of the detected deviation with the baseline.
  • the processing unit 202 may classify the deviation using the one or more Al models 108.
  • the deviation may be classified not only utilizing statistical techniques but also contextual analysis, domain expertise, and other relevant factors. For instance, in addition to statistical techniques, contextual factors such as operational constraints, equipment specifications, and historical performance data may also influence the classification of the time series data.
  • the temperature inside a reactor of the chemical dye factory may remain within a predefined range, for instance between 70°C and 80°C.
  • the processing unit 202 may continuously collect the time series data from temperature sensors installed within the reactor. Further, the processing unit 202 may detects the deviation in the time series data, indicating a change in the reactor’ s temperature outside the normal range. For instance, the temperature suddenly increases to 90°C, thus exceeding the predefined upper limit of 80°C.
  • the processing unit 202 may use statistical techniques, to compares the detected deviation (i.e., 90°C) with the predefined baseline temperature range (i.e., 70°C - 80°C). Consequently, based on this comparison and the severity of the deviation, the processing unit 202 may classify the deviation into one of the major deviation or minor deviation or no deviation.
  • the major deviation may correspond to if the temperature exceeds the predefined threshold significantly, such as reaching 90°C, it may be classified as a major deviation.
  • the major deviations indicate potentially critical issues that require immediate attention.
  • the minor deviation may correspond to if the temperature falls slightly outside the normal range, such as reaching 81 °C or 82°C.
  • the minor deviations may suggest deviations from normal operation but may not pose immediate risks to safety or production.
  • the no deviation may correspond if the temperature remains within the predefined range, such as between 70°C and 80°C, no deviation is detected, and normal operation may continue.
  • the flow of the method 500 now proceeds to step 510.
  • the processing unit 202 extract, using the industrial intelligence module 106 comprising the knowledge base 106b, the one or more recommendations for addressing the classified deviation, based on the determined relationships. Furthermore, the processing unit extracts the set of templates associated with the detected deviation from the knowledge base 106b.
  • the set of templates comprises statistically relevant information capable of describing the behaviour of the time series data. Further, a detailed description related to the various steps the industrial intelligence module 106 is already covered in the description related to Figures 1-4 and is omitted herein for the sake of brevity.
  • the processing unit 202 may classify the deviation in the reactor’s temperature as the major deviation, indicating a critical issue requiring immediate attention.
  • the processing unit 202 may extract recommendations for addressing the major deviation.
  • the recommendations may include “Immediately shutting down the reactor to prevent further temperature escalation”, “Initiating emergency cooling procedures to bring the temperature back within the normal range”, and “Notifying maintenance personnel to inspect the reactor for potential malfunctions or leaks”.
  • the users of the chemical dye factory may use the extracted recommendations to guide immediate actions for addressing the major deviation, ensuring operational safety and minimizing production disruptions.
  • the set of templates may provide structured framework for understanding the behaviour of temperature deviations in the reactor, facilitating informed decision-making and long-term process optimization efforts.
  • step 512 The flow of the method 500 now proceeds to step 512.
  • the processing unit 202 generates using the industrial intelligence module 106, the natural language descriptions of the time series data based on the extracted one or more recommendations.
  • the natural language descriptions of the time series data may be based the predefined template and includes the summary for the major deviation and the minor deviation. Furthermore, the natural language descriptions may be generated for each time segment of the pre-defined time segments. Furthermore, the generated natural language descriptions may be presented on the display screen to provide an alert to the user.
  • the processing unit 202 extracts recommendations for addressing the major deviation in the reactor’s temperature, such as initiating emergency cooling procedures and notifying maintenance personnel.
  • the processing unit 202 may utilize the industrial intelligence module 106, to generate the natural language descriptions of the time series data based on the extracted recommendations.
  • the natural language descriptions may be structured according to the predefined templates tailored to the reactor temperature control, ensuring consistency and clarity in the presented information.
  • the natural language description comprises of, a summary of the detected major deviation, highlighting the temperature spike to 90°C and the recommended actions for mitigating the issue.
  • the natural language description comprises of an overview of historical trends in the reactor temperature fluctuations, including past instances of major deviations and their corresponding responses.
  • the processing unit 202 may generate the natural language descriptions for each time segment of the pre-defined intervals, allowing operators of the chemical dye factory to track the progression of temperature fluctuations in the reactor over time.
  • the natural language descriptions may be generated for hourly intervals, providing insights into temperature trends and deviations during each hour of operation.
  • the processing unit 202 may present the generated natural language descriptions on the display screen, such as in the chemical dye factory’s control room to provide alerts to the users.
  • the operators may quickly review the natural language descriptions to understand the current status of the reactor temperature control, assess the severity of deviations, and take appropriate actions as needed.
  • the alerts may include visual indicators, audible alarms, and text notifications to ensure timely response to critical deviations and maintain operational safety and efficiency.
  • step 512 The flow of the method 500 now proceeds to step 512.
  • the processing unit 202 generates, using the industrial intelligence module, natural language descriptions of the time series data based on the extracted one or more recommendations .
  • a use case scenario for the disclosed method may be in an automobile logistics centre.
  • the process involves a lot of tightly coupled equipment and a simple anomaly/deviation in one piece of equipment could cause a major production delay.
  • These processes require specific equipment (like Robotic Arms) to perform specific operations at a specific point in time.
  • the normal/baseline actions of this equipment may be easily defined from some SOPs/ User Manuals.
  • the disclosed method provides a textual description of the problem covering the causal analysis to the operator instead of a simple threshold-based alarm. The operator may easily comprehend the textual description and thereby the changes of overlook are reduced. Providing the textual description also reduces the heavy reliance on experienced operators to evaluate and resolve the problem.
  • the present invention provides various technical advancements based on the key features discussed above.
  • the disclosed method allows users to understand the patterns, trends, and insights of the time series data without requiring specialized knowledge or expertise in data analysis. Further, the disclosed method enables effective communication between data analysts and decision-makers by describing the key findings, important trends, or anomalies in natural language that facilitates clear and concise communication, enabling stakeholders to make informed decisions based on the information provided. Further, the disclosed method enhances the interpretability and transparency of time series models in critical application areas such as finance, healthcare, or fraud detection. Furthermore, the verbalization helps decision-makers understand the potential impact of different actions or scenarios thereby providing support in strategic planning, risk assessment, resource allocation, and other decision-making processes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Human Resources & Organizations (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Automation & Control Theory (AREA)
  • Operations Research (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Medical Informatics (AREA)
  • Quality & Reliability (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Testing And Monitoring For Control Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclosed is a computer-implemented method for generating natural language descriptions of time series data. The method includes receiving the time series data for one or more variables measured by one or more sensors. Further, the method includes detecting a deviation in the time series data based on a comparison with predefined baseline for each variable in the time series data using a deviation detection model. Further, the method includes determining relationships between variables. Further, the method includes classifying the detected deviations as one of a major deviation, a minor deviation, or no deviation. Furthermore, the method includes extracting one or more recommendations for addressing the classified deviations. Furthermore, the method includes generating the natural language descriptions that include a summary of the major deviation and the minor deviation based on the generated set of templates.

Description

Description
SYSTEMS AND METHODS FOR GENERATING NATURAL LANGUAGE DESCRIPTIONS OF TIME SERIES DATA
CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims priority to and the benefit of patent application number 202441032988 titled “SYSTEM AND METHOD FOR ANSWERING A NATURAL LANGUAGE USER QUERY IN AN INDUSTRIAL ENVIRONMENT”, filed in the Indian Patent Office on April 25, 2024. The specification of the above referenced patent application is incorporated herein by reference in its entirety.
The present invention relates to the field of data analytics systems and more particularly relates to systems and methods for generating natural language descriptions of time series data an using an industrial intelligence module having a large language model and a knowledge base (LLM).
In an industrial environment, numerous assets are employed for performing different processes. The assets may include mechanical systems, electromechanical systems, electronic systems, and other systems. These assets may have different attributes such as availability, performance, behaviour, efficiency, maintainability, reliability, serviceability, and other attributes. These attributes of each of the assets may affect the overall performance of the assets in the industrial environment. In order to monitor the overall performance of the assets, a plant operator at any point in time must monitor multiple sensors that are providing time series data critical to the process.
The operator is required to interpret the time series data based on his domain understanding to identify and comprehend trends that could lead to potential problems in the asset within the industrial environment. The domain understanding of the trends in the time series data that could lead to potential anomalies is gained through experience and is not documented normally. The operator with limited experience in these systems might not be able to identify these trends and even experienced operators might miss some of these trends in the early stages as they are focused on monitoring several variables. This problem arises as there are several limitations in comprehending the time series data. The time series data is complex and difficult to interpret. Also, the time series data is continuously generated and therefore the volume of the time series data is huge. Further, as the systems include multiple sensors, the dimensions of the time series data grow exponentially. Furthermore, as the time series data is captured in the form of unstructured data, the data comprehension difficulty is further enhanced.
Some conventional prediction systems use Al models which require an expert operator to interpret the time series data. However, these conventional prediction systems do not translate results into a form that eases the decision-making for the operator. Such conventional prediction systems use domain knowledge which is in textual form and associating the domain knowledge with the time series data is not feasible because of a multimodality of data. Further, another limitation of these conventional prediction systems is the large amount of time taken by the expert operator to manually or via Al system to identify the anomaly and/or derive insights/KPIs for assets in the plant and then apply the domain knowledge to feed in the information. Further, there are a smaller number of expert operators available to make inferences from the time series data to knowledge.
Some conventional prediction systems identify change from a particular operating range of the equipment and flag that as an anomaly using alarms without the aspect of underlying context and collating the results in textual form providing better inferences. These conventional prediction systems mostly provide insights based on the time series data and algorithms that operate on this time series data to identify the problem. However, these conventional prediction systems lack key insights that could be derived from the time series data for anomaly detection as well as resolution.
In light of the above, there exists a need for an improved system and method for generating natural language descriptions of the time series in a technical installation.
Therefore, it is an object of the present invention to provide a system, apparatus, and method for analyzing the time series data and generating natural language descriptions of the time series data in the technical installation.
Throughout the present disclosure, the term “industrial environment” and “technical installation” as used herein interchangeably refers to a technical set-up with a plurality of assets such as a power plant, wind farm, power grid, manufacturing facility, process plants, buildings (residential or non-residential areas) and so on. Examples of an industrial environment or technical installation may include a complex industrial set-up such as a manufacturing facility, process plants, storage facility, transportation. It will be appreciated that the industrial environment may refer to any vertical and/or domain in business. For example, different verticals treated as industrial environment for the purpose of this disclosure, may include but not limited to automobiles, textiles, every distribution, energy production, buildings, factories, and medical equipment. For the sake of simplicity and brevity of the invention, the invention is explained with respect to industrial environment. A person skilled in the art would understand that the concepts disclosed herein can be applied across multiple domains such as finance, marketing, legal, medicine, etc. Therefore, the claims appended herein shall be construed limiting to the industrial environment only.
Throughout the present disclosure, the term “industrial intelligence module” refers to a component of one or more processor configured to process, analyse, and interpret data in the technical installation. The industrial intelligence module may alternatively be referred to as the industrial intelligence layer (IIL), within the scope of the present invention. For the context of the present invention, the meaning of the term “industrial intelligence module” is consistent with the system disclosed in priority application 202441032988. The industrial intelligence module may utilize advanced artificial intelligence techniques and tools to generate actionable insights and recommendations based on a time series data received. The industrial intelligence module provides a comprehensive solution for monitoring, analysing, and improving industrial processes. The industrial intelligence module generates understandable and actionable insights, ultimately helping to optimize the performance and reliability of the technical installation.
Throughout the present disclosure, the term “sensors” refers to a devices or instruments used to measure various physical, chemical, or environmental parameters within the technical installation. The sensors collect data over time, which forms the basis of the time series data used for analysis. The “sensors” comprises of position sensors, rotary encoders, dynamometers, proximity sensors, current sensors, accelerometers, temperature sensors, acoustic sensors, voltage sensors associated with the assets in the technical installation that provide data related to assets.
Throughout the present disclosure, the term “assets” or “one or more assets” may refer to any device, system, instrument or machinery manufactured or used in an industry that may be employed for performing an operation. In some cases, assets may also include any devices or instruments deployed or functioning in a non-industrial environment such as buildings. Example of assets include any machinery in a technical system or technical installation/facility such as motors, gears, bearings, shafts, switchgears, rotors, circuit breakers, protection devices, remote terminal units, transformers, reactors, disconnectors, gear-drive, gradient coils, magnet, chillers, radio frequency coils, appliances, electronic devices, chillers, pumps, heat exchangers, cooling towers, air compressors, boilers, fluid bed driers, coating machines, carbonation towers etc.
Throughout the present disclosure, the term “time series data” refers to a sequence of data points collected or recorded at successive points in time, usually at uniform intervals. The time series data may be used to monitor and analyse the behaviour of various variables within the technical installation over time. The time series data may be fundamental for analysing trends, detecting anomalies, and making predictions in a technical installation. By collecting and examining this data over time, the system can identify deviations from expected behaviour, determine relationships between variables, classify the severity of issues, and generate natural language descriptions and recommendations for addressing these deviations.
Throughout the present disclosure, the term “one or more variables” refers to a different measurable quantities or attributes within the technical installation that are being monitored by sensors. The one or more variables represent the specific aspects of the system’s operation that can change over time and provide insight into the system’s performance, condition, and any potential issues. In an example, the one or more variables may be temperature within in the technical installation, pressure, flow rate, humidity. The one or more variables may be crucial for understanding the operational state of the technical installation. Monitoring of the one or more variables at real-time via sensors may enable the system to collect time series data, detect deviations, determine relationships between variables, classify the severity of deviations, and generate natural language descriptions and recommendations to address any issues. The comprehensive monitoring of the one or more variables may ensure the efficient, safe, and reliable operation of the technical installation.
Throughout the present disclosure, the term “deviation” refers to any anomaly or unexpected change in the measured variables that suggests a potential issue or abnormal condition within the technical installation. Detecting these deviations is critical for maintaining the safety, efficiency, and reliability of the system. In an example, detection of the deviations may be used to trigger further analysis, determine relationships, classify the severity of the deviations, and generate natural language descriptions and recommendations for addressing them. The deviation may be classified into major deviation and minor deviation.
Throughout the present disclosure, the term “knowledge base” as used herein refers to a heterogeneous database comprising information pertaining to the domain and the industrial environment. The knowledge base is a centralized repository or database that contains domain- specific information, asset specific information, process- specific information, sensor- specific information, etc. The knowledge base also contains information relevant to the specific industry, sector, or domain in which the organization operates. This may include technical specifications, manufacturing processes, equipment manuals, safety procedures, regulatory requirements, and industry standards. The knowledge base also comprises information retrieved from one or more Al models deployed in the industrial environment. The knowledge base also comprises information extracted from documentation of performance parameters, efficiency parameters, anomalies, root cause analysis, prediction, and resolution anomalies, etc. The knowledge base also comprises information extracted from knowledge graphs comprising information of the plant and its assets in a hierarchical manner. The knowledge base also comprises images, videos, audios etc. having information of the industrial environment.
Throughout the present disclosure, the term “natural language description” refers to a textual output from the industrial intelligence module post analysing and processing the time series data into human-readable text. The natural language description provides the technical analysis and findings of the industrial intelligence module articulated in a clear, concise, and understandable manner for users who may not be familiar with the technical details. In an example, if the time series data from a sensor shows a major deviation in temperature, the natural language description might be something like: "The temperature sensor has recorded a significant deviation from the baseline, indicating a potential overheating issue. It is recommended to inspect the cooling system and ensure proper ventilation.
Throughout the present disclosure, the term “user” as used herein refers to a human interacting with the system for responses. The user may also refer to a virtual assistant or co-pilot capable of querying the system with natural language query. Through the present disclosure, the term “one or more Al models” refer to a plurality of neural networks. Examples of neural networks include but are not limited to, convolutional neural network (CNN), deep neural network (DNN), recurrent neural network (RNN), and Restricted Boltzmann Machine (RBM). The learning technique for training each Al model uses a plurality of learning data to cause, allow, or control the system 100 to make a determination or analysis. Examples of learning techniques include but are not limited to supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning. At least one of a plurality of CNN, DNN, RNN, RMB models and the like may be implemented to thereby achieve execution of the present subject matter’s mechanism through the Al models.
Through the present disclosure, the term “Large Language Model (LLM)” refers to a type of artificial intelligence model designed to understand and generate human-like text based on vast amounts of natural language data. These models are built using deep learning architectures, particularly transformer-based architectures. LLMs are trained on massive datasets containing billions or even trillions of words from diverse sources such as books, articles, websites, and other textual content. The large-scale training enables the model to learn complex patterns, structures, and nuances of human language. LLMs are typically built using deep learning architectures, particularly transformer architectures. Transformers employ self-attention mechanisms to process input sequences and capture long-range dependencies, enabling the model to generate coherent and contextually relevant text. LLMs undergo two main stages of training: pre-training and fine-tuning. During pre-training, the model is trained on a large corpus of text data using unsupervised learning techniques to learn general language patterns and semantics. In fine-tuning, the pre-trained model is further optimized on domain- specific or task-specific datasets to adapt its knowledge and capabilities to specific applications. LLMs have the ability to generate human-like text based on given prompts or inputs. They can produce coherent paragraphs, articles, stories, code, or responses to questions by predicting the next words or tokens in the sequence based on the context provided. LLMs exhibit a strong understanding of context and semantics in natural language. They can infer meaning, resolve ambiguity, and generate text that is contextually relevant and coherent with the given input.
The object of the present invention is achieved by a computer-implemented method for generating natural language descriptions of time series data in a system. The method comprises receiving, by one or more processors, the time series data for one or more variables measured by one or more sensors in the technical installation.
In an embodiment, the method further comprises periodically splitting time series data into pre-defined time segments based on a predefined time interval. Advantageously, By dividing the time series data into smaller, manageable segments, it becomes easier to analyze and interpret the data. In an example, patterns, trends, and anomalies that may not be evident in the full dataset (time series data) may be more readily identified in smaller time intervals. Further, advantageously processing split time series data may reduce the computational load on the system. Thus, leading to faster processing and more efficient use of resources, especially important when dealing with large datasets.
Further, the method comprises detecting, by one or more processors, , a deviation in time series data based on a comparison with predefined baseline for each variable of the one or more variables in the time series data using a deviation detection model.
In an embodiment, the method further comprises determining the predefined baseline for each variable of the one or more variables based on domain expertise knowledge available in the knowledge base. Advantageously, the predefined baseline corresponds to realistic and expected values derived from domain expertise incorporating contextual knowledge about the processes and variables. Thus, contextual knowledge ensures that the predefined baseline may account for normal operational variations and conditions, reducing false positives in deviation detection. Therefore, predefined baseline based on domain expertise knowledge may distinguish between normal operational fluctuations and true anomalies. Advantageously, such distinction may improve sensitivity of the deviation detection model, to identify significant deviations that may indicate underlying issues or opportunities for improvement.
Further, the method comprises determining, by the one or more processors, relationships between variables of the detected deviations.
In an embodiment, the method further includes identifying the dependencies and causalities between the variables and determining relationships between variables based on the identification of dependencies. Further, the method comprises classifying, by the one or more processors, the detected deviations as one of a major deviation, a minor deviation, or a normal deviation using statistical techniques and Al model trained on specific domain knowledge. The classification may be based on the determined relationships and a comparison of the detected deviation with the baseline. Advantageously, the statistical techniques provide a robust foundation for detecting and classifying deviations by analysing historical data patterns and relationships. Thus, ensuring baseline accuracy for detecting deviations. Furthermore, advantageously, with the trained Al model, the method includes leveraging domain- specific knowledge to refine the classification process. Thus, the classification may be easily adaptable to various domains and applications, enhancing the method’s scalability and flexibility to different industrial scenarios.
Furthermore, the method comprises extracting, by the one or more processors, using the industrial intelligence module comprising a large language model and a knowledge base, one or more recommendations for addressing the classified deviation, based on the determined relationships. Advantageously, the one or more recommendations may be tailored to the specific context and domain, leveraging the knowledge base and large language model. Thus, ensuring that the suggested actions are relevant and effective in addressing the deviations identified.
In an embodiment, the method further comprises extracting a set of templates associated with the detected deviation from the knowledge base, wherein the set of templates comprises statistically relevant information capable of describing the behaviour of the time series data.
Furthermore, the method comprises generating, by the one or more processors using the industrial intelligence module, natural language descriptions of the time series data based on the extracted one or more recommendations.
In an embodiment, the method comprises generating the natural language descriptions of the time series data based a predefined template and the generated descriptions include a summary for the major deviation and the minor deviation. Advantageously, the predefined template ensures that the generated natural language descriptions follow a consistent standard format. This standardization makes it easier for users to understand and compare reports across different time periods and datasets. Furthermore, advantageously, the summaries corresponding to both major deviation and minor deviation, may ensure that all significant aspects of the time series data is covered. Consequently, providing a clear and detailed overview of the deviations, aiding in thorough analysis and decision-making. Furthermore, in another aspect, summarizing major and minor deviations separately may assist in prioritizing issues. For instance, major deviations may be highlighted for immediate attention, while minor deviations may be monitored and addressed as needed, ensuring that critical problems may not be overlooked.
In an embodiment, the natural language descriptions are generated for each time segment of the pre-defined time segments the generated description on a display screen to provide an alert. Advantageously, generating the natural language descriptions for each time segment and displaying them as alerts, may enable real-time monitoring. For instance, allowing operators to quickly identify and respond to deviations or anomalies as they occur, thus, enhancing the responsiveness and timeliness of interventions.
In an embodiment, the method comprises the summary for the major deviation and the minor deviation as a textual description of determined anomalies in the time series data. Advantageously, the textual descriptions provide a clear and concise explanation of the deviations, for example, anomalies detected in the time series data. Consequently, the textual descriptions help users quickly understand the nature and significance of the deviations without needing to interpret complex data visualizations or raw data. Furthermore, advantageously, the textual descriptions is accessible to a broader audience, including those who may not have technical expertise in data analysis. Thus, the broader accessibility may ensure that all relevant stakeholders may understand and act on the information (deviation summary). Furthermore, describing anomalies in the textual description may allow for the inclusion of contextual information which may consequently assist in diagnosing issues and planning effective responses. The textual descriptions may be used as training materials for new operators and as the knowledge base for future reference.
In an embodiment, the method comprises analysing, by one or more processors, a status of the system at a particular point in time to determine anomalies in the time series data. Advantageously, analysing the status at specific points in time may allow for the immediate detection of anomalies, enabling quick responses to potential issues before any escalation. Furthermore, the method comprises, implementing, by one or more processors, a sequence of troubleshooting queries to identify a root cause for the determined anomalies using the industrial intelligence module. Advantageously, the sequence of queries may streamline the root cause analysis, thus, reducing the time and effort required to pinpoint underlying problems. Furthermore, the method includes, identifying, by one or more processors, one or more insights, corrective actions, or preventive actions based on the identified root cause using the industrial intelligence module. Advantageously, the industrial intelligence module, may integrate domain- specific knowledge and advanced analytical models to enhance the accuracy and relevance of the root cause analysis and subsequent recommendations. Consequently, the industrial intelligence module may consider a wide range of factors and data points, providing a more holistic view of the system and its anomalies.
The object of the present invention is also achieved by an apparatus for generating natural language descriptions of time series data. The apparatus comprises a memory and one or more processors communicatively coupled to the memory. The memory comprises programmable instructions executable by the one or more processors. The programmable instructions, when executed by the one or more processors, cause the one or more processors to perform one or more methods, as discussed throughout the present invention.
The object of the present invention is also achieved by a system for generating natural language descriptions of time series data in a technical installation. In an embodiment the system includes the industrial intelligence module comprising of the knowledge base. Further, the industrial intelligence module is communicatively coupled with the one or more Al models and the apparatus for generating descriptions of time series data in the technical installation based on performing one or more methods, as discussed throughout the present invention.
The object of the present invention is also achieved by a computer-program product being disclosed. The computer-program product has machine-readable instructions stored therein, that when executed by one or more processors, cause the one or more processors to perform one or more methods, as discussed throughout the present invention.
The object of the present invention is also achieved by a non-transitory computer-readable medium being disclosed. The non-transitory computer-readable medium is encoded with executable instructions, that when executed by one or more processors, cause the one or more processors to perform one or more methods, as discussed throughout the present invention.
To further clarify the advantages and features of the present invention, a more particular description of the invention will be rendered by reference to specific embodiments thereof, which are illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail with the accompanying drawings.
These and other features, aspects, and advantages of the present invention will become better understood when the following detailed description is read with reference to the accompanying drawings in which like characters represent like parts throughout the drawings, wherein:
FIG 1A, IB illustrates a block diagram of a system for generating natural language descriptions of time series data, in accordance with an embodiment of the present invention;
FIG 2 illustrates a block diagram of the apparatus for generating the natural language descriptions of the time series data, in accordance with an embodiment of the present invention;
FIG 3 illustrates a diagram depicting a process flow for generating the natural language descriptions of the time series data, in accordance with an embodiment of the present invention;
FIG 4 illustrates a functional block diagram of the system for generating the natural language descriptions of the time series data, in accordance with an embodiment of the present invention; and
FIG 5 illustrates a flow chart of a method for generating the natural language descriptions of the time series data, in accordance with an embodiment of the present invention.
Further, skilled artisans will appreciate that elements in the drawings are illustrated for simplicity and may not have necessarily been drawn to scale. For example, the flow charts illustrate the method in terms of the most prominent steps involved to help to improve understanding of aspects of the present invention. Furthermore, in terms of the construction of the device, one or more components of the device may have been represented in the drawings by conventional symbols, and the drawings may show only those specific details that are pertinent to understanding the embodiments of the present invention so as not to obscure the drawings with details that will be readily apparent to those of ordinary skill in the art having the benefit of the description herein.
For the purpose of promoting an understanding of the principles of the invention, reference will now be made to the various embodiments and specific language will be used to describe the same. It will nevertheless be understood that no limitation of the scope of the invention is thereby intended, such alterations and further modifications in the illustrated system, and such further applications of the principles of the invention as illustrated therein being contemplated as would normally occur to one skilled in the art to which the invention relates.
It will be understood by those skilled in the art that the foregoing general description and the following detailed description are explanatory of the invention and are not intended to be restrictive thereof.
Reference throughout this specification to “an aspect”, “another aspect” or similar language means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrase “in an embodiment”, “in another embodiment” and similar language throughout this specification may, but do not necessarily, all refer to the same embodiment.
The terms “comprises”, “comprising”, or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps but may include other steps not expressly listed or inherent to such process or method. Similarly, one or more devices or sub-systems or elements or structures or components proceeded by “comprises... a” does not, without more constraints, preclude the existence of other devices or other sub-systems or other elements or other structures or other components or additional devices or additional sub-systems or additional elements or additional structures or additional components. FIG la illustrates a block diagram of a system 100 for generating natural language descriptions of time series data in a technical installation, in accordance with an embodiment of the present invention. The system includes one or more assets 102, a sensing unit 104, an industrial intelligence layer 106 comprising of a knowledge base 106a and a large language model (LLM) 106b. Further, the system includes one or more Al models 108, a client device 110, an edge device 112, a network 114, and an apparatus 116. The system 100 can be understood as an industrial intelligence layer (IIL), a chatbot or a query engine capable of assisting the users/plant operators with a structured response related to time series data.
The one or more assets 102 include mechanical systems, electromechanical systems, electronic systems, and other systems in an industrial environment such as a power plant, wind farm, power grid, manufacturing facility, process plants, and so on. Throughout the present disclosure, the term “assets” 102 refers to any device, system, instrument, or machinery manufactured or used in the industrial environment that may be employed for performing an operation. Examples of assets 102 include any machinery in a technical system or technical installation/facility such as motors, gears, bearings, shafts, switch gears, rotors, circuit breakers, protection devices, remote terminal units, transformers, reactors, disconnectors, gear-drive, gradient coils, magnet, radio frequency coils, appliances, electronic devices, chillers, pumps, heat exchangers, cooling towers, air compressors, boilers, fluid bed driers, coating machines, carbonation towers etc.
Throughout the present disclosure, the term sensing unit 104 may refer to one or more sensors for acquiring and transmitting time series data from the one or more assets 102. The sensing unit 104 may include, but are not limited to, position sensors, rotary encoders, dynamometers, proximity sensors, current sensors, accelerometers, temperature sensors, acoustic sensors, and voltage sensors. The sensing unit 104 may acquire real-time condition data indicative of one or more operating conditions of the one or more assets 102 in realtime.
Throughout the present disclosure, the term one or more variables may refer to aspects that are being measured by the sensing unit 104. The one or more variables may include, but are not limited to, position, rotation, force, distance, electric current, acceleration, temperature, sound or vibrations, and voltage. In an embodiment, the industrial intelligence module 106 may refer to a component of one or more processor configured to process, analyse, and interpret data in the technical installation. The industrial intelligence module may alternatively be referred to as the industrial intelligence layer (IIL), within the scope of the present disclosure. The industrial intelligence module may utilize advanced artificial intelligence techniques and tools to generate actionable insights and recommendations based on a time series data received. The industrial intelligence module provides a comprehensive solution for monitoring, analysing, and improving industrial processes. The industrial intelligence module generates understandable and actionable insights, ultimately helping to optimize the performance and reliability of the technical installation. The industrial intelligence module 106 may be communicatively coupled with the one or more Al models 108.
Throughout the present disclosure, the term ‘knowledge base' 106a may refer to a database comprising data and information pertaining to the one or more assets 102 in the form of a knowledge graph comprising a plurality of nodes linked to each other. The knowledge base 106a may include an algorithm or domain expert knowledge to identify normal, minor, and major deviations for measured time series data. The knowledge base 106a may also include historical data including information of past deviations in the time series data. The knowledge base 106a may also include historical analysis data including information of past analysis of the time series data. The knowledge base 106a may also include information corresponding to the one or more sensors in a plant and standard operating procedures in the plant. The knowledge base 106a may also include information of an external knowledge base or documentation to determine the normal, minor, and major deviation for the measured time series data. Furthermore, the knowledge base 106a may also have information extracted from the schematics of the plant including graphical representation of the process flow such as piping and instrumentation diagram, asset design sheets, and alike.
In an embodiment, the industrial intelligence module 106 comprises the knowledge base 106a which contains domain- specific information, best practices, historical data, and expert recommendations related to the technical installation. The knowledge base 106a may serve as a reference for the industrial intelligence module 106 to extract relevant information while extracting one or more recommendations for addressing deviation in the time series data. Thus, by leveraging the knowledge base 106a, the industrial intelligence module 106 may ensure that the one or more recommendations may not only be based on real-time data but also on historical insights and expert knowledge, leading to more accurate and actionable recommendations .
In an embodiment, the one or more Al models 108 comprises the plurality of neural network layers. Examples of neural networks include but are not limited to, convolutional neural network (CNN), deep neural network (DNN), recurrent neural network (RNN), and Restricted Boltzmann Machine (RBM). The one or more Al models 108 is configured to process and analyse large volumes of the time series data, to identify patterns, make predictions, and provide insights that are valuable for decision-making. In an example, the one or more Al models 108 may analyse the time series data collected from the one or more sensors 104 in the technical installation. The one or more sensors 104 measure various variables over time, such as temperature, pressure, or voltage, providing a continuous stream of data. The one or more Al models 108 are configured to detect deviations in the time series data based on comparing the measured data with the predefined baselines. Thus, the one or more Al models 108 may identify deviations from expected patterns. These deviations may indicate potential issues or anomalies in the operation of the technical installation. Furthermore, the one or more Al models 108 are configured to determine relationships between variables. Advantageously, based on determining relationships, the system 100 may determine root causes of the deviations and any potential impact.
Furthermore, the one or more Al models 108 is configured to classify the deviations based on the determined relationships and the magnitude of the detected deviations. In an example, the one or more Al models 108 may classify deviation as major deviations, minor deviations, or no deviations. Advantageously, the classification may assist the system 100 in prioritizing and focus attention on the most critical issues of the technical installation.
In an embodiment, the industrial intelligence module 106 is communicatively coupled with the one or more Al models 108 by leveraging the knowledge base 106a to extract one or more recommendations for addressing the classified deviations. The one or more recommendations comprises maintenance actions, operational adjustments, or other interventions aimed at resolving or mitigating the detected issues in the technical installation.
Furthermore, the industrial intelligence module 106 utilizes the large language model (LLM) 106b to generate the natural language descriptions of the time series data based on the extracted one or more recommendations. In an example, the LLM 106b may be trained on the domain knowledge pertinent to a particular industrial environment. Consequently, the LLM 106b understands the terminology, context, and nuances specific to that industry. For example, instead of presenting raw data or charts, the industrial intelligence module 106 utilizes the LLM 106b to generate a summary like “The temperature of Reactor has shown a consistent upward trend over the past 24 hours, indicating a possible need for cooling system adjustment”. Advantageously, generation of the natural language descriptions may allow for easy interpretation and communication of the insights derived from the time series data. Thus, facilitating informed decision-making by users or operators, engineers, or other stakeholders involved in managing the technical installation.
Furthermore, in an embodiment, the LLM trained with the domain knowledge pertinent to the particular industrial environment or the technical installation, to understand and effectively operate within the context of the technical installation. Thus, the LLM 106b is trained on datasets that include the specific terminologies, processes, workflows, and common scenarios relevant to the technical installation. Consequently, the LLM 106b becomes adept at recognizing patterns, understanding context, and generating text that is both accurate and contextually appropriate for that specific field. Furthermore, training the LLM 106b with the domain knowledge advantageously generate outputs that are highly accurate and relevant to the technical installation, thus, reducing the chances of misinterpretation or irrelevant suggestions. In an advantageous aspect, training the LLM 106b with the domain knowledge or the specific context in which the LLM 106b operates, the LLM 106b provides insights that are practical and actionable within the technical installation. Thus, the LLM 106b becomes proficient in the technical natural language used in the industry, ensuring clear and effective communication.
In an example scenario, in a chemical dye processing factory, operations may involve various complex processes such as mixing, heating, and chemical reactions, with stringent requirements for quality control and safety. The one or more assets used might include reactors, mixers, filtration units, and quality testing machines, all of which generate continuous streams of the time series data. Thus, in such scenario the LLM 106b may be trained for this specific environment, using data gathered from the factory, including sensor data, operational logs, maintenance records, and quality control reports. Further, the LLM 106b may also be trained using domain- Specific documents such as technical manuals, safety protocols, process descriptions, and industry standards to help the LLM 106b understand the procedural and regulatory context.
Consequently, the one or more Al models 108 analyzes the time series data to detect deviations and understand their implications, while the industrial intelligence module 106 enhances the analysis of the one or more Al models 108 by providing actionable recommendations and generating the natural language descriptions such as descriptive reports, thereby improving the overall intelligence and efficiency of managing the technical installation.
In an embodiment, an operator 118 provides a requirement for generating natural language descriptions of the time series data via the client device 110. For instance, the requirements may include a request for continuous monitoring of the one or more assets 102. In another example, the requirements may include a request for calculating efficiency or metrics of performance of the one or more assets 102. In yet another example, the requirements may include a request for generating one or more insights, corrective actions, and preventive actions for the deviation of the time series data.
The requirements may be stored in a memory of the edge device 112 or may be input to the edge device 112 by the operator 118. For example, the edge device 112 may be communicatively coupled to the client device 110. Non-limiting examples of client devices 110 include personal computers, workstations, personal digital assistants, and humanmachine interfaces. The client device 108 may enable the operator to input one or more requirements through a web-based interface. Upon receiving the one or more requirements from the operator, the edge device 112 transmits a request for generating the summary for the major deviation and the minor deviation of the time series data.
In the present embodiment, the apparatus 116 is deployed in a cloud computing environment. As used herein, “cloud computing environment” refers to a processing environment comprising configurable computing physical and logical resources, for example, networks, servers, storage, applications, services, etc., and data distributed over the network 114, for example, the internet. The cloud computing environment provides on- demand network access to a shared pool of configurable computing physical and logical resources. The apparatus 116 comprises of a network interface (not shown) for communicating with the one or more edge devices 112 via the network 114. In another embodiment, the apparatus 116 is an edge computing device. As used herein “edge computing” refers to a computing environment that is capable of being performed on an edge device (e.g., connected to one or more sensing units in an industrial setup and to a remote server(s) such as for computing server(s) or cloud computing server(s) on other end), which may be a compact computing device that has a small form factor and resource constraints in terms of computing power. A network of the edge computing devices can also be used to implement the apparatus. Such a network of edge computing devices is referred to as a fog network.
FIG lb illustrates a block diagram of the system 100 for generating the natural language descriptions of the time series data in the technical installation, in accordance with an embodiment of the present invention.
The functional blocks of the system 100 are implemented in conjunction with as an industrial predictive analytics engine (IPAE) 122 and intent and an industrial intelligence layer (IIL) 120. For the sake of brevity, the constructional and operational features of the IPAE 122 that are already explained in application PCT/EP2022/058317, are not explained in detail in the description of Fig lb and is omitted herein for the sake of brevity. Similarly, the constructional and operational features of the IIL 120 that are already explained in application 202441032988, the entire contents of which are incorporated herein by reference, are not explained in detail in the description of Fig lb.
In an embodiment, the operator or the user 118 initiates the query via the client device 110. The IIL 120 may receive the query from the client device 110 and provide it to the system 100 to relate and generate descriptions of the time series data. The entire contents of which are incorporated herein by reference of the application 202441032988 and is omitted herein for the sake of brevity. In an embodiment, the system 100 receives the time series data from the one or more sensors 104 via the IPAE 122. The entire contents of which are incorporated herein by reference of the application PCT/EP2022/058317and is omitted herein for the sake of brevity.
In an embodiment, the system 100 is configured to process the received time series data based on selecting the one or more Al models 108, for instance, domain tuned Al model 124a, statistical data Al models 124b, and the vision-based Al models 124c, depending on the nature of the time series data. In an embodiment, the one or more Al models 108 is configured to process incoming time series data by employing various analytical techniques tailored to the specific characteristics of the time series data and a source of the one or more sensors 104. This includes utilizing at least one of, the domain tuned Al models 124a, the statistical data Al models 124b, and the vision-based Al models 124c, depending on the nature of the time series data being received. Additionally, the system 100 may dynamically selects the one or more Al models 108 based on the source of the one or more sensors 104, thus, ensuring a finely tuned approach to extracting meaningful insights from the time series data. Thus, the selected one or more Al models 108 may detect anomalies, identify relationships between one or more variables, and classify deviations based on their significance.
In an example, the system 100 is configured to utilize the domain-tuned Al models 124a to analyse the time series data. For instance, if the time series data relates to machine performance, the system 100 may use the domain-tuned Al models 124a specifically trained on manufacturing data. For another instance, the system 100 may use the domain-tuned Al models 124a trained to detect anomalies in equipment behaviour or to predict when maintenance might be needed in the technical installation. Further, the domain-tuned Al models 124a are trained using the IPAE 122, the entire contents of which are incorporated herein by reference of the application PCT/EP2022/058317and is omitted herein for the sake of brevity.
In an example, the system 100 is configured to utilize the statistical data Al models 124b in response to receiving the time series data related to environmental conditions such as temperature and humidity. The system 100 may use the statistical data Al models 124b such as time-series analysis to identify trends or seasonal patterns in the time series data.
In an example, the system 100 is configured to utilize the vision Al models 124c. For instance, where the one or more sensors 104 may capture visual data, such as surveillance cameras monitoring the production floor of the technical installation, the vision Al models 124c may be used. Advantageously, the vision Al models 124c may analyse images or video feeds to detect defects in products or identify safety hazards at the technical installation.
In an embodiment, the industrial intelligence layer 106 receives the time series data from the one or more Al models 108 such as the domain tuned Al models 124a, the statistical Al data models 124b, and the vision-based Al models 124c. The industrial intelligence layer 106 includes the knowledge base 106a which stores information about the technical installation, its components, historical data, and best practices for maintenance and operation. The industrial intelligence module 106 is in constant communication with the one or more Al models 108. Thus, once the one or more Al models 108 have classified the deviation in the time series data, the industrial intelligence module 106 leverages the knowledge base 106a to extract one or more recommendations for addressing the classified deviation. The one or more recommendations comprises of actions to be taken to rectify the anomaly, such as adjusting operating parameters, scheduling maintenance, or investigating potential equipment failures.
In an embodiment, the industrial intelligence layer 106 is configured to generate natural language descriptions of the time series data using the Large Language Model (LLM) 106b. The LLM 106b is trained on vast amounts of text data and is capable of generating humanlike text based on input prompts or query by the user 118. In an example the natural language descriptions of the time series data may provide context, insights, and actionable information based on the detected deviations and recommended actions. Further, the LLM 106b is already explained in the application 202441032988 and is not explained in detail herein for the sake of brevity.
In an embodiment, the natural language descriptions of the time series data is displayed on the client device 110.
FIG 2 illustrates a block diagram of the apparatus 116 for generating the natural language descriptions of the time series data, in accordance with an embodiment of the present invention.
As shown in FIG 2, the apparatus 116 comprises a processing unit 202, a memory unit 204, a communication unit 206, an I/O interface 208, and an output unit 210. The apparatus 114 can be a computer, a workstation, or a virtual machine running on host hardware. As an alternative, the apparatus 114 can be a real or a virtual group of computers (the technical term for a real group of computers is “cluster”, the technical term for a virtual group of computers is “cloud”).
The processing unit 202 may include one or more processors as a single processing unit or several units. The processing unit 202 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the one or more processors are configured to fetch and execute computer-readable instructions and data stored in the memory unit 204.
The memory unit 204 includes one or more computer-readable storage media. The memory unit 204 may include non-volatile storage elements. Examples of such non-volatile storage elements may include magnetic hard discs, optical discs, floppy discs, flash memories, or forms of electrically programmable memories (EPROM) or electrically erasable and programmable (EEPROM) memories. In addition, the memory may, in some examples, be considered a non-transitory storage medium. The term “non-transitory” may indicate that the storage medium is not embodied in a carrier wave or a propagated signal. However, the term “non-transitory” should not be interpreted to mean that the memory is non-movable. In certain examples, a non-transitory storage medium may store data that can, over time, change (e.g., in Random Access Memory (RAM) or cache.
The memory unit 204 may further include any non-transitory computer-readable medium known in the art including, for example, volatile memory, such as static random-access memory (SRAM) and dynamic random-access memory (DRAM), and/or non-volatile memory, such as read-only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
The memory unit 204 includes a database 212 and an.
The database 212 is configured to be accessed by the processing unit 202 and stores information as required by processing unit 202 to perform the one or more functions. The database 212 may store the time series data for the one or more variables measured by one or more sensors of the sensing unit 104. The database 212 may also store historical data including information of past deviations in the time series data and extracted one or more insights, corrective actions, and preventive actions in the past. Additionally, the database 212 may serve dual functions in storing different types of information. Primarily, it holds the time series data, most of which is collected from one or more sensors of the sensing unit 104. Thus, the time series data from the one or more sensors of the sensing unit 104 provides a continuous stream of information that reflects the operational status over time. In contrast, the database 212 may also maintains the knowledge base, which includes historical data on past deviations in the time series data. The knowledge base comprises of extracted insights, corrective actions, and preventive actions that have been previously implemented.
The AI/ML module 214 may include an Artificial Intelligence (Al) model and Large Language Models (LLMs). Each Al model may include a plurality of neural network layers. Examples of neural networks include but are not limited to, convolutional neural network (CNN), deep neural network (DNN), recurrent neural network (RNN), and Restricted Boltzmann Machine (RBM). The learning technique for training each Al model uses a plurality of learning data to cause, allow, or control the system 100 to make a determination or analysis. Examples of learning techniques include but are not limited to supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning. At least one of a plurality of CNN, DNN, RNN, RMB models and the like may be implemented to thereby achieve execution of the present subject matter’s mechanism through the Al models. A function associated with the AI/ML module 214 may be performed through the non-volatile memory, the volatile memory, and the processing unit 202.
The term “Large Language Models (LLM)” may refer to a type of Al model designed to understand and generate human-like text based on vast amounts of natural language data. These models are built using deep learning architectures, particularly transformer-based architectures. LLMs are trained on massive datasets containing billions or even trillions of words from diverse sources such as books, articles, websites, and other textual content. The large-scale training enables the model to learn complex patterns, structures, and nuances of human language. LLMs are typically built using deep learning architectures, particularly transformer architectures.. LLMs undergo two main stages of training: pre-training and fine- tuning. During pre-training, the model is trained on a large corpus of text data using unsupervised learning techniques to learn general language patterns and semantics. In fine- tuning, the pre-trained model is further optimized on domain- specific or task-specific datasets to adapt its knowledge and capabilities to specific applications. LLMs have the ability to generate human-like text based on given prompts or inputs. They can produce coherent paragraphs, articles, stories, code, or responses to questions by predicting the next words or tokens in the sequence based on the context provided. LLMs exhibit a strong understanding of context and semantics in natural language. They can infer meaning, resolve ambiguity, and generate text that is contextually relevant and coherent with the given input. In an embodiment, the industrial intelligence module 106 generate the natural language descriptions of time series data using the LLM 106b, such as contextually relevant and coherent text, predictively constructing sentences based on the query from the client device 110. For instance, the LLM 106b may take the extracted one or more recommendations about the deviations and produce detailed, human-readable natural language descriptions that explain the data trends, deviations, and suggested actions in a manner that is easily understandable by the users.
Advantageously, the integration of the LLM 106b with the industrial intelligence module 106, which is communicatively coupled to the apparatus 116, generates insightful and articulate descriptions of the time series data, making complex time series analyses more accessible and actionable for users. For example, if the one or more sensors 104 detects a significant deviation in temperature within a manufacturing process, the one or more Al models 108 classifies it as a major deviation and identifies the relationship between temperature and other variables, such as pressure. The industrial intelligence module 106 then extracts the one or more recommendation, such as adjusting the cooling system. Consequently, the LLM 106b based on the one or more recommendation, generates the comprehensive report, describing the detected temperature anomaly, its potential causes, and the recommended corrective action, thereby facilitating quick and informed decisionmaking.
The communication unit 206 is configured to communicate sensor data, or any other content over a communication network. Further, the communication unit 206 may include a communication port or a communication interface for sending and receiving signals from the apparatus 114 via the communication network. The communication port or the communication interface may be a part of the processing unit 202 or may be a separate component. The communication port may be created in software or may be a physical connection in hardware. The communication port may be configured to connect with the communication network, external media, the display, or any other components in the system 100, or combinations thereof. The connection with the communication network may be a physical connection, such as a wired Ethernet connection, or may be established wirelessly as discussed above. Likewise, the additional connections with other components of the system 100 may be physical or may be established wirelessly. The communication unit 206 may include the Wi-Fi module or Bluetooth module for enabling wireless communication capability and data exchange capability between various modules of the system 100. The I/O interface 208 refers to hardware or software components that enable communication between various modules of the system 100. The I/O interface 208 serves as a communication medium for exchanging information, commands, signals, or query responses with other devices or systems. The I/O interface 208 may be a part of the processing unit 202 or maybe a separate component. The I/O interface 208 may be created in software or maybe a physical connection in hardware. The I/O interface 208 may be configured to connect with an external network, external media, the display, or any other components, or combinations thereof. The external network may be a physical connection, such as a wired Ethernet connection, or may be established wirelessly.
The output unit 210 comprises of a display device. The display device may be an Augmented Reality/Virtual Reality (AR/VR) device to display a virtual environment to the user. The display device may include a display screen. As a non-limiting example, the display screen may be Light Emitting Diode (LED), Liquid Crystal Display (LCD), Organic Light Emitting Diode (OLED), Active Matrix Organic Light Emitting Diode (AMOLED), or Super Active Matrix Organic Light Emitting Diode (AMOLED) screen. The display screen may be of varied resolutions. The output unit 210 is further configured for presenting the generated summary for the major deviation and the minor deviation on the client device 108.
The processing unit 202 is configured to extract information from the knowledge base 106 which comprises domain knowledge stored in the form of a knowledge graph. Further, the processing unit 202 is also configured for obtaining time series data from the sensing unit 104. Further, the processing unit 202 is configured to detect deviation in time series data based on a comparison with predefined baseline for each variable of the one or more variables in the time series data using the deviation detection model.. For instance, the processing unit 202 periodically splits the time series data into pre-defined time segments based on a predefined time interval. Then, the processing unit 202 determines the deviation in time series data for each time segment of the pre-defined time segments.
Further, the processing unit 202 is configured to classify the deviation as one of a major deviation, a minor deviation, or a normal deviation. For instance, the processing unit 202 determines a corresponding threshold value corresponding to each of the major deviation, the minor deviation, and the normal deviation. Then, the processing unit 202 classifies the deviation by comparing the deviation with the corresponding threshold values. Furthermore, in addition to using statistical analysis such as by comparing the deviation with the corresponding threshold value, additional AI/ML techniques, such as clustering or classification may be combined with causality or explainer modules, to classify the detected deviations of the time series data. Advantageously, enabling a more nuanced and accurate classification by identifying patterns and relationships within the time series data that might not be apparent through statistical analysis alone.
Thereafter, the processing unit 202 is configured to extract one or more insights, corrective actions, and preventive actions for the classified deviation. For instance, the processing unit 202 uses the one or more Al models 108 to extract the one or more insights, the corrective actions, and the preventive actions.
Further, the processing unit 202 is configured to extract a set of templates associated with the detected deviation from the knowledge base 106a. The set of templates comprises statistically relevant information capable of describing the behaviour of the time series data.
In an embodiment, the set of templates corresponds to structured frameworks or outlines the expected patterns and behaviours exhibited by the time series data under normal circumstances. The set of templates comprises of key statistical information, such as typical ranges, trends, and fluctuations, which may be considered indicative of normal operations of the technical installation. Upon detecting the deviation, the processing unit 202 may utilizes the set of templates to compare the observed data against the expected requirements of the technical installation. For example, if the time series data represents temperature readings in the manufacturing process, the set of templates may include information about the expected temperature range, typical variations, and any known patterns related to temperature fluctuations during different production phases. Thus, upon detecting the deviation in temperature, the processing unit 212 may refer to the set of templates to understand the nature and significance of the deviation. This allows the apparatus 116 to assess whether the observed deviation falls within acceptable bounds or if it requires further investigation and corrective action.
Additionally, while extracting the set of templates associated with the detected deviation from the knowledge base 106a the LLM 106b may generate the natural language descriptions based on the set of templates. Advantageously, the LLM 106b may add a layer of linguistic sophistication to the analysis, enabling the apparatus 116 to convey complex statistical findings in a more accessible and intuitive manner.
Furthermore, the processing unit 202 presents the summary for the major deviation and the minor deviation as the textual description of determined anomalies in the time series data on the output device. Advantageously, the usability of the apparatus 116 enhances by providing users with clear and concise summaries of the detected deviations. The textual descriptions generated by the processors may encapsulate the key insights from the analysis, allowing stakeholders to quickly grasp the nature and significance of the deviations without needing to delve into the raw data or statistical metrics.
In a scenario, the apparatus 116 may be based in the technical installation such as a manufacturing plant for monitoring and managing production processes, then the below workflow is followed: a. The one or more sensors 104 may be installed on various assets, such as machines, conveyor belts, and environmental controls, to measure variables like temperature, pressure, speed, and humidity. b. The apparatus 116 includes the one or more Al models 108 that may be configured to analyse the time series data from the one or more sensors 104 to detect deviations and classify them. c. The industrial intelligence module 106 with the knowledge base 106a, and in communication with the apparatus 116, may be used to recommend corrective actions based on detected deviations. d. The LLM 106b may be integrated with the industrial intelligence module 106 to generate the natural language descriptions of the time series data and the recommended actions.
Thus, the apparatus 116 may be configured to monitor a critical process involving a high- temperature furnace used for metal forging. a) The one or more sensors 104 on the furnace measure variables like temperature, pressure, and gas flow rates. b) The one or more Al models 108 may detect that the furnace temperature has deviated significantly from its baseline. The detected deviation shows the temperature spiked to 1200°C, while the baseline temperature is 1100°C. c) The one or more Al models 108 may analyse the relationship between temperature and other variables. Consequently, determining that the increase in temperature correlates with an unexpected increase in gas flow rate. d) Thus, based on the severity and potential impact, the deviation is classified as a major deviation. e) The industrial intelligence module 106 may refers to the knowledge base 106a and extracts the one or more recommendations. Consequently, suggesting reducing the gas flow rate and checking the gas valve for malfunctions. f) The LLM 106b takes the technical details and recommendations and generates the natural language description. g) The generated natural language description may be in form of summary, “On October 17, 2024, at 07:10, the furnace temperature in the metal forging process was detected to have a significant deviation. The temperature spiked to 1200°C, exceeding the baseline of 1100°C. This increase in temperature was found to correlate with an unexpected rise in the gas flow rate. Immediately, reduce the gas flow rate to bring the temperature back to its baseline. Proceed by inspecting the gas valve for any malfunctions or blockages that may have caused the increase in flow rate. Thus, address these recommendations promptly to stabilize the production process and prevent potential damage to the furnace. ”
In an embodiment, the present invention also contemplates a computer-program product, having machine-readable instructions stored therein, when executed by the one or more processors, cause the one or more processors to perform a method for generating natural language descriptions of the time series data in the industrial environment. The details on the method(s) performed by the one or more processors have been elaborated in subsequent paragraphs at least with reference to FIG. 5.
Further, the present invention also contemplates a non-transitory computer-readable medium encoded with executable instructions. The executable instructions, when executed by the one or more processors, cause the one or more processors to perform a method for generating natural language descriptions of the time series data in the industrial environment. The details on the method(s) performed by the one or more processors have been elaborated in subsequent paragraphs at least with reference to FIG. 5. FIG 3 illustrates a diagram depicting a process flow 300 for generating the natural language descriptions of the time series data, in accordance with an embodiment of the present invention.
At step 302 of the process flow 300, the time series data is input to the system 100. The one or more processors of the processing unit 202 receive the time series data measured by the one or more sensors of the sensing unit 104.
At step 304 of the process flow 300, the one or more processors set up algorithms and models to generate process-driven knowledge. The one or more processors use the knowledge base 106 to set up the algorithms and the models. The knowledge base 106 includes the algorithm or the domain expert knowledge, the historical data, the historical analysis data, the external knowledge base, and the documentation.
In an embodiment, the one or more processors uses the knowledge base 106 to not only detect anomalies in the time series data but also identify any deviations, insights, or patterns present within the time series data. Thus, the expanded functionality allows the one or more processors to leverage the comprehensive information stored in the knowledge base 106a, enabling them to detect a broader range of data anomalies and extract valuable insights that may inform decision-making processes or trigger proactive interventions in the technical installation.. The one or more processors may detect the anomaly in terms of events with direct consequences for business such as shutdown, upset, or quality by utilizing the historical data. For example, in the case of batch processes in the pharmaceutical industry, where the anomaly is directed by temperature setting which can lead to abnormal organic material growth in the batch and hence making the batch production literally waste, an LSTM Auto encoders trained on historical health data is used to detect the anomaly.
In an embodiment, the one or more processors uses the knowledge base 106 to define or identify a baseline for the normal, minor, and major deviation for the measured time series data for different variables. The one or more processors identify the baseline using the domain expert knowledge. For example, in the case of batch processes in the pharmaceutical industry, there are specific quality control aspects that restrict the variation of each variable and define the path each variable should take when a batch progresses. This information may be used to define specific descriptions for what can be called as normal for each variable and what should be considered as a deviation. In an embodiment, the one or more processors uses the knowledge base 106 to identify dependencies of measured variables on other variables. The dependencies may be identified using the existing algorithms stored in the knowledge base 106. For example, in the case of operation in the pharmaceutical industry, the speed of the drum motor is causal for the temperature spike in the time series data. This information is stored in the knowledge base 106 and may be used to identify the dependencies.
In an embodiment, the one or more processors uses the external knowledge base or the documentation to identify consequence/remedial action. The one or more processors may generate the knowledge base including cause, remedial actions, preventive and corrective action, limitations, and other information for different components, sensor tags, and the variables. The external knowledge base or the documentation may include external and internal data sources such as public Large Language Models (LLMs), blogs, etc. The one or more processors may also generate metadata for the knowledge corpus.
In an embodiment, the one or more processors sets up a model or algorithm to detect deviation parameters for sensor tags or the variables and detect the major, minor, and normal deviations. The one or more processors may set the Al model to detect and identify the deviation parameters for the variables using the time series data. The Al model is used to analyze the time series data for specific patterns that indicate the major or minor deviations. The analysis may include trend analysis, seasonality detection, change point detection, or pattern-matching techniques to identify abnormal behavior. For example, in a case of operation in a thermal process, one or more processors may detect the deviation parameter for temperature depending on the time taken using the Al model. For instance, a temperature spike of 10C within 5 sec is an anomalous behavior. Further, examples of the algorithm may include statistical methods (e.g., Z-score, Grubbs' test), machine learning approaches (e.g., Isolation Forest, One-Class SVM), or time series- specific techniques (e.g., Seasonal Hybrid ESD, LSTM autoencoders). These methods learn the patterns of normal behavior and flag instances that significantly deviate from those patterns as anomalies.
At step 306 of the process flow 300, the one or more processors splits the time series data into the pre-defined time segments based on the predefined time interval. Further, the one or more processors split the time series data based on a determination whether the process is a continuous process or a batch process. At step 308 of the process flow 300, the one or more processors determines the deviation in the time series data and classifies the deviation as one of the major deviation, the minor deviation, or the normal deviation. The one or more processors may use the Al model that is set up using the knowledge corpus such as the definitions for normal (baseline) as well as deviations. The one or more Al models monitors the individual variables for specific sliding windows to identify the deviations. In an embodiment, the one or more Al models may be trained based on the historical data, the historical analysis data, the information corresponding to the one or more sensors in the plant and standard operating procedure in the plant, and the domain expertise information for managing the deviation.
At step 310 of the process flow 300, the one or more processors identify and classify the deviation in each time segment. The one or more processors may store the result in a database. The one or more processors may create a set of major deviation, a set of normal deviation, and a set of minor deviation.
At step 312 of the process flow 300, the one or more processors describe each deviation in the set of major deviation and the set of minor deviation. In a non-limiting example, a description of deviation for temperature variable may be a template such as “Temperature changed from 20C to 30C during the initial batch window at the rate of 2C per minute which is abnormal”.
For instance, the one or more processors extract, using the industrial intelligence layer 106 , the one or more insights, the corrective actions, and the preventive actions for each deviation in the set of major deviation and the set of minor deviation. Further, the one or more processors may generate, using the industrial intelligence layer 106, the set of templates for the major deviation and the minor deviation based on the extracted the one or more insights, the corrective actions, and the preventive actions. The one or more processors may also detect anomalies in the time series data using the knowledge base 106a. The one or more processors may also use the information of dependencies of the measured variable on other variables.
At step 314 of the process flow 300, the one or more processors generate, using the LLMs 106b, the natural language descriptions that include a summary for the major deviation and the minor deviation based on the generated set of templates. For instance, the one or more processors generate the summary for each deviation in the time window. The summary provides an overview of problems in the process or the asset. The one or more processors may generate the summary based on requirements such as for every day, week, or as required by the operator. Further, the one or more processors consider the knowledge base metadata information with other references along with the textual description generated from deviation for the summarization task to get the summary report for the user or operator.
FIG 4 illustrates a functional block diagram 400 of the system 100 for generating the natural language descriptions of the time series data, in accordance with an embodiment of the present invention.
The functional blocks 400 of the system are implemented as an industrial predictive analytics engine (IPAE). The functional blocks 400 include a configuration application programming interface (API) 401, a Time Series (TS) verbalization API 403, a data source API 405, a model management API 407, an SQL database 409, an influx database 411, the knowledge base 106, an internet server 413, an ML model selection module 415, an external User Interface (UI) 417, and an output module 419.
The TS verbalization API 403 includes a TS verbalization backend layer 421, a data acquisition module 423, an information extraction module 425, a verbalization module 427, and an Al tool interface 429.
The configuration application API 401 configures the one or more assets 102 in the system 100. The configuration application API 401 also adds any new asset in the system 100. Any new asset may be added to the system 100 by adding all related metadata, diagrams, sensors, KPIs, and any other available information to the SQL database 409. The new asset is added in the system 100 based on an input received by the operator 118 of the client device 108.
The operator 118 inputs a set of requirements that are stored in the memory of the edge device 110. The set of requirements may be input by the operator 118 to the client device 108. For example, the set of requirements may comprise information pertaining to one or more parameters for predicting the condition of the one or more assets 102. In another example, the set of requirements may include conditions for estimating the remaining useful life of the one or more assets 102. In yet another example, the set of requirements may include information for continuous monitoring of the one or more assets 102. In yet another example, the set of requirements may include information for calculating efficiency or metrics of performance of the one or more assets 102. The TS verbalization backend layer 421 receives the information of the one or more assets 102 from the SQL database 409. The information may include sensors, KPIs, and any other available information associated with the one or more assets 102.
The data source API 405 is configured for acquiring data from the sensing units 104. The data source API 405 reads the multiple data sources and stores the acquired data in the influx database 411.
The data acquisition module 423 obtains the set of requirements for managing the one or more assets 102. Further, the data acquisition module 423 acquires the time series data from the influx database 411 and the information of the one or more assets 102 from the TS verbalization backend layer 421.
The information extraction module 425 is configured for extracting feature information associated with the selected one or more assets 105 based on the received data by the data acquisition module 423. For example, if the one or more assets 102 include a pump, then the features may be a pump temperature or pump pressure. Therefore, a pressure or temperature may be the feature we need to verbalize.
The knowledge base 106 refers to the database comprising data and information pertaining to the one or more assets 102 in the form of the knowledge graph. The knowledge base 106 is further connected to an internet server 413 to use features of Al tools via the internet.
The model management API 407 manages a plurality of Al models stored in the memory unit 204. The ML model selection module 415 selects the Al model among the plurality of Al models stored in the memory unit 204.
The verbalization module 427 manages, monitors, and detects anomalies and recommends actions using the Al model. The verbalization module may generate the natural language descriptions. The generated natural language descriptions may include a short-term verbalization summary or a long-term verbalization summary based on the input from the operator 118.
The output module 419 is configured to provide the generated natural language descriptions that include the short-term verbalization summary or the long-term verbalization summary to the operator 118. The short-term verbalization summary may include KPI trends for a short duration, variance information in the time series data, outliner factor in the time series data, health information of the asset, information of control parameter for the asset, alarms, the one or more insights, KPI forecasts, or optimization result.
The long-term verbalization summary may include health degradation data, downtime information, or a summary of the KPI of the last month.
The Al tool interface 429 uses the LLM 106b and Al tools may compare the generated summary with a query given by the operator 118. The external UI 417 presents the answer to the query on the client device 108.
FIG 5 illustrates a flow chart of a method 500 for generating the natural language descriptions of the time series data, in accordance with an embodiment of the present invention. The method 500 includes a series of operation steps 502 through 512 performed by the processing unit 202 of the system 100.
At step 502, the processing unit 202 receives the time series data for the one or more variables measured by the one or more sensors of the sensing unit 104. The one or more variables are associated with the one or more assets 102. Further, the processing unit 202 may split the time series data into pre-defined time segments based on the predefined time interval.
For instance, in a chemical dye factory may install a network of the one or more sensors 104 to monitor various parameters critical to operations of the chemical dye factory. The one or more sensors 104 may continuously collect the time series data representing variables like temperature, pressure, and chemical concentrations across different stages of the production process in the chemical dye factory.
Thus, the processing unit 202 in the chemical dye factory receives the time series data transmitted by the one or more sensors 104 throughout the facility. This time series data may be obtained in real-time and reflects the current operational status of the dye factory.
Further, each variable in the time series data may corresponds to a specific asset or piece of equipment within the dye factory. For example, temperature readings may come from sensors installed in reactors, pressure measurements from valves, and chemical concentrations from analysers. Furthermore, upon receiving the time series data, the processing unit 202 may split it into pre-defined time segments based on the predefined time interval. For instance, the time series data may be segmented into hourly intervals, each representing the measurements collected over the past hour. Thus, if the predefined time interval is one hour, the processing unit 202 segments the time series data into hourly segments. For each segment, the processing unit 202 may collect and aggregate the measurements recorded by the one or more sensors 104 during that hour.
Advantageously, the splitting of the time series data enables the processing unit 202 efficiently, identify trends, detect deviations, and generate insights on an hourly basis. Thus, providing a structured approach to monitoring the chemical dye factory’ s performance over time and facilitates the detection of changes or abnormalities within each time segment. The flow of the method 500 now proceeds to step 504.
At step 504, the processing unit 202, detecting, by the one or more processors, a deviation in the time series data based on a comparison with predefined baseline for each variable of the one or more variables in the time series data using the deviation detection model. In an example, the deviation detection model may among the one or more Al models 108, trained to recognize patterns and deviations in the time series data. The deviation detection model may be configured to compare the real-time measurements with the predefined baselines established for each variable.
In an example, the deviation detection model is configured to compare the current measurements with the predefined baselines for each variable. Further, if the observed values deviate significantly from the expected ranges as included in the predefined baselines, the deviation detection model may flag such instances as deviations.
Further, the predefined baseline for each variable of the one or more variables may be determined based domain expertise knowledge available in the knowledge base 106a.
In an example scenario, the knowledge base 106a comprises of historical data, operational guidelines, and expert insights gathered from years of experience in the chemical dye processing. If the baseline temperature for a reactor is 80°C based on historical data and expert recommendations, the deviation detection model may identify any deviations from this baseline temperature during production. Further, over time, as the chemical dye factory’s processes evolve or equipment undergoes maintenance or upgrades, the baselines may be adjusted by the users accordingly to reflect changes in operational norms.
Advantageously, the deviation detection model continuously analyse the time series data in real-time, for detecting deviations or anomalies as soon as they occur in the technical installation. Thus, the early detection enables prompt intervention by the technical installation operators, minimizing the potential impact of abnormal conditions on production efficiency, product quality, and safety. Further, the deviation detection model may allow for the establishment of predefined baselines tailored to the specific operational requirements and characteristics of the technical installation, in the example scenario such as the chemical dye factory. Furthermore, the deviation detection model may adapt and refine its detection algorithms based on feedback from detected deviations and their outcomes. Thus, the deviation detection model may learn from past incidents, identify recurring patterns, and continuously improve its ability to distinguish between normal variations and abnormal events. Furthermore, the deviation detection model may be able to continuously monitor a wide range of process variables in real-time thus, well-suited for large-scale industrial environments like chemical dye factories. Advantageously, the deviation detection model may scale seamlessly to accommodate additional sensors, variables, and production units as the plant expands or undergoes modifications, ensuring comprehensive coverage of all critical processes.
The flow of the method 500 now proceeds to step 506.
At step 506, the processing unit 202 determines relationships between variables of the detected deviations. In an embodiment, determining relationships between variables of the detected deviations may include, identifying the dependencies and causalities between the variables and determining relationships between variables based on the identification of dependencies. The flow of the method 500 now proceeds to step 508.
At step 508, the processing unit 202 classifies, the detected deviations of the time series data as one of the major deviation, the minor deviation, or no deviation using statistical techniques based on the determined relationships and a comparison of the detected deviation with the baseline. The processing unit 202 may classify the deviation using the one or more Al models 108. Furthermore, the deviation may be classified not only utilizing statistical techniques but also contextual analysis, domain expertise, and other relevant factors. For instance, in addition to statistical techniques, contextual factors such as operational constraints, equipment specifications, and historical performance data may also influence the classification of the time series data.
In an example scenario, during normal operation, the temperature inside a reactor of the chemical dye factory may remain within a predefined range, for instance between 70°C and 80°C. The processing unit 202 may continuously collect the time series data from temperature sensors installed within the reactor. Further, the processing unit 202 may detects the deviation in the time series data, indicating a change in the reactor’ s temperature outside the normal range. For instance, the temperature suddenly increases to 90°C, thus exceeding the predefined upper limit of 80°C. In the non-limiting example, the processing unit 202 may use statistical techniques, to compares the detected deviation (i.e., 90°C) with the predefined baseline temperature range (i.e., 70°C - 80°C). Consequently, based on this comparison and the severity of the deviation, the processing unit 202 may classify the deviation into one of the major deviation or minor deviation or no deviation.
In the example scenario, the major deviation may correspond to if the temperature exceeds the predefined threshold significantly, such as reaching 90°C, it may be classified as a major deviation. Thus, the major deviations indicate potentially critical issues that require immediate attention.
In the example scenario, the minor deviation may correspond to if the temperature falls slightly outside the normal range, such as reaching 81 °C or 82°C, The minor deviations may suggest deviations from normal operation but may not pose immediate risks to safety or production.
In the example scenario, the no deviation may correspond if the temperature remains within the predefined range, such as between 70°C and 80°C, no deviation is detected, and normal operation may continue. The flow of the method 500 now proceeds to step 510.
At step 510, the processing unit 202 extract, using the industrial intelligence module 106 comprising the knowledge base 106b, the one or more recommendations for addressing the classified deviation, based on the determined relationships. Furthermore, the processing unit extracts the set of templates associated with the detected deviation from the knowledge base 106b. The set of templates comprises statistically relevant information capable of describing the behaviour of the time series data. Further, a detailed description related to the various steps the industrial intelligence module 106 is already covered in the description related to Figures 1-4 and is omitted herein for the sake of brevity.
In an example scenario, as the temperature of the reactor in the chemical dye factory spiked to 90°C, exceeding the normal range of 70°C to 80°C, the processing unit 202 may classify the deviation in the reactor’s temperature as the major deviation, indicating a critical issue requiring immediate attention. Thus, based on the classified deviation and the determined relationships between variables e.g., temperature, pressure, and flow rate, the processing unit 202 may extract recommendations for addressing the major deviation. In the example scenario, the recommendations may include “Immediately shutting down the reactor to prevent further temperature escalation”, “Initiating emergency cooling procedures to bring the temperature back within the normal range”, and “Notifying maintenance personnel to inspect the reactor for potential malfunctions or leaks”.
Advantageously, the users of the chemical dye factory may use the extracted recommendations to guide immediate actions for addressing the major deviation, ensuring operational safety and minimizing production disruptions. Further, the set of templates may provide structured framework for understanding the behaviour of temperature deviations in the reactor, facilitating informed decision-making and long-term process optimization efforts.
The flow of the method 500 now proceeds to step 512.
At step 512, the processing unit 202 generates using the industrial intelligence module 106, the natural language descriptions of the time series data based on the extracted one or more recommendations. The natural language descriptions of the time series data may be based the predefined template and includes the summary for the major deviation and the minor deviation. Furthermore, the natural language descriptions may be generated for each time segment of the pre-defined time segments. Furthermore, the generated natural language descriptions may be presented on the display screen to provide an alert to the user.
In an example scenario, the processing unit 202 extracts recommendations for addressing the major deviation in the reactor’s temperature, such as initiating emergency cooling procedures and notifying maintenance personnel. The processing unit 202 may utilize the industrial intelligence module 106, to generate the natural language descriptions of the time series data based on the extracted recommendations. The natural language descriptions may be structured according to the predefined templates tailored to the reactor temperature control, ensuring consistency and clarity in the presented information. For example, the natural language description comprises of, a summary of the detected major deviation, highlighting the temperature spike to 90°C and the recommended actions for mitigating the issue. Further, the natural language description comprises of an overview of historical trends in the reactor temperature fluctuations, including past instances of major deviations and their corresponding responses.
Furthermore, in the example scenario the processing unit 202 may generate the natural language descriptions for each time segment of the pre-defined intervals, allowing operators of the chemical dye factory to track the progression of temperature fluctuations in the reactor over time. For instance, the natural language descriptions may be generated for hourly intervals, providing insights into temperature trends and deviations during each hour of operation.
Furthermore, in the example scenario the processing unit 202 may present the generated natural language descriptions on the display screen, such as in the chemical dye factory’s control room to provide alerts to the users. Advantageously, the operators may quickly review the natural language descriptions to understand the current status of the reactor temperature control, assess the severity of deviations, and take appropriate actions as needed. In a non-limiting example, the alerts may include visual indicators, audible alarms, and text notifications to ensure timely response to critical deviations and maintain operational safety and efficiency.
The flow of the method 500 now proceeds to step 512.
At step 512, the processing unit 202 generates, using the industrial intelligence module, natural language descriptions of the time series data based on the extracted one or more recommendations .
Further, a use case scenario for the disclosed method may be in an automobile logistics centre. In the automobile logistics centre, the process involves a lot of tightly coupled equipment and a simple anomaly/deviation in one piece of equipment could cause a major production delay. These processes require specific equipment (like Robotic Arms) to perform specific operations at a specific point in time. The normal/baseline actions of this equipment may be easily defined from some SOPs/ User Manuals. Further, due to the high number of alarms and the complexity of the system, there is a high chance for an operator to miss the alarm resulting in the situation getting even worse. The disclosed method provides a textual description of the problem covering the causal analysis to the operator instead of a simple threshold-based alarm. The operator may easily comprehend the textual description and thereby the changes of overlook are reduced. Providing the textual description also reduces the heavy reliance on experienced operators to evaluate and resolve the problem.
The present invention provides various technical advancements based on the key features discussed above. The disclosed method allows users to understand the patterns, trends, and insights of the time series data without requiring specialized knowledge or expertise in data analysis. Further, the disclosed method enables effective communication between data analysts and decision-makers by describing the key findings, important trends, or anomalies in natural language that facilitates clear and concise communication, enabling stakeholders to make informed decisions based on the information provided. Further, the disclosed method enhances the interpretability and transparency of time series models in critical application areas such as finance, healthcare, or fraud detection. Furthermore, the verbalization helps decision-makers understand the potential impact of different actions or scenarios thereby providing support in strategic planning, risk assessment, resource allocation, and other decision-making processes.
The various actions, acts, blocks, steps, or the like in the flow diagrams may be performed in the order presented, in a different order, or simultaneously. Further, in some embodiments, some of the actions, acts, blocks, steps, or the like may be omitted, added, modified, skipped, or the like without departing from the scope of the invention.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one ordinary skilled in the art to which this invention belongs. The system, methods, and examples provided herein are illustrative only and not intended to be limiting.
While specific language has been used to describe the present subject matter, any limitations arising on account thereto, are not intended. As would be apparent to a person in the art, various working modifications may be made to the method in order to implement the inventive concept as taught herein. The drawings and the foregoing description give examples of embodiments. Those skilled in the art will appreciate that one or more of the described elements may well be combined into a single functional element. Alternatively, certain elements may be split into multiple functional elements. Elements from one embodiment may be added to another embodiment.
The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practised with modification within the scope of the embodiments as described herein.
List of Reference Numerals

Claims

1. A computer-implemented method for generating natural language descriptions of time series data in a technical installation, the method comprising: receiving, by one or more processors, the time series data for one or more variables measured by one or more sensors in the technical installation; detecting, by the one or more processors, a deviation in the time series data based on a comparison with predefined baseline for each variable of the one or more variables in the time series data using a deviation detection model; determining, by the one or more processors, relationships between variables of the detected deviations; classifying, by the one or more processors, the detected deviations of the time series data as one of a major deviation, a minor deviation, or no deviation using statistical techniques based on the determined relationships and a comparison of the detected deviation with the baseline; extracting, by the one or more processors using an industrial intelligence module comprising a knowledge base, one or more recommendations for addressing the classified deviation, based on the determined relationships; and generating, by the one or more processors using the industrial intelligence module, natural language descriptions of the time series data based on the extracted one or more recommendations .
2. The method according to claim 1, further comprising generating the natural language descriptions of the time series data based on a predefined template.
3. The computer- implemented method according to claim 1 and 2, wherein the generated descriptions include a summary for the major deviation and the minor deviation.
4. The computer-implemented method according to any of the preceding claims, further comprising: determining predefined baseline for each variable of the one or more variables based domain expertise knowledge available in the knowledge base.
5. The computer- implemented method according to any of the preceding claims, wherein, determining relationships between variables of the detected deviations, comprises: identifying the dependencies and causalities between the variables; and determining relationships between variables based on the identification of dependencies.
6. The computer-implemented method according to any of the preceding claims, further comprising: extracting a set of templates associated with the detected deviation from the knowledge base, wherein the set of templates comprises statistically relevant information capable of describing the behaviour of the time series data.
7. The computer- implemented method according to any of the preceding claims, further comprising splitting, by the one or more processors, the time series data into pre-defined time segments based on a predefined time interval.
8. The computer- implemented method according to claim 7, wherein the natural language descriptions are generated for each time segment of the pre-defined time segments.
9. The computer-implemented method according to any of the preceding claims, further comprising presenting, by the one or more processors on an output device, the generated description on a display screen to provide an alert.
10. The computer-implemented method according to claim 1, wherein the knowledge base comprises: the historical data including information of past deviations in the time series data; historical analysis data including information of past analysis of the time series data; information corresponding to the one or more sensors in a plant and standard operating procedure in the plant; and domain expertise information for managing the deviation.
11. The computer-implemented method according to any of the preceding claims, further comprising: analysing, by one or more processors, a status of the system at a particular point in time to determine anomalies in the time series data; implementing, by one or more processors, a sequence of troubleshooting queries to identify a root cause for the determined anomalies using the industrial intelligence module and identifying, by one or more processors, one or more insights, corrective actions, or preventive actions based on the identified root cause using the industrial intelligence module.
12. The computer-implemented method according to claim 11, further comprising presenting, by one or more processors on an output device, the summary for the major deviation and the minor deviation as a textual description of determined anomalies in the time series data.
13. An apparatus for generating natural language descriptions of time series data, the apparatus comprising: a memory; and one or more processors communicatively coupled to the memory, wherein the memory comprises programmable instructions which, when executed by the one or more processors, cause the one or more processors to perform the method steps of claims 1 to 12.
14. A system for generating natural language descriptions of time series data in a technical installation, the system comprising: one or more sensors associated with one or more assets in the technical installations, wherein the one or more sensors configured to measure time series data for one or more variables; one or more Al models associated with the one or more assets in the technical installation, wherein the one or more Al models are communicatively coupled to the one or more sensors configured to: detect a deviation in the time series data based on a comparison with predefined baseline for each variable of the one or more variables in the time series data; determine relationships between variables of the detected deviations; classify the detected deviations of the time series data as one of a major deviation, a minor deviation, or no deviation based on the determined relationships and the detected deviation; an industrial intelligence module comprising a knowledge base, the industrial intelligence module communicatively coupled with the one or more Al models and configured to: extract one or more recommendations for addressing the classified deviation, based on the determined relationships; generate, using a large language model, descriptions of the time series data based on the extracted one or more recommendations; and an apparatus according to claim 13, communicatively coupled to with the industrial intelligence module, wherein the apparatus is configured for generating descriptions of time series data in the technical installation according to any of the preceding claims 1 to 12.
15. A computer-program product, having machine-readable instructions stored therein, which when executed by one or more processors, cause the one or more processors to perform a method according to any of the claims 1 to 12.
16. A non-transitory computer readable medium encoded with executable instructions which, when executed by one or more processors, cause the one or more processors to perform a method according to any one of the claims 1 to 12.
PCT/EP2024/069575 2024-04-25 2024-07-11 Systems and methods for generating natural language descriptions of time series data Pending WO2025223683A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN202441032988 2024-04-25
IN202441032988 2024-04-25

Publications (1)

Publication Number Publication Date
WO2025223683A1 true WO2025223683A1 (en) 2025-10-30

Family

ID=91946556

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/EP2024/069575 Pending WO2025223683A1 (en) 2024-04-25 2024-07-11 Systems and methods for generating natural language descriptions of time series data
PCT/EP2024/074558 Pending WO2025223685A1 (en) 2024-04-25 2024-09-03 System and method for answering a natural language user query in an industrial environment

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/EP2024/074558 Pending WO2025223685A1 (en) 2024-04-25 2024-09-03 System and method for answering a natural language user query in an industrial environment

Country Status (1)

Country Link
WO (2) WO2025223683A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200379454A1 (en) * 2019-05-31 2020-12-03 Panasonic Intellectual Property Management Co., Ltd. Machine learning based predictive maintenance of equipment
US20240125675A1 (en) * 2022-10-12 2024-04-18 Baker Hughes Holdings Llc Anomaly detection for industrial assets

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11803556B1 (en) * 2018-12-10 2023-10-31 Townsend Street Labs, Inc. System for handling workplace queries using online learning to rank
WO2024015321A1 (en) * 2022-07-11 2024-01-18 Pryon Incorporated Methods and systems for improved document processing and information retrieval

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200379454A1 (en) * 2019-05-31 2020-12-03 Panasonic Intellectual Property Management Co., Ltd. Machine learning based predictive maintenance of equipment
US20240125675A1 (en) * 2022-10-12 2024-04-18 Baker Hughes Holdings Llc Anomaly detection for industrial assets

Also Published As

Publication number Publication date
WO2025223685A1 (en) 2025-10-30

Similar Documents

Publication Publication Date Title
US11796993B2 (en) Systems, methods, and devices for equipment monitoring and fault prediction
CN113597634B (en) Process mapping and monitoring using artificial intelligence
Emmanouilidis et al. Enabling the human in the loop: Linked data and knowledge in industrial cyber-physical systems
CN112639781B (en) Knowledge graph for real-time industrial control system security event monitoring and management
US10318570B2 (en) Multimodal search input for an industrial search platform
US20180088566A1 (en) Selective online and offline access to searchable industrial automation data
EP3255581A1 (en) Digital pattern prognostics
da Silva Arantes et al. A novel unsupervised method for anomaly detection in time series based on statistical features for industrial predictive maintenance
EP4148523A1 (en) Intelligent asset anomaly prediction via feature generation
Zhao et al. Early fault diagnosis based on reinforcement learning optimized-SVM model with vibration-monitored signals
Wang et al. Research on fault diagnosis system for belt conveyor based on internet of things and the LightGBM model
WO2022115419A1 (en) Method of detecting an anomaly in a system
WO2024043888A1 (en) Real time detection, prediction and remediation of machine learning model drift in asset hierarchy based on time-series data
Wu et al. Intelligent fault diagnosis of rolling bearings based on clustering algorithm of fast search and find of density peaks
Giannoulidis et al. Engineering and evaluating an unsupervised predictive maintenance solution: a cold-forming press case-study
Frumosu et al. Mould wear-out prediction in the plastic injection moulding industry: a case study
US11537109B2 (en) Server and system for automatic selection of tags for modeling and anomaly detection
WO2025223683A1 (en) Systems and methods for generating natural language descriptions of time series data
Cheng et al. Variance shifts identification model of bivariate process based on LS-SVM pattern recognizer
Das et al. Real-time IoT data analysis for HVAC system maintenance
Megdadi et al. Machine Learning-Driven Best–Worst Method for Predictive Maintenance in Industry 4.0
US20240154413A1 (en) Systems and methods for facilitating the management of energy production or processing facilities
Kotti Industrial Automation with Safety Aspects using Machine Learning Techniques
Kumar et al. PREDICTIVE MAINTENANCE SYSTEM USING MACHINE LEARNING AND FASTAPI
Hussain et al. Using Big Data Analytics in PIMS

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24742563

Country of ref document: EP

Kind code of ref document: A1