[go: up one dir, main page]

CN120196528B - Operation and maintenance fault positioning method, device, equipment and storage medium based on large model - Google Patents

Operation and maintenance fault positioning method, device, equipment and storage medium based on large model

Info

Publication number
CN120196528B
CN120196528B CN202510678067.9A CN202510678067A CN120196528B CN 120196528 B CN120196528 B CN 120196528B CN 202510678067 A CN202510678067 A CN 202510678067A CN 120196528 B CN120196528 B CN 120196528B
Authority
CN
China
Prior art keywords
fault
data
maintenance
system operation
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202510678067.9A
Other languages
Chinese (zh)
Other versions
CN120196528A (en
Inventor
赵兴业
李廷
韩同
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Cloud Information Technology Co Ltd
Original Assignee
Inspur Cloud Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Cloud Information Technology Co Ltd filed Critical Inspur Cloud Information Technology Co Ltd
Priority to CN202510678067.9A priority Critical patent/CN120196528B/en
Publication of CN120196528A publication Critical patent/CN120196528A/en
Application granted granted Critical
Publication of CN120196528B publication Critical patent/CN120196528B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Test And Diagnosis Of Digital Computers (AREA)

Abstract

本申请公开了基于大模型的运维故障定位方法、装置、设备及存储介质,涉及计算机技术领域,包括:若监测到系统故障,则实时采集当前系统运维数据,并对当前系统运维数据进行预处理,以得到目标预处理后系统运维数据;提取目标预处理后系统运维数据对应的数据特征,并将数据特征输入至预设大模型,以通过预设大模型对数据特征进行分析,得到故障分析结果;基于故障树分析法构建与故障分析结果对应的故障树,以通过故障树确定系统故障的故障原因,并对故障原因进行分析,以完成故障定位。由此,当故障发生时,能够迅速收集与故障相关的各种信息快速准确地定位故障的根本原因,减少故障排查时间,提高故障处理效率。

The present application discloses a method, device, equipment and storage medium for locating operation and maintenance faults based on a large model, which relates to the field of computer technology, including: if a system fault is detected, the current system operation and maintenance data is collected in real time, and the current system operation and maintenance data is preprocessed to obtain the target preprocessed system operation and maintenance data; the data features corresponding to the target preprocessed system operation and maintenance data are extracted, and the data features are input into a preset large model to analyze the data features through the preset large model to obtain the fault analysis results; based on the fault tree analysis method, a fault tree corresponding to the fault analysis results is constructed to determine the cause of the system fault through the fault tree, and the cause of the fault is analyzed to complete the fault location. Thus, when a fault occurs, various information related to the fault can be quickly collected to quickly and accurately locate the root cause of the fault, thereby reducing the time for troubleshooting and improving the efficiency of fault handling.

Description

Operation and maintenance fault positioning method, device, equipment and storage medium based on large model
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for positioning an operation and maintenance fault based on a large model.
Background
In the present digital age, the scale and complexity of various types of information systems, network systems, and industrial control systems are increasing. These systems are typically made up of a large number of hardware devices, software components, and network connections, the operating state of which is affected by a number of factors, such as hardware aging, software vulnerabilities, network fluctuations, human operational errors, and the like. Once the system fails, serious consequences such as service interruption, data loss, service quality degradation and the like can be caused, and huge losses are brought to enterprises and users.
Traditional operation and maintenance management methods mainly depend on manual experience and simple monitoring tools. The operation and maintenance personnel monitor key indexes of the system, such as processor utilization rate, memory occupancy rate, network flow rate and the like by setting a plurality of fixed thresholds. When these indicators exceed a threshold, the system will issue an alarm to notify the operator. However, this method has the problems of insufficient fault prediction capability caused by dead plate of threshold setting and difficult positioning of fault root caused by complicated alarm information.
Disclosure of Invention
Therefore, the invention aims to provide an operation and maintenance fault positioning method, device, equipment and storage medium based on a large model, which can rapidly collect various information related to faults to rapidly and accurately position the root cause of the faults when the faults occur, reduce the fault troubleshooting time and improve the fault processing efficiency. The specific scheme is as follows:
In a first aspect, the application discloses an operation and maintenance fault positioning method based on a large model, which comprises the following steps:
If the system fault is monitored, current system operation and maintenance data are collected in real time, and the current system operation and maintenance data are preprocessed to obtain system operation and maintenance data after target preprocessing;
extracting data features corresponding to the system operation data after target preprocessing, and inputting the data features into a preset large model to analyze the data features through the preset large model so as to obtain a fault analysis result;
constructing a fault tree corresponding to the fault analysis result based on a fault tree analysis method, determining a fault cause of the system fault through the fault tree, and analyzing the fault cause to complete fault positioning.
Optionally, if the system fault is detected, current system operation and maintenance data are collected in real time, and the current system operation and maintenance data are preprocessed to obtain target preprocessed system operation and maintenance data, including:
If a system fault is detected, collecting current system operation data of a local system in real time, and carrying out data marking and data classification on the current system operation data to obtain first preprocessed system operation data;
Determining invalid data, repeated data and abnormal data in the first preprocessed system operation and maintenance data, and removing the invalid data, the repeated data and the abnormal data from the first preprocessed system operation and maintenance data to obtain second preprocessed system operation and maintenance data;
And carrying out normalization processing on the second preprocessed system operation data to convert the second preprocessed system operation data into a preset data format so as to obtain target preprocessed system operation data.
Optionally, the collecting the current system operation data of the local system in real time, and performing data labeling and data classification on the current system operation data to obtain first preprocessed system operation data, including:
Collecting current system logs, alarm information, performance index data, network flow data and hardware state data of a system in real time;
adding a time stamp to the system log, and sorting the system log based on the time stamp to obtain a sorted system log;
and generating an alarm event chain based on the alarm information, classifying the performance index data based on index types, classifying the hardware state data based on component types, and classifying the network flow data based on flow types.
Optionally, the analyzing the data features through the preset large model to obtain a fault analysis result includes:
Extracting information from the data features through a preset large model, and matching the extracted target information with a preset fault case to determine a target fault case matched with the target information in the preset fault case;
constructing a fault causal relationship graph based on the alarm event chain in the data characteristic;
Comparing the performance index data with historical performance index data to determine abnormal changes in the performance index;
analyzing the network traffic data to determine abnormal network behavior existing in the network traffic data;
Comparing the hardware state data with a preset hardware data threshold value to determine an abnormal hardware state in the hardware state data;
And taking the fault causal relationship graph, the abnormal change of the performance index, the abnormal network behavior, the abnormal hardware state and the target fault case as fault analysis results.
Optionally, the constructing a fault tree corresponding to the fault analysis result based on the fault tree analysis method to determine a fault cause of the system fault through the fault tree includes:
Identifying a fault top event, a fault middle event and a fault bottom event corresponding to the system fault based on the fault analysis result, and constructing a fault tree corresponding to the fault analysis result based on the fault top event, the fault middle event and the fault bottom event;
And determining a target fault bottom event with the highest probability of influencing the fault top event in the fault bottom events through the fault tree, and taking the target fault bottom event as a fault reason of the system fault.
Optionally, the analyzing the fault cause to complete fault localization includes:
And analyzing the fault reasons to determine fault occurrence time, fault occurrence components and fault abnormal data corresponding to the fault reasons so as to complete fault positioning.
Optionally, the operation and maintenance fault positioning method based on the large model further includes:
constructing a fault prediction model based on the pre-training model through an integrated learning method;
collecting current target system operation data in real time based on a preset time interval, and inputting the target system operation data into the fault prediction model to perform system fault prediction in real time through the fault prediction model so as to obtain a real-time fault prediction result;
And adjusting system parameters based on the fault prediction result so as to prevent system faults.
In a second aspect, the application discloses an operation and maintenance fault positioning device based on a large model, which comprises:
The data preprocessing module is used for acquiring current system operation and maintenance data in real time if the system fault is monitored, and preprocessing the current system operation and maintenance data to obtain target preprocessed system operation and maintenance data;
The fault analysis module is used for extracting data features corresponding to the system operation data after the target pretreatment, inputting the data features into a preset large model, and analyzing the data features through the preset large model to obtain a fault analysis result;
The fault positioning module is used for constructing a fault tree corresponding to the fault analysis result based on a fault tree analysis method so as to determine the fault cause of the system fault through the fault tree and analyzing the fault cause to finish fault positioning.
In a third aspect, the present application discloses an electronic device, comprising:
A memory for storing a computer program;
And the processor is used for executing the computer program to realize the operation and maintenance fault positioning method based on the large model.
In a fourth aspect, the present application discloses a computer readable storage medium for storing a computer program, wherein the computer program when executed by a processor implements an operation and maintenance fault localization method based on a large model as described above.
The method comprises the steps of monitoring system faults, acquiring current system operation and maintenance data in real time, preprocessing the current system operation and maintenance data to obtain target preprocessed system operation and maintenance data, extracting data features corresponding to the target preprocessed system operation and maintenance data, inputting the data features into a preset large model, analyzing the data features through the preset large model to obtain fault analysis results, constructing a fault tree corresponding to the fault analysis results based on a fault tree analysis method, determining fault reasons of the system faults through the fault tree, and analyzing the fault reasons to finish fault positioning. Therefore, by the method, the operation and maintenance data of the system are required to be acquired in real time after the system fault is detected, then the operation and maintenance data of the system are preprocessed, and the operation and maintenance characteristics of the preprocessed data are extracted. After the characteristics are extracted, the data characteristics are required to be analyzed through a preset large model, and a fault tree corresponding to the obtained analysis result is constructed, so that the fault cause of the system is determined through the finally obtained fault tree, and therefore, when the fault occurs, various information related to the fault can be rapidly collected to rapidly and accurately locate the root cause of the fault, the fault checking time is shortened, the fault processing efficiency is improved, and the rapid recovery and normal operation of the system are ensured.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of an operation and maintenance fault positioning method based on a large model;
FIG. 2 is a timing diagram of an operation and maintenance fault locating method based on a large model according to the present application;
FIG. 3 is a schematic diagram of an operation and maintenance fault locating device based on a large model;
fig. 4 is a block diagram of an electronic device according to the present disclosure.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the prior art, the traditional operation and maintenance management method mainly relies on manual experience and a simple monitoring tool, for example, a threshold is manually set to detect key indexes of a system and alarm is carried out. However, this method has the problems of insufficient fault prediction capability caused by dead plate setting of the threshold value and difficult positioning of fault root caused by complicated alarm information.
In order to overcome the technical problems, the application discloses an operation and maintenance fault positioning method, device, equipment and storage medium based on a large model, which can rapidly collect various information related to faults to rapidly and accurately position root causes of the faults when the faults occur, reduce fault troubleshooting time and improve fault processing efficiency.
Referring to fig. 1, the embodiment of the invention discloses an operation and maintenance fault positioning method based on a large model, which comprises the following steps:
And S11, if the system fault is monitored, acquiring current system operation and maintenance data in real time, and preprocessing the current system operation and maintenance data to obtain the system operation and maintenance data after target preprocessing.
In this embodiment, as shown in fig. 2, if a system fault is detected, current system operation data needs to be collected in real time, and corresponding preprocessing is performed on the system operation data, specifically, if a system fault is detected, current system operation data of a local system is collected in real time, and data labeling and data classification are performed on the current system operation data to obtain first preprocessed system operation data, where current system logs, alarm information, performance index data, network flow data and hardware state data of the system need to be collected in real time, such as logs of a server, an application program, a database and the like, after log collection is completed, time stamps need to be added to the logs, and the system logs are ordered based on the time stamps to obtain ordered system logs; the alarm information is needed to be obtained from hardware, a network and an application performance monitoring system, the alarm information is needed to be integrated and classified, repetition is removed, the alarm information is needed to be correlated to form an alarm event chain, performance index data such as the utilization rate of a processor, the occupancy rate of a memory and the like are classified according to index types, historical data and change trends are displayed in a chart form, hardware state data is needed to be classified according to component types corresponding to the hardware, such as temperature, fan rotating speed, power supply state and the like, and network flow data is needed to be obtained through a network flow collection tool and classified according to flow types, and flow direction, size and data packet characteristics of flow can be analyzed.
Further, further processing needs to be performed on the obtained first preprocessed system operation and maintenance data, specifically, invalid data, repeated data and abnormal data in the first preprocessed system operation and maintenance data need to be determined, and the invalid data, the repeated data and the abnormal data are removed from the first preprocessed system operation and maintenance data, so that second preprocessed system operation and maintenance data are obtained. Therefore, by eliminating invalid data, repeated data and abnormal data, noise interference in the data can be effectively reduced, and the data characteristics can be reflected more truly.
And further, normalizing the obtained second preprocessed system operation data to convert the second preprocessed system operation data into a preset data format, thereby obtaining the target preprocessed system operation data. It should be noted that, normalization processing is performed on the data, and the data is converted into a unified data format, that is, JSON (JavaScript Object Notation ) format. It should be noted that, the normalization processing is performed on the data, and the data is converted into a unified data format, so that the consistency of the data can be ensured, and analysis errors caused by data contradiction can be avoided.
And S12, extracting data features corresponding to the system operation data after target pretreatment, and inputting the data features into a preset large model to analyze the data features through the preset large model so as to obtain a fault analysis result.
In this embodiment, as shown in fig. 2, feature extraction is required to be performed on the system operation data after target preprocessing, and the extracted data features are analyzed by using a multi-scale model to obtain a corresponding fault analysis result. Specifically, information extraction is required to be performed on the data features through a preset large model, and matching is performed on the extracted target information and a preset fault case according to the extracted target information, so that a target fault case matched with the target information in the preset fault case is determined. After the system operation data after the target preprocessing is obtained, the data is required to be subjected to feature extraction, the time domain features comprise the mean value, variance, the maximum value and the median of the performance indexes, and the occurrence frequency and time interval of specific events in the log, the frequency domain features acquire data frequency components such as main frequency and harmonic waves through Fourier transformation and the like to reflect the system operation periodicity and stability, the trend features extract the rising, falling or stable trend of the performance indexes by using time sequence analysis methods such as moving average, exponential smoothing and the like, and the text features find out abnormal system performance in advance aiming at the text data such as the system log and the like. And inputting the extracted data features into a preset large model, judging which features are important for fault prediction according to analysis of knowledge and data in the operation and maintenance field by the large model, avoiding that feature redundancy influences the performance of the model, and matching the large model according to fault cases and feature information in the operation and maintenance field so as to determine corresponding target fault cases.
Further, a fault causal relationship graph needs to be constructed according to alarm event chains in data characteristics to judge which alarms cause other alarms and root causes of faults, performance index data and historical performance index data need to be compared to determine abnormal changes of performance indexes, fault correlation can be judged according to system normal and fault modes to analyze processes and reasons causing abnormal performance indexes, network flow data can be analyzed to determine abnormal network behaviors in the network flow data and judge whether network attack, congestion or abnormal network behaviors of application programs exist or not, hardware state data and preset hardware data thresholds can be compared to determine abnormal hardware states in the hardware state data, such as judging reasons of overhigh server temperature and influences on other components of the system. Finally, the fault causal relation graph, the abnormal change of the performance index, the abnormal network behavior, the abnormal hardware state and the target fault case are taken as fault analysis results.
And S13, constructing a fault tree corresponding to the fault analysis result based on a fault tree analysis method, determining the fault cause of the system fault through the fault tree, and analyzing the fault cause to finish fault positioning.
In this embodiment, the fault cause of the system fault needs to be determined through the constructed fault tree, and the corresponding fault location is completed. Specifically, it is necessary to identify a fault top event, a fault middle event, and a fault bottom event corresponding to a system fault based on a fault analysis result, and construct a fault tree corresponding to the fault analysis result based on the fault top event, the fault middle event, and the fault bottom event. That is, based on the large model analysis result, a fault tree is constructed by adopting a fault tree analysis method. And taking the fault phenomenon as a top event, and gradually expanding and constructing an intermediate event and a bottom event according to the fault cause and the causal relationship analyzed by the large model. If the system service is not available, the intermediate event may be a network failure, a server hardware failure or an application error, and the further expansion network failure may be a network device failure, a line interruption or a configuration error.
Then, determining a target fault bottom event with the highest probability of influencing a fault top event in the fault bottom events through a fault tree, and taking the target fault bottom event as a fault reason of the system fault. Specifically, parameters such as a fault tree structure, an event logic relationship, occurrence probability of a bottom event and the like are input by using fault tree analysis software, the contribution degree of the bottom event to occurrence of a top event is calculated, and a target fault bottom event with the maximum probability of influencing the fault top event in the fault bottom event is determined, so that the target fault bottom event is used as a fault cause of a system fault. Further, the fault cause needs to be analyzed to determine the fault occurrence time, the fault occurrence component and the fault abnormal data corresponding to the fault cause, so as to complete fault positioning. It should be noted that, after fault localization is completed, the fault report may be generated and the fault report may include fault occurrence time, location, phenomenon, root cause analysis process, fault tree structure, and solution proposal. The large model analysis thought and the fault tree construction process are described in detail in the report, and clear fault processing basis is provided for operation and maintenance personnel. Specific operational steps and suggestions, such as network equipment port failure, are then provided based on the root cause of the failure, port replacement is suggested, line connections are checked, reconfiguration parameters are checked, and related technical documentation and reference links are provided. Ultimately, the fault analysis report needs to be provided to the operation and maintenance personnel. Therefore, when a fault occurs, various information related to the fault, including system logs, alarm information, performance indexes and the like, can be rapidly collected, and the root cause of the fault can be rapidly and accurately positioned by means of knowledge reasoning and semantic understanding capability of a large model, so that the fault troubleshooting time is shortened, the fault processing efficiency is improved, and the rapid recovery and normal operation of the system are ensured.
It should be further noted that, a fault prediction model may be constructed to predict the system by using the prediction model, specifically, a fault prediction model may be constructed based on a pre-training model by using a learning method, where the pre-training model may be selected according to requirements, for example, a vector machine (Support Vector Machine, SVM), random Forest (RF), long Short-Term Memory (LSTM), convolutional neural network (Convolutional Neural Networks, CNN), and other algorithms, and a suitable algorithm or combination may be selected according to the characteristics and data characteristics of the system. And after the fault prediction model is constructed, the current target system operation data can be acquired in real time based on a preset time interval, and the target system operation data is input into the fault prediction model, so that the system fault prediction is performed in real time through the fault prediction model, and a real-time fault prediction result is obtained. And moreover, the preset large model can be used for knowing the performance of the model under different parameter settings according to the failure prediction result in combination with operation and maintenance cases and data, providing optimal parameter suggestions for the failure prediction model, such as adjusting suggestions for kernel function parameters, penalty factors and the like of an SVM (Support Vector Machine ) model, optimizing model architecture, such as adjusting network layers, neuron numbers and the like according to system data complexity and failure modes, suggesting an LSTM or CNN model, and improving model prediction capability. Therefore, fault prediction can be performed in real time, and parameter adjustment is performed through a large model, so that the efficiency and accuracy of fault prediction are effectively improved.
In the embodiment, if a system fault is detected, current system operation and maintenance data are collected in real time, the current system operation and maintenance data are preprocessed to obtain target preprocessed system operation and maintenance data, data features corresponding to the target preprocessed system operation and maintenance data are extracted and input into a preset large model, the data features are analyzed through the preset large model to obtain a fault analysis result, a fault tree corresponding to the fault analysis result is constructed based on a fault tree analysis method, the fault cause of the system fault is determined through the fault tree, and the fault cause is analyzed to complete fault positioning. Therefore, by the method, the operation and maintenance data of the system are required to be acquired in real time after the system fault is detected, then the operation and maintenance data of the system are preprocessed, and the operation and maintenance characteristics of the preprocessed data are extracted. After the characteristics are extracted, the data characteristics are required to be analyzed through a preset large model, and a fault tree corresponding to the obtained analysis result is constructed so as to determine the system fault cause through the finally obtained fault tree. On the one hand, when faults occur, various information related to the faults can be rapidly collected to rapidly and accurately locate the root cause of the faults, the fault checking time is reduced, the fault processing efficiency is improved, the rapid recovery and normal operation of the system are guaranteed, and on the other hand, through automatic fault prediction and root cause location, the dependence on manual operation and maintenance is reduced, the working intensity and professional requirements of operation and maintenance personnel are reduced, and therefore the operation and maintenance cost of enterprises is reduced. On the other hand, through carrying on real-time collection and analysis to the heterogeneous data of multisource that produces in the course of system operation, utilize the predictive model constructed, find the potential trouble hidden danger in the system in advance, realize the accurate trouble prediction, offer sufficient time to carry on trouble prevention and treatment for the operation and maintenance personnel, reduce probability and influence degree that the trouble takes place.
Referring to fig. 3, the embodiment of the invention discloses an operation and maintenance fault positioning device based on a large model, which comprises:
The data preprocessing module 11 is configured to collect current system operation and maintenance data in real time if a system fault is detected, and preprocess the current system operation and maintenance data to obtain target preprocessed system operation and maintenance data;
the fault analysis module 12 is configured to extract data features corresponding to the system operation data after the target preprocessing, and input the data features to a preset large model, so as to analyze the data features through the preset large model, and obtain a fault analysis result;
And the fault positioning module 13 is used for constructing a fault tree corresponding to the fault analysis result based on a fault tree analysis method so as to determine the fault cause of the system fault through the fault tree and analyzing the fault cause to finish fault positioning.
In the embodiment, if a system fault is detected, current system operation and maintenance data are collected in real time, the current system operation and maintenance data are preprocessed to obtain target preprocessed system operation and maintenance data, data features corresponding to the target preprocessed system operation and maintenance data are extracted and input into a preset large model, the data features are analyzed through the preset large model to obtain a fault analysis result, a fault tree corresponding to the fault analysis result is constructed based on a fault tree analysis method, the fault cause of the system fault is determined through the fault tree, and the fault cause is analyzed to complete fault positioning. Therefore, by the method, the operation and maintenance data of the system are required to be acquired in real time after the system fault is detected, then the operation and maintenance data of the system are preprocessed, and the operation and maintenance characteristics of the preprocessed data are extracted. After the characteristics are extracted, the data characteristics are required to be analyzed through a preset large model, and a fault tree corresponding to the obtained analysis result is constructed, so that the fault cause of the system is determined through the finally obtained fault tree, and therefore, when the fault occurs, various information related to the fault can be rapidly collected to rapidly and accurately locate the root cause of the fault, the fault checking time is shortened, the fault processing efficiency is improved, and the rapid recovery and normal operation of the system are ensured.
In some embodiments, the data preprocessing module 11 may specifically include:
The first preprocessing sub-module is used for acquiring the current system operation data of the local system in real time if the system fault is detected, and carrying out data marking and data classification on the current system operation data to obtain the system operation data after the first preprocessing;
The second preprocessing sub-module is used for data, and removing the invalid data, the repeated data and the abnormal data from the first preprocessed system operation and maintenance data to obtain second preprocessed system operation and maintenance data;
And the data conversion sub-module is used for carrying out normalization processing on the second preprocessed system operation data so as to convert the second preprocessed system operation data into a preset data format, so as to obtain the target preprocessed system operation data.
In some embodiments, the first preprocessing sub-module may specifically include:
the data acquisition unit is used for acquiring current system logs, alarm information, performance index data, network flow data and hardware state data of the system in real time;
The log sorting unit is used for adding a time stamp to the system log and sorting the system log based on the time stamp to obtain a sorted system log;
And the data classification unit is used for generating an alarm event chain based on the alarm information, classifying the performance index data based on index types, classifying the hardware state data based on component types and classifying the network flow data based on flow types.
In some embodiments, the fault analysis module 12 may specifically include:
The information matching unit is used for extracting information of the data features through a preset large model, and matching the extracted target information with a preset fault case to determine a target fault case matched with the target information in the preset fault case;
A relationship graph construction unit, configured to construct a fault causal relationship graph based on the alarm event chain in the data feature;
the first data comparison unit is used for comparing the performance index data with the historical performance index data so as to determine abnormal change of the performance index;
the second data comparison unit is used for analyzing the network traffic data to determine abnormal network behaviors in the network traffic data;
A third data comparing unit, configured to compare the hardware state data with a preset hardware data threshold value, so as to determine an abnormal hardware state in the hardware state data;
and the data definition unit is used for taking the fault causal relation graph, the performance index abnormal change, the abnormal network behavior, the abnormal hardware state and the target fault case as fault analysis results.
In some embodiments, the fault location module 13 may specifically include:
The fault tree construction unit is used for identifying a fault top event, a fault middle event and a fault bottom event corresponding to the system fault based on the fault analysis result, and constructing a fault tree corresponding to the fault analysis result based on the fault top event, the fault middle event and the fault bottom event;
And the fault cause analysis unit is used for determining a target fault bottom event with the highest probability of influencing the fault top event in the fault bottom events through the fault tree, and taking the target fault bottom event as the fault cause of the system fault.
In some embodiments, the fault location module 13 may specifically include:
And the fault locating unit is used for analyzing the fault reasons to determine fault occurrence time, fault occurrence components and fault abnormal data corresponding to the fault reasons so as to complete fault locating.
In some embodiments, the operation and maintenance fault positioning device based on the large model may further include:
the model building unit is used for building a fault prediction model based on the pre-training model through an integrated learning method;
The fault real-time prediction unit is used for collecting the current target system operation data in real time based on a preset time interval, inputting the target system operation data into the fault prediction model, and performing system fault prediction in real time through the fault prediction model to obtain a real-time fault prediction result;
and the parameter adjustment unit is used for adjusting system parameters based on the fault prediction result so as to prevent the system faults.
Further, the embodiment of the present application further discloses an electronic device, and fig. 4 is a block diagram of an electronic device 20 according to an exemplary embodiment, where the content of the diagram is not to be considered as any limitation on the scope of use of the present application.
Fig. 4 is a schematic structural diagram of an electronic device 20 according to an embodiment of the present application. The electronic device 20 may include, in particular, at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input-output interface 25, and a communication bus 26. The memory 22 is used for storing a computer program, and the computer program is loaded and executed by the processor 21 to implement relevant steps in the operation and maintenance fault positioning method based on the large model disclosed in any one of the foregoing embodiments. In addition, the electronic device 20 in the present embodiment may be specifically an electronic computer.
In this embodiment, the power supply 23 is configured to provide working voltages for each hardware device on the electronic device 20, the communication interface 24 is capable of creating a data transmission channel with an external device for the electronic device 20, and the communication protocol to be followed is any communication protocol applicable to the technical solution of the present application, which is not specifically limited herein, and the input/output interface 25 is configured to obtain external input data or output data to the external device, and the specific interface type of the input/output interface may be selected according to the specific application needs and is not specifically limited herein.
The memory 22 may be a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk, or an optical disk, and the resources stored thereon may include an operating system 221, a computer program 222, and the like, and the storage may be temporary storage or permanent storage.
The operating system 221 is used for managing and controlling various hardware devices on the electronic device 20 and the computer program 222, which may be Windows Server, netware, unix, linux, etc. The computer program 222 may further include a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the large model-based operation and maintenance fault localization method performed by the electronic device 20 as disclosed in any of the previous embodiments.
Furthermore, the application also discloses a computer readable storage medium for storing a computer program, wherein the computer program is executed by a processor to realize the operation and maintenance fault positioning method based on the large model. For specific steps of the method, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and no further description is given here.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
While the foregoing has been provided to illustrate the principles and embodiments of the present application, specific examples have been provided herein to assist in understanding the principles and embodiments of the present application, and are intended to be in no way limiting, for those of ordinary skill in the art will, in light of the above teachings, appreciate that the principles and embodiments of the present application may be varied in any way.

Claims (9)

1.一种基于大模型的运维故障定位方法,其特征在于,包括:1. A large-scale model-based operation and maintenance fault location method, characterized by comprising: 若监测到系统故障,则实时采集当前系统运维数据,并对所述当前系统运维数据进行预处理,以得到目标预处理后系统运维数据;If a system failure is detected, the current system operation and maintenance data is collected in real time, and the current system operation and maintenance data is preprocessed to obtain the target preprocessed system operation and maintenance data; 提取所述目标预处理后系统运维数据对应的数据特征,并将所述数据特征输入至预设大模型,以通过所述预设大模型对所述数据特征进行分析,得到故障分析结果;Extracting data features corresponding to the target pre-processed system operation and maintenance data, and inputting the data features into a preset large model, so as to analyze the data features through the preset large model to obtain a fault analysis result; 基于故障树分析法构建与所述故障分析结果对应的故障树,以通过所述故障树确定所述系统故障的故障原因,并对所述故障原因进行分析,以完成故障定位;Constructing a fault tree corresponding to the fault analysis result based on a fault tree analysis method, determining the cause of the system fault through the fault tree, and analyzing the cause of the fault to complete fault location; 其中,所述通过所述预设大模型对所述数据特征进行分析,得到故障分析结果,包括:The analyzing the data features by the preset large model to obtain the fault analysis results includes: 通过预设大模型对所述数据特征进行信息提取,并根据提取出的目标信息与预设故障案例进行匹配,以确定所述预设故障案例中与所述目标信息相匹配的目标故障案例;Extracting information from the data features through a preset large model, and matching the extracted target information with preset fault cases to determine a target fault case that matches the target information in the preset fault cases; 基于所述数据特征中的告警事件链构建故障因果关系图;所述告警事件链为基于告警信息生成的事件链;Constructing a fault causal relationship diagram based on the alarm event chain in the data feature; the alarm event chain is an event chain generated based on the alarm information; 对比性能指标数据与历史性能指标数据,以确定性能指标异常变化;Compare performance indicator data with historical performance indicator data to identify abnormal changes in performance indicators; 对网络流量数据进行分析,以确定所述网络流量数据中存在的异常网络行为;Analyzing network traffic data to determine abnormal network behavior in the network traffic data; 对比硬件状态数据与预设硬件数据阈值,以确定所述硬件状态数据中的异常硬件状态;Comparing the hardware status data with a preset hardware data threshold to determine an abnormal hardware status in the hardware status data; 将所述故障因果关系图、所述性能指标异常变化、所述异常网络行为、所述异常硬件状态以及所述目标故障案例作为故障分析结果;Taking the fault causal relationship diagram, the abnormal change of the performance indicator, the abnormal network behavior, the abnormal hardware status, and the target fault case as the fault analysis result; 其中,所述告警信息、所述性能指标数据、所述网络流量数据、所述硬件状态数据为对系统进行实时采集得到的数据。Among them, the alarm information, the performance indicator data, the network traffic data, and the hardware status data are data obtained by real-time collection of the system. 2.根据权利要求1所述的基于大模型的运维故障定位方法,其特征在于,所述若监测到系统故障,则实时采集当前系统运维数据,并对所述当前系统运维数据进行预处理,以得到目标预处理后系统运维数据,包括:2. The large-model-based operation and maintenance fault location method according to claim 1, wherein if a system fault is detected, current system operation and maintenance data is collected in real time and preprocessed to obtain target preprocessed system operation and maintenance data, including: 若检测到系统故障,则实时采集系统本地的当前系统运维数据,并对所述当前系统运维数据进行数据标注以及数据分类,以得到第一预处理后系统运维数据;If a system failure is detected, the current system operation and maintenance data of the local system is collected in real time, and the current system operation and maintenance data is labeled and classified to obtain the first pre-processed system operation and maintenance data; 确定所述第一预处理后系统运维数据中的无效数据、重复数据以及异常数据,并将所述无效数据、所述重复数据以及所述异常数据从所述第一预处理后系统运维数据中剔除,以得到第二预处理后系统运维数据;Determining invalid data, duplicate data, and abnormal data in the first preprocessed system operation and maintenance data, and removing the invalid data, the duplicate data, and the abnormal data from the first preprocessed system operation and maintenance data to obtain second preprocessed system operation and maintenance data; 对所述第二预处理后系统运维数据进行归一化处理,以将所述第二预处理后系统运维数据转换为预设数据格式,以得到目标预处理后系统运维数据。Normalization processing is performed on the second preprocessed system operation and maintenance data to convert the second preprocessed system operation and maintenance data into a preset data format to obtain target preprocessed system operation and maintenance data. 3.根据权利要求2所述的基于大模型的运维故障定位方法,其特征在于,所述实时采集系统本地的当前系统运维数据,并对所述当前系统运维数据进行数据标注以及数据分类,以得到第一预处理后系统运维数据,包括:3. The large-model-based operation and maintenance fault location method according to claim 2, wherein the real-time acquisition of local current system operation and maintenance data and the data labeling and data classification of the current system operation and maintenance data to obtain first pre-processed system operation and maintenance data include: 实时采集系统当前的系统日志、告警信息、性能指标数据、网络流量数据以及硬件状态数据;Real-time collection of the system's current system logs, alarm information, performance indicator data, network traffic data, and hardware status data; 为所述系统日志添加时间戳,并基于所述时间戳对所述系统日志进行排序,以得到排序后系统日志;adding a timestamp to the system log, and sorting the system log based on the timestamp to obtain sorted system logs; 基于所述告警信息生成告警事件链,并基于指标类型对所述性能指标数据进行分类,基于组件类型对所述硬件状态数据进行分类,基于流量类型对所述网络流量数据进行分类。An alarm event chain is generated based on the alarm information, and the performance indicator data is classified based on the indicator type, the hardware status data is classified based on the component type, and the network traffic data is classified based on the traffic type. 4.根据权利要求1所述的基于大模型的运维故障定位方法,其特征在于,所述基于故障树分析法构建与所述故障分析结果对应的故障树,以通过所述故障树确定所述系统故障的故障原因,包括:4. The large-model-based operation and maintenance fault location method according to claim 1, wherein constructing a fault tree corresponding to the fault analysis result based on a fault tree analysis method to determine the cause of the system fault through the fault tree comprises: 基于所述故障分析结果识别所述系统故障对应的故障顶事件、故障中间事件以及故障底事件,并基于所述故障顶事件、所述故障中间事件以及所述故障底事件构建与所述故障分析结果对应的故障树;Identifying a top fault event, an intermediate fault event, and a bottom fault event corresponding to the system fault based on the fault analysis result, and constructing a fault tree corresponding to the fault analysis result based on the top fault event, the intermediate fault event, and the bottom fault event; 通过所述故障树确定所述故障底事件中对所述故障顶事件影响概率最大的目标故障底事件,并将所述目标故障底事件作为所述系统故障的故障原因。A target bottom fault event having the greatest probability of affecting the top fault event among the bottom fault events is determined through the fault tree, and the target bottom fault event is used as the fault cause of the system fault. 5.根据权利要求1所述的基于大模型的运维故障定位方法,其特征在于,所述对所述故障原因进行分析,以完成故障定位,包括:5. The large model-based operation and maintenance fault location method according to claim 1, wherein analyzing the cause of the fault to complete the fault location comprises: 对所述故障原因进行分析,以确定所述故障原因对应的故障发生时间、故障发生部件以及故障异常数据,以完成故障定位。The fault cause is analyzed to determine the fault occurrence time, fault-occurring component, and fault abnormality data corresponding to the fault cause, so as to complete the fault location. 6.根据权利要求1所述的基于大模型的运维故障定位方法,其特征在于,还包括:6. The large model-based operation and maintenance fault location method according to claim 1, further comprising: 通过集成学习方法基于预训练模型构建故障预测模型;Build a fault prediction model based on the pre-trained model through an ensemble learning method; 基于预设时间间隔实时采集当前的目标系统运维数据,并将所述目标系统运维数据输入至所述故障预测模型,以通过所述故障预测模型实时进行系统故障预测,得到实时故障预测结果;Collecting current target system operation and maintenance data in real time based on a preset time interval, and inputting the target system operation and maintenance data into the fault prediction model, so as to perform system fault prediction in real time through the fault prediction model to obtain a real-time fault prediction result; 基于所述故障预测结果进行系统参数调整,以进行系统故障预防。System parameters are adjusted based on the fault prediction results to prevent system faults. 7.一种基于大模型的运维故障定位装置,其特征在于,包括:7. A large-scale model-based operation and maintenance fault location device, comprising: 数据预处理模块,用于若监测到系统故障,则实时采集当前系统运维数据,并对所述当前系统运维数据进行预处理,以得到目标预处理后系统运维数据;A data preprocessing module is used to collect current system operation and maintenance data in real time if a system fault is detected, and to preprocess the current system operation and maintenance data to obtain target preprocessed system operation and maintenance data; 故障分析模块,用于提取所述目标预处理后系统运维数据对应的数据特征,并将所述数据特征输入至预设大模型,以通过所述预设大模型对所述数据特征进行分析,得到故障分析结果;a fault analysis module, configured to extract data features corresponding to the target pre-processed system operation and maintenance data, and input the data features into a preset large model, so as to analyze the data features through the preset large model and obtain a fault analysis result; 故障定位模块,用于基于故障树分析法构建与所述故障分析结果对应的故障树,以通过所述故障树确定所述系统故障的故障原因,并对所述故障原因进行分析,以完成故障定位;a fault location module, configured to construct a fault tree corresponding to the fault analysis result based on a fault tree analysis method, so as to determine the cause of the system fault through the fault tree, and to analyze the cause of the fault to complete fault location; 其中,所述故障分析模块包括:Wherein, the fault analysis module includes: 信息匹配单元,用于通过预设大模型对所述数据特征进行信息提取,并根据提取出的目标信息与预设故障案例进行匹配,以确定所述预设故障案例中与所述目标信息相匹配的目标故障案例;An information matching unit is used to extract information from the data features using a preset large model, and match the extracted target information with preset fault cases to determine a target fault case that matches the target information in the preset fault cases; 关系图构建单元,用于基于所述数据特征中的告警事件链构建故障因果关系图;所述告警事件链为基于告警信息生成的事件链;A relationship graph construction unit, configured to construct a fault causal relationship graph based on an alarm event chain in the data feature; the alarm event chain is an event chain generated based on the alarm information; 第一数据对比单元,用于对比性能指标数据与历史性能指标数据,以确定性能指标异常变化;a first data comparison unit, configured to compare the performance indicator data with historical performance indicator data to determine abnormal changes in the performance indicator; 第二数据对比单元,用于数据分析单元,用于对网络流量数据进行分析,以确定所述网络流量数据中存在的异常网络行为;a second data comparison unit, used in the data analysis unit, for analyzing the network traffic data to determine abnormal network behavior present in the network traffic data; 第三数据对比单元,用于对比硬件状态数据与预设硬件数据阈值,以确定所述硬件状态数据中的异常硬件状态;a third data comparing unit, configured to compare the hardware status data with a preset hardware data threshold to determine an abnormal hardware status in the hardware status data; 数据定义单元,用于将所述故障因果关系图、所述性能指标异常变化、所述异常网络行为、所述异常硬件状态以及所述目标故障案例作为故障分析结果;a data definition unit, configured to use the fault causal relationship diagram, the abnormal change in the performance indicator, the abnormal network behavior, the abnormal hardware status, and the target fault case as a fault analysis result; 其中,所述告警信息、所述性能指标数据、所述网络流量数据、所述硬件状态数据为对系统进行实时采集得到的数据。Among them, the alarm information, the performance indicator data, the network traffic data, and the hardware status data are data obtained by real-time collection of the system. 8.一种电子设备,其特征在于,包括:8. An electronic device, comprising: 存储器,用于保存计算机程序;Memory, used to store computer programs; 处理器,用于执行所述计算机程序,以实现如权利要求1至6任一项所述的基于大模型的运维故障定位方法。A processor is used to execute the computer program to implement the large model-based operation and maintenance fault location method according to any one of claims 1 to 6. 9.一种计算机可读存储介质,其特征在于,用于保存计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求1至6任一项所述的基于大模型的运维故障定位方法。9. A computer-readable storage medium, characterized in that it is used to store a computer program, wherein when the computer program is executed by a processor, it implements the large model-based operation and maintenance fault location method according to any one of claims 1 to 6.
CN202510678067.9A 2025-05-26 2025-05-26 Operation and maintenance fault positioning method, device, equipment and storage medium based on large model Active CN120196528B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510678067.9A CN120196528B (en) 2025-05-26 2025-05-26 Operation and maintenance fault positioning method, device, equipment and storage medium based on large model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510678067.9A CN120196528B (en) 2025-05-26 2025-05-26 Operation and maintenance fault positioning method, device, equipment and storage medium based on large model

Publications (2)

Publication Number Publication Date
CN120196528A CN120196528A (en) 2025-06-24
CN120196528B true CN120196528B (en) 2025-08-19

Family

ID=96068700

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510678067.9A Active CN120196528B (en) 2025-05-26 2025-05-26 Operation and maintenance fault positioning method, device, equipment and storage medium based on large model

Country Status (1)

Country Link
CN (1) CN120196528B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117170925A (en) * 2023-09-14 2023-12-05 建信金融科技有限责任公司 Method, device, equipment and storage medium for processing system faults
CN118827342A (en) * 2023-09-25 2024-10-22 中国移动通信集团浙江有限公司 Internet equipment fault diagnosis method and system

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5099436A (en) * 1988-11-03 1992-03-24 Allied-Signal Inc. Methods and apparatus for performing system fault diagnosis
CN110659173B (en) * 2018-06-28 2023-05-26 中兴通讯股份有限公司 Operation and maintenance system and method
CN115718472A (en) * 2022-11-17 2023-02-28 中国长江电力股份有限公司 Fault scanning and diagnosing method for hydroelectric generating set
CN118052281A (en) * 2023-12-29 2024-05-17 北京航天测控技术有限公司 System-level fault tree auxiliary modeling method based on large model
CN119311447A (en) * 2024-09-12 2025-01-14 上海天玑科技股份有限公司 An intelligent operation and maintenance method and system based on natural language processing technology
CN119646541A (en) * 2024-11-08 2025-03-18 浪潮通信技术有限公司 A cloud computer end-to-end fault root cause analysis method and system
CN119887161A (en) * 2024-12-20 2025-04-25 中企云链股份有限公司 Root cause analysis implementation method in intelligent operation and maintenance system and intelligent operation and maintenance system
CN119722374B (en) * 2025-02-25 2025-05-06 深圳市联特微电脑信息技术开发有限公司 Intelligent manufacturing optimization method, device, equipment and storage medium based on large model

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117170925A (en) * 2023-09-14 2023-12-05 建信金融科技有限责任公司 Method, device, equipment and storage medium for processing system faults
CN118827342A (en) * 2023-09-25 2024-10-22 中国移动通信集团浙江有限公司 Internet equipment fault diagnosis method and system

Also Published As

Publication number Publication date
CN120196528A (en) 2025-06-24

Similar Documents

Publication Publication Date Title
CN109800127A (en) A kind of system fault diagnosis intelligence O&M method and system based on machine learning
CN111290913A (en) Fault location visualization system and method based on operation and maintenance data prediction
CN118408583B (en) Encoder fault diagnosis method and system
CN112540905A (en) System risk assessment method, device, equipment and medium under micro-service architecture
CN119939456A (en) A method for identifying abnormalities in multi-modal data of main transformer operation
CN117435908A (en) Multi-fault feature extraction method for rotary machine
CN117633779A (en) Rapid deployment method and system for element learning detection model of network threat in power network
CN118075088A (en) Alarm system based on cloud intelligent maintenance center
CN118446671A (en) Integrated intelligent operation and maintenance control method and system
CN120197957A (en) An abnormal behavior analysis system for data centers based on artificial intelligence
CN119759635A (en) Fault handling system, method, electronic device and storage medium
CN119728397A (en) A network fault prediction method and system
CN120029807A (en) A method and device for automatic fault processing based on AI big model
CN119782390A (en) A method and system for generating substation inspection report
CN119383107A (en) A method for abnormal monitoring and diagnosis of IoT equipment
CN120196528B (en) Operation and maintenance fault positioning method, device, equipment and storage medium based on large model
CN119377041A (en) Automated operation and maintenance intelligent alarm handling method, device, equipment and storage medium
CN119150097A (en) Alarm classification method and device based on large model, medium and equipment
CN119105443A (en) Intelligent fault prediction system based on deep learning
CN118134462A (en) Operation and maintenance method and device of power transformation equipment, electronic equipment and storage medium
CN118114858A (en) STM process tracing and error-proofing intelligent production management system
CN117971536A (en) Abnormal data processing method and device, electronic equipment and storage medium
CN116522213A (en) Service state level classification and classification model training method and electronic equipment
CN119854127B (en) A self-learning algorithm and device for network configuration auditing rules
Caravela et al. A closed-loop automatic data-mining approach for preventive network monitoring

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant