CN118467403B - Database detection method, device, equipment and storage medium - Google Patents
Database detection method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN118467403B CN118467403B CN202410933798.9A CN202410933798A CN118467403B CN 118467403 B CN118467403 B CN 118467403B CN 202410933798 A CN202410933798 A CN 202410933798A CN 118467403 B CN118467403 B CN 118467403B
- Authority
- CN
- China
- Prior art keywords
- log
- target
- time window
- sliding time
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Prevention of errors by analysis, debugging or testing of software
- G06F11/3668—Testing of software
- G06F11/3672—Test management
- G06F11/3692—Test management for test results analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2433—Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Debugging And Monitoring (AREA)
Abstract
The disclosure provides a database detection method, a database detection device, database detection equipment and a database storage medium, and relates to the technical field of database operation and maintenance. The method comprises the following steps: acquiring target log data of an index item to be detected of a database within a target time range; splitting the target log vector subjected to target log data vectorization through a sliding time window to obtain a target log sequence; determining a possible log event set of a next sliding time window of the target time range based on the target log sequence through an anomaly detection model; if the coincidence of the actual log event set and the possible log event set of the next sliding time window is smaller than a first threshold value, determining that the target log data is abnormal. The database detection method provided by the disclosure can improve the efficiency of database detection and the accuracy of detection results.
Description
Technical Field
The disclosure relates to the technical field of database operation and maintenance, and in particular relates to a database detection method, device, equipment and storage medium.
Background
At present, most enterprises store daily operation data in a database, and continuous normal operation of the database is critical to users, but abnormal conditions of the database occur due to the large data volume, complex data form, unreasonable database setting and the like. In the prior art, various behavior characteristics of a database system are analyzed mainly through an anomaly detection model to detect whether the database system is abnormal or not.
However, the conventional anomaly detection model ignores the change condition of the log content in the database system, which easily causes problems in understanding the log information during log analysis, and in addition, when processing the sequence data, the utilization of the context information is insufficient, which may cause log information loss and inaccurate understanding of the current time step, and meanwhile, too much time-dependent sequence is also easily affected by the change of the sequence data sequence, which may affect the accuracy of the detection result.
Disclosure of Invention
The present disclosure provides a database detection method, apparatus, device, and storage medium, so as to at least solve the above technical problems in the prior art.
According to a first aspect of the present disclosure, there is provided a database detection method, the method comprising: acquiring target log data of an index item to be detected of a database within a target time range; splitting the target log vector after vectorization of the target log data through a sliding time window to obtain a target log sequence; determining a set of possible log events for a next sliding time window of the target time range based on the target log sequence by an anomaly detection model; if the coincidence ratio of the actual log event set and the possible log event set of the next sliding time window is smaller than a first threshold value, determining that the target log data is abnormal; the anomaly detection model is obtained by training an initial detection model based on historical log data of index items related to the running state of the database and historical log events corresponding to the historical log data, and is used for predicting the log events which possibly occur in the next sliding time window.
In an embodiment, before said determining the set of possible log events for the next sliding time window of the target time range, further comprising: acquiring historical log data of index items related to the running state of a database and a historical log event corresponding to the historical log data; splitting the history log vector subjected to the vectorization of the history log data through a sliding time window to obtain a history log sequence; training the initial detection model based on the history log sequence and the history log event corresponding to the history log vector to obtain the abnormal detection model, wherein the initial detection model comprises a deformation bidirectional gating circulation unit Mogrifier BiGRU layer, a self-attention layer and a full-connection layer.
In one embodiment, the loss function of the initial detection model is:
wherein, As a function of the loss in question,For the total number of sliding time windows in the history log sequence,For the total number of categories to which the history log event corresponds,For the ith history log event in the current sliding time window,For outputting the full connection layerIs a probability of occurrence of (a).
In an embodiment, the determining the set of possible log events for the next sliding time window of the target time range includes: inputting the target log sequence to a Mogrifier BiGRU layer of the anomaly detection model, wherein the Mogrifier BiGRU layer outputs a hidden state corresponding to each sliding time window in the target log sequence, and the hidden state represents the context information of all target log vectors in the sliding time window; inputting the hidden state to a self-attention layer of the anomaly detection model, and outputting an association vector corresponding to each sliding time window in the target log sequence by the self-attention layer, wherein the association vector represents the association degree between the sliding time window and other sliding time windows; inputting the association vector to a full connection layer of the anomaly detection model, and outputting the occurrence probability of each type of log event in a next sliding time window of the target time range by the full connection layer; and forming the possible log event set by the log events with the occurrence probability larger than a second threshold value.
In one embodiment, the calculation formula of the fused attention in the self-attention layer is:
wherein, For the purpose of the described fusion of the attention,For each sliding time window the corresponding query vector,For each sliding time window the corresponding key vector,For each sliding time window corresponding value vector, N is the total number of sliding time windows in the target log sequence,For the nth query vector in Q,In order to transpose the symbol,Representation ofIs used to determine the transposed vector of (c),For the n-th key vector in K,For the n-th value vector in V,Is a scaling factor.
In one embodiment, the activation function of the fully connected layer is:
wherein, The activation function is characterized in that,For the i-th log event,Is thatThe corresponding weight vector is used to determine the weight vector,Located at the first of the weight matrix WThe number of columns in a row,For the associated vector corresponding to the ith sliding time window in the target log sequence,For the j-th log event,For the total number of categories to which the log event corresponds,For the association vectorAnd weight vectorAn included angle between the two.
In an embodiment, the method further comprises: if the target log data is abnormal, determining index items corresponding to abnormal actual log events which do not coincide with the possible log event set in the actual log event set; and displaying the index item corresponding to the abnormal actual log event and the abnormal actual log event.
According to a second aspect of the present disclosure, there is provided a database detection apparatus, the apparatus comprising: the acquisition module is used for acquiring target log data of the index items to be detected of the database within a target time range; the segmentation module is used for segmenting the target log vector subjected to the vectorization of the target log data through a sliding time window to obtain a target log sequence; a first determining module for determining a set of possible log events for a next sliding time window of the target time range based on the target log sequence by means of an anomaly detection model; a second determining module, configured to determine that the target log data is abnormal if the coincidence ratio of the actual log event set and the possible log event set of the next sliding time window is smaller than a first threshold; the anomaly detection model is obtained by training an initial detection model based on historical log data of index items related to the running state of the database and historical log events corresponding to the historical log data, and is used for predicting the log events which possibly occur in the next sliding time window.
According to a third aspect of the present disclosure, there is provided an electronic device comprising:
At least one processor; and
A memory communicatively coupled to the at least one processor; wherein,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the methods described in the present disclosure.
According to a fourth aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of the present disclosure.
According to the database detection method, device and equipment and storage medium, firstly, target log data of an index item to be detected of a database in a target time range is obtained, the target log vector vectorized by the target log data is segmented through a sliding time window to obtain a target log sequence, then a possible log event set of a next sliding time window of the target time range is determined based on the target log sequence through an anomaly detection model, if the coincidence ratio of the actual log event set and the possible log event set of the next sliding time window is smaller than a first threshold value, the anomaly of the target log data is determined, therefore, the anomaly detection model performs targeted detection on the characteristics of the target log sequence taking time as a dimension, even if the data sequence of a certain sliding time window in the target log sequence is changed, only the sliding time window is affected, the anomaly detection model considers the change condition of log content along with time, and the context information of each sliding time window in the target log sequence, and the accuracy of a detection result can be improved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The above, as well as additional purposes, features, and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description when read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings, in which:
In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
FIG. 1 is a schematic flow chart of a database detection method according to an embodiment of the disclosure;
FIG. 2 is a flow chart illustrating a database detection method according to an embodiment of the disclosure;
FIG. 3 shows a schematic structural diagram of an initial detection model in an embodiment of the present disclosure;
FIG. 4 illustrates a third flow diagram of a database detection method according to an embodiment of the disclosure;
FIG. 5 is a flowchart illustrating a database detection method according to an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of a database detection apparatus according to an embodiment of the present disclosure;
FIG. 7 illustrates a schematic diagram of a database detection system according to an embodiment of the present disclosure;
fig. 8 shows a schematic diagram of a composition structure of an electronic device according to an embodiment of the present disclosure.
Detailed Description
In order to make the objects, features and advantages of the present disclosure more comprehensible, the technical solutions in the embodiments of the present disclosure will be clearly described in conjunction with the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, but not all embodiments. Based on the embodiments in this disclosure, all other embodiments that a person skilled in the art would obtain without making any inventive effort are within the scope of protection of this disclosure.
Fig. 1 shows a flowchart of a database detection method according to an embodiment of the present disclosure, as shown in fig. 1, a database detection method includes:
Step S101, obtaining target log data of the index items to be detected in the database within a target time range.
In this embodiment, the to-be-detected index item is at least one of index items related to the running state of the database, the target time range is a time range in which the to-be-detected index item needs to be detected, a user may select, in an index management unit of the database detection system, the to-be-detected index item and a target time range corresponding to the to-be-detected index item, where a default value of the target time range in the index management unit is 1 hour, and a maximum value is two weeks, and target log data may be collected directly according to the to-be-detected index item and the target time range set by the user.
In one embodiment, the index items related to the running state of the database include 25 index items in total of 7 index categories, wherein the index categories include a capacity layer, a resource layer, an abnormal event layer, an object layer, a session layer, an SQL (structured query language ) layer and a parameter layer, and the capacity layer includes 4 index items, which are respectively UNDO (UNDO) tablespace analysis, UNDO active block use cases, tablespace and temporary tablespace use rates; the resource layer comprises 2 index items, namely process utilization rate and PGA (program Global area ) utilization rate; the exception event layer includes 7 index items, namely, exception GC (Global Cache) waiting, common GC class waiting, line-level lock waiting, table-level lock waiting, other ENQ (enqueue ) lock waiting, latch waiting and Log File Sync waiting; the object layer comprises 1 index item which is an object with expiration of statistical information; the session layer comprises 2 index items, namely a total session number and an active session number; the SQL layer comprises 8 index items, namely TOPSQL (top-level SQL) index items which occupy TEMP (temporary table space) for monitoring, wherein the index layer comprises the duty ratio of CPU Time (central processing unit Time ) in DB Time (Database Time), redo GENERATED PER SEC (Redo data quantity generated per second), read IO (Input/Output) response Time, write IO response Time, physical write IO, physical read IO and RAC (real-Time application cluster, real Application Clusters), and the private network flow is too high; the parameter layer comprises 1 index item which is standard parameter comparison.
Step S102, segmenting the target log vector after target log data vectorization through a sliding time window to obtain a target log sequence.
In this embodiment, after the target log data is obtained, unstructured target log data may be parsed into structured target log data, and then vectorization is performed on the structured target log data based on semantic word embedding and lexical information embedding to obtain a target log vector corresponding to the target log data, and then the target log vector is segmented through a sliding time window to obtain a target log sequence, such asWherein, the method comprises the steps of, wherein,Each sliding time window comprises a plurality of item mark log vectors, the size of the sliding time window can be determined according to actual conditions, and the sliding time window is not limited by the present disclosure.
Step S103, determining, by the anomaly detection model, a set of possible log events for a next sliding time window of the target time range based on the target log sequence.
In this embodiment, the anomaly detection model is obtained by training an initial detection model based on historical log data of index items related to the running state of the database and historical log events corresponding to the historical log data, the anomaly detection model is used for predicting log events which may occur in a next sliding time window, after a target log sequence is obtained, the target log sequence is input into the anomaly detection model, and the anomaly detection model outputs a set of possible log events in the next sliding time window in the target time range, that is, the sliding time windowIs a set of possible log events for the next sliding time window.
Step S104, if the coincidence degree of the actual log event set and the possible log event set of the next sliding time window is smaller than a first threshold value, determining that the target log data is abnormal.
In this embodiment, an actual log event set actually occurring in a next sliding time window of the target time range is obtained, and if the coincidence degree of an actual log event in the actual log event set and a possible log event in the possible log event set is smaller than a first threshold, it is determined that the target log data is abnormal, and an abnormal alarm can be sent.
In the method, target log data of an index item to be detected of a database in a target time range are firstly obtained, target log vectors of the target log data after vectorization are segmented through sliding time windows to obtain target log sequences, then a possible log event set of a next sliding time window of the target time range is determined based on the target log sequences through an anomaly detection model, if the coincidence ratio of an actual log event set of the next sliding time window and the possible log event set is smaller than a first threshold value, the target log data anomaly is determined, therefore, the anomaly detection model carries out targeted detection on the characteristics of the target log sequences taking time as a dimension, even if the data sequence of a certain sliding time window in the target log sequences changes, only the sliding time window is affected, the anomaly detection model considers the change condition of log contents along with time, and the context information of each sliding time window in the target log sequences, and the accuracy of detection results can be improved.
Fig. 2 shows a second flowchart of a database detection method according to an embodiment of the present disclosure, as shown in fig. 2, before "determining a possible log event set of a next sliding time window of a target time range" in step S103, the database detection method further includes:
Step S201, acquiring the history log data of the index item related to the database running state and the history log event corresponding to the history log data.
In this embodiment, an anomaly detection model needs to be trained, first, historical log data of index items related to a database running state is acquired, then unstructured historical log data is determined to be parsed into structured historical log data, and then vectorization is performed on the structured historical log data based on semantic word embedding and vocabulary information embedding to obtain historical log vectors corresponding to the historical log data.
Step S202, the history log vector after the history log data vectorization is segmented through a sliding time window, and a history log sequence is obtained.
In this embodiment, after the history log data and the history log events corresponding to the history log data are obtained, the history log vector after the vectorization of the history log data may be segmented through a sliding time window to obtain a history log sequence, for exampleWherein, the method comprises the steps of, wherein,Each sliding time window comprises a plurality of history log vectors, and each history log vector is marked with a corresponding history log event.
Step S203, training the initial detection model based on the history log event corresponding to the history log sequence and the history log vector to obtain an abnormal detection model.
Fig. 3 shows a schematic structural diagram of an initial detection model in an embodiment of the present disclosure, where, as shown in fig. 3, the initial detection model includes a time sequence input layer, a deformation bidirectional gate control circulation unit Mogrifier BiGRU layer, a self-attention layer and a full-connection layer, and a history log sequence is input into the initial detection model through the time sequence input layer of the initial detection model to train the initial detection model, so as to obtain an anomaly detection model.
In one embodiment, the loss function of the initial detection model is:
wherein, As a function of the loss,For the total number of sliding time windows in the history log sequence,For the total number of categories to which the history log event corresponds,For the ith history log event in the current sliding time window,For outputting of fully-connected layersThe output prediction of the full connection layer in fig. 3 includes the respective occurrence probabilities of k kinds of history log events in each sliding time window.
In the method, the initial detection model is connected to the self-attention layer after Mogrifier BiGRU layers, the importance degree of the input log event to the abnormal detection task can be calculated, the robustness of the abnormal detection model is enhanced, the problem that the abnormal detection performance is reduced due to the fact that sentences of log data are changed is solved, in addition, the weight of the abnormal detection model is updated through the fact that the gradient reduction loss between input and output is minimized by the loss function of the initial detection model, and a more accurate abnormal detection model can be obtained.
Fig. 4 shows a third flowchart of a database detection method according to an embodiment of the present disclosure, as shown in fig. 4, the "determining a possible log event set of a next sliding time window of a target time range" in step S103 includes:
In step S301, the target log sequence is input to Mogrifier BiGRU layers of the anomaly detection model, and Mogrifier BiGRU layers output hidden states corresponding to each sliding time window in the target log sequence.
In this embodiment, after the target log sequence is obtained, the target log sequence is input to Mogrifier BiGRU layers of the anomaly detection model, each target log vector in the target log sequence corresponds to one Mogrifier BiGRU unit in Mogrifier BiGRU layers, mogrifier BiGRU layers respectively construct a forward GRU network structure and a backward GRU network structure, and the input of the forward GRU network structure at the t-th moment is a candidate stateAnd the hidden state of the previous momentOutputting the hidden state vector; Then input to GRU network structure at time t isAnd the hidden state at the latter instantOutputting the hidden state vector; The hidden state vector at the time t isWhereinIndicates an update gate, +.indicates a multiplication between vectors, +.. The output sequence of the last hidden layer of Mogrifier BiGRU layers isWherein, the method comprises the steps of, wherein,And respectively representing the hidden state vectors corresponding to each sliding time window in the target log sequence, wherein the hidden state vectors represent the context information of all the target log vectors in the sliding time window.
Step S302, the hidden state is input to a self-attention layer of the anomaly detection model, and the self-attention layer outputs the associated vector corresponding to each sliding time window in the target log sequence.
In this embodiment, the hidden state output by Mogrifier BiGRU layers is input to the self-attention layer of the anomaly detection model, and the self-attention layer pairs the hidden statesEach element of (3)Applying linear transformation to obtain a Query (Query) vector, a Key (Key) vector and a Value (Value) vector corresponding to each sliding time window, and querying the vectorsKey vectorValue vectorWherein, the method comprises the steps of, wherein,、AndAre all neural network parameters, are adjusted along with back propagation in the calculation process,For the hidden state vector of each sliding time window, the self-attention layer can determine a correlation vector corresponding to each sliding time window according to the query vector, the key vector and the value vector and output the correlation vector, wherein the correlation vector characterizes the degree of correlation between the sliding time window and other sliding time windows.
In one embodiment, the calculation formula for merging attention in the self-attention layer is:
wherein, In order to integrate the attention of the person,For each sliding time window the corresponding query vector,For each sliding time window the corresponding key vector,For each sliding time window corresponding value vector, N is the total number of sliding time windows in the target log sequence,For the nth query vector in Q,In order to transpose the symbol,Representation ofIs used to determine the transposed vector of (c),For the n-th key vector in K,For the n-th value vector in V,Is a scaling factor, and。
Step S303, the association vector is input to the full connection layer of the anomaly detection model, and the full connection layer outputs the occurrence probability of each category log event in the next sliding time window of the target time range.
In this embodiment, the correlation vector output from the attention layer is input to the full-connection layer of the anomaly detection model, the full-connection layer maps the correlation vector into a k-dimensional probability vector, the vector represents the predicted probability distribution of the model to k kinds of log events at the time t, and the full-connection layer adopts an activation function to normalize the range of values of the occurrence probability of the ith kind of log event to [0,1].
In one embodiment, the activation function of the fully connected layer is:
wherein, The activation function is characterized by the fact that,For the i-th log event,Is thatThe corresponding weight vector is used to determine the weight vector,Located at the first of the weight matrix WThe number of columns in a row,For the associated vector corresponding to the ith sliding time window in the target log sequence,For the j-th log event,For the total number of categories to which the log event corresponds,For the association vectorAnd weight vectorAn included angle between the two.
Step S304, a possible log event set is formed by the log events with the occurrence probability larger than the second threshold value.
In this embodiment, the possible log event set is formed by log events whose occurrence probability is greater than the second threshold, or the first g log events with the highest occurrence probability are taken out to form the possible log event set.
In the present disclosure, to reduce the computational complexity of self-attention and avoid the inner product from excessively affecting the initial detection network training and the efficiency of anomaly detection model prediction of the set of possible log events, the fused attention computational formula in the self-attention layer is redesigned, i.eAnd redesign the activation function of the full connection layer due to the complex data form of the database, namelyThe function avoids the influence of different text lengths of different data sets on the result as much as possible, not only improves the efficiency of initial detection network training and prediction of the possible log event set by the abnormal detection model, but also ensures the accuracy of the possible log event set.
Fig. 5 shows a flowchart of a database detection method according to an embodiment of the disclosure, and as shown in fig. 5, the database detection method includes:
step S401, obtaining target log data of the index items to be detected in the database within a target time range.
Step S402, the target log vector after the target log data vectorization is segmented through a sliding time window, and a target log sequence is obtained.
In step S403, a set of possible log events for the next sliding time window of the target time range is determined based on the target log sequence by the anomaly detection model.
In step S404, if the coincidence ratio of the actual log event set and the possible log event set of the next sliding time window is smaller than the first threshold, it is determined that the target log data is abnormal.
The specific implementation details of step S401 to step S404 are similar to those of step S101 to step S104, and will not be repeated here.
Step S405, if the target log data is abnormal, determining an index item corresponding to an abnormal actual log event that does not overlap with the possible log event set in the actual log event set.
In this embodiment, if the target log data is abnormal, an actual log event that does not overlap with a possible log event set in the actual log event set is determined as an abnormal actual log event, and an index item corresponding to the abnormal actual log event is identified.
Step S406, the index item corresponding to the abnormal actual log event and the abnormal actual log event are displayed.
In this embodiment, the index item corresponding to the abnormal actual log event and the abnormal actual log event are returned to the database detection system for display. In one example, the presentation results may be as shown in table one below:
List one
In an embodiment, all the actual log events in the actual log event set, the index items corresponding to the actual log events, and the states of the actual log events may also be returned to the database detection system for display.
In one embodiment, the database detection system is provided with a "result export" button for each index category, and clicking the button can output a status table for all index items of the index category, so as to display detailed information of each index item.
In the method, the index item corresponding to the abnormal actual log event and the abnormal actual log event are displayed, so that operation and maintenance personnel can conveniently and quickly locate the problem and check related abnormal index items, and by checking the abnormal index item, the operation and maintenance personnel can further analyze the cause of the abnormality and take corresponding measures to repair the problem so as to ensure the stability and normal operation of the database.
Fig. 6 is a schematic structural diagram of a database detection apparatus according to an embodiment of the present disclosure, and as shown in fig. 6, a database detection apparatus includes: the acquisition module 10 is used for acquiring target log data of the index items to be detected in the database within a target time range; the splitting module 11 is configured to split the target log vector after the target log data vectorization through a sliding time window to obtain a target log sequence; a first determining module 12 for determining a set of possible log events for a next sliding time window of the target time range based on the target log sequence by means of an anomaly detection model; a second determining module 13, configured to determine that the target log data is abnormal if the coincidence between the actual log event set and the possible log event set of the next sliding time window is smaller than a first threshold; the anomaly detection model is obtained by training the initial detection model based on the historical log data of index items related to the running state of the database and the historical log events corresponding to the historical log data, and is used for predicting the log events possibly occurring in the next sliding time window.
In one embodiment, a database detection apparatus further includes: the training module is used for acquiring the historical log data of the index item related to the running state of the database and the historical log event corresponding to the historical log data; splitting the history log vector subjected to the vectorization of the history log data through a sliding time window to obtain a history log sequence; training an initial detection model based on a history log event corresponding to a history log sequence and a history log vector to obtain an abnormal detection model, wherein the initial detection model comprises a deformation bidirectional gating circulation unit Mogrifier BiGRU layer, a self-attention layer and a full-connection layer.
In one embodiment, the loss function of the initial detection model in the training module is:
wherein, As a function of the loss,For the total number of sliding time windows in the history log sequence,For the total number of categories to which the history log event corresponds,For the ith history log event in the current sliding time window,For outputting of fully-connected layersIs a probability of occurrence of (a).
In an embodiment, the first determining module 12 is further configured to: inputting the target log sequence to Mogrifier BiGRU layers of an anomaly detection model, and Mogrifier BiGRU layers outputting hidden states corresponding to each sliding time window in the target log sequence, wherein the hidden states represent context information of all target log vectors in the sliding time window; inputting the hidden state into a self-attention layer of the anomaly detection model, and outputting an association vector corresponding to each sliding time window in the target log sequence by the self-attention layer, wherein the association vector represents the association degree between the sliding time window and other sliding time windows; inputting the association vector to a full connection layer of the anomaly detection model, and outputting the occurrence probability of the next sliding time window of each type of log event in the target time range by the full connection layer; a set of possible log events is composed of log events having an occurrence probability greater than a second threshold.
In one embodiment, the calculation formula of the fused attention in the self-attention layer in the first determining module 12 is:
wherein, In order to integrate the attention of the person,For each sliding time window the corresponding query vector,For each sliding time window the corresponding key vector,For each sliding time window corresponding value vector, N is the total number of sliding time windows in the target log sequence,For the nth query vector in Q,In order to transpose the symbol,Representation ofIs used to determine the transposed vector of (c),For the n-th key vector in K,For the n-th value vector in V,Is a scaling factor.
In one embodiment, the activation function of the full connectivity layer in the first determining module 12 is:
wherein, The activation function is characterized by the fact that,For the i-th log event,Is thatThe corresponding weight vector is used to determine the weight vector,Located at the first of the weight matrix WThe number of columns in a row,For the associated vector corresponding to the ith sliding time window in the target log sequence,For the j-th log event,For the total number of categories to which the log event corresponds,For the association vectorAnd weight vectorAn included angle between the two.
In one embodiment, a database detection apparatus further includes: the display module is used for determining index items corresponding to abnormal actual log events which are not overlapped with the possible log event sets in the actual log event sets if the target log data are abnormal; and displaying the index item corresponding to the abnormal actual log event and the abnormal actual log event.
Fig. 7 is a schematic structural diagram of a database detection system according to an embodiment of the present disclosure, and as shown in fig. 7, a database detection system includes: a basic function unit 20, an index management unit 21, an abnormality detection unit 22, and a result display unit 23;
The basic function unit 20 is used for managing basic functions in the database detection system;
the index management unit 21 is used for managing index categories and index items in the database detection system;
the anomaly detection unit 22 is used for detecting target log data of the index item to be detected in the database within a target time range to obtain a detection result, and the database detection method and the database detection device are applied to the anomaly detection unit;
the result display unit 23 is configured to display the detection result sent by the anomaly detection unit.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device and a readable storage medium.
Fig. 8 illustrates a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The computing unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.
Various components in device 800 are connected to I/O interface 805, including: an input unit 806 such as a keyboard, mouse, etc.; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, etc.; and a communication unit 809, such as a network card, modem, wireless communication transceiver, or the like. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 801 performs the various methods and processes described above, such as a database detection method. For example, in some embodiments, a database detection method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 800 via ROM 802 and/or communication unit 809. When a computer program is loaded into RAM 803 and executed by computing unit 801, one or more steps of one database detection method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform a database detection method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present disclosure, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.
The foregoing is merely specific embodiments of the disclosure, but the protection scope of the disclosure is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the disclosure, and it is intended to cover the scope of the disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.
Claims (7)
1. A method of database detection, the method comprising:
acquiring target log data of an index item to be detected of a database within a target time range;
Splitting the target log vector after vectorization of the target log data through a sliding time window to obtain a target log sequence;
determining a set of possible log events for a next sliding time window of the target time range based on the target log sequence by an anomaly detection model;
If the coincidence ratio of the actual log event set and the possible log event set of the next sliding time window is smaller than a first threshold value, determining that the target log data is abnormal;
the anomaly detection model is obtained by training an initial detection model based on historical log data of index items related to the running state of the database and historical log events corresponding to the historical log data, and is used for predicting log events possibly occurring in the next sliding time window;
Wherein said determining a set of possible log events for a next sliding time window of said target time range comprises:
inputting the target log sequence to a Mogrifier BiGRU layer of the anomaly detection model, wherein the Mogrifier BiGRU layer outputs a hidden state corresponding to each sliding time window in the target log sequence, and the hidden state represents the context information of all target log vectors in the sliding time window;
Inputting the hidden state to a self-attention layer of the anomaly detection model, and outputting an association vector corresponding to each sliding time window in the target log sequence by the self-attention layer, wherein the association vector represents the association degree between the sliding time window and other sliding time windows;
Inputting the association vector to a full connection layer of the anomaly detection model, and outputting the occurrence probability of each type of log event in a next sliding time window of the target time range by the full connection layer;
the possible log event set is composed of the log events with the occurrence probability larger than a second threshold value;
The calculation formula of the fusion attention in the self-attention layer is as follows:
wherein, For the purpose of the described fusion of the attention,For each sliding time window the corresponding query vector,For each sliding time window the corresponding key vector,For each sliding time window corresponding value vector, N is the total number of sliding time windows in the target log sequence,For the nth query vector in Q,In order to transpose the symbol,Representation ofIs used to determine the transposed vector of (c),For the n-th key vector in K,For the n-th value vector in V,Is a scaling factor;
wherein, the activation function of the full connection layer is:
wherein, The activation function is characterized in that,For the i-th log event,Is thatThe corresponding weight vector is used to determine the weight vector,Located at the first of the weight matrix WThe number of columns in a row,For the associated vector corresponding to the ith sliding time window in the target log sequence,For the j-th log event,For the total number of categories to which the log event corresponds,For the association vectorAnd weight vectorAn included angle between the two.
2. The method of claim 1, further comprising, prior to said determining the set of possible log events for the next sliding time window of the target time range:
Acquiring historical log data of index items related to the running state of a database and a historical log event corresponding to the historical log data;
splitting the history log vector subjected to the vectorization of the history log data through a sliding time window to obtain a history log sequence;
Training the initial detection model based on the history log sequence and the history log event corresponding to the history log vector to obtain the abnormal detection model, wherein the initial detection model comprises a deformation bidirectional gating circulation unit Mogrifier BiGRU layer, a self-attention layer and a full-connection layer.
3. The method of claim 2, wherein the initial detection model has a loss function of:
wherein, As a function of the loss in question,For the total number of sliding time windows in the history log sequence,For the total number of categories to which the history log event corresponds,For the ith history log event in the current sliding time window,For outputting the full connection layerIs a probability of occurrence of (a).
4. A method according to any one of claims 1-3, further comprising:
If the target log data is abnormal, determining index items corresponding to abnormal actual log events which do not coincide with the possible log event set in the actual log event set;
and displaying the index item corresponding to the abnormal actual log event and the abnormal actual log event.
5. A database detection apparatus, the apparatus comprising:
the acquisition module is used for acquiring target log data of the index items to be detected of the database within a target time range;
the segmentation module is used for segmenting the target log vector subjected to the vectorization of the target log data through a sliding time window to obtain a target log sequence;
A first determining module for determining a set of possible log events for a next sliding time window of the target time range based on the target log sequence by means of an anomaly detection model;
A second determining module, configured to determine that the target log data is abnormal if the coincidence ratio of the actual log event set and the possible log event set of the next sliding time window is smaller than a first threshold;
the anomaly detection model is obtained by training an initial detection model based on historical log data of index items related to the running state of the database and historical log events corresponding to the historical log data, and is used for predicting log events possibly occurring in the next sliding time window;
Wherein said determining a set of possible log events for a next sliding time window of said target time range comprises:
inputting the target log sequence to a Mogrifier BiGRU layer of the anomaly detection model, wherein the Mogrifier BiGRU layer outputs a hidden state corresponding to each sliding time window in the target log sequence, and the hidden state represents the context information of all target log vectors in the sliding time window;
Inputting the hidden state to a self-attention layer of the anomaly detection model, and outputting an association vector corresponding to each sliding time window in the target log sequence by the self-attention layer, wherein the association vector represents the association degree between the sliding time window and other sliding time windows;
Inputting the association vector to a full connection layer of the anomaly detection model, and outputting the occurrence probability of each type of log event in a next sliding time window of the target time range by the full connection layer;
the possible log event set is composed of the log events with the occurrence probability larger than a second threshold value;
The calculation formula of the fusion attention in the self-attention layer is as follows:
wherein, For the purpose of the described fusion of the attention,For each sliding time window the corresponding query vector,For each sliding time window the corresponding key vector,For each sliding time window corresponding value vector, N is the total number of sliding time windows in the target log sequence,For the nth query vector in Q,In order to transpose the symbol,Representation ofIs used to determine the transposed vector of (c),For the n-th key vector in K,For the n-th value vector in V,Is a scaling factor;
wherein, the activation function of the full connection layer is:
wherein, The activation function is characterized in that,For the i-th log event,Is thatThe corresponding weight vector is used to determine the weight vector,Located at the first of the weight matrix WThe number of columns in a row,For the associated vector corresponding to the ith sliding time window in the target log sequence,For the j-th log event,For the total number of categories to which the log event corresponds,For the association vectorAnd weight vectorAn included angle between the two.
6. An electronic device, comprising:
At least one processor; and
A memory communicatively coupled to the at least one processor; wherein,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-4.
7. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-4.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410933798.9A CN118467403B (en) | 2024-07-12 | 2024-07-12 | Database detection method, device, equipment and storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410933798.9A CN118467403B (en) | 2024-07-12 | 2024-07-12 | Database detection method, device, equipment and storage medium |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN118467403A CN118467403A (en) | 2024-08-09 |
| CN118467403B true CN118467403B (en) | 2024-10-08 |
Family
ID=92154460
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202410933798.9A Active CN118467403B (en) | 2024-07-12 | 2024-07-12 | Database detection method, device, equipment and storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN118467403B (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119442227A (en) * | 2024-11-21 | 2025-02-14 | 中移动信息技术有限公司 | A method, device, equipment, medium and product for detecting abnormal behavior |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117011962A (en) * | 2023-08-31 | 2023-11-07 | 木卫四(北京)科技有限公司 | Log serialization-based automobile state data anomaly detection method and system |
Family Cites Families (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11120033B2 (en) * | 2018-05-16 | 2021-09-14 | Nec Corporation | Computer log retrieval based on multivariate log time series |
| US11374953B2 (en) * | 2020-03-06 | 2022-06-28 | International Business Machines Corporation | Hybrid machine learning to detect anomalies |
| CN112948215A (en) * | 2021-03-04 | 2021-06-11 | 浪潮云信息技术股份公司 | Real-time anomaly detection method and system based on distributed database log data |
| CN114510708B (en) * | 2021-12-28 | 2025-02-18 | 奇安信科技集团股份有限公司 | Real-time data warehouse construction, anomaly detection methods, devices, equipment and products |
| CN114584379B (en) * | 2022-03-07 | 2023-05-30 | 四川大学 | Log anomaly detection method based on optimized feature extraction granularity |
| CN117235639A (en) * | 2023-08-12 | 2023-12-15 | 中国人民解放军战略支援部队信息工程大学 | A log anomaly detection auxiliary decision-making method and system based on knowledge graph and reinforcement learning |
| CN118210670A (en) * | 2024-01-19 | 2024-06-18 | 中国农业银行股份有限公司 | Log abnormality detection method and device, electronic equipment and storage medium |
-
2024
- 2024-07-12 CN CN202410933798.9A patent/CN118467403B/en active Active
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117011962A (en) * | 2023-08-31 | 2023-11-07 | 木卫四(北京)科技有限公司 | Log serialization-based automobile state data anomaly detection method and system |
Also Published As
| Publication number | Publication date |
|---|---|
| CN118467403A (en) | 2024-08-09 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN113792154B (en) | Method, device, electronic equipment and storage medium for determining fault correlation | |
| EP4040421B1 (en) | Method and apparatus for predicting traffic data and electronic device | |
| CN118467403B (en) | Database detection method, device, equipment and storage medium | |
| CN116307672A (en) | Fault diagnosis method, device, electronic equipment and medium | |
| CN116629620B (en) | Risk level determining method and device, electronic equipment and storage medium | |
| CN116011570B (en) | XAI model consistency training method, device, equipment and storage medium | |
| CN120317972A (en) | Method, device, equipment, medium and product for detecting abnormal transaction objects | |
| CN113360357B (en) | Data monitoring method, system and device | |
| CN113887101A (en) | Visualization method, device, electronic device and storage medium of network model | |
| CN117573412A (en) | System fault early warning method and device, electronic equipment and storage medium | |
| CN113553407B (en) | Event tracing method, device, electronic device and storage medium | |
| CN117421174A (en) | System abnormity monitoring method, device, equipment and storage medium | |
| CN117076610A (en) | Identification method and device of data sensitive table, electronic equipment and storage medium | |
| CN117195118A (en) | Data anomaly detection method, device, equipment and medium | |
| CN115034322A (en) | Data processing method and device and electronic equipment | |
| CN119806956B (en) | Page monitoring method and device, electronic equipment and storage medium | |
| CN116991693B (en) | Test method, device, equipment and storage medium | |
| CN114970677B (en) | Outpatient quantity prediction method and device based on ensemble learning | |
| CN114548077B (en) | Method and device for constructing vocabulary | |
| CN119806956A (en) | Page monitoring method, device, electronic device and storage medium | |
| CN120579977A (en) | A method, device and equipment for predicting customer complaint volume | |
| CN120469969A (en) | A project early warning method, device, equipment and medium based on distributed system | |
| CN120706430A (en) | Research and development project repeated analysis method and device, electronic equipment and storage medium | |
| CN117056782A (en) | Data anomaly identification method, device, equipment and storage medium thereof | |
| CN120179808A (en) | Problem attribution method, device, electronic device and storage medium based on large model |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |