Disclosure of Invention
The invention aims to provide a chemical fault detection method based on particle swarm optimization and a noise reduction sparse coding machine, aiming at the defects of the prior art. The method applies a stack type noise reduction sparse automatic coding system (SDSA) algorithm in deep learning to feature learning in a chemical process, conducts model training of a Softmax classifier in a supervision mode, finally fine-tunes weight parameters of the whole network through a BP algorithm, and introduces a particle swarm optimization algorithm to automatically tune key adjustable hyper-parameters. The method can adopt a greedy layer-by-layer training method of the deep neural network to adaptively and intelligently learn the implicit knowledge of the original data, so that the fault information is extracted.
The purpose of the invention can be realized by the following technical scheme:
a chemical industry fault detection method based on particle swarm optimization and a noise reduction sparse coding machine comprises the following steps:
step one, data acquisition:
taking historical time sequence data acquired by a simulation system or a DCS (distributed control System) as a training sample set XtrainChemical process real-time data from DCS system as test sample set XtestWherein a training sample set X is collectedtrainThe method comprises time sequence data under normal working conditions and various fault working conditions, is used for establishing an intelligent fault detection model of the method, and tests a sample set XtestThe method is real-time working condition data monitored on line, also comprises time sequence data under normal working conditions and various faults, and is used for verifying the diagnosis precision of the method or realizing fault detection by applying a model established by the method in actual industry;
step two, data preprocessing:
firstly, a training sample set X is calculatedtrainMean value X of monitoring variables of data under normal working conditionmeanAnd standard deviation XstdThen training sample set XtrainAnd test sample set XtestAll using the mean value XmeanAnd standard deviation XstdCarrying out standardization preprocessing, and training sample set X after standardization preprocessingtrainstdAnd test sample set XteststdThen carrying out whitening pretreatment to obtain a whitened training sample set XtrainwhiteAnd test sample set XtestwhiteCompleting the pretreatment of the training sample and the test sample set;
step three, off-line training:
the off-line training aims at establishing an intelligent fault detection model in the chemical process by adopting a preprocessed training sample, and the process can be divided into four major parts, namely an unsupervised pre-training stack type noise reduction sparse automatic coding machine, a supervised pre-training Softmax classifier, a BP algorithm global fine-tuning network parameter and a particle swarm optimization adjustable super parameter; firstly, carrying out unsupervised pre-training stack type noise reduction sparse automatic coding machine, adopting N noise reduction sparse coding machines to code a sample set into a feature space layer by layer for a training set after preprocessing, adopting a layer-by-layer training method, optimizing the loss function minimization of each layer in the training process of each layer so as to obtain model parameters of each layer, and finally obtaining the final feature h of the training sample learned by N hidden layersN(ii) a Secondly, carrying out supervised pre-training on a Softmax classifier, and dividing h into hNAs the input of the Softmax classifier model, adding corresponding working condition labels to all the learned training sample characteristicsyi1 indicates that the sample is normal, yi2 represents that the sample is a fault, and the cost function of Softmax is optimized to obtain the pre-trained Softmax model parameters; then carrying out BP algorithm global microAdjusting network parameters, carrying out supervised fine adjustment on all model parameters in the whole training process, taking SDSA (software development architecture) model parameters subjected to unsupervised pre-training and Softmax model parameters subjected to supervised pre-training as initial values, taking a pre-processed training sample as an input layer, firstly, carrying out feature learning layer by layer through the SDSA parameters to obtain a final feature matrix of a hidden layer, calculating a loss function value of the feature matrix through a Softmax classifier, and then optimizing global parameters by using a feedback propagation (BP) algorithm to minimize and converge the loss function, thereby obtaining a preliminarily trained fault detection model; finally, performing particle swarm optimization adjustable hyper-parameters, wherein because the artificial adjustable hyper-parameters in the whole intelligent fault detection model are not optimal, if the performance of the SDSA is not optimal due to improper setting of the hyper-parameters, a Particle Swarm Optimization (PSO) algorithm is called to perform hyper-parameter optimization, and finally, the values of the optimized hyper-parameters and all model parameters corresponding to the optimized hyper-parameters are determined, so that a finally trained fault detection model is obtained, and the model can be used for online monitoring of the process state;
step four, online monitoring:
for the chemical process of real-time continuous production, whether the current process is in a fault state can be effectively predicted by using the fault detection model trained in the third step; with pretreated test specimen XtestwhiteAnd as an input layer, obtaining finally learned test characteristics for the determined neural network structure by adopting a layer-by-layer characteristic learning method according to a trained fault detection model, calculating the membership probability value of a Softmax prediction function according to the learned final characteristics, indicating that the working condition is normal when the obtained membership class is 1, indicating that an abnormal working condition occurs when the membership class is 2, immediately sending an early warning by a connected warning device, indicating fault detection, and simultaneously notifying a technician or engineer to check the system safety and remove faults in time, thereby realizing fault monitoring on the current working condition.
Further, in the second step, the training sample set X is realized through the following stepstrainAnd test sample set XtestThe normalization pretreatment:
(1) training sample set XtrainIs a matrix of n multiplied by m, wherein n is the number of samples and m is the number of observation variables, and the training sample set X after the standardization processing is solved through the following formulatrainstdAnd test sample set Xteststd:
Wherein, XijRepresenting a training sample set XtrainAnd test sample set XtestValue of the jth variable of the ith sample, Xmean,jFor training sample XtrainMean value of j variable of data under normal working condition, Xstd,jTraining sample XtrainAnd standard deviation of the jth variable of the data under the medium normal working condition.
Further, in the second step, the training sample set X is realized through the following stepstrainAnd test sample set XtestNormalization and whitening pre-processing:
(1) the training sample set X after the standardization treatmenttrainstdAnd test sample set XteststdPerforming whitening pretreatment, performing eigenvalue decomposition on the covariance matrix of the training sample by the following formula to obtain an orthogonal matrix of the eigenvector of the covariance matrix and a diagonal matrix of the eigenvalue of the orthogonal matrix, thereby obtaining a whitening matrix Wwhite:
Cov=VDVT(2)
Wwhite=VD-1/2VT(3)
Wherein Cov is a training sample set X after the standardization processingtrainstdV is an orthogonal matrix of eigenvectors of the covariance matrix, D is a diagonal matrix of eigenvalues of the covariance matrix;
(2) then whiten the processed training sample XtrainwhiteAnd test sample XtestwhiteAre all WwhiteCalculating to obtain:
wherein, WwhiteFor whitening the preprocessed whitening matrix, XtrainwhiteIs a whitened training sample set, XtestwhiteIs the whitened set of test samples.
Further, in step three, the unsupervised pre-training stacked noise reduction sparse automatic coding machine in the offline training is specifically performed by the following steps:
(1) for the training sample X after the pretreatmenttrainwhiteSetting a deep neural network structure of a matrix of nxm, wherein N is the number of samples, m is the number of observation variables, and the number of hidden layers of the SDSA is assumed to be N, namely, the number HL of nodes of each hidden layer network is mainly determined1——HLNThe deep neural network has the following layers: an input layer (preprocessed training samples), N hidden layers (learning N layers of features), and a classification layer (output prediction probability), so that the global network can be regarded as a neural network with N +2 layers; firstly, initializing the weight matrix parameter of the first hidden layer, and dividing XtrainwhiteTraining a noise reduction sparse automatic coding (DSA) machine with a first hidden layer as an input layer of the SDSA1) Training sample X bytrainwhiteAdding Gaussian noise which is partially obeyed to normal distribution, and changing the training set into a data set Xc mixed with noise:
Xc=X+le*G (5)
wherein X is a training set matrix when no noise is added, and X is a training set X after preprocessing when a first hidden layer is formedtrainwhite(ii) a le is the noise level and can be set artificially (the value is 0-1, and generally 0.1); g is the generated Gaussian noise with the same dimension as the training set matrix when the noise is not added, and Xc is a data set containing artificial noise;
(2) after the interference noise is added, using Xc as input to carry out coding learning and decoding reconstruction, wherein the coding stage is the stage of characteristic learning essentially, and the decoding is the reconstruction of characteristic data, and the coding formula and the decoding formula are as follows:
wherein h is the characteristic of learning, Y is the information reconstructed by the characteristic h, and the closer the value of the H is to the original data X without noise, the better the model parameter training is; w is a coding weight parameter matrix, b1,b2As an offset vector, WTIs a decoding weight parameter matrix, and the feature h of the first hidden layer can be learned by the formula1And a model parameter W of the first hidden layer1, 11,b21;
(3) However, it can be noted that these model parameters are not optimal, so the first hidden layer features learned cannot deeply express the information contained in the original data; the optimal capability of the model can be played only by defining a reasonable loss function and then continuously optimizing the loss function to reach the minimum value, wherein the loss function is as follows:
wherein L istotal(W,WT,b1,b2) Representing the loss function value of the noise-reduction sparse automatic coding machine of the current hidden layer, wherein n is the number of samples and Y is the number of samplesiFor the reconstructed information vector value of the ith sample, XiFor the raw data vector value when no noise is added to the ith sample,is the value of the ith row and jth column of the weighting parameter of the ith layer, lambdarThe regularization weight attenuation hyperparameter is used for adjusting the weight of the equation, sl represents the node number of the l layer of a single noise reduction automatic coding machine, s (l +1) represents the node number of the l +1 layer, β is the weight hyperparameter for controlling the sparsity penalty term and needs to be manually adjusted to a proper value, s2The number of nodes of the current hidden layer learning features, namely the number of features learned by a single sample;is relative entropy, representing the difference between the current average activity and the sparsity constraint; rho is a sparsity parameter, represents sparsity constraint and is a small value generally set by human, for example, rho is 0.1;is the average activation of the jth variable in the feature matrix h,representing the value of the jth variable of the ith sample in the feature matrix h of the current hidden layer;
the matlab toolbox minFunc can be called to minimize equation (8) during offline trainingA loss function, i.e. the optimized weight parameter W after the pre-training of the first hidden layer can be obtained1,And bias term b11,b21Then, using the trained model parameters for feature learning of the first hidden layer training sample;
(4) and the first three steps are equivalent to the completion of the off-line training of the model of the first hidden layer, and because the stacked noise reduction sparse automatic coding machine of the method comprises N hidden layers, the step (1-3) needs to be repeatedly called to train the model parameters of the multiple hidden layer network until the completion of the pre-training of the SDSA of the N hidden layers.
Further, the step (4) specifically includes:
first, a noise reduction sparse automatic coding (DSA) machine for the first hidden layer1) After training is completed, its training sample XtrainwhiteIs encoded as h by1:
h1=f(W1X+b11) (11)
Wherein, W1,b11For a trained DSA1X is a training set matrix when no noise is added, and X is a training set X after preprocessing when a first hidden layer is formedtrainwhiteThe 2-N hidden layer represents the feature matrix learned by the previous layer; then the well-learned first hidden layer characteristic h is used1Sparse Autocoder (DSA) with noise reduction as second hidden layer2) Repeatedly calling the step (1-3) to train the DSA2Obtaining the model weight parameter and the bias term W of the second hidden layer2,b12,b22And the characteristics of the second hidden layer are coded into h through the above formula (11)2This process is repeated until the DSANAfter training is finished, the final characteristics h learned by N hidden layers are obtainedNWill pre-trainThe learned features are used for a subsequent classification layer, and model parameters of the Softmax classifier can be trained.
Further, in step three, the supervised pretraining Softmax classifier in the offline training is specifically performed by the following steps:
(1) the Nth hidden layer characteristic h learned after the SDSA containing N hidden layers is trainedNAs input to the Softmax classifier model, corresponding condition labels are then appendedyi1 means that the ith sample is normal, yi2 represents that the ith sample is a fault; firstly, initializing a model parameter matrix theta, and predicting probability values of samples belonging to various classes according to the following prediction functions of a Softmax classifier:
wherein,for the probability values that the ith sample belongs to each class, θ is the model parameter matrix of the Softmax classifier, consisting ofVector composition; k is the number of classes defined by the classifier, where k is 2;the feature vector of the Nth hidden layer of the ith sample is obtained;
(2) because various membership probability values predicted by the prediction function are inaccurate, a classification loss function needs to be constructed so as to obtain optimized model parameters, and the adopted Softmax classifier takes the regularization item into consideration so as to effectively avoid the overfitting of the classification model training and define the loss function as follows:
wherein, the meaning of the 1{ } part is that if an index function in brackets is true, 1 is returned, otherwise, 0 is returned; lambda [ alpha ]smIs the weighted decay term coefficient of Softmax, λsm>0, is an artificially adjustable hyper-parameter; for the loss function, the matlab toolbox minFunc can be called to optimize the loss function minimum, so that the optimized parameter matrix theta of the Softmax classifier model can be obtained.
Further, in step three, the global fine-tuning network parameters of the BP algorithm in the offline training are specifically performed by the following steps:
with all model parameters pre-trained { (W)1,b11),(W2,b12),…,(WN,b1N) Theta as initial value to pre-process the training sample XtrainwhiteFor an input layer, firstly, performing layer-by-layer feature learning by using an equation (11) through initial parameters of a model of the SDSA to obtain a final hidden layer feature matrix, calculating a loss value of the equation (13) through a classification layer by using the feature matrix, then optimizing global parameters by using a feedback propagation algorithm, updating all model parameters once per iteration, and thus converging the loss value of the equation (13) to the minimum, and finishing a fine tuning process.
Further, in step three, the particle swarm optimization adjustable hyper-parameter in the offline training is specifically performed through the following steps:
(1) the parameters of the method can be divided into two categories of model parameters and adjustable parameters, and the method optimizes three key adjustable parameters which are respectively: regularization weight attenuation hyperparameter lambdarWeight hyperparameter β for controlling sparsity penalty term, weight attenuation term coefficient lambda of Softmaxsm(ii) a The three parameters form a PSO optimized single particle (lambda)r,β,λsm) Thus the dimensions of each particle are 3-dimensional; statorYi NpThe method comprises the following steps of carrying out simultaneous optimization on each particle, setting K iterations, initializing the initial position and the speed of each particle, taking a fitness function value as the overall Accuracy of a training sample, taking the three key adjustable hyper-parameters as independent variables, training the fine-tuned fault detection model to obtain the overall Accuracy of the training sample, wherein the formula is defined as the following formula:
wherein p isiRepresenting the category, y, of the ith training sample predicted by the fault detection modeliRepresenting the category of the ith training sample in the artificially given label, if the category of the ith training sample is equal to the category of the ith training sample, returning to be 1, and if the category of the ith training sample is not equal to the category of the ith training sample, indicating that the current sample is diagnosed by the model incorrectly, and returning to be 0; calculating the optimal fitness and position of each particle and the global optimal fitness and position of the particle swarm through the formula (14);
(2) since the global optimal fitness and the optimal position after the one iteration are not the most accurate, the speed and the position of each particle need to be updated, and the position of the particle under the t-th iteration can be expressed asThe flight velocity of each particle at the t-th iteration can be expressed asThe PSO algorithm updates the velocity and position of each particle by the following equation:
wherein WpIs coefficient of inertia, C1Tracking an acceleration coefficient of a self historical optimal value for the particles, and expressing the cognition of the particles to the particles; c2C is generally set for tracking the acceleration coefficient of the optimal value of the group for the particle, representing the cognition of the particle on the group knowledge, namely social knowledge1=C22, t is the current iteration number, ξ and η are at 0,1]Uniformly distributed random numbers within the interval;the position of the optimal fitness experienced by the particle i by the t-th iteration; gbtThe best position that the particle swarm has experienced by the time of the tth iteration;
(3) retraining the overall fine-tuning network parameters of the unsupervised pretrained SDSA, the supervised pretrained Softmax classifier and the BP algorithm of the fault detection model to obtain a new overall optimal fitness value and position, continuously updating the position and speed of the particles by adopting a formula (15-16), and stopping optimizing when the iteration times exceed K iterations defined by the formula, so as to obtain the positions of the particles under the overall optimal fitness, namely under the optimized adjustable hyper-parameter, the detection accuracy of the training samples is highest, thereby completing the automatic optimization hyper-parameter; and setting the optimized adjustable hyper-parameter value as the optimized adjustable hyper-parameter of the SDSA, and re-training all model parameters of the SDSA under the determined network structure and the optimized hyper-parameter, thereby obtaining a trained fault detection model, wherein the model parameters can be used for feature learning and classification of a test sample.
Further, in step four, the online monitored feature learning and membership probability prediction are performed by the following steps:
for the real-time chemical process of continuous production, the fault detection model trained in the third step is utilized to effectively predict whether the current process is in a fault state or not so as to preprocess the test sample XtestwhiteAs an input layer, adopting the determined neural network structure according to the trained fault detection modelUsing a layer-by-layer feature learning method, and obtaining weight parameters (W) of each trained noise reduction sparse automatic coding machine1,b11),(W2,b12),…,(WN,b1N) Using a formula (11) to perform feedforward learning to obtain the characteristics of the current hidden layer of a test sample, then using the learned characteristics of the current hidden layer as input, combining DSA model parameters of the next hidden layer to perform learning characteristics by using the same formula, learning all hidden layers by analogy to obtain finally learned characteristics, then using a formula (12) to perform prediction of each membership class, wherein the classes of the single sample only have two types, namely a fault state and a normal state, so that the prediction probability vector of the single sample only has two values, selecting the prediction class with the maximum probability value as the predicted membership class of the whole fault detection model, when the calculated membership class is 1, indicating that the working condition belongs to the normal state, when the membership class is 2, indicating that an abnormal working condition appears, immediately sending an early warning by a connected warning device, indicating fault detection, and simultaneously notifying a technician or engineer to check the safety of the system and eliminate the fault in time, therefore, fault monitoring can be carried out on the current working condition.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the invention adopts a stack type noise reduction sparse self-encoder algorithm in a deep learning algorithm and a particle swarm parameter optimization algorithm to develop a novel fault monitoring method suitable for a complex nonlinear chemical process, does not need to mark data during feature learning, adopts a deep neural network, can realize self-adaptive intelligent learning of the features of original data compared with the traditional shallow neural network method and an artificial neural network, saves a great deal of energy and time compared with manual feature and knowledge extraction, and can more deeply distinguish normal data from abnormal data by the learned features, so that the clustering performance of samples of various categories is more obvious, and the algorithm is more intelligent.
2. The method avoids the randomness and the time consumption of manual test parameter selection on the basis of manual adjustable hyper-parameter selection, adopts the technical scheme of automatic parameter adjustment, and performs key adjustable hyper-parameter automatic adjustment by fusing a particle swarm optimization algorithm in the method, thereby saving a large amount of time and enabling the method to be more automatic.
3. Compared with the traditional chemical process fault detection technology (such as PCA, ICA, KPCA, MICA and the like), the method adopts a layer-by-layer training method of a deep neural network, can adaptively learn the information of nonlinear process data during characteristic learning, and simultaneously adds a noise reduction strategy, so that the robustness of an algorithm to high-noise data is enhanced, and the established fault detection model is more accurate.
Example (b):
the embodiment provides a Chemical fault detection method based on a particle swarm optimization and noise reduction sparse coding machine, a flow chart of the method is shown in fig. 1, and the method provided by the embodiment is applied to a Tennessee-Isemann (TE) benchmark Chemical process to further explain the method of the embodiment, wherein the TE process is a computer simulation of an actual Chemical process published by downloads and Vogel in 1993 in the SCI journal of Computers & Chemical Engineering, and the process is mainly developed into a performance evaluation process monitoring method, and a process flow chart of the process is shown in fig. 2. The TE process mainly includes 5 operation units, namely: the device comprises a reactor, a condenser, a vapor-liquid separator, a circulating compressor and a stripping tower. In the simulated data, a total of 41 observed variables were monitored, 22 continuous process variables, 19 compositional variables, respectively. The TE process further includes 21 preset faults, and the embodiment uses the first 20 faults for monitoring, and the 20 preset faults are shown in table 1 below.
TABLE 1 TE Process 20 Preset failures
The chemical process fault detection method comprises the following steps:
step one, data acquisition:
collecting data under normal working conditions and 20 faults in the TE process, and dividing the data into a training sample set XtrainAnd test sample set Xtest. The training sample set comprises 13480 normal working condition samples and 480 samples under each fault. The test sample set contains 960 normal samples, 960 fault samples each, but the fault samples are all in a fault state starting at the 161 st sample. The process monitored 41 variables, so the training sample set formed an 23080 × 41 matrix and the test sample set formed an 20160 × 41 matrix.
Step two, data preprocessing:
firstly, the mean value X of each variable is obtained from 13480 normal samples in training datameanAnd standard deviation Xstd. Then using the mean value XmeanAnd standard deviation XstdAdopting formula (1) to carry out standardization processing on the training sample set and the test sample set, and then utilizing formula (2-3) to obtain whitening matrix W of the training sample setwhiteObtaining the whitened training sample set X by using the formula (4-5)trainwhiteAnd test sample set Xtestwhite。
Step three, off-line training:
firstly, defining the hidden layer number to be 2, the network structure to be 41-130-20-2, initializing the weight parameter matrix and the bias item of the SDSA, and using the whitened training sample set XtrainwhiteAs an input layer of the SDSA, performing greedy training layer by layer on 2 hidden layers, optimizing a loss function of the SDSA by using a matlab toolbox minFunc to obtain optimized model parameters, and performing feedforward learning by adopting a formula (11) to obtain a characteristic h of a training sample of the SDSA2. Then the training sample characteristics h2The model is input as the input of a Softmax classifier model, and all learned training sample characteristics are added with corresponding working condition labelsyi1 indicates that the sample is normal, yi2 stands forThe sample is a fault, and a loss function of Softmax is optimized through a matlab toolbox minFunc, so that a pre-trained Softmax model parameter is obtained; and then carrying out global fine tuning on network parameters by using a BP algorithm, carrying out supervised fine tuning on all model parameters in the whole training process, taking the SDSA model parameters subjected to unsupervised pre-training and the Softmax model parameters subjected to supervised pre-training as initial values, taking the preprocessed training samples as input layers, and optimizing the global parameters by using a feedback propagation (BP) algorithm so as to minimize and converge the loss function, thereby obtaining a preliminarily trained fault detection model. And finally, performing particle swarm optimization on three key adjustable hyper-parameters, and finally determining the value of the optimized hyper-parameter and all model parameters corresponding to the optimized hyper-parameter, thereby obtaining a finally trained fault detection model which can be used for on-line monitoring of the process state.
Step four, online monitoring:
for the real-time chemical process of continuous production, whether the current process is in a fault state can be effectively predicted by using the fault detection model trained in the third step. With pretreated test specimen XtestwhiteAs an input layer, according to a trained fault detection model, for a determined neural network structure, adopting a layer-by-layer feature learning method, and using the trained weight parameters { (W) of each noise reduction sparse automatic coding machine1,b11),(W2,b12),…,(WN,b1N) Performing feedforward learning by adopting a formula (11) to obtain the characteristics of the current hidden layer of the test sample, then taking the learned characteristics of the current hidden layer as input, learning the characteristics by adopting the same formula in combination with DSA model parameters of the next hidden layer, repeating the learning process to obtain the finally learned characteristics of all the hidden layers, and predicting each membership class by adopting a formula (12), wherein the classes of the single sample only have two types, namely a fault state and a normal state, so that the prediction probability vector of the single sample only has two values, the prediction class with the maximum probability value is selected as the predicted membership class of the whole fault detection model, when the calculated membership class is 1, the working condition is normal, and when the membership class is 1, the working condition is normalIf the number is 2, indicating that an abnormal working condition occurs, and finally counting the detection rate of each fault;
and the test sample set is regarded as real-time working condition data, and fault diagnosis can be performed on the real-time data acquired in the actual chemical process through the steps.
Firstly, ten repeated tests are carried out on the selected network structure of 41-130-20-2 and the optimized superparameter, and the optimal superparameter obtained by the TE process is as follows: regularization weight attenuation hyperparameter lambdar0.002, 2.628255 is the weight hyperparameter β controlling the sparsity penalty term, and λ is the weight attenuation factor of Softmaxsm0.0001, and 0.9998. The average failure detection rate and the average false alarm rate of the test obtained in each test are shown in fig. 3, and it can be seen from the graph that the average value of the Failure Detection Rate (FDR) of the test sample of the ten-time average test is 82.42%, the standard deviation is 0.857%, the average value of the False Alarm Rate (FAR) is 0.64%, and the standard deviation is 1.194%, which indicates that the overall FDR of the method is higher and the FAR is lower, and the diagnostic performance of the method is better according to the method for evaluating the failure detection performance. In addition, the difference between the FDR and the FAR of the ten tests is not large, which shows that the method has the effect of improving the stability of the model by fusing the PSO optimization, and the training is difficult and the model is trapped in local optimization due to improper parameters under the condition that the model is not optimized, so that the PSO optimization is selected, the problems can be avoided, the model can achieve better diagnostic performance under the optimal parameters, and meanwhile, the PSO optimization is used as a more mature unconstrained discrete optimization method, has the advantages of high convergence speed, easiness in programming realization and the like, and in conclusion, the PSO parameter optimization is necessary and is easy to realize.
The result (9 th test) with the best performance (highest fault detection rate and lowest false alarm rate) of the method is selected to show the detection rate of each fault of the test sample and compared with other methods as shown in table 2. As can be seen from table 2, the proposed PSO-SDSA method has a high diagnostic accuracy for faults 1,2, 4,6,7,8, 10, 11, 12, 13, 14, 16, 17, 18, 19, 20. It can be seen that comparing the PCA method, the MICA method and the KPCA method, the proposed method of mean failure detection rate (PSO-SDSA) is the best, with an average FDR of 83.48% for all failures and an FAR of only 0.21%. In the method developed in the past, faults 3, 9 and 15 are faults which are difficult to detect in the TE process, the difference between the faults and normal sample data is not large, so that the very low diagnosis rate (mostly lower than 10%) occurs when a plurality of methods detect the faults, the difference between the deep neural network and the modeling of the traditional method is that the modeling data adopted is a large number of normal samples and fault samples, in addition, the method can deeply train and add noise modeling layer by layer due to greedy, the loss of information is strictly limited, and the learned fault characteristics can deeply mine the difference between the micro-disturbance faults and the normal faults, so that the fault characteristics are better distinguished from the normal characteristics. Therefore, the fault detection rates of faults 3, 9 and 15 are greatly improved in the proposed method, wherein the fault 3 reaches 50.5%, the fault 5 reaches 44.755%, the fault 15 reaches 44.875%, and other methods are basically maintained below 10%, so that the proposed PSO-SDSA method effectively improves the performance of certain fault points which are difficult to detect to a certain extent. Although the PSO-SDSA method is somewhat lower than other methods at some faults (fault 1, 14), the method is generally modeled under consideration of global errors, and since the global faults are sought to be smaller, the model may be biased towards reducing the errors of the harder-to-detect faults, while somewhat ignoring some of the easier-to-detect fault samples, which may result in some of the easier-to-detect faults FDR being somewhat lower than the conventional method instead, and the resulting fault detection rate (fault 1 is 96.5%, fault 14 is 90.375%) is quite acceptable. In general, compared with the other methods, the method has greatly improved performance, and is an FDD method which is more advantageous in the current research. In addition, fig. 4 shows a comparison graph (faults 3,5,9,10,15,19,20) of the fault detection rate, where FDR is significantly improved compared with other methods, so that the superiority of the performance improvement of the method can be more intuitively shown, and it can be seen that each point of the PSO-SDSA curve is on the other method curves, which shows that the detection rates of the 7 given faults are greatly improved.
TABLE 2 Fault detectivity for various methods of TE Process
And (3) carrying out early warning speed analysis of online monitoring on the test result, selecting 9 faults with higher detectable rate to carry out detection speed analysis (faults 1,2, 4,6,7,8,12, 13 and 17), respectively drawing each fault sample and the prediction probability in the figure 5 according to the sequence, wherein a 0.5 warning line in the figure is a classification control limit, and when the sample prediction probability value exceeds the control limit, the model is identified as a fault state. As can be seen from fig. 5, for the selected 9 faults, most of the samples after the fault is detected can be detected, thus resulting in a high fault detection rate (the FDR of the faults is higher than 90%), and faults 1,4,6,7,8,12 are detected at the 161 st sample, delayed by 0, which indicates that the detection speed of the faults is extremely high. And the detection points of faults 2 and 17 are at the 162 th sample, although 1 point is delayed, the detection speed is still high, and the early warning effect is completely acceptable in practical application. Fault 13 was detected at a relatively slower rate, and at sample 167 a fault was detected, delaying the onset of alarm by 6 points. In addition, as can be seen from the given online fault monitoring results, after the faults are detected by the faults 2,4,6,7 and 12, the number of points at which the samples are lower than the control limit is very small, so that the fault detection rates are all over 99%, which indicates that the method has strong capturing capability for detecting the faults. The faults 1,8,13 and 17 are occasionally stepped below the control limit after the faults are detected, so that the detection rates of the faults are lower, but the detection rates are all over 94%, and therefore the detection performance is better.
In order to further research the reason why the fault detection rate and the false alarm rate of the method are good, principal component analysis is carried out on the characteristics learned by the method. We selected the first 3 PCs of the data of normal state and faults 1,2,6,7 for comparison, fig. 6(a) is a comparison graph of the first three principal component analyses of the test sample after preprocessing, and fig. 6(b) is a graph of the first three principal component analyses of the test features learned by the PSO-SDSA method. As can be seen from a comparison between fig. 6(a) and fig. 6(b), in the test data after the preprocessing, the data cross-over overlap of faults 1,2,7 and normal samples is serious, and the linear inseparable characteristic is serious, the TE process has the characteristic of complex nonlinearity, correct classification cannot be realized by directly feeding the TE process into a classification model, and the TE process can be known through characteristic principal component analysis after being learned by a PSO-SDSA method, the faults 1,2,6 and 7 have better distinguishability, and the sample aggregation of each fault is also better, meanwhile, the clustering performance between normal samples is good, and the normal samples are not very dispersive, so that the faults can be completely distinguished from the normal samples, the fault diagnosis rate is high, and the false alarm rate is very low. Under the characteristic learning of the stack type noise reduction sparse self-coding machine, fault information and normal information can be deeply distinguished, so that the learned characteristics have excellent clustering performance, and the fault detection performance is improved.
The above description is only for the preferred embodiments of the present invention, but the protection scope of the present invention is not limited thereto, and any person skilled in the art can substitute or change the technical solution of the present invention and the inventive concept within the scope of the present invention, which is disclosed by the present invention, and the equivalent or change thereof belongs to the protection scope of the present invention.