CN109800875A

CN109800875A - Chemical industry fault detection method based on particle group optimizing and noise reduction sparse coding machine

Info

Publication number: CN109800875A
Application number: CN201910016558.1A
Authority: CN
Inventors: 苏堪裂; 李秀喜; 旷天亮
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2019-01-08
Filing date: 2019-01-08
Publication date: 2019-05-24

Abstract

The invention discloses a chemical fault detection method based on particle swarm optimization and noise reduction sparse coding machine. The method mainly uses a plurality of stacked noise reduction sparse automatic coding machines to perform unsupervised feature learning on the training data after normalization and whitening. , and then train the Softmax classifier model in a supervised way, and finally fine-tune the weight parameters of the entire network through supervision. At the same time, the particle swarm optimization algorithm is introduced to automatically tune the key adjustable hyperparameters, and the trained chemical process faults are obtained. Detect smart models for fault detection of process real-time data. The invention adopts the greedy layer-by-layer training method of the deep neural network to adaptively and intelligently learn the knowledge implicit in the original data of the chemical process, so that the information of the process failure can be extracted more accurately, and the method is more intelligent than the traditional method. It can improve the performance of fault detection, and because of adding automatic optimization algorithm, it saves a lot of time than manual parameter tuning.

Description

Chemical fault detection method based on particle swarm optimization and noise reduction sparse coding machine

Technical Field

The invention relates to the field of chemical process fault detection and diagnosis, in particular to a chemical fault detection method based on particle swarm optimization and a noise reduction sparse coding machine.

Background

The chemical process fault detection is one of the most powerful tools for managing abnormal working conditions of the chemical process, provides certain guarantee for process safety early warning, and saves great economic loss. According to the estimation of the national security administration, the abnormal working condition causes economic loss of at least about $ 200 million per year to the American petroleum and chemical industry, and in the United kingdom, the loss caused by the abnormal working condition is as high as $ 270 million per year, so that the development of a chemical process fault detection method with better performance is crucial to the actual chemical production.

The chemical process data has the characteristics of nonlinearity, high dimension, non-Gaussian distribution and the like, so that the extraction of fault information in the fault detection process is more complicated, a plurality of traditional chemical process fault detection methods based on the process data are developed, the data-based methods do not need to acquire a large amount of expert knowledge in advance, the conditions of the current system can be predicted only by acquiring data acquired in the chemical process and establishing a proper fault detection model, and the methods are wide in scientific research and industrial application at present. Although the conventional methods such as PCA, ICA, KPCA, KICA, MICA, etc. which are popular at present can effectively detect some faults, the detection rate of some disturbing faults is very low, which indicates that the conventional methods still cannot completely and accurately extract the information of the faults, so that new methods need to be developed to improve the detection rate of the faults. The characteristics of the complex chemical process such as nonlinearity, high noise, non-Gaussian distribution and the like enable the traditional chemical process fault detection method not to show excellent diagnostic performance, so that the development of a fault monitoring method suitable for the complex nonlinear chemical process is very necessary. And the characteristics implied by the original data of the chemical process can be more accurately learned by the aid of a layer-by-layer greedy learning mode of the deep neural network, so that the chemical process can be monitored by applying a proper classification model.

With the development of deep learning, many students in the computer field are turning to research deep neural networks, and the deep neural networks and shallow neural networks are mostly distinguished in that the deep neural networks are trained layer by layer in an unsupervised learning manner, so that the deep neural networks have excellent feature learning ability, and the learned features are more essential in describing data, thereby being beneficial to classification. Research in the direction of an automatic coding machine is a popular research direction in the field of deep learning in recent years, and is widely applied to handwriting recognition, image classification, audio feature extraction, and the like because of its excellent performance in feature learning, and is also gradually applied to other fields, for example, in mechanical failure diagnosis, a trainee starts to apply an automatic coding machine or a modified automatic coding machine to perform feature learning. In 2006, Hiton et al, a famous deep learning learner, proposed a deep neural network algorithm of an Automatic Encoder (AE), and applied to image recognition to obtain a good classification effect, and then many scholars of automatic encoders based on this basis proposed improved algorithms, for example, Bengio et al, proposed a Sparse Automatic Encoder (SAE), added with sparsity restriction, so that the learned features can more accurately restore original data; vincent et al proposed a noise reduction Auto Encoder (DAE) in 2008, and added a noise reduction strategy in unsupervised feature learning, so that the learned features are more robust. The stack Denoising Sparse automatic coding machine (SDSA) is an improved automatic coding machine which is researched more enthusiastically in recent three years, and the stack Denoising Sparse automatic coding machine means that a plurality of Denoising Sparse automatic coding machines are Stacked to perform layer-by-layer training and layer-by-layer learning. The algorithm takes a noise reduction sparse automatic coding machine as a base body, each layer of weight parameters and bias items of the algorithm are trained layer by layer in an unsupervised mode by stacking a plurality of similar noise reduction sparse automatic coding machines, so that a deep neural network is built, the learned characteristics can deal with high-noise data and the learned characteristics have sparseness, and the training method is carried out in an unsupervised mode.

Because the method disclosed by the invention relates to more manually adjustable hyper-parameters and the method for selecting the hyper-parameters by adopting a manual test has randomness and time consumption, single-target discrete optimization is carried out on multiple parameters to realize the automatic adjusting function of the method, so that the method is more intelligent and achieves better performance. Common parameter optimization algorithms include genetic algorithms, artificial fish swarm algorithms, particle swarm algorithms and the like. The Particle Swarm Optimization (PSO) algorithm is a bionic algorithm which is proposed by Kennedy et al in 1995 and simulates foraging behavior of a bird flock, and the algorithm is widely applied in the fields of science and engineering due to the advantages of high convergence rate, simple implementation process, easy understanding of optimization process and the like, and becomes one of the fastest-developing parameter intelligent optimization algorithms.

Disclosure of Invention

The invention aims to provide a chemical fault detection method based on particle swarm optimization and a noise reduction sparse coding machine, aiming at the defects of the prior art. The method applies a stack type noise reduction sparse automatic coding system (SDSA) algorithm in deep learning to feature learning in a chemical process, conducts model training of a Softmax classifier in a supervision mode, finally fine-tunes weight parameters of the whole network through a BP algorithm, and introduces a particle swarm optimization algorithm to automatically tune key adjustable hyper-parameters. The method can adopt a greedy layer-by-layer training method of the deep neural network to adaptively and intelligently learn the implicit knowledge of the original data, so that the fault information is extracted.

The purpose of the invention can be realized by the following technical scheme:

a chemical industry fault detection method based on particle swarm optimization and a noise reduction sparse coding machine comprises the following steps:

step one, data acquisition:

taking historical time sequence data acquired by a simulation system or a DCS (distributed control System) as a training sample set X_trainChemical process real-time data from DCS system as test sample set X_testWherein a training sample set X is collected_trainThe method comprises time sequence data under normal working conditions and various fault working conditions, is used for establishing an intelligent fault detection model of the method, and tests a sample set X_testThe method is real-time working condition data monitored on line, also comprises time sequence data under normal working conditions and various faults, and is used for verifying the diagnosis precision of the method or realizing fault detection by applying a model established by the method in actual industry;

step two, data preprocessing:

firstly, a training sample set X is calculated_trainMean value X of monitoring variables of data under normal working condition_meanAnd standard deviation X_stdThen training sample set X_trainAnd test sample set X_testAll using the mean value X_meanAnd standard deviation X_stdCarrying out standardization preprocessing, and training sample set X after standardization preprocessing_trainstdAnd test sample set X_teststdThen carrying out whitening pretreatment to obtain a whitened training sample set X_trainwhiteAnd test sample set X_testwhiteCompleting the pretreatment of the training sample and the test sample set;

step three, off-line training:

the off-line training aims at establishing an intelligent fault detection model in the chemical process by adopting a preprocessed training sample, and the process can be divided into four major parts, namely an unsupervised pre-training stack type noise reduction sparse automatic coding machine, a supervised pre-training Softmax classifier, a BP algorithm global fine-tuning network parameter and a particle swarm optimization adjustable super parameter; firstly, carrying out unsupervised pre-training stack type noise reduction sparse automatic coding machine, adopting N noise reduction sparse coding machines to code a sample set into a feature space layer by layer for a training set after preprocessing, adopting a layer-by-layer training method, optimizing the loss function minimization of each layer in the training process of each layer so as to obtain model parameters of each layer, and finally obtaining the final feature h of the training sample learned by N hidden layers_N(ii) a Secondly, carrying out supervised pre-training on a Softmax classifier, and dividing h into h_NAs the input of the Softmax classifier model, adding corresponding working condition labels to all the learned training sample characteristicsyⁱ1 indicates that the sample is normal, yⁱ2 represents that the sample is a fault, and the cost function of Softmax is optimized to obtain the pre-trained Softmax model parameters; then carrying out BP algorithm global microAdjusting network parameters, carrying out supervised fine adjustment on all model parameters in the whole training process, taking SDSA (software development architecture) model parameters subjected to unsupervised pre-training and Softmax model parameters subjected to supervised pre-training as initial values, taking a pre-processed training sample as an input layer, firstly, carrying out feature learning layer by layer through the SDSA parameters to obtain a final feature matrix of a hidden layer, calculating a loss function value of the feature matrix through a Softmax classifier, and then optimizing global parameters by using a feedback propagation (BP) algorithm to minimize and converge the loss function, thereby obtaining a preliminarily trained fault detection model; finally, performing particle swarm optimization adjustable hyper-parameters, wherein because the artificial adjustable hyper-parameters in the whole intelligent fault detection model are not optimal, if the performance of the SDSA is not optimal due to improper setting of the hyper-parameters, a Particle Swarm Optimization (PSO) algorithm is called to perform hyper-parameter optimization, and finally, the values of the optimized hyper-parameters and all model parameters corresponding to the optimized hyper-parameters are determined, so that a finally trained fault detection model is obtained, and the model can be used for online monitoring of the process state;

step four, online monitoring:

for the chemical process of real-time continuous production, whether the current process is in a fault state can be effectively predicted by using the fault detection model trained in the third step; with pretreated test specimen X_testwhiteAnd as an input layer, obtaining finally learned test characteristics for the determined neural network structure by adopting a layer-by-layer characteristic learning method according to a trained fault detection model, calculating the membership probability value of a Softmax prediction function according to the learned final characteristics, indicating that the working condition is normal when the obtained membership class is 1, indicating that an abnormal working condition occurs when the membership class is 2, immediately sending an early warning by a connected warning device, indicating fault detection, and simultaneously notifying a technician or engineer to check the system safety and remove faults in time, thereby realizing fault monitoring on the current working condition.

Further, in the second step, the training sample set X is realized through the following steps_trainAnd test sample set X_testThe normalization pretreatment:

(1) training sample set X_trainIs a matrix of n multiplied by m, wherein n is the number of samples and m is the number of observation variables, and the training sample set X after the standardization processing is solved through the following formula_trainstdAnd test sample set X_teststd：

Wherein, X_ijRepresenting a training sample set X_trainAnd test sample set X_testValue of the jth variable of the ith sample, X_mean,jFor training sample X_trainMean value of j variable of data under normal working condition, X_std,jTraining sample X_trainAnd standard deviation of the jth variable of the data under the medium normal working condition.

Further, in the second step, the training sample set X is realized through the following steps_trainAnd test sample set X_testNormalization and whitening pre-processing:

(1) the training sample set X after the standardization treatment_trainstdAnd test sample set X_teststdPerforming whitening pretreatment, performing eigenvalue decomposition on the covariance matrix of the training sample by the following formula to obtain an orthogonal matrix of the eigenvector of the covariance matrix and a diagonal matrix of the eigenvalue of the orthogonal matrix, thereby obtaining a whitening matrix W_white：

Cov＝VDV^T(2)

W_white＝VD^-1/2V^T(3)

Wherein Cov is a training sample set X after the standardization processing_trainstdV is an orthogonal matrix of eigenvectors of the covariance matrix, D is a diagonal matrix of eigenvalues of the covariance matrix;

(2) then whiten the processed training sample X_trainwhiteAnd test sample X_testwhiteAre all W_whiteCalculating to obtain:

wherein, W_whiteFor whitening the preprocessed whitening matrix, X_trainwhiteIs a whitened training sample set, X_testwhiteIs the whitened set of test samples.

Further, in step three, the unsupervised pre-training stacked noise reduction sparse automatic coding machine in the offline training is specifically performed by the following steps:

(1) for the training sample X after the pretreatment_trainwhiteSetting a deep neural network structure of a matrix of nxm, wherein N is the number of samples, m is the number of observation variables, and the number of hidden layers of the SDSA is assumed to be N, namely, the number HL of nodes of each hidden layer network is mainly determined₁——HL_NThe deep neural network has the following layers: an input layer (preprocessed training samples), N hidden layers (learning N layers of features), and a classification layer (output prediction probability), so that the global network can be regarded as a neural network with N +2 layers; firstly, initializing the weight matrix parameter of the first hidden layer, and dividing X_trainwhiteTraining a noise reduction sparse automatic coding (DSA) machine with a first hidden layer as an input layer of the SDSA₁) Training sample X by_trainwhiteAdding Gaussian noise which is partially obeyed to normal distribution, and changing the training set into a data set Xc mixed with noise:

Xc＝X+le*G (5)

wherein X is a training set matrix when no noise is added, and X is a training set X after preprocessing when a first hidden layer is formed_trainwhite(ii) a le is the noise level and can be set artificially (the value is 0-1, and generally 0.1); g is the generated Gaussian noise with the same dimension as the training set matrix when the noise is not added, and Xc is a data set containing artificial noise;

(2) after the interference noise is added, using Xc as input to carry out coding learning and decoding reconstruction, wherein the coding stage is the stage of characteristic learning essentially, and the decoding is the reconstruction of characteristic data, and the coding formula and the decoding formula are as follows:

wherein h is the characteristic of learning, Y is the information reconstructed by the characteristic h, and the closer the value of the H is to the original data X without noise, the better the model parameter training is; w is a coding weight parameter matrix, b₁,b₂As an offset vector, W^TIs a decoding weight parameter matrix, and the feature h of the first hidden layer can be learned by the formula₁And a model parameter W of the first hidden layer₁， ₁₁,b₂₁；

(3) However, it can be noted that these model parameters are not optimal, so the first hidden layer features learned cannot deeply express the information contained in the original data; the optimal capability of the model can be played only by defining a reasonable loss function and then continuously optimizing the loss function to reach the minimum value, wherein the loss function is as follows:

wherein L is_total(W,W^T,b₁,b₂) Representing the loss function value of the noise-reduction sparse automatic coding machine of the current hidden layer, wherein n is the number of samples and Y is the number of samples_iFor the reconstructed information vector value of the ith sample, X_iFor the raw data vector value when no noise is added to the ith sample,is the value of the ith row and jth column of the weighting parameter of the ith layer, lambda_rThe regularization weight attenuation hyperparameter is used for adjusting the weight of the equation, sl represents the node number of the l layer of a single noise reduction automatic coding machine, s (l +1) represents the node number of the l +1 layer, β is the weight hyperparameter for controlling the sparsity penalty term and needs to be manually adjusted to a proper value, s₂The number of nodes of the current hidden layer learning features, namely the number of features learned by a single sample;is relative entropy, representing the difference between the current average activity and the sparsity constraint; rho is a sparsity parameter, represents sparsity constraint and is a small value generally set by human, for example, rho is 0.1;is the average activation of the jth variable in the feature matrix h,representing the value of the jth variable of the ith sample in the feature matrix h of the current hidden layer;

the matlab toolbox minFunc can be called to minimize equation (8) during offline trainingA loss function, i.e. the optimized weight parameter W after the pre-training of the first hidden layer can be obtained₁,And bias term b₁₁,b₂₁Then, using the trained model parameters for feature learning of the first hidden layer training sample;

(4) and the first three steps are equivalent to the completion of the off-line training of the model of the first hidden layer, and because the stacked noise reduction sparse automatic coding machine of the method comprises N hidden layers, the step (1-3) needs to be repeatedly called to train the model parameters of the multiple hidden layer network until the completion of the pre-training of the SDSA of the N hidden layers.

Further, the step (4) specifically includes:

first, a noise reduction sparse automatic coding (DSA) machine for the first hidden layer₁) After training is completed, its training sample X_trainwhiteIs encoded as h by₁：

h₁＝f(W₁X+b₁₁) (11)

Wherein, W₁，b₁₁For a trained DSA₁X is a training set matrix when no noise is added, and X is a training set X after preprocessing when a first hidden layer is formed_trainwhiteThe 2-N hidden layer represents the feature matrix learned by the previous layer; then the well-learned first hidden layer characteristic h is used₁Sparse Autocoder (DSA) with noise reduction as second hidden layer₂) Repeatedly calling the step (1-3) to train the DSA₂Obtaining the model weight parameter and the bias term W of the second hidden layer₂,b₁₂,b₂₂And the characteristics of the second hidden layer are coded into h through the above formula (11)₂This process is repeated until the DSA_NAfter training is finished, the final characteristics h learned by N hidden layers are obtained_NWill pre-trainThe learned features are used for a subsequent classification layer, and model parameters of the Softmax classifier can be trained.

Further, in step three, the supervised pretraining Softmax classifier in the offline training is specifically performed by the following steps:

(1) the Nth hidden layer characteristic h learned after the SDSA containing N hidden layers is trained_NAs input to the Softmax classifier model, corresponding condition labels are then appendedyⁱ1 means that the ith sample is normal, yⁱ2 represents that the ith sample is a fault; firstly, initializing a model parameter matrix theta, and predicting probability values of samples belonging to various classes according to the following prediction functions of a Softmax classifier:

wherein,for the probability values that the ith sample belongs to each class, θ is the model parameter matrix of the Softmax classifier, consisting ofVector composition; k is the number of classes defined by the classifier, where k is 2;the feature vector of the Nth hidden layer of the ith sample is obtained;

(2) because various membership probability values predicted by the prediction function are inaccurate, a classification loss function needs to be constructed so as to obtain optimized model parameters, and the adopted Softmax classifier takes the regularization item into consideration so as to effectively avoid the overfitting of the classification model training and define the loss function as follows:

wherein, the meaning of the 1{ } part is that if an index function in brackets is true, 1 is returned, otherwise, 0 is returned; lambda [ alpha ]_smIs the weighted decay term coefficient of Softmax, λ_sm>0, is an artificially adjustable hyper-parameter; for the loss function, the matlab toolbox minFunc can be called to optimize the loss function minimum, so that the optimized parameter matrix theta of the Softmax classifier model can be obtained.

Further, in step three, the global fine-tuning network parameters of the BP algorithm in the offline training are specifically performed by the following steps:

with all model parameters pre-trained { (W)₁,b₁₁),(W₂,b₁₂),…,(W_N,b_1N) Theta as initial value to pre-process the training sample X_trainwhiteFor an input layer, firstly, performing layer-by-layer feature learning by using an equation (11) through initial parameters of a model of the SDSA to obtain a final hidden layer feature matrix, calculating a loss value of the equation (13) through a classification layer by using the feature matrix, then optimizing global parameters by using a feedback propagation algorithm, updating all model parameters once per iteration, and thus converging the loss value of the equation (13) to the minimum, and finishing a fine tuning process.

Further, in step three, the particle swarm optimization adjustable hyper-parameter in the offline training is specifically performed through the following steps:

(1) the parameters of the method can be divided into two categories of model parameters and adjustable parameters, and the method optimizes three key adjustable parameters which are respectively: regularization weight attenuation hyperparameter lambda_rWeight hyperparameter β for controlling sparsity penalty term, weight attenuation term coefficient lambda of Softmax_sm(ii) a The three parameters form a PSO optimized single particle (lambda)_r,β,λ_sm) Thus the dimensions of each particle are 3-dimensional; statorYi N_pThe method comprises the following steps of carrying out simultaneous optimization on each particle, setting K iterations, initializing the initial position and the speed of each particle, taking a fitness function value as the overall Accuracy of a training sample, taking the three key adjustable hyper-parameters as independent variables, training the fine-tuned fault detection model to obtain the overall Accuracy of the training sample, wherein the formula is defined as the following formula:

wherein p isⁱRepresenting the category, y, of the ith training sample predicted by the fault detection modelⁱRepresenting the category of the ith training sample in the artificially given label, if the category of the ith training sample is equal to the category of the ith training sample, returning to be 1, and if the category of the ith training sample is not equal to the category of the ith training sample, indicating that the current sample is diagnosed by the model incorrectly, and returning to be 0; calculating the optimal fitness and position of each particle and the global optimal fitness and position of the particle swarm through the formula (14);

(2) since the global optimal fitness and the optimal position after the one iteration are not the most accurate, the speed and the position of each particle need to be updated, and the position of the particle under the t-th iteration can be expressed asThe flight velocity of each particle at the t-th iteration can be expressed asThe PSO algorithm updates the velocity and position of each particle by the following equation:

wherein W_pIs coefficient of inertia, C₁Tracking an acceleration coefficient of a self historical optimal value for the particles, and expressing the cognition of the particles to the particles; c₂C is generally set for tracking the acceleration coefficient of the optimal value of the group for the particle, representing the cognition of the particle on the group knowledge, namely social knowledge₁＝C₂2, t is the current iteration number, ξ and η are at 0,1]Uniformly distributed random numbers within the interval;the position of the optimal fitness experienced by the particle i by the t-th iteration; gb^tThe best position that the particle swarm has experienced by the time of the tth iteration;

(3) retraining the overall fine-tuning network parameters of the unsupervised pretrained SDSA, the supervised pretrained Softmax classifier and the BP algorithm of the fault detection model to obtain a new overall optimal fitness value and position, continuously updating the position and speed of the particles by adopting a formula (15-16), and stopping optimizing when the iteration times exceed K iterations defined by the formula, so as to obtain the positions of the particles under the overall optimal fitness, namely under the optimized adjustable hyper-parameter, the detection accuracy of the training samples is highest, thereby completing the automatic optimization hyper-parameter; and setting the optimized adjustable hyper-parameter value as the optimized adjustable hyper-parameter of the SDSA, and re-training all model parameters of the SDSA under the determined network structure and the optimized hyper-parameter, thereby obtaining a trained fault detection model, wherein the model parameters can be used for feature learning and classification of a test sample.

Further, in step four, the online monitored feature learning and membership probability prediction are performed by the following steps:

for the real-time chemical process of continuous production, the fault detection model trained in the third step is utilized to effectively predict whether the current process is in a fault state or not so as to preprocess the test sample X_testwhiteAs an input layer, adopting the determined neural network structure according to the trained fault detection modelUsing a layer-by-layer feature learning method, and obtaining weight parameters (W) of each trained noise reduction sparse automatic coding machine₁,b₁₁),(W₂,b₁₂),…,(W_N,b_1N) Using a formula (11) to perform feedforward learning to obtain the characteristics of the current hidden layer of a test sample, then using the learned characteristics of the current hidden layer as input, combining DSA model parameters of the next hidden layer to perform learning characteristics by using the same formula, learning all hidden layers by analogy to obtain finally learned characteristics, then using a formula (12) to perform prediction of each membership class, wherein the classes of the single sample only have two types, namely a fault state and a normal state, so that the prediction probability vector of the single sample only has two values, selecting the prediction class with the maximum probability value as the predicted membership class of the whole fault detection model, when the calculated membership class is 1, indicating that the working condition belongs to the normal state, when the membership class is 2, indicating that an abnormal working condition appears, immediately sending an early warning by a connected warning device, indicating fault detection, and simultaneously notifying a technician or engineer to check the safety of the system and eliminate the fault in time, therefore, fault monitoring can be carried out on the current working condition.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. the invention adopts a stack type noise reduction sparse self-encoder algorithm in a deep learning algorithm and a particle swarm parameter optimization algorithm to develop a novel fault monitoring method suitable for a complex nonlinear chemical process, does not need to mark data during feature learning, adopts a deep neural network, can realize self-adaptive intelligent learning of the features of original data compared with the traditional shallow neural network method and an artificial neural network, saves a great deal of energy and time compared with manual feature and knowledge extraction, and can more deeply distinguish normal data from abnormal data by the learned features, so that the clustering performance of samples of various categories is more obvious, and the algorithm is more intelligent.

2. The method avoids the randomness and the time consumption of manual test parameter selection on the basis of manual adjustable hyper-parameter selection, adopts the technical scheme of automatic parameter adjustment, and performs key adjustable hyper-parameter automatic adjustment by fusing a particle swarm optimization algorithm in the method, thereby saving a large amount of time and enabling the method to be more automatic.

3. Compared with the traditional chemical process fault detection technology (such as PCA, ICA, KPCA, MICA and the like), the method adopts a layer-by-layer training method of a deep neural network, can adaptively learn the information of nonlinear process data during characteristic learning, and simultaneously adds a noise reduction strategy, so that the robustness of an algorithm to high-noise data is enhanced, and the established fault detection model is more accurate.

Drawings

Fig. 1 is a flowchart of a chemical fault detection method based on particle swarm optimization and a noise reduction sparse coding machine according to an embodiment of the present invention.

FIG. 2 is a process flow diagram of a Tennessee-Iseman (TE) chemical process employed in an embodiment of the present invention.

Fig. 3 is a graph of the average failure detection rate and the false alarm rate of ten repeated tests according to the embodiment of the present invention.

Fig. 4 is a comparison graph of the fault detection rates of the methods with a large detection rate improvement according to the embodiment of the present invention.

Fig. 5(a), 5(b), 5(c), 5(d), 5(e), 5(f), 5(g), 5(h), and 5(i) are graphs of the results of fault monitoring of fault 1, fault 2, fault 4, fault 6, fault 7, fault 8, fault 12, fault 13, and fault 17, respectively.

Fig. 6(a) is a comparison graph of the first three principal component analyses of the test sample after the pretreatment, and fig. 6(b) is a graph of the first three principal component analyses of the test feature learned by the PSO-SDSA method.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.

Example (b):

the embodiment provides a Chemical fault detection method based on a particle swarm optimization and noise reduction sparse coding machine, a flow chart of the method is shown in fig. 1, and the method provided by the embodiment is applied to a Tennessee-Isemann (TE) benchmark Chemical process to further explain the method of the embodiment, wherein the TE process is a computer simulation of an actual Chemical process published by downloads and Vogel in 1993 in the SCI journal of Computers & Chemical Engineering, and the process is mainly developed into a performance evaluation process monitoring method, and a process flow chart of the process is shown in fig. 2. The TE process mainly includes 5 operation units, namely: the device comprises a reactor, a condenser, a vapor-liquid separator, a circulating compressor and a stripping tower. In the simulated data, a total of 41 observed variables were monitored, 22 continuous process variables, 19 compositional variables, respectively. The TE process further includes 21 preset faults, and the embodiment uses the first 20 faults for monitoring, and the 20 preset faults are shown in table 1 below.

TABLE 1 TE Process 20 Preset failures

The chemical process fault detection method comprises the following steps:

step one, data acquisition:

collecting data under normal working conditions and 20 faults in the TE process, and dividing the data into a training sample set X_trainAnd test sample set X_test. The training sample set comprises 13480 normal working condition samples and 480 samples under each fault. The test sample set contains 960 normal samples, 960 fault samples each, but the fault samples are all in a fault state starting at the 161 st sample. The process monitored 41 variables, so the training sample set formed an 23080 × 41 matrix and the test sample set formed an 20160 × 41 matrix.

Step two, data preprocessing:

firstly, the mean value X of each variable is obtained from 13480 normal samples in training data_meanAnd standard deviation X_std. Then using the mean value X_meanAnd standard deviation X_stdAdopting formula (1) to carry out standardization processing on the training sample set and the test sample set, and then utilizing formula (2-3) to obtain whitening matrix W of the training sample set_whiteObtaining the whitened training sample set X by using the formula (4-5)_trainwhiteAnd test sample set X_testwhite。

Step three, off-line training:

firstly, defining the hidden layer number to be 2, the network structure to be 41-130-20-2, initializing the weight parameter matrix and the bias item of the SDSA, and using the whitened training sample set X_trainwhiteAs an input layer of the SDSA, performing greedy training layer by layer on 2 hidden layers, optimizing a loss function of the SDSA by using a matlab toolbox minFunc to obtain optimized model parameters, and performing feedforward learning by adopting a formula (11) to obtain a characteristic h of a training sample of the SDSA₂. Then the training sample characteristics h₂The model is input as the input of a Softmax classifier model, and all learned training sample characteristics are added with corresponding working condition labelsyⁱ1 indicates that the sample is normal, yⁱ2 stands forThe sample is a fault, and a loss function of Softmax is optimized through a matlab toolbox minFunc, so that a pre-trained Softmax model parameter is obtained; and then carrying out global fine tuning on network parameters by using a BP algorithm, carrying out supervised fine tuning on all model parameters in the whole training process, taking the SDSA model parameters subjected to unsupervised pre-training and the Softmax model parameters subjected to supervised pre-training as initial values, taking the preprocessed training samples as input layers, and optimizing the global parameters by using a feedback propagation (BP) algorithm so as to minimize and converge the loss function, thereby obtaining a preliminarily trained fault detection model. And finally, performing particle swarm optimization on three key adjustable hyper-parameters, and finally determining the value of the optimized hyper-parameter and all model parameters corresponding to the optimized hyper-parameter, thereby obtaining a finally trained fault detection model which can be used for on-line monitoring of the process state.

Step four, online monitoring:

for the real-time chemical process of continuous production, whether the current process is in a fault state can be effectively predicted by using the fault detection model trained in the third step. With pretreated test specimen X_testwhiteAs an input layer, according to a trained fault detection model, for a determined neural network structure, adopting a layer-by-layer feature learning method, and using the trained weight parameters { (W) of each noise reduction sparse automatic coding machine₁,b₁₁),(W₂,b₁₂),…,(W_N,b_1N) Performing feedforward learning by adopting a formula (11) to obtain the characteristics of the current hidden layer of the test sample, then taking the learned characteristics of the current hidden layer as input, learning the characteristics by adopting the same formula in combination with DSA model parameters of the next hidden layer, repeating the learning process to obtain the finally learned characteristics of all the hidden layers, and predicting each membership class by adopting a formula (12), wherein the classes of the single sample only have two types, namely a fault state and a normal state, so that the prediction probability vector of the single sample only has two values, the prediction class with the maximum probability value is selected as the predicted membership class of the whole fault detection model, when the calculated membership class is 1, the working condition is normal, and when the membership class is 1, the working condition is normalIf the number is 2, indicating that an abnormal working condition occurs, and finally counting the detection rate of each fault;

and the test sample set is regarded as real-time working condition data, and fault diagnosis can be performed on the real-time data acquired in the actual chemical process through the steps.

Firstly, ten repeated tests are carried out on the selected network structure of 41-130-20-2 and the optimized superparameter, and the optimal superparameter obtained by the TE process is as follows: regularization weight attenuation hyperparameter lambda_r0.002, 2.628255 is the weight hyperparameter β controlling the sparsity penalty term, and λ is the weight attenuation factor of Softmax_sm0.0001, and 0.9998. The average failure detection rate and the average false alarm rate of the test obtained in each test are shown in fig. 3, and it can be seen from the graph that the average value of the Failure Detection Rate (FDR) of the test sample of the ten-time average test is 82.42%, the standard deviation is 0.857%, the average value of the False Alarm Rate (FAR) is 0.64%, and the standard deviation is 1.194%, which indicates that the overall FDR of the method is higher and the FAR is lower, and the diagnostic performance of the method is better according to the method for evaluating the failure detection performance. In addition, the difference between the FDR and the FAR of the ten tests is not large, which shows that the method has the effect of improving the stability of the model by fusing the PSO optimization, and the training is difficult and the model is trapped in local optimization due to improper parameters under the condition that the model is not optimized, so that the PSO optimization is selected, the problems can be avoided, the model can achieve better diagnostic performance under the optimal parameters, and meanwhile, the PSO optimization is used as a more mature unconstrained discrete optimization method, has the advantages of high convergence speed, easiness in programming realization and the like, and in conclusion, the PSO parameter optimization is necessary and is easy to realize.

The result (9 th test) with the best performance (highest fault detection rate and lowest false alarm rate) of the method is selected to show the detection rate of each fault of the test sample and compared with other methods as shown in table 2. As can be seen from table 2, the proposed PSO-SDSA method has a high diagnostic accuracy for faults 1,2, 4,6,7,8, 10, 11, 12, 13, 14, 16, 17, 18, 19, 20. It can be seen that comparing the PCA method, the MICA method and the KPCA method, the proposed method of mean failure detection rate (PSO-SDSA) is the best, with an average FDR of 83.48% for all failures and an FAR of only 0.21%. In the method developed in the past, faults 3, 9 and 15 are faults which are difficult to detect in the TE process, the difference between the faults and normal sample data is not large, so that the very low diagnosis rate (mostly lower than 10%) occurs when a plurality of methods detect the faults, the difference between the deep neural network and the modeling of the traditional method is that the modeling data adopted is a large number of normal samples and fault samples, in addition, the method can deeply train and add noise modeling layer by layer due to greedy, the loss of information is strictly limited, and the learned fault characteristics can deeply mine the difference between the micro-disturbance faults and the normal faults, so that the fault characteristics are better distinguished from the normal characteristics. Therefore, the fault detection rates of faults 3, 9 and 15 are greatly improved in the proposed method, wherein the fault 3 reaches 50.5%, the fault 5 reaches 44.755%, the fault 15 reaches 44.875%, and other methods are basically maintained below 10%, so that the proposed PSO-SDSA method effectively improves the performance of certain fault points which are difficult to detect to a certain extent. Although the PSO-SDSA method is somewhat lower than other methods at some faults (fault 1, 14), the method is generally modeled under consideration of global errors, and since the global faults are sought to be smaller, the model may be biased towards reducing the errors of the harder-to-detect faults, while somewhat ignoring some of the easier-to-detect fault samples, which may result in some of the easier-to-detect faults FDR being somewhat lower than the conventional method instead, and the resulting fault detection rate (fault 1 is 96.5%, fault 14 is 90.375%) is quite acceptable. In general, compared with the other methods, the method has greatly improved performance, and is an FDD method which is more advantageous in the current research. In addition, fig. 4 shows a comparison graph (faults 3,5,9,10,15,19,20) of the fault detection rate, where FDR is significantly improved compared with other methods, so that the superiority of the performance improvement of the method can be more intuitively shown, and it can be seen that each point of the PSO-SDSA curve is on the other method curves, which shows that the detection rates of the 7 given faults are greatly improved.

TABLE 2 Fault detectivity for various methods of TE Process

And (3) carrying out early warning speed analysis of online monitoring on the test result, selecting 9 faults with higher detectable rate to carry out detection speed analysis (faults 1,2, 4,6,7,8,12, 13 and 17), respectively drawing each fault sample and the prediction probability in the figure 5 according to the sequence, wherein a 0.5 warning line in the figure is a classification control limit, and when the sample prediction probability value exceeds the control limit, the model is identified as a fault state. As can be seen from fig. 5, for the selected 9 faults, most of the samples after the fault is detected can be detected, thus resulting in a high fault detection rate (the FDR of the faults is higher than 90%), and faults 1,4,6,7,8,12 are detected at the 161 st sample, delayed by 0, which indicates that the detection speed of the faults is extremely high. And the detection points of faults 2 and 17 are at the 162 th sample, although 1 point is delayed, the detection speed is still high, and the early warning effect is completely acceptable in practical application. Fault 13 was detected at a relatively slower rate, and at sample 167 a fault was detected, delaying the onset of alarm by 6 points. In addition, as can be seen from the given online fault monitoring results, after the faults are detected by the faults 2,4,6,7 and 12, the number of points at which the samples are lower than the control limit is very small, so that the fault detection rates are all over 99%, which indicates that the method has strong capturing capability for detecting the faults. The faults 1,8,13 and 17 are occasionally stepped below the control limit after the faults are detected, so that the detection rates of the faults are lower, but the detection rates are all over 94%, and therefore the detection performance is better.

In order to further research the reason why the fault detection rate and the false alarm rate of the method are good, principal component analysis is carried out on the characteristics learned by the method. We selected the first 3 PCs of the data of normal state and faults 1,2,6,7 for comparison, fig. 6(a) is a comparison graph of the first three principal component analyses of the test sample after preprocessing, and fig. 6(b) is a graph of the first three principal component analyses of the test features learned by the PSO-SDSA method. As can be seen from a comparison between fig. 6(a) and fig. 6(b), in the test data after the preprocessing, the data cross-over overlap of faults 1,2,7 and normal samples is serious, and the linear inseparable characteristic is serious, the TE process has the characteristic of complex nonlinearity, correct classification cannot be realized by directly feeding the TE process into a classification model, and the TE process can be known through characteristic principal component analysis after being learned by a PSO-SDSA method, the faults 1,2,6 and 7 have better distinguishability, and the sample aggregation of each fault is also better, meanwhile, the clustering performance between normal samples is good, and the normal samples are not very dispersive, so that the faults can be completely distinguished from the normal samples, the fault diagnosis rate is high, and the false alarm rate is very low. Under the characteristic learning of the stack type noise reduction sparse self-coding machine, fault information and normal information can be deeply distinguished, so that the learned characteristics have excellent clustering performance, and the fault detection performance is improved.

The above description is only for the preferred embodiments of the present invention, but the protection scope of the present invention is not limited thereto, and any person skilled in the art can substitute or change the technical solution of the present invention and the inventive concept within the scope of the present invention, which is disclosed by the present invention, and the equivalent or change thereof belongs to the protection scope of the present invention.

Claims

1. a chemical fault detection method based on particle swarm optimization and noise reduction sparse coding machine, is characterized in that, described method comprises the following steps:

Step 1. Data collection:

The historical time series data collected by the simulation system or DCS system is used as the training sample set X _train , and the real-time chemical process data from the DCS system is used as the test sample set X _test , wherein the collected training sample set X _train includes The time series data under various fault conditions is used to establish the intelligent fault detection model of this method. The test sample set X _test is the real-time operating condition data monitored online, and also includes the time series data under normal conditions and various faults. To verify the diagnostic accuracy of the method or apply the model established by the method in the actual industry to realize fault detection;

Step 2: Data preprocessing:

First, calculate the mean value X _mean and standard deviation X _std of each monitoring variable of the data in the training sample set X _train under _normal working conditions, and _then use the mean value X _mean and the standard The difference X _std is subjected to standardization preprocessing, and the preprocessed training sample set X _trainstd and test sample set X _teststd are subjected to whitening preprocessing to obtain the whitened training sample set X _trainwhite and test sample set X _testwhite , so far the training samples and Preprocessing of the test sample set;

Step 3. Offline training:

The pre-processed training samples are used to establish an intelligent fault detection model for chemical processes. The process can be divided into unsupervised pre-training stack noise reduction and sparse automatic encoder, supervised pre-training Softmax classifier, BP algorithm global fine-tuning network parameters, particle swarm The four major parts of the adjustable hyperparameters are optimized. First, the unsupervised pre-training stack-type noise reduction sparse auto-encoder is performed, and N noise reduction sparse encoders are used for the preprocessed training set to encode the sample set into feature space layer by layer. Using the method of layer-by-layer training, in the training process of each layer, the loss function of each layer is optimized to minimize the loss function of each layer, so as to obtain the model parameters of each layer, and finally obtain the final feature h _N of the training samples learned by its N hidden layers; secondly Perform supervised pre-training of the Softmax classifier, use h _N as the input of the Softmax classifier model, and attach all the learned training sample features to the corresponding working condition labels y ⁱ =1 represents that the sample is normal, y ⁱ =2 represents that the sample is faulty, and the pre-trained Softmax model parameters are obtained by optimizing the cost function of Softmax; then the BP algorithm is used to globally fine-tune the network parameters. All model parameters in the training process are fine-tuned in a supervised manner. The unsupervised pre-trained SDSA model parameters and the supervised pre-trained Softmax model parameters are used as initial values, and the pre-processed training samples are used as the input layer. First, pass these SDSA parameters. Layer-by-layer feature learning is used to obtain the feature matrix of the final hidden layer. The feature matrix is used to calculate its loss function value through the Softmax classifier, and then the feedback propagation (BP) algorithm is used to optimize the global parameters so that the loss function can be minimized and converged. Obtain the fault detection model that has been initially trained; finally, adjust the hyperparameters by particle swarm optimization. Since the artificially adjustable hyperparameters in the entire intelligent fault detection model are not optimal, if these hyperparameters are set improperly, the performance of SDSA may not be the same. Therefore, the particle swarm PSO optimization algorithm will be called for hyperparameter tuning, and finally the optimal hyperparameter value and all corresponding model parameters will be determined, so as to obtain the final trained fault detection model for online monitoring of process status. ;

Step 4. Online monitoring:

Taking the preprocessed test sample X _testwhite as the input layer, according to the trained fault detection model, for the determined neural network structure, the layer-by-layer feature learning method is used to obtain the final learned test features, and then calculate the final learned features. The membership probability value of its Softmax prediction function, when the obtained membership category is 1, it means that the working condition is normal; when the membership category is 2, it means that there is an abnormal working condition, and then the connected alarm device issues an early warning to indicate a fault. It is detected, and at the same time, the technician or engineer is notified to check the safety of the system and eliminate the fault in time, so as to realize the fault monitoring of the current working condition.

2. the chemical industry fault detection method based on particle swarm optimization and noise reduction sparse coding machine according to claim 1, is characterized in that, in step 2, realizes the standardization of training sample set X _train and test sample set X _test by following steps Preprocessing:

The training sample set X _train is an n×m matrix, where n is the number of samples and m is the number of observed variables. The standardized training sample set X _trainstd and the test sample set X _teststd are solved by the following formula:

Among them, X _ij represents the value of the jth variable of the ith sample in the training sample set X _train and the test sample set X _test , and X _{mean, j} is the value of the jth variable of the data in the training sample X _train under normal conditions. Mean, X _{std, j} The standard deviation of the jth variable of the data under normal conditions in the training sample X _train .

3. the chemical industry fault detection method based on particle swarm optimization and noise reduction sparse coding machine according to claim 2, is characterized in that, in step 2, realizes the whitening of training sample set X _train and test sample set X _test by following steps Preprocessing:

(1) Perform whitening preprocessing on the standardized training sample set X _trainstd and the test sample set X _teststd , first perform eigenvalue decomposition on the covariance matrix of the training sample by the following formula, and obtain the eigenvector of the covariance matrix. The orthogonal matrix and the diagonal matrix of its eigenvalues, resulting in the whitening matrix W _white :

Cov=VDV ^T (2)

W _white = VD ^-1/2 V ^T (3)

Among them, Cov is the covariance matrix of the standardized training sample set X _trainstd , V is the orthogonal matrix of the eigenvectors of the covariance matrix, and D is the diagonal matrix of the eigenvalues of the covariance matrix;

(2), then the training sample X _trainwhite and the test sample X _testwhite after whitening are calculated by W _white :

Among them, W _white is the whitening matrix of whitening preprocessing, X _trainwhite is the training sample set after whitening processing, and X _testwhite is the testing sample set after whitening processing.

4. the chemical industry fault detection method based on particle swarm optimization and noise reduction sparse coding machine according to claim 3, is characterized in that, in step 3, the unsupervised pre-training stack type noise reduction sparse automatic coding in described offline training The machine is carried out through the following steps:

(1) For the preprocessed training sample X _trainwhite is a matrix of n × m, n is the number of samples, m is the number of observation variables, assuming the hidden layer of the stack noise reduction sparse automatic encoder (SDSA) The number is N, and its deep neural network structure is set, that is, the number of nodes in each hidden layer network is mainly determined HL ₁ --- HL _N , and each layer of the deep neural network is mainly defined as: input layer, N hidden layers, classification Therefore, its global network can be regarded as a neural network with N+2 layers. First, initialize the weight matrix parameters of the first hidden layer, use X _trainwhite as the input layer of SDSA, and train the noise reduction and sparse automatic of the first hidden layer. The encoder DSA ₁ , adds part of the Gaussian noise that obeys the normal distribution to the training sample X _trainwhite by the following formula, and turns the training set into a data set Xc mixed with noise:

Xc=X+le*G (5)

Among them, X is the training set matrix when no noise is added, and X is the preprocessed training set X _trainwhite in the first hidden layer; le is the noise level, and its value is 0-1, generally 0.1; G is The generated Gaussian noise with the same dimension as the training set matrix when no noise is added, Xc is the data set containing artificial noise;

(2) After adding the interference noise, use Xc as the input to perform coding learning and decoding reconstruction. The coding stage is essentially the stage of feature learning, and the decoding is the reconstruction of the feature data. The coding formula and decoding formula are:

Among them, h is the learned feature, Y is the information reconstructed by the feature h, the closer the value is to the original data X without noise, the better the model parameters are trained; W is the encoding weight parameter matrix, b ₁ , b ₂ is the bias vector, W ^T is the decoding weight parameter matrix, the feature h ₁ of the first hidden layer and the model parameter W ₁ of the first hidden layer can be learned through the above formula, b ₁₁ , b ₂₁ ;

(3) Define a reasonable loss function, and then continuously optimize the loss function to reach the minimum value, which can exert the optimal ability of the model. The loss function is as follows:

Among them, L _total (W, W ^T , b ₁ , b ₂ ) represents the loss function value of the noise reduction sparse auto-encoder of the current hidden layer, n is the number of samples, and Y _i is the reconstructed information vector of the ith sample value, x _i is the original data vector value of the ith sample without adding noise, is the value of the i-th row and the j-th column of the weight parameter of the l-th layer, λ _r is the regularization weight decay hyperparameter used to adjust the weight of the equation; sl represents the number of nodes in the l-th layer of a single noise reduction auto-encoder , s(l+1) represents the number of nodes in the l+1th layer, β is the weight hyperparameter that controls the sparsity penalty term, and needs to be manually adjusted to a more appropriate value; s ₂ is the number of nodes of the current hidden layer learning feature , that is, the number of features learned by a single sample; is the relative entropy, which represents the difference between the current average activation degree and the sparsity constraint; ρ is the sparsity parameter, which represents the sparsity constraint, which is generally a small value set artificially, for example, ρ takes 0.1; is the average activation of the jth variable in the feature matrix h, Represents the value of the jth variable of the ith sample in the feature matrix h of the current hidden layer;

During offline training, the matlab toolbox minFunc can be called to minimize the loss function of equation (8), and the optimized weight parameter W ₁ after the first hidden layer pre-training can be obtained, and bias terms b ₁₁ , b ₂₁ , and then use the trained model parameters for feature learning of the training samples of the first hidden layer;

(4) Repeat calling steps (1-3) to train the model parameters of the multi-hidden layer network until the SDSA pre-training of N hidden layers is completed.

5. the chemical industry fault detection method based on particle swarm optimization and noise reduction sparse coding machine according to claim 4, is characterized in that, described step (4) specifically comprises:

First, after the above-mentioned first hidden layer noise reduction sparse auto-encoder DSA ₁ is trained, its training sample X _trainwhite is encoded as h ₁ by the following formula:

h ₁ =f(W ₁ X+b ₁₁ ) (11)

Among them, W ₁ , b ₁₁ are the model parameters of the trained DSA ₁ , X is the training set matrix when no noise is added, X is the preprocessed training set X _trainwhite in the first hidden layer, and the second-N hidden layer is represented as the feature matrix learned by the previous layer;

Then use the learned first hidden layer feature h ₁ as the input of the second hidden layer noise reduction sparse auto-encoder DSA ₂ , and repeat steps (1-3) to train DSA ₂ to obtain the model weight of the second hidden layer parameter and bias term W ₂ , b ₁₂ , b ₂₂ , encode the feature of the second hidden layer as h ₂ through the above formula (11), repeat this process until the DSA _N training is completed, and obtain the final feature h _N learned by its N hidden layers, The features learned by pre-training are used for the subsequent classification layer, and the model parameters of the Softmax classifier can be trained.

6. the chemical industry fault detection method based on particle swarm optimization and noise reduction sparse coding machine according to claim 5, is characterized in that, in step 3, the supervised pre-training Softmax classifier in described off-line training specifically passes following two steps to proceed:

(1) The Nth hidden layer feature hN learned after SDSA training with _N hidden layers is used as the input of the Softmax classifier model, and then the corresponding working condition label is attached y ⁱ =1 represents that the ith sample is normal, and y ⁱ =2 represents that the ith sample is faulty: first initialize its model parameter matrix θ, and predict the probability value of its sample belonging to each category according to the following Softmax classifier prediction function:

in, is the probability value of the i-th sample belonging to each category, and θ is the model parameter matrix of the Softmax classifier, which is represented by Vector composition; k is the number of categories defined by the classifier, where k=2; is the feature vector of the Nth hidden layer of the ith sample;

(2) Construct a classification loss function to obtain the optimal model parameters. The Softmax classifier used considers the regularization term, which can effectively avoid overfitting of the classification model training. The loss function is defined as follows:

Among them, the meaning of the 1{.} part is that if the index function in the _parentheses is true, it will return 1, otherwise it will _return 0; Loss function, call the matlab toolbox minFunc to optimize the minimum loss function, so as to obtain the optimal parameter matrix θ of the Softmax classifier model.

7. the chemical industry fault detection method based on particle swarm optimization and noise reduction sparse coding machine according to claim 6, is characterized in that, in step 3, the BP algorithm in described off-line training global fine-tuning network parameter is specifically carried out by following steps :

Take all pre-trained model parameters {(W ₁ , b ₁₁ ), (W ₂ , b ₁₂ ), ..., (W _N , b _1N ), θ} as initial values, and take the pre-trained training samples as initial values X _trainwhite is the input layer. First, use equation (11) to perform layer-by-layer feature learning through the initial parameters of SDSA model, so as to obtain the feature matrix of the final hidden layer, and calculate the loss value of equation (13) through the feature matrix through the classification layer, and then The feedback propagation algorithm is used to optimize the global parameters, and all model parameters are updated once in each iteration, so that the loss value of equation (13) can be converged to the minimum, and the fine-tuning process is completed.

8. The chemical industry fault detection method based on particle swarm optimization and noise reduction sparse coding machine according to claim 7, is characterized in that, in step 3, the particle swarm optimization adjustable hyperparameter in described offline training specifically passes following steps conduct:

(1) Optimizing three key adjustable parameters, namely: regularization weight decay hyperparameter λ _r , weight hyperparameter β controlling sparsity penalty term, and Softmax weight decay term coefficient λ _sm . Three adjustable parameters constitute a single particle (λ _r , β, λ _sm ) optimized by PSO, so the dimension of each particle is 3 dimensions; define N _p particles for simultaneous optimization, set K iterations, and initialize the initial value of each particle Position and speed, the fitness function value of which is the overall accuracy of the training samples. The calculation method uses these three key adjustable hyperparameters as independent variables. After training the above fine-tuned fault detection model, the overall training sample can be obtained. Accuracy, its formula is defined as the following formula:

Among them, pi represents the category of the ⁱ -th training sample predicted by the fault detection model, and ^yi represents the category of the i-th training sample in the artificially given label. If the two are equal, it will return 1. If they are not equal, it means that the current sample is wrongly diagnosed by the model, and the return value is 0; the optimal fitness and position of each particle and the global optimal fitness and position of the particle swarm are calculated through the above formula (14);

(2) Update the velocity and position of each particle, and suppose that the position of the particle under the t-th iteration can be expressed as The flight speed of each particle in the t-th iteration can be expressed as The PSO algorithm updates the velocity and position of each particle by the following formulas:

Among them, W _p is the inertia coefficient, C ₁ is the acceleration coefficient of the particle tracking its own historical optimal value, indicating the particle's cognition of itself; C ₂ is the acceleration coefficient of the particle tracking group optimal value, indicating the particle's cognition of the group knowledge , namely social knowledge, usually set C ₁ =C ₂ =2; t is the current number of iterations; ξ and η are uniformly distributed random numbers in the interval [0, 1]; is the position of the optimal fitness experienced by particle i up to the t-th iteration; gb ^t is the best position experienced by the particle swarm up to the t-th iteration;

(3) Retrain the unsupervised pre-training SDSA of the above fault detection model, the supervised pre-training Softmax classifier, and the BP algorithm to globally fine-tune the network parameters to obtain a new global optimal fitness value and position, and use the formula (15- 16) Continuously update the position and speed of the particle, and terminate the optimization when the number of iterations exceeds its defined K iterations, so as to obtain the particle position under the global optimal fitness, that is, under the optimal adjustable hyperparameter, its The detection accuracy rate of the training samples is the highest, thus completing the automatic tuning of hyperparameters; then set the optimized tunable hyperparameters as SDSA's optimized tunable hyperparameters, and retrain the determined network structure and the optimal tunable hyperparameters. All model parameters of SDSA under the optimal hyperparameters are obtained to obtain a trained fault detection model, whose model parameters can be used for feature learning and classification of test samples.

9. the chemical industry fault detection method based on particle swarm optimization and noise reduction sparse coding machine according to claim 8, is characterized in that, in step 4, the feature learning of described online monitoring and membership probability prediction are carried out by following steps:

Taking the preprocessed test sample X _testwhite as the input layer, according to the trained fault detection model, for the determined neural network structure, the layer-by-layer feature learning method is adopted, and the weight parameters of the trained noise reduction sparse auto-encoder are passed { (W ₁ , b ₁₁ ), (W ₂ , b ₁₂ ), ..., (W _N , b _1n )}, use formula (11) to perform feedforward learning to obtain the features of the current hidden layer of the test sample, and then use the current The features learned by the hidden layer are used as input, and the DSA model parameters of the next hidden layer are combined with the same formula to learn the features, and so on after all hidden layers are learned to obtain the final learned features, and then formula (12) is used for each membership. For category prediction, in fault detection, there are only two categories, namely fault state and normal state. Therefore, for a single sample, the predicted probability vector has only two values, and the predicted category with the largest probability value is selected as the membership predicted by the entire fault detection model. When the calculated affiliation category is 1, it means that the working condition is normal; when the affiliation category is 2, it means that there is an abnormal working condition, and then the connected alarm device will issue an early warning, indicating the fault is detected, and notify the technician at the same time. Or engineers can check the system safety and troubleshoot in time, so that the current working conditions can be monitored for faults.