Abnormal electrocardiogram recognition method based on ultra-complete characteristics
Technical field
What the present invention relates to is the method in a kind of signal processing technology field, specifically is a kind of abnormal electrocardiogram recognition method based on ultra-complete characteristics.
Background technology
Electrocardiogram (ECG) often is used to detect the cardiomotility situation on clinical medicine, has very important clinical value.Complete electrocardiographic wave of record on the electrocardiograph recorder chart, mainly comprise: P ripple (what occur at first in each ripple of electrocardiogram is the P ripple of two atrium process of excitation about representative), the QRS wave group (is represented the potential change of the excited communication process of two ventricles, typical QRS wave group comprises three continuous fluctuations, first downward ripple is the Q ripple, the ripple that narrow height makes progress behind the Q ripple is the R ripple, another the downward ripple that is connected with the R phase of wave is the S ripple), S-T section (finish begin by the QRS wave group horizontal line) to the T ripple, T ripple (be the lower and wide long electric wave of ripple of a wave amplitude behind the QRS wave group, after the excitement of reflection ventricle polarization process) again.The medical worker changes according to electrocardiographic wave and analyzes, diagnoses heart disease.But artificial diagnosis is subjected to the considerable restraint of individual Professional knowledge and clinical experience, and diagnosis speed is also slow simultaneously, can not handle in real time, and since the fifties end, along with development of computer, people have begun the research to the electrocardio automatic analysis technology.Waveforms detection identification and parameter extraction are the keys of electrocardio automatic analysis system, and its accuracy and reliability are determining the effect of diagnosis, and even relate to the success or failure of saving patient's life.Arrhythmia is to weigh an important indicator of cardiac electrical activity stability.Especially, some ventricular arrhythmias often are considered to the omen that life is on the hazard.The target that electrocardio is analyzed automatically is to carry out the arrhythmia diagnosis.At present used clinically electrocardio automatic analysis system all is that recording medium such as utilization tape was noted by guardianship in advance in a period of time, as three lead electrocardiogram in 24 hours, carries out analyzing and diagnosing by quick playback analytical system then.Be actually in conjunction with rhythm analysis and the analysis of QRS waveform attitude, each fought carry out the template classification, diagnose according to predetermined diagnostic criteria again.Because the arrhythmia form is various, the automatic waveforms detection of electrocardio is imperfection also, does not also have unified diagnostic criteria at present, we can say that arrhythmia analysis still is in the junior stage.
Find through literature search prior art, " Characterisation of electrocardiogram signals based on blind sourceseparation " (based on the isolating ecg characteristics in blind source) that M.I.Owis etc. deliver on " Medical and BiologicalEngineering and Computing " (bioengineering and calculating) (2003 the 1st phase 227-231 pages or leaves), propose in this article to classify with independent component analysis method extraction feature and to five class electrocardio heart beatings, be specially: electrocardiosignal is decomposed with independent component analysis method, selection 219 components wherein are as feature, used Bayes's minimum error grader then, nearest neighbor classifier, three kinds of graders of minimum distance classifier are contrasted respectively, finally show, use 219 components, nearest neighbor classifier obtains best result to 5 class electrocardiosignaies, can reach 100% classify accuracy to normal heart beating, other four classes are respectively 68.8%, 68.8%, 84.4%, 87.5%.Its deficiency is: can only reach good result to normal heart beating, other four classes accuracy are not high, and only 5 class electrocardiosignaies are classified, secondly because the component number that uses is too many, make the training stage computation complexity very high, if the characteristic vector dimension that adopts is too high, the performance in the time of will influencing identification, thus can not accomplish that real-time electrocardio discerns automatically.
Usually, a real-valued signal can represent that it is a kind of effective ways to the high-dimensional data space coding with the linear combination of one group of basic function, such as, Fourier or small echo can both provide a kind of effective expression to signal, but they can not clearly represent to owe to decide signal.Represent just become a kind of more general alternative method with so-called super complete base this moment, and it allows the dimension that outnumbers input signal of basic function.Super complete base by a series of complete base of combination (such as, Fourier, small echo) or on complete base, add some basic functions and form.Under super complete base, the decomposition of signal is not unique, but this can provide some advantages, and is at first very flexible in the structure of expression data, and can obtain more rarefaction representation when super complete base only comprises single class basic function.In addition, super complete expression has improved the stability that noise in the signal is represented.The present invention has adopted the characteristic of super complete base to extract the ultra-complete characteristics of electrocardiosignal just.
Summary of the invention
The objective of the invention is to overcome the deficiencies in the prior art, a kind of electrocardio automatic identifying method based on the ultra-complete characteristics extracting method is provided.Make it and obtain very high nicety of grading, thereby can carry out Classification and Identification to electrocardio in real time, can be used for clinical medicine and medical science monitor system 14 class arrhythmia data rapid extraction features.
The present invention is achieved by the following technical solutions, and the process that the present invention carries out automatic Classification and Identification to ECG signal mainly comprises three parts, data preprocessing part, feature extraction part, pattern classification part.The data preprocessing process at first uses the quadratic spline small echo that R point in the QRS wave group in the successive ECG signal (R wave crest point in the QRS wave group) position is detected, according to R point position electrocardiogram (ECG) data is carried out segmentation and data pretreatment, characteristic extraction procedure is the most critical part of the whole recognition effect of relation, it also is major part of the present invention, with independent component analysis method and discrete small wave converting method feature extraction is carried out in each heart beating respectively and obtained a super complete feature set, and utilize the mutual information method to carry out feature and shrink, at last, utilize support vector machine method that the characteristic vector of extracting is trained at last and obtain a supporting vector machine model, utilize this supporting vector machine model that new heartbeat data section is carried out automatic Classification and Identification.
Described data preprocessing part mainly is divided into two parts, at first is the R point position probing of QRS ripple, secondly is successive electrocardiosignal segmentation, and segmentation is to be based upon on the basis that R point position is accurately calculated.Because the electrocardiosignal right and wrong are stochastic signal stably, add the interference that is subjected to various signals in the electrocardio measuring process easily, and wavelet analysis method is typical Time-Frequency Analysis Method, in to R point position probing, use the wavelet multi-scale analysis method can detect R point position easily and accuracy rate very high.In the R point detects, use the quadratic spline small echo that successive electrocardiosignal is carried out multiple dimensioned decomposition, decomposing R point position of corresponding source signal on each yardstick, back all has a modulus maximum to corresponding with it, energy maximum on 3,4 yardsticks, and modulus maximum is to also more obvious.It is right to utilize this principle at first to find on 4 yardsticks of wavelet transformation greater than all modulus maximums of a certain thresholding (generally selecting maximum to get 2/3), modulus maximum is paired appearance, promptly exist a maximum corresponding with an adjacent minimum, may be not equal to the minimum number by the maximum number for some reason, then eliminate redundant module maximum according to a most adjacent minimizing method of each maximum correspondence, after having determined one to one modulus maximum, respectively 1,2, it is right to look for deserved modulus maximum on 3 yardsticks, the R point position of the corresponding original heart beating that last position of calculating the right middle zero crossing of each modulus maximum on the 1st yardstick just will be looked for.The R point is under the situation of 360Hz in sample rate, in order to obtain each complete heartbeat waveform after detecting and finishing, with R point position is the center, gets its 120 point before, and 180 o'clock afterwards as a heartbeat data section, the common so complete heartbeat waveform (P ripple, QRS ripple, T ripple) that all comprised.In order to reduce the influence of baseline drift to classification, need carry out normalizing operation to data, going average to make each heart beating all is zero-mean.Thereby the segmentation and the pretreatment operation of electrocardiogram (ECG) data have been finished.
Described feature extraction partly is the key component of electrocardio automatic diagnosis identification, also is main contents of the present invention, and the quality of feature extraction directly has influence on the accuracy rate and the reliability of identification.The feature extracting method that the present invention takes is to adopt independent component analysis method and wavelet analysis method to extract two feature sets respectively simultaneously to have formed a super complete feature set.Utilize feature contraction algorithm to find some features that classification is played an important role to carry out follow-up classification then, specify as follows based on mutual information:
(1) independent component analysis method: when extracting ecg characteristics, at first to utilize an isolated component feature of many sample training base, FastICA (fast independent component analysis method) thus being based on the fixing point algorithm optimizes maximal negentropy and finds isolated component, fast convergence rate and realizing easily.Choose 10000 normal heartbeat datas after the process data pretreatment segmentation as training sample, each heart beating is the column vector of one 300 dimension.Use the FastICA method to calculate the isolated component coefficient of 18 isolated component bases and each heart beating, original like this heartbeat data can be represented by the isolated component coefficient after the conversion, thereby reach the purpose of feature extraction, its essence is through conversion original signal projection to new isolated component feature base, finally train to obtain one and separate mixed matrix W.Thereby the isolated component decomposition that utilizes matrix W can calculate any heart beating then obtains the isolated component characteristic coefficient.
(2) wavelet analysis method feature extraction: at first utilize discrete wavelet DB8 (Daubechies series wavelet basis) that multiple dimensioned decomposition is carried out in the heart beating after the segmentation, because the electrocardio energy mainly concentrates between 0.5Hz and the 40Hz, after the decomposition of wavelet multiresolution rate, the HFS that the first order is decomposed mainly is a noise section, and fourth stage low frequency part has been represented the main energy of heart beating, and owing to eliminated the influence of most of noise and baseline drift, thereby the nuance that utilizes the low frequency part of the 4th yardstick can eliminate the same type heart beating as the character representation of former heart beating reaches the cluster effect.Become 32 dimension wavelet coefficients through 300 original behind wavelet transformation dimension heart beating vectors, be called the wavelet character collection.
(3) feature contraction process: at first the feature set of the feature set of isolated component extraction and wavelet analysis extraction is combined as a super complete feature set, because some feature wherein is very little to the effect of classification, in fact there are some uncorrelated or redundant features in the feature set, in order to improve classification performance and to reduce amount of calculation, need therefrom select part correlation degree height and big feature is contributed in classification.The principle that feature is shunk is to utilize the mutual information method to calculate the mutual information of each feature and heart beating type, sets a threshold value then and selects the bigger feature of mutual information, and the character subset that keeps former feature set 80% shrinks the result as final feature.
Described pattern classification process is to utilize above-mentioned feature extraction and feature to shrink the operation that the result carries out pattern classification, method for classifying modes is a lot, support vector machine is to construct optimum hyperplane in sample space or feature space, make that the distance between hyperplane and the inhomogeneity sample set is maximum, support vector machine is simple in structure and have global optimum's property and a generalization ability preferably.Use is trained based on the sample space of support vector machine after to the training sample feature extraction of the radially base of gaussian kernel, obtains a supporting vector machine model.Can utilize the supporting vector machine model that trains to classify and provide recognition result new heartbeat data.
The invention has the beneficial effects as follows: can obtain the higher automatic recognition accuracy of the electrocardio rhythm of the heart.Owing to utilize the wavelet character extracting method, can reduce the individual variation between the heart beating of the same type, thereby having guaranteed the identification ability of heartbeat waveform, utilized the isolated component method can extract implicit isolated component feature simultaneously, is independently between the characteristic coefficient of extraction.The method of this invention is tested on the ARR data base of MIT-BIH (the arrhythmia data base that Massachusetts Institute Technology provides), and the heart beating type identification accuracys rate different to 14 classes can reach 98.65%.
Description of drawings
Fig. 1 is a heart real time identifying schematic flow sheet.
Entire identification process mainly comprises two parts among the figure, the first step: the process of utilizing the normal data among the MIT-BIH arrhythmia data base to train obtains isolated component feature base, feature contraction mode and support vector machine training pattern after the training.Second step: at first utilize small wave converting method to real-time continuous electrocardiosignal segmentation, isolated component feature base, the feature of utilizing training process to obtain then shrinks mode and supporting vector machine model is discerned the heart beating of the unknown.
Fig. 2 is a feature contraction method sketch map.
Represented among the figure in training process, ultra-complete characteristics concentrate 50 features respectively with the mutual information of heart beating type.Can find out among the figure that the mutual information between some feature and the heart beating type is very little, can think redundant or incoherent feature, they are also very little to the effect of classification, size according to mutual information, set a thresholding, keep 80% bigger feature of mutual information wherein, and note the feature locations of reservation.
The specific embodiment
Below in conjunction with accompanying drawing embodiments of the invention are elaborated: present embodiment has provided detailed embodiment and process being to implement under the prerequisite with the technical solution of the present invention, but protection scope of the present invention is not limited to following embodiment.
Present embodiment is primarily aimed at 14 kinds of arrhythmia heartbeat datas and diagnoses identification, data among the The data MIT-BIH arrhythmia data base, 14 types comprise: the atrial extra-systolic of left bundle branch block, right bundle branch block, atrial premature beat, premature ventricular beat, distortion, pace-making heart beating, premature AV junctional beats, ventricle and normal heart beating fusion, atrial flutter, atrioventricular junction escape beat, ventricular escape, pace-making and normal fusion heart beating, room early pass down, normal heart beating.
As shown in Figure 1, details are as follows for implementation procedure:
1. data pretreatment.From MIT-BIH arrhythmia data base, take out 14 types heartbeat data, according to segmentation method, each heart beating is the center with the R point, 120 points are got in the front, the back get 180 points 300 totally o'clock as a heartbeat data, the data volume of some Exception Type is smaller, and the data volume maximum of normal heart beating, so get the normal heart beating of part.Take out the back all heartbeat datas are removed equal Value Operations, making each heart beating all is zero-mean.
2. feature extraction:
Concentrate 10000 heart beatings of taking-up to train the isolated component base from normal heartbeat data, the matrix X that forms a 300*10000, each classifies a normal heart beating as, adopt the FastICA method that this matrix is carried out the isolated component decomposition and be provided with keeping 18 isolated components, obtain one at last and separate mixed matrix W and isolated component matrix S.W inverse of a matrix matrix A promptly is the isolated component base that obtains.Be exactly all heart beatings that need classify to be projected to obtain the isolated component characteristic coefficient on this base below.All types of heartbeat data matrix X are obtained independently isolated component coefficient by taking advantage of with the right side of W matrix, each row promptly is the isolated component characteristic coefficient of respective column among the former heartbeat data matrix X in the matrix S that obtains, and each heart beating is just become the isolated component feature of 18 dimensions by 300 original dimensions.
All heartbeat data matrix X (each row represent a heart beating) through discrete wavelet DB8 conversion and obtain the low frequency part of the 4th yardstick, are become 32 through 300 original behind wavelet transformation dimension heart beating vectors and tie up wavelet characters.
The characteristics combination of the feature of independent component analysis method extraction and small echo extraction is formed a super complete feature set, and intrinsic dimensionality is 50 dimensions.Utilize the mutual information method to calculate the mutual information of each feature and this heart beating type, the size of all characteristic use mutual informations is arranged from big to small, select to keep bigger 80% feature and remove remaining feature of 20%, obtain a new characteristic vector, and note the coordinate position of keeping characteristics, thereby finished the feature contraction process.
3. training supporting vector machine model
The feature set that obtains after utilizing all types of heartbeat datas through feature extraction and feature contraction is as the input of support vector machine, at first all input feature vectors are all zoomed between-1 to 1 before the support vector machine training, and write down the scaling of each feature through zoom operations.The support vector machine kernel function adopts gaussian radial basis function, the selection of parameter c and g is very big to the classification performance influence of support vector machine, in order to find best parameter, training set data is divided into 5 five equilibriums, recycling wherein four parts as training data and remaining portion as test set, obtain average cross validation accuracy rate, calculate c respectively and in different spans, obtain the cross validation result with g, therefrom find best parameter c and g, utilize the optimized parameter c and the g that find that training set data is trained and supported vector machine model.
4. electrocardio classification
After training supporting vector machine model, can carry out automatic diagnosis identification to new heartbeat data, and provide recognition result, at first from the continuous electrocardiogram (ECG) data that collects, utilize the quadratic spline wavelet-decomposing method to obtain the wavelet analysis result of four yardsticks, on the 4th yardstick, calculate maximum earlier, utilize peaked 2/3 find greater than all modulus maximums of this threshold value as threshold value right, modulus maximum should be paired appearance, promptly exist a maximum corresponding with an adjacent minimum, may be not equal to the minimum number by the maximum number for some reason, then eliminate redundant module maximum according to a most adjacent minimizing method of each maximum correspondence, after having determined one to one modulus maximum, respectively 1,2, look for corresponding modulus maximum right on 3 yardsticks, the R point position of the corresponding original heart beating that last position of calculating the right middle zero crossing of each modulus maximum on the 1st yardstick just will be looked for.After finding the R point, be that the front, center is got 120 some back and got at 180 o'clock as a heartbeat data section with the R point.After the heartbeat data section cuts, carry out data and remove equal Value Operations, utilize characteristic extraction procedure to obtain characteristic vector, the feature locations selection Partial Feature wherein of feature contraction process record is as characteristic vector during according to training, then this characteristic vector by when training each feature scaling carry out behind the convergent-divergent prediction of classifying of input support vector machine, and obtain recognition result.
From above embodiment as can be seen, test on the ARR data base of MIT-BIH, the electrocardio type identification accuracys rate different to above-mentioned 14 classes can reach 98.65%.