CN115577254B

CN115577254B - ReliefF feature selection method for carrying out random multi-subspace aiming at big data

Info

Publication number: CN115577254B
Application number: CN202210139889.6A
Authority: CN
Inventors: 李艳颖; 张宝双; 柴政; 龚欢欢; 张姣妮; 王夏琳
Original assignee: Baoji University of Arts and Sciences
Current assignee: Baoji University of Arts and Sciences
Priority date: 2022-02-16
Filing date: 2022-02-16
Publication date: 2025-11-21
Anticipated expiration: 2042-02-16
Also published as: CN115577254A

Abstract

This invention proposes a ReliefF feature selection method for big data with random multi-subspaces, comprising: S1, dividing the original feature space to generate multiple feature partitions containing several non-overlapping random subspaces of the same size; S2, using ReliefF or the Relief algorithm in each random subspace to obtain the local weights of the features, and then combining the local weight vectors of the random subspaces in each feature partition to obtain the full weight vector; S3, integrating the full weight vectors of multiple feature partitions into the final weight vector of each feature, i.e., averaging the weight scores of multiple feature partitions to obtain the final weight vector of all features. This invention fully considers the diversity of subspaces and the contribution information of samples to features during the feature selection process, and has the ability to explore each subspace in a high-dimensional space.

Description

ReliefF feature selection method for carrying out random multi-subspace aiming at big data

Technical Field

The invention relates to the technical field of data mining methods, in particular to a ReliefF characteristic selection method for random multiple subspaces aiming at big data.

Background

With the increasing volume of high-dimensional data generated by different disciplines, enormous opportunities and challenges are presented to the study of data mining, knowledge discovery, and pattern recognition. Big data typically contains unimportant features, so how to select an optimal feature subset from feature space has become an important area of research. Feature Selection (FS) techniques, an important preprocessing method for classification problem dimension reduction, aim to identify a set of relevant, information-rich features from a high-dimensional feature space.

Existing feature selection methods based on different evaluation criteria are generally classified into three types, i.e., a filtering method, a wrapping method, and an embedded method. The filtered approach first selects a subset of all features from the dataset and then trains the learner, the feature selection process being independent of the subsequent learning process (e.g., the Relief and ReliefF methods). The wrapped feature selection method directly uses the performance of the model as an evaluation criterion for the feature subset (e.g., LVW method), and therefore, it is superior to the filtered method. However, since the learner needs to be trained multiple times during feature selection, it is typically much more costly than filtering selection. The embedded feature selection method integrates the feature selection process with the learner training process, and is completed in the same process (e.g., LASSO method).

Feature Selection (FS) is an NP-hard problem when feature space is high-dimensional, because the best subset needs to be found from 2 ^d -1 possible subsets on a given dataset with d features. This presents a significant challenge for search strategies for feature selection. Thus, most existing feature selection methods identify feature subsets that accurately preserve the original feature space structure by estimating the underlying structure of the dataset, which lack the ability to explore each subspace in a high-dimensional space, while they do not consider the contribution of the sample to the feature selection model, i.e., ignore the diversity of the sample.

Disclosure of Invention

The invention aims at least solving the technical problems existing in the prior art, and particularly creatively provides a ReliefF characteristic selection method for carrying out random multi-subspace on big data.

In order to achieve the above object of the present invention, the present invention provides a ReliefF feature selection method for random multi-subspace for big data, comprising:

S1, dividing an original feature space to generate a plurality of feature partitions which comprise a plurality of random subspaces with the same size and without intersecting each other;

S2, using ReliefF or a Relief algorithm in each random subspace to obtain local weights of the features, wherein the local weight vectors reflect the local holding capacity of the random subspace and the contribution capacity of the samples to the features. Wherein the random subspace has a plurality of features.

Combining the local weight vectors of the random subspaces in each feature partition to obtain a full weight vector;

And S3, integrating the full weight vectors of the feature partitions into the final weight vector of each feature, namely, obtaining the final weight vector of all the features by averaging the weight scores of the feature partitions.

Further, the step S1 includes:

representing the feature partition as:

Wherein, the Representing an ith feature partition;

P ^(i,j) represents A j-th random subspace of (a);

s represents The number of random subspaces;

Randomly sampling [ d/s ] each time and not repeatedly sampling when each feature partition is generated until all the features are sampled, and forming a random subspace by the residual features when the residual features are less than [ d/s ], wherein [ d/s ] represents a d/s truncated and rounded value, and d represents the total number of the features;

Will be The j-th random subspace of (a) may be marked as:

Wherein f ₁ ^(i,j) represents the 1 st feature in the random subspace P ^(i,j);

f ₂ ^(i,j) denotes the 2 nd feature in the random subspace P ^(i,j);

represents the d _ij th feature in the random subspace P ^(i,j);

d _ij represents the number of features in the random subspace P ^(i,j);

repeatedly and randomly generating feature partitions, resulting in a set of multiple feature partitions can be expressed as:

Wherein, the Representing the generated 1 st feature partition;

Representing the generated 2 nd feature partition;

representing the generated Mth feature partition;

M represents the number of feature partitions.

Further, the feature partition comprises s random subspaces, and each random subspace has the same number of [ d/s ] features, and if d cannot integrate s, the remainder of the remaining features self-form a random subspace.

Further, the obtaining the local weights of the features in S2 using ReliefF or Relief algorithms in each random subspace includes:

calculating local feature weights for features f _l ^(i,j)(l＝1,2,...,d_ij) in random subspace P ^(i,j) using ReliefF or Relief algorithms for the features;

the local weight vectors for all features in the random subspace P ^(i,j) can then be expressed as:

Wherein w (f ₁ ^(i,j)) represents the local weight of the 1 st feature in the random subspace P ^(i,j);

w (f ₂ ^(i,j)) represents the local weight of the 2 nd feature in the random subspace P ^(i,j);

Local weights representing the d _ij th feature in the random subspace P ^(i,j);

d _ij represents the number of features in the random subspace P ^(i,j);

(. Cndot.) ^T represents the transpose of the matrix.

Further, the ReliefF algorithm obtains the weight formula of the feature f _l ^(i,j)(l＝1,2,...,d_ij) in the random subspace P ^(i,j) as follows:

Wherein w _m(f_l ^(i,j)) represents the local weight of the first feature in the random subspace P ^(i,j), the value of which is calculated recursively m times (the number of samples), and w _m-1(f_l ^(i,j)) represents the local weight of the feature f _l ^(i,j) in the random subspace P ^(i,j), the value of which is calculated recursively m-1 times;

f _l ^(i,j) denotes the first feature in the random subspace P ^(i,j);

diff (f _l ^(i,j),R,H_h) represents the difference between samples R and H _h under feature f _l ^(i,j) in the random subspace P ^(i,j);

diff (f _l ^(i,j),R,M_h) represents the difference between samples R and M _h under feature f _l ^(i,j) in the random subspace P ^(i,j);

h _h denotes selecting the H nearest neighbor of sample R from samples having the same class as sample R;

M _h denotes selecting the h nearest neighbor of sample R from samples having a different class from sample R;

m represents the sampling times;

k represents the total number of nearest neighbors;

q+.class (R) represents a collection of classes different from the class of sample R;

q represents the set Q;

class (R) represents the class of sample R;

p (·) represents the probability that a given class occupies all classes.

The nearest neighbor is the sample nearest to sample R.

Further, the full weight vector in S2 includes:

each feature in the original feature space is only in a feature partition Once out of s random subspaces, can be obtained by concatenating the local weight vectors of the feature partition s subspacesIs expressed as:

Wherein the method comprises the steps of Representing feature partitionsIs a full weight vector of (a);

w (f ₁ ⁽ⁱ⁾) represents the local weight vector of f ₁ ⁽ⁱ⁾;

f ₁ ⁽ⁱ⁾ denotes the 1 st feature in the original feature space;

w (f ₂ ⁽ⁱ⁾) represents the local weight vector of f ₂ ⁽ⁱ⁾;

f ₂ ⁽ⁱ⁾ denotes the 2 nd feature in the original feature space;

w (f _d ⁽ⁱ⁾) represents the local weight vector of f _d ⁽ⁱ⁾;

f _d ⁽ⁱ⁾ denotes the d-th feature in the original feature space;

(. Cndot.) ^T represents the transpose of the matrix.

Further, the step S3 includes:

the final weight vector is expressed as:

Wherein the method comprises the steps of Representing the final weight vector;

Representing feature partitions Is a full weight vector of (a);

representing an ith feature partition;

M represents the number of feature partitions;

then calculate the final weight vector If the feature weight w (f _i) in the final weight vector is greater than w _avg, then selecting the feature as a subset of the selected features.

Further, the method further comprises the step of taking the precision, the precision and the recall of the KNN classifier and/or the DT classifier as evaluation indexes of the method, wherein the classification precision of the KNN classifier and the DT classifier is expressed as follows:

Wherein E (f; D) represents the classification accuracy of KNN and DT classifiers;

d represents a training dataset;

f represents a KNN classifier or a DT classifier;

n represents n dimension;

Is an indication function;

f (x _i) represents the prediction result of the classifier in the ith sample;

c _i represents the true class label of the classifier in the ith sample.

In summary, by adopting the technical scheme, the invention fully considers the diversity of subspaces and the contribution information of samples to the characteristics in the characteristic selection process, and has the capability of exploring each subspace in a high-dimensional space.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:

FIG. 1 is a radar chart of the invention for the average accuracy, precision, recall of 26 data sets under KNN and DT classifiers.

Fig. 2 is a box plot over different data sets under the KNN classifier of the present invention.

Figure 3 is a box plot over different data sets under the DT classifier of the present invention.

Fig. 4 is a schematic representation of the total average selected feature count over 26 data sets of the present invention.

Fig. 5 is a graph of the overall average characteristic reduction rate over 26 data sets according to the present invention.

Detailed Description

Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.

1. Introduction to the invention

Here we use the framework of filtering methods for feature selection, as they are superior to embedded and wrapped methods in terms of simplicity, scalability and independence.

In view of the two problems of the background art, a new feature selection method RBEFF is presented herein. The main idea is to use a large number of random subspaces divided from the original feature space to select a set of feature subsets containing a large amount of information. During the learning process, it is important that each feature should compete fairly in a large number of random subspaces. RBEFF consists of three steps. First, the original feature space is divided into a plurality of disjoint feature subspaces with equal size, called a feature partition, where the number (size) of feature partitions is M, and the subspace size in each feature partition is d/s. This process was repeated M times. Thus, in this step we get M feature partitions containing a large amount of subspace. And secondly, obtaining the local weight of each feature of the subspace by calculating the local weight of each feature in each random subspace by using ReliefF algorithm to form a local weight vector of the random subspace, wherein the local weight vector reflects the local holding capacity of the random subspace and the contribution capacity of a sample to the feature, namely, because the features of the random subspace are different, k similar neighbors and different similar neighbors of one sample are different, and each feature is given weight with different values in different subspaces. Since ReliefF algorithm samples randomly m times, the information of different samples is utilized to measure the weight size of the feature. These local weight vectors in each feature partition constitute a full weight vector, and the full weight vectors in the M feature partitions are integrated to obtain a final weight vector of the feature space. Finally, feature selection is adaptively performed according to the final weight vector, i.e., an optimal feature subset is selected to represent all features without setting the selected feature number. A number of experiments were performed on 26 standard datasets, and the results indicate the validity and effectiveness of the algorithm.

2. Background

2.1 Related work for feature selection

The problem of feature estimation has attracted a great deal of attention in the literature. There are several ways in which the quality of a feature can be assessed. If the target variable is discrete, the metric values include information gain, coefficient of parities, distance metric, j-metric, relief, reliefF, MDL, χ ², and G statistic. When the target variable is a real valued function (numerical class and regression problem), mean square, mean absolute error and RReliefF are used. Most metrics assume that features have conditional independence (based on target variables) and therefore are unsuitable when designing a large number of feature interactions. Relief, reliefF and the RReliefF algorithm do not make this assumption and can correctly estimate the quality of the feature in the case of strong dependencies between attributes. Therefore, the Relief series algorithm is one of the most excellent preprocessing algorithms so far.

The Relief series algorithm is an efficient filtering feature selection algorithm. Kira and Rendell first proposed a feature selection algorithm, relief, for binary classification problems in 1992. The Relief algorithm is widely used because of being relatively simple, but high in operation efficiency and satisfactory in result. However, it is limited in that it can only process two types of data. Therefore Kononeill analyzed and expanded ReliefF algorithm in 1994 that could be used to deal with noise, imperfections and multiple classes of problems. Regarding how ReliefF is applied to the continuous class (regression) problem, the RReliefF algorithm was generated in 1997. After this, a number of feature selection algorithms based on the Relief series algorithm have been proposed. The Relief algorithm family (Relief, reliefF and rreieff) has been widely used to solve practical problems in life, such as signal recognition, transmission line fault localization, image processing and image classification, and gene classification, pattern recognition, etc., due to its excellent performance.

2.2Relief feature selection algorithm

The Relief algorithm is a feature weighting algorithm. And giving different weights to the features according to the relevance of each feature and category. Features with weights less than a certain threshold will be deleted. In the Relief algorithm, the correlation between features and categories is based on the ability of the features to distinguish between close range samples. D is divided into D ⁺ = { positive example sample } and D ^- = { negative example sample }. The algorithm randomly selects one sample r= (f ₁,f₂,...,f_d) from the training data set D, then searches for the nearest positive sample Z ⁺ of R from D ⁺ = { positive sample } and searches for the nearest negative sample Z ^- of R from D ^- = { negative sample }. The weight of each feature is then updated according to the rules that if R is a positive example sample, then near_hit=z ⁺ and near_miss=z ^-, instead near_hit=z ⁺ and near_miss=z ^-, where the nearest neighbor is found using euclidean distance. We then update each weight of the data using equation (1). Wherein diff is calculated from equation (2) when the two variables x _k and y _k are nominal values, and from equation (3) when x _k and y _k are values. repeating the above process for m times to finally obtain the average weight of each feature. The pseudocode of the Relief algorithm is shown in algorithm 1. The larger the feature weight, the more classification capability the features. Conversely, the weaker the classification capability of a feature. The runtime of the Relief algorithm increases linearly with the number of samples m and the number of raw features d, and therefore the running efficiency is high.

W_i＝W_i-diff(R_i,near_hit_i)²+diff(R_i,near_miss_i)²,(i＝1,2,...,d), (1)

W _i represents the weight of the ith feature, the value of which is calculated recursively by m (the number of samples);

w _i represents the cumulative weight of the samples of the ith feature m times, and finally the weight of each feature divided by m is averaged.

Diff (R _i,near_hit_i) represents the difference under feature i between a randomly drawn sample R and its nearest neighbor_hit of the same class.

Diff (R _i,near_miss_i) represents the difference in feature i between a randomly drawn sample R and its different nearest neighbors.

R _i represents the value of the randomly drawn sample R under the feature i.

Near_hit _i represents the value of the nearest neighbor near_hit of the same class of randomly drawn samples R under feature i.

Near_miss _i represents the value of the nearest neighbor_miss of a different class of randomly drawn samples R under feature i.

Where v _k is a normalization unit that normalizes the value of diff to the [0,1] interval, e.g., min-max normalization formula that normalizes x to [0,1]Because x _k and v _k are vectors, max refers to the largest value in the vector and min refers to the smallest value in the vector.

2.3 ReliefF feature selection algorithm

In dealing with multiple classes of problems, the ReliefF algorithm randomly extracts one sample r= (f ₁,f₂,...,f_d) at a time from the training dataset D. K nearest neighbors H _h (h=1, 2,..k) of R are selected from samples of the same class as R, and k nearest neighbors M _h (h=1, 2,..k) of R are found from samples of a different class than R, wherein k nearest neighbors are found using euclidean distance. The above process was repeated m times. We then update the weights for each feature using equation (4), where diff is calculated by equation (5). Thus, we do feature selection based on the weight of each feature and a given threshold. The pseudo code of ReliefF algorithm is described in algorithm 2.

(l=0,1,...,d_ij)

Wherein w _m(f_l ^(i,j)) represents the local weight of the first feature in the random subspace P ^(i,j), the value of which is calculated recursively m times (the number of samples);

w _m-1(f_l ^(i,j)) represents the local weight of feature f _l ^(i,j) in the random subspace P ^(i,j), the value of which is calculated by recursion m-1 times;

f _l ^(i,j) denotes the first feature in the random subspace P ^(i,j);

m represents the sampling times;

k represents the total number of nearest neighbors;

q represents the set Q;

class (R) represents the class of sample R;

p (·) represents the probability that a given class occupies all classes.

The nearest neighbor is the sample nearest to sample R.

Where diff (A, R ₁,R₂) represents the difference between samples R ₁ and R ₂ under feature A, R ₁ [ A ] and R ₂ [ A ] represent the values of samples R ₁ and R ₂ under feature A, respectively, and maxA and minA represent the maximum and minimum values of all samples on feature A.

3. Proposed algorithm

3.1 Symbol description

Given the classification dataset d= (X, C),Representing a data matrix of n samples and d features, whereinRepresenting the i-th sample (i-th row of matrix X), i.e., X _i is d-dimensional (has d features).Represents the j-th feature (j-th column of matrix X).Representing class vectors. Feature selection refers to the selection of some of the most efficient subsets of features from the original features by sufficiently estimating the underlying structure of the data, thereby preserving the structural information of the data. We express the matrix of this subset as T '= (f ₁',f₂',...,f_d′'), where T 'is the submatrix of X ^T and d' represents the number of features selected. Wherein T 'represents the data matrix formed by the selected features, X ^T represents the transpose of X, d' represents the number of features selected by the features, and d represents the total number of features in the data set.

3.2 Generation of random subspaces

The data structures are often hidden in different subspaces of the original feature space. However, it is difficult to estimate the correct feature subspace and true structural information. Many existing feature selection methods tend to describe the hidden structure of data by taking into account the minimum redundancy of global and local information or samples. However, in a high-dimensional feature space, the diversity of random subspaces tends to be ignored and a large amount of structural information tends to be hidden. While some efforts have been made to describe learning structural information in subspaces, they implicitly assume that there is a single optimal subspace in a high-dimensional space, and that a single subspace cannot be exceeded to explore multiple, even large, subspaces. Thus, the diversity of subspaces is ignored. To address this problem, the goal herein is to mine the information between features and class attributes from subspaces. Since it is difficult to find a small portion of the appropriate subspace, we have employed a random subspace technique and verified that a large number of subspaces can significantly improve the accuracy of feature selection.

By using a large number of random subspaces, we hope that each feature will compete fairly in the random subspace, where each feature should show the same number in all subspaces, and the resulting subspaces should be of the same size. Therefore, in this section, we propose a scheme to generate random subspaces. The main idea is to generate a plurality of feature partitions from the original feature space. Each feature division is a completely random division of all features, and consists of a certain number of random subspaces with the same size, so that all features appear in the subspaces the same number of times. Thus, each feature participates fairly in fair competition between the generated subspaces.

Order theRepresenting the ith feature partition, P ^(i,j) representsA j-th random subspace of (a). The feature partitions can then be expressed as

Wherein s representsThe number of random subspaces in the matrix. Because ofThe size of the random subspace in (c) should be the same, so we let the size of the random subspace be d/s in this document. However, in most cases in reality, the value of d/s is not an integer. Therefore, we do a truncated rounding setting for d/s. Thus, as each feature partition is generated, we randomly sample [ d/s ] features each time and do not repeat the sampling until all features have been sampled, and when the remaining features are less than [ d/s ], the remaining features constitute a random subspace. Where [ d/s ] represents the value of d/s truncated rounding. It should be noted that when d cannot divide s all together, we actually generate s+1 random subspaces. The number of subspaces in each feature partition is denoted herein collectively as s. Is provided withRepresents the kth feature in the random subspace P ^(i,j), where d _ij represents the feature number in the random subspace P ^(i,j). Thus, the first and second substrates are bonded together,The jth random subspace of (2) may be marked as

Thus, we can clearly conclude that in one feature partition, each feature appears only once in multiple random subspaces. That is, the different random subspaces of a feature partition do not intersect each other. That is, for the feature space P ^(i,j), forHas the following componentsThis is true.

We repeatedly and randomly generate feature partitions so that a set of multiple feature partitions can be represented as

Wherein, the Representing the generated i-th feature partition, and M represents the number of feature partitions. For one dataset, each feature partition contains s random subspaces, and each random subspace has the same number of features. Thus, the first and second substrates are bonded together,There are m·s random subspaces of the same size. According to the generation mechanism of the random subspace, each feature in the original dataset appears M times in the random subspace of m·s.

3.3 Weight learning and feature selection for each feature

In this section we use random subspaces to describe the underlying information structure of the data, and choose information features that preserve the spatial structure well. This process is feature selection.

For a total of m·s random subspaces, each local feature weight is obtained in each subspace according to the ReliefF algorithm (introduced in section 2.3). The k nearest neighbors of the randomly selected samples are not exactly the same, due to the different features contained in each random subspace. In sample space, a sample is randomly extracted under a given random subspace, and k nearest neighbors of the sample are found to update the weights. Thus, the underlying structure of data can be described as much as possible by a large number of random subspaces.

Given one random subspace P ^(i,j), for the feature f _l ^(i,j)(l＝1,2,...,d_ij in the random subspace P ^(i,j)), its local weight can be calculated by the equation (4) introduced in section 2.3, W (f ₁ ^(i,j)) corresponding to W (f _i) of equation (4), i.e., the local feature weight of the first feature in the current subspace in the jth random subspace of the ith feature partition is calculated by equation (4) (i in equation (4) represents the ith feature of the original feature space). The local weight vectors for all d _ij features in the random subspace P ^(i,j) can then be expressed as

Where w (f _l ^(i,j)) represents the local weight of the first feature in P ^(i,j). Each feature in the original feature space is only in a feature partitionOnce out of s random subspaces of (a) can be obtained by concatenating the local weight vectors of its s subspacesThe full weight vector (d is the number of all features) of (a) can be expressed as

Where f _j ⁽ⁱ⁾ (i=1, 2,., d) is a feature in the original feature space.

The final weight vector is obtained by taking the average of the full weight vectors of the M feature partitions. The final weight vector can be expressed as

Thus, the final weight vector for all features can be obtained by reconciling the number of random subspaces. The average of the final weight vectors is then calculated and noted as w _avg. For feature weights in the final weight vector, if w (f _i) is greater than w _avg, then the feature is selected as a subset of the selected features.

3.4 Overall algorithm

In this section we will summarize the whole procedure of the proposed algorithm, the detailed steps of which are shown in algorithm 3.

The main idea of this algorithm is to generate a large number of random subspaces and to obtain the weights of the features in each subspace. Finally, combining the information of the multiple subspaces to obtain the weight scores of all the features. In order to ensure that each feature of the original dataset can fairly compete and cooperate in the generated subspace in the process of acquiring the weight of each feature, a plurality of random subspaces which are the same in size and have no intersection should be generated in each feature partition. In generating the random subspace, two parameters, namely, the number of feature partitions M and the number of random subspaces s contained in each feature partition, need to be set. These two parameters are used to adjust the number of feature partitions and the random subspaces generated. We calculate the weights of the features and select the features according to the following. After a large number of random subspaces are generated, class attributes are added to each random subspace. Then, in each random subspace, using ReliefF algorithm to obtain local weight of each feature in the current subspace, obtaining local weight vector of the subspace, and connecting the local weight vectors of different random subspaces in the feature division in series to obtain full weight vector of all the features (i.e. vector obtained by each feature partition). Finally, the final weight vector of all the features is obtained by averaging the weight scores of the feature partitions (i.e., each feature is given a different weight in the feature partitions, and the average is taken to obtain the final weight vector of each feature). Here we use the average of the final weight vectors for feature selection, i.e. if the final score of a feature is greater than the average, then that feature is selected. Thus, the selected feature subset is ultimately returned. (as shown in algorithm 3)

4. Experiment

We have performed experiments on 26 standard datasets with different feature numbers to demonstrate that the proposed method is indeed superior to several existing feature selection methods.

4.1 Data sets

26 Data sets were obtained from the UCI machine learning library to verify the validity of the proposed method. It should be noted that the "Ecoli" and "processed. Cleveland" datasets are applied to the algorithm after the subclass samples are deleted. Table 1 shows a brief description of the data set used, including the number of features, the number of samples, and the number of classes.

Table 1 dataset

4.2 Contrast Algorithm

We compare our approach to feature selection methods using the features used, and using different metrics throughout the feature space or subspaces. Details of the comparison algorithm are shown below.

Benchmark (Baseline) classification using all features.

Feature selection based on Laplace (LS) feature selection using the laplace score throughout the feature space.

Feature Selection (SRCFS) based on Yu Duozi spatial random collaboration the feature is selected by computing the Laplacian scores over multiple subspaces.

Feature selection (MI) based on mutual information feature selection is performed using mutual information throughout the feature space.

Feature selection based on ReliefF (ReliefF) feature selection is performed using ReliefF algorithm throughout feature space.

Feature selection (FJMI) of fuzzy joint mutual information by computing fuzzy mutual information to select features.

4.3 Evaluation index

The main purpose of the feature selection method is to select fewer features to achieve higher or similar accuracy. To verify the superiority of the algorithm, two well-known classifiers, K-nearest neighbor (KNN) and Decision Tree (DT), were used in the experiment. Three criteria of KNN and DT were used to measure the quality of the algorithm, namely accuracy, precision and recall.

4.3.1 KNN classifier

The KNN algorithm was originally proposed by Cover and Hart in 1968, which is one of the simplest and most commonly used classification algorithms. Therefore, it is widely used in machine learning. For a test sample, KNN finds its k-nearest neighbor in the training sample, and then classifies the test sample into most of its k-nearest neighbors. In the KNN algorithm, the number k of the neighbors needs to be specified, and the value of the parameter k has great influence on the classification precision of the KNN classifier. Depending on the values, the accuracy may vary considerably. Euclidean distance is typically applied in KNN algorithms, which are defined as equation (12).

Wherein ED represents Euclidean distance, X ₁ andTwo samples representing the d dimension, representing d features per sample. X _1,i represents one dimension of sample X ₁, and X _2,i represents one dimension of sample X ₂.

4.3.2DT classifier

Decision Time (DT) is a simple and common classifier in data mining. The purpose of the decision tree is to learn the model from the given training data and to predict the class of test samples. A decision tree is a tree structure in which each internal node represents a test of a feature, each branch represents a test output, and each leaf node represents a class. In the DT algorithm, the key to building a decision tree is which feature is selected as the basis for classification in the current state. The algorithm for building the tree is mainly three kinds of ID3, C4.5 and CART according to different objective functions. The main difference is that the selected objective functions are different. ID3 uses information gain, C4.5 uses information gain rate, and CART uses the coefficient of kunit. While the use of different objective functions has a great impact on the classification accuracy. Here we use the ID3 algorithm.

4.3.3 Classification Properties

(1) Classification accuracy

The classification accuracy of KNN and DT classifiers is expressed as:

Wherein n represents n dimensions, Is an oscillometric function, f (x _i) and c _i (i=1, 2,., n) represent the prediction result and the true class label of the classifier in n samples, respectively. E (f; D) represents the classification accuracy of the KNN and DT classifier, D in E (f; D) represents the training data set, and f represents the KNN classifier or the DT classifier.

(2) Precision and recall/recall

For the classification problem, the samples can be classified into four cases of true (true positive), false positive (false positive), true negative (true negative) and false negative (FALSE NEGATIVE) according to the combination of the true classification and the learner prediction classification. The classification problem can be generalized to the multi-classification problem. For the multi-classification problem, one of the classes is regarded as a positive example, the other classes are regarded as negative examples, and TP, FP, TN, FN respectively represents the corresponding sample numbers, so that it is obvious that tp+fp+tn+fn=total number of samples. The "confusion matrix" of the classification result is shown in table 2, and the precision and recall are calculated from the confusion matrix. The definition of the precision and recall is shown in formulas (14) and (15), the precision of the multi-class problems can be divided into macro precision and micro precision, and the recall of the multi-class problems can be divided into macro recall and micro recall. In this context, we use the macro precision and macro recall of the multi-class dataset.

Table 2 classified confusion matrix

4.3.4 Parameter settings

Here we used 10-fold cross-validation on our proposed method and other comparison methods, the distance measure of KNN used euclidean distance, and the KNN classifier selected 5 neighbors. Each algorithm was independently repeated 10 times. For the number of random spaces s, s is set to satisfy the size of 6 for each random subspace. Note that if the number of all features is not divisible by 6, then the remainder of the division by 6 is the size of the subspace. For fair comparison, the parameters in the compared algorithm and the parameters in the proposed algorithm should agree. The values of the specific parameters are shown in table 3.

Table 3 parameter settings for experiments

Parameter	Value(s)
		r The number of runs	10
K for cross validation	10
		k for KNN	5
the distance metric	Euclidean
		k for RBEFF	5
m the sample size for RBEFF	10
		M the feature partition	5

5. Experimental results

Experiments were performed here on 26 standard datasets with different feature numbers to compare our proposed method with several feature selection methods.

5.1 Classification precision

Tables 4 and 5 show the average classification accuracy of each feature selection method using KNN and DT classifiers. As can be seen from table 4, the proposed method RBEFF achieves the highest classification over 15 data sets under the KNN classifier compared to other methods, accounting for more than half of all data sets. As can be seen from table 5, the RBEFF method achieved the highest classification accuracy over the 12 datasets under the DT classifier compared to the other methods. The average classification accuracy for 26 data sets using KNN and DT classifiers on different algorithms is shown in the last row of tables 4 and 5. It can be derived that our algorithm achieves the highest average accuracy, whether KNN or DT classifiers are used. It can be concluded that our algorithm RBEFF has achieved good results in terms of classification accuracy.

Table 4 KNN average classification accuracy of dataset under classifier (darkened value indicates best results among several methods)

Table 5DT average classification accuracy of dataset under classifier (darkened value indicates best results among several methods)

Dataset	Baseline	ReliefF	MI	LS	SRCFS	FJMI	RBEFF
								Data.user	0.9016	0.8401	0.8381	0.9282	0.9256	0.9152	0.8435
Cryotherapy	0.8867	0.8256	0.8733	0.8667	0.8744	0.8567	0.8389
								Ecoli	0.9786	0.9705	0.9745	0.9395	0.9745	0.9723	0.9866
Seed1	0.9129	0.8971	0.8595	0.9219	0.9195	0.8314	0.9445
								Pima	0.7238	0.5921	0.6780	0.6134	0.6916	0.6864	0.6674
Biopsy	0.9476	0.9558	0.9447	0.9376	0.9407	0.9307	0.9505
								Glass	0.7830	0.7000	0.8381	0.8074	0.7888	0.8352	0.7961
Breast-cancer	0.9486	0.9490	0.9453	0.9404	0.9474	0.8486	0.9500
								Heart-failure	0.7727	0.7708	0.7602	0.7707	0.7677	0.7910	0.7760
Accent-mfcc	0.7622	0.6972	0.759	0.732	0.7413	0.6222	0.7235
								Wine	0.9323	0.926	0.9365	0.9381	0.9006	0.7124	0.9300
Processed.cleveland	0.5152	0.5664	0.4445	0.5003	0.4700	0.4248	0.5983
								Diabetes-data-upload	0.9442	0.9115	0.9375	0.9537	0.9440	0.9494	0.9446
Trial	1	1	1	1	1	0.8258	1
								Hepatitis	0.5225	0.6515	0.5425	0.5475	0.5977	0.4586	0.7075
Brands	0.632	0.6911	0.6433	0.6331	0.6315	0.6117	0.6873
								ForestTypes	0.848	0.7463	0.8396	0.8384	0.8451	0.6111	0.7542
Wdbc	0.7075	0.7292	0.7317	0.7608	0.7492	0.7633	0.7633
								Lung-cancer	0.9143	0.9086	0.9218	0.5763	0.9264	0.8868	0.9118
Wpbc	0.8761	0.882	0.8526	0.8484	0.8763	0.8654	0.8912
								Ionosphere	0.6984	0.6909	0.6455	0.6761	0.6584	0.5941	0.7471
Qsak	0.8161	0.7767	0.7968	0.8181	0.8049	0.744	0.7679
								Sonar	0.7515	0.7471	0.7722	0.781	0.7322	0.71	0.731
Urban land cover	0.7645	0.7584	0.7461	0.742	0.7175	0.5481	0.786
								Musk1	0.823	0.7784	0.7979	0.814	0.7707	0.81	0.7655
Clean 1	0.7922	0.7807	0.8059	0.8191	0.7056	0.7788	0.7731
								Average	0.73678	0.7897	0.7547	0.7022	0.7588	0.7171	0.8059

5.2 Precision ratio

Tables 6 and 7 show the average precision of each feature selection method under KNN and DT classifiers. As can be seen from table 6, the 11 data sets under the KNN classifier of the proposed method RBEFF achieved the highest accuracy than the other algorithms. It can also be seen from table 7 that the 10 data sets under the KNN classifier of the proposed method RBEFF achieved the highest accuracy than the other algorithms. Thus, it can be concluded that the method we propose performs better on half of the data set. The last row of tables 6 and 7 gives the total average precision of the 26 data sets using KNN and DT classifiers on different algorithms. It can be seen that our algorithm RBEFF achieves the highest average precision on both KNN and DT classifiers, indicating that RBEFF method performs well on precision.

Table 6 KNN average precision of data sets under classifier (darkened value indicates best results among several methods)

Table 7 DT average precision of data sets under classifier (darkened value indicates best results among several methods)

Dataset	Baseline	ReliefF	MI	LS	SRCFS	FJMI	RBEFF
								Data.user	0.9154	0.8411	0.8412	0.9233	0.9232	0.9162	0.9387
Cryotherapy	0.8949	0.8515	0.8870	0.8664	0.8725	0.8523	0.8612
								Ecoli	0.9606	0.9948	0.9672	0.9662	0.9889	0.9785	0.9989
Seed1	0.9145	0.8979	0.8503	0.9213	0.9147	0.8343	0.9150
								Pima	0.7943	0.6997	0.7613	0.7112	0.6538	0.6484	0.7273
Biopsy	0.9363	0.9452	0.9305	0.9149	0.9374	0.9293	0.9275
								Glass	0.7718	0.6875	0.8310	0.8017	0.7903	0.8384	0.7801
Breast-cancer	0.9382	0.9470	0.9401	0.9373	0.9448	0.8347	0.9474
								Heart-failure	0.6636	0.6579	0.6309	0.6582	0.7307	0.7625	0.6297
Accent-mfcc	0.7763	0.6975	0.7567	0.7283	0.7434	0.6264	0.7221
								Wine	0.9400	0.9262	0.9438	0.9362	0.9015	0.7060	0.8600
Processed.cleveland	0.3635	0.3467	0.3168	0.3373	0.3191	0.2390	0.4421
								Diabetes-data-upload	0.9771	0.9540	0.9674	0.9643	0.9469	0.9452	0.9774
Trial	1	1	1	1	1	0.8118	1
								Hepatitis	0.5930	0.6588	0.6238	0.6158	0.5417	0.4433	0.7006
Brands	0.7099	0.7514	0.7144	0.7111	0.6026	0.5715	0.7400
								ForestTypes	0.8426	0.7483	0.8333	0.8331	0.8371	0.5913	0.7500
Wdbc	0.6387	0.6704	0.6525	0.7192	0.6871	0.7033	0.7072
								Lung-cancer	0.9351	0.9264	0.9371	0.6458	0.9215	0.8801	0.9375
Wpbc	0.8679	0.8762	0.8484	0.8417	0.8679	0.8525	0.8853
								Ionosphere	0.3738	0.3733	0.2617	0.3375	0.5428	0.4534	0.4738
Qsak	0.8627	0.8366	0.8486	0.8640	0.7822	0.7173	0.8202
								Sonar	0.7160	0.7332	0.7714	0.7653	0.7337	0.7104	0.7370
Urban land cover	0.7523	0.7618	0.7219	0.7238	0.688	0.5086	0.7860
								Musk1	0.803	0.7401	0.7666	0.7864	0.7691	0.8076	0.7319
Clean 1	0.7867	0.7383	0.7782	0.7959	0.7921	0.7746	0.7352
								Average	0.7972	0.7793	0.7839	0.7810	0.7859	0.7283	0.7974

5.3 Recall/recall

Tables 8 and 9 show the average precision of each feature selection method under KNN and DT classifiers. As can be seen from table 8, the 10 data sets of the proposed method RBEFF under the KNN classifier achieved the highest accuracy than the other algorithms. It can also be seen from table 9 that the 9 data sets under the KNN classifier of the proposed method RBEFF achieved the highest accuracy than the other algorithms. Thus, it can be concluded that the method we propose performs better on half of the data set. The last row of tables 8 and 9 gives the total average precision of the 26 data sets using KNN and DT classifiers on different algorithms. It can be seen that our algorithm reaches the highest average recall, whether a KNN or DT classifier is used. Thus, it can be concluded that our algorithm RBEFF also achieved excellent performance in recall.

Table 8 KNN average recall of datasets under classifier (darkened values indicate best results among several methods)

Dataset	Baseline	ReliefF	MI	LS	SRCFS	FJMI	RBEFF
								Data.user	0.8480	0.8405	0.8426	0.9238	0.9200	0.8579	0.8448
Cryotherapy	0.7679	0.8118	0.7475	0.7593	0.8244	0.7281	0.8116
								Ecoli	0.9703	0.9563	0.9490	0.8734	0.9773	0.9729	0.9726
Seed1	0.8817	0.8958	0.8849	0.878	0.8806	0.8251	0.8958
								Pima	0.8344	0.8357	0.8435	0.7743	0.6756	0.6783	0.8238
Biopsy	0.9650	0.9586	0.9595	0.9223	0.9400	0.9252	0.9494
								Glass	0.8298	0.8277	0.8609	0.8254	0.7831	0.7682	0.8328
Breast-cancer	0.5474	0.9616	0.9628	0.5478	0.5386	0.5437	0.9775
								Heart-failure	0.2060	0.6631	0.2000	0.2018	0.7033	0.7962	0.6567
Accent-mfcc	0.8657	0.642	0.8277	0.7489	0.5386	0.6263	0.7541
								Wine	0.6762	0.907	0.6726	0.6784	0.686	0.6371	0.8700
Processed.cleveland	0.2600	0.3551	0.2524	0.2508	0.2483	0.2879	0.4041
								Diabetes-data-upload	0.8472	0.8924	0.8624	0.8497	0.8851	0.9371	0.9297
Trial	0.8900	1	0.9556	0.8905	0.9012	0.8366	1
								Hepatitis	0.7337	0.9167	0.7382	0.7504	0.5915	0.5430	0.9416
Brands	0.7985	0.8186	0.7961	0.7977	0.5732	0.5820	0.7612
								ForestTypes	0.8753	0.8227	0.8748	0.8759	0.8713	0.6502	0.7727
Wdbc	0.6833	0.8021	0.7083	0.7683	0.7071	0.7022	0.8124
								Lung-cancer	0.9625	0.8856	0.9563	0.7339	0.9228	0.8945	0.9355
Wpbc	0.7887	0.8028	0.7649	0.798	0.7905	0.8281	0.8312
								Ionosphere	0.8424	0.8764	0.8584	0.8433	0.8015	0.6742	0.8315
Sonar	0.7359	0.7941	0.7445	0.683	0.7619	0.7497	0.7000
								Urban land cover	0.3481	0.7569	0.3419	0.3484	0.4465	0.3089	0.7751
Musk1	0.9326	0.8754	0.902	0.8567	0.8481	0.8167	0.9222
								Clean 1	0.9259	0.8738	0.8801	0.853	0.8477	0.8228	0.9115
Average	0.76066	0.8309	0.7755	0.7373	0.7466	0.7197	0.8367

Table 9 DT average recall for datasets under classifier (darkened values indicate best results among several methods)

From the above results we made a radar map of the average accuracy, recall and accuracy of the 26 datasets at the DT and KNN classifiers. As can be seen from fig. 1, RBEFF has better performance than the comparison method from the three indices of different classifiers. Thus, we conclude that our algorithm is competitive and advantageous in most cases from different indices of different classifiers.

Herein, we evaluate the stability dataset of the proposed algorithm from a box plot of average accuracy over the dataset that performed well using RBEFF method. The classifier uses 10 fold cross validation and runs 10 times on the proposed and comparative algorithm to draw each box plot from which the stability of the algorithm can be reflected. The minimum, maximum, median, first quartile and third quartile in the 10-degree classification accuracy can be seen in this box plot. The line in this box plot represents the median value. Fig. 2 and 3 show box plots of the average accuracy of KNN and DT classifiers over different data sets, which can be intuitively compared for average performance using different methods. As shown in fig. 2 and 3, the RBEFF method presented has a higher box plot for most datasets than other comparison methods, and we present methods that are also more accurate than other methods. It can also be seen that the box height of the proposed algorithm is short in most data sets. Thus, we can conclude that the algorithm we propose achieves better stability.

5.5 Selection of the number of features

As shown in tables 10 and 11, it is the total average feature number selected over 26 data sets using different methods and the about-reduction ratio, which refers to the ratio of the reduced feature number to the total feature number after the feature selection method is used on the data sets. It can be seen that RBEFF selects the smallest number of features on the 14 data sets compared to the other algorithms, and good results are also obtained on the other data sets (second or third rank). It can also be seen from table 11 that our algorithm deletes half or more features over 24 data sets. It can be concluded that RBEFF is an effective way to solve the feature selection problem.

We have made bar and line graphs using different methods for the total average number and average reduction ratio of selected features for 26 datasets, as shown in fig. 4 and 5. As can be seen from fig. 4, RBEFF achieves a tower average size 13.0769, slightly above FJMI. However, this method has a higher average classification accuracy than FJMI. As is clear from fig. 5, RBEFF achieves the highest average reduction rate in the feature selection process as compared to other methods.

Table 10 number of features selected by different methods on different data sets

Table 11 reduction rates for different method features on different data sets

5.6Wilcoxon signed rank test

Finally, the significance difference of the algorithm REBFF presented herein from the comparison algorithm was verified by the Wilcoxon signed rank test. Table 12 shows the results of the significance test of the proposed algorithm and other comparison algorithms on the KNN classifier when the significance level α was set to 0.05. From the statistics (table 12), it can be seen that all p values are less than 0.05, which indicates that there is a significant difference between the proposed algorithm and the comparison algorithm, i.e. the proposed algorithm RBEFF is significantly better than the other comparison algorithms. The Wilcoxon signed rank test results demonstrate the effectiveness and superiority of RBEFF.

Table 12 Wlicoxon symbol rank test on KNN classifier for 26 dataset RBEFF method and comparative method ("yes" means that there is a significant difference between the performance of the two methods

Pairwise comparision	P-value	Significant difference
			RBEFF VS.Baseline	0.0092333	yes
RBEFF VS.FJMI	0.0000457	yes
			RBEFF VS.SRCFS	0.0048130	yes
RBEFF VS.LS	0.0007648	yes
			RBEFF VS.MI	0.0115009	yes
RBEFF VS.ReliefF	0.0004456	yes

6. Conclusion and future work

The method is used for mining the bottom information of a data set by randomly sampling a sample and searching k neighbors of the sample in different subspaces, and the weight of the feature is obtained by using distributed samples on the sample, so that the information contained in the sample is fully utilized. We use K-nearest neighbor and decision tree classifiers to verify the performance of our algorithm through 10-fold cross-validation. To investigate the reliability and quality of the results, we compared our algorithm with commonly used methods, such as Baseline, reliefF, LS, MI, FJMI and SRCFS algorithms, on 26 real datasets. The result shows that the algorithm has the highest classification precision under the condition of the least selected feature number or the same feature number compared with other methods. Stability analysis shows that our algorithm RBEFF has an advantage in terms of stability. Finally, wilcoxon signed rank test results demonstrated RBEFF effectiveness and superiority.

We can apply our algorithm to the ultra-high dimensional or noisy dataset and analyze the impact of different parameters (such as feature partition M and random subspace s) on experimental results, which can be a further extension of this work.

From this, it follows that the main contributions herein are summarized as follows:

(1) A RBEFF method is presented herein that fully considers the diversity of subspaces and the contribution information of samples to features during feature selection.

(2) The feature selection framework of the RBEFF algorithm is able to adaptively select valid features.

(3) A number of comparisons and statistical analyses were performed, such as classification accuracy, recall, accuracy, bar graph, block diagram, and Wilcoxon signed rank test.

(4) RBEFF in the feature selection process, the least feature number is selected on most of the 26 data sets, and the average reduction rate is higher than that of other comparison methods.

(5) The RBEFF algorithm is compared with other algorithms on 26 data sets with different scales, and experimental results show that the RBEFF algorithm has the highest average accuracy, recall rate and accuracy on the KNN and DT classifiers at the same time. It can be concluded that RBEFF has advantages and effectiveness in solving the feature selection problem.

The specific embodiment is as follows:

In the first embodiment, when the signal is identified, part of the characteristics of the signal are severely polluted by noise, the characteristics are evaluated by adopting ReliefF algorithm to evaluate the classification capability, the characteristics with stronger classification capability in the signal are extracted, then the redundant characteristics with similar classification capability are removed by adopting characteristic similarity algorithm, and then the redundant characteristics with similar classification capability are removed by adopting characteristic similarity algorithm. The remaining strong classifier features consist of feature vectors and are classified. Simulation experiment results show that the method can obtain higher correct recognition rate by using fewer features. In transmission fault location, a set of candidate features is extracted from single voltage measurement data by using an RReliefF algorithm, and a regression cancellation algorithm is used to select useful and high-line candidate features from the candidate features to accurately acquire an estimate of a fault location, which is not affected by a current signal measurement error during fault location and does not face problems and costs related to transmitting and synchronizing two lines. The ReliefF algorithm is widely used for feature selection of microarray gene expression data and processing of image data (image classification management, etc.) as one of the most successful feature filtering algorithms in the field of machine learning. The signals may be sound, communication signals, radar signals, etc.

In the second embodiment, in image processing and classification, the image data has the characteristics of less samples and high dimensionality, and the feature extraction and selection of the image are the most effective methods, so that a feature subspace with higher response data essential structure and recognition rate is obtained. The RBEFF algorithm performs the mining of the underlying structural information for a number of different random subspaces of the high-dimensional image data, extracts the most efficient information features describing the image, and thus accurately classifies the image. The RBEFF algorithm can effectively and accurately improve the useful characteristics and the classification accuracy of the image. Similarly, in gene classification, the RBEFF can automatically select a small number of disease-related features (genes) and perform effective decision judgment (classification) on gene data in the face of high-dimensional gene expression data with a small sample.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the spirit and scope of the invention as defined by the appended claims and their equivalents.

Claims

1. A ReliefF feature selection method for random multi-subspaces for high-dimensional image big data, comprising:

S1, dividing an original feature space to generate a plurality of feature partitions comprising a plurality of random subspaces with the same size and without intersecting, wherein the S1 comprises:

representing the feature partition as:

,

Wherein, the Represent the firstA feature partition;

Representation of The first of (3)A plurality of random subspaces;

Representation of The number of random subspaces;

Each time randomly sampling as each feature partition is generated [ The number of features is not repeated until all features have been sampled and when the remaining features are insufficient [When the residual features form a random subspace, wherein [ [Represented by ]The value of the rounding off is truncated,Representing the total number of features; Is a super parameter;

Will be The first of (3)The individual random subspaces can be marked as:

,

Wherein the method comprises the steps of Representing random subspacesThe 1 st feature of (2);

Representing random subspaces The 2 nd feature of (a);

Representing random subspaces Middle (f)A plurality of features;

Representing random subspaces The number of the middle characteristics;

,

Wherein, the Representing the generated 1 st feature partition;

Representing the generated 2 nd feature partition;

Representation generation of the first A feature partition;

M represents the number of feature partitions;

s2, obtaining local weights of the features in each random subspace by using ReliefF algorithm, and then combining the local weight vectors of the random subspaces in each feature partition to obtain a full weight vector;

s3, integrating the full weight vectors of the feature partitions into the final weight vector of each feature, namely obtaining the final weight vector of all the features by averaging the weight scores of the feature partitions, if the feature weights in the final weight vector are greater than the average value The feature is selected as a subset of the selected features.

2. The method of ReliefF feature selection for random multispaces on high dimensional image data of claim 1, wherein the feature partition comprisesEach random subspace having the same number [If d cannot integrate s, the remainder of the features are self-contained into a random subspace.

3. The method of ReliefF feature selection for random multi-subspaces of high dimensional image data of claim 1, wherein said using ReliefF algorithm in each random subspace in S2 to obtain local weights for features comprises:

For random subspace Features of (a)Calculating local feature weights of the features using ReliefF algorithm;

then, the subspace is random The local weight vector of all features in (a) can be expressed as:

,

Wherein the method comprises the steps of Representing random subspacesLocal weights of the 1 st feature in (a);

Representing random subspaces Local weights of the 2 nd feature in (a);

Representing random subspaces Middle (f)Local weights of the individual features;

Representing random subspaces The number of the middle characteristics;

representing the transpose of the matrix.

4. A ReliefF feature selection method for random multiple subspaces of high dimensional image data as claimed in claim 3, wherein said ReliefF algorithm is in random subspacesObtained features inThe weight formula of (2) is:

,

Wherein, the Representing random subspacesMiddle featureBy local weights of (2)Sub-recursively calculating its value;

Representing random subspaces Middle featureBy local weights of (2)Sub-recursively calculating its value;

Expressed in a random subspace Middle (f)A plurality of features;

Expressed in a random subspace Middle featureLower sampleAndDifferences between;

Representing the slave and sample Selecting samples from samples having the same classIs the first of (2)Nearest neighbors;

Representing the slave and sample Selecting samples from samples having different categoriesIs the first of (2)Nearest neighbors;

Representing the sampling times;

representing the total number of nearest neighbors;

Representation and sample A set of classes that are different in class;

Representing a collection ;

Representing a sampleIs a category of (2);

representing the probability that a given class occupies all classes.

5. The method of ReliefF feature selection for random multi-subspaces of high dimensional image big data of claim 1, wherein said full weight vector in S2 comprises:

partitioning by connection features Local weight vectors of the subspaces are obtainedIs expressed as:

,

Representation of Is defined by a local weight vector of (a);

representing the 1 st feature in the original feature space;

Representation of Is defined by a local weight vector of (a);

Representing the 2 nd feature in the original feature space;

Representation of Is defined by a local weight vector of (a);

Representing the first in the original feature space A plurality of features;

representing the transpose of the matrix.

6. The method of ReliefF feature selection for random multi-subspaces of high dimensional image data of claim 1, wherein S3 comprises:

the final weight vector is expressed as:

,

Wherein the method comprises the steps of Representing the final weight vector;

Representing feature partitions Is a full weight vector of (a);

Represent the first A feature partition;

representing the number of feature partitions;

then calculate the final weight vector Average value of (2)。