Disclosure of Invention
The invention aims to provide a non-invasive load cluster classification method and a non-invasive load cluster classification system, so as to classify load clusters with the same operation characteristics and unified electricity utilization rule.
To achieve the above object, the present invention provides a non-invasive load cluster classification method, the method comprising:
constructing a test set T and a training set S based on an electricity utilization data set D corresponding to a load cluster, wherein the training set S comprises M training subsets, and the test set T comprises M test subsets, wherein M is a positive integer greater than 1;
Determining a double-layer Gaussian process mixed model corresponding to each training subset, wherein parameters of the double-layer Gaussian process mixed model are to be solved;
Solving parameters of a double-layer Gaussian process mixed model corresponding to each training subset;
Solving an average value of the M groups of parameters obtained by solving, and substituting the average value into a double-layer Gaussian process mixed model to obtain the double-layer Gaussian process mixed average model;
substituting each cluster in each test subset into the double-layer Gaussian process mixed average model for calculation, and outputting probability values of each cluster in each test subset;
And taking the cluster category corresponding to the maximum value of the output probability of each cluster in each test subset as the classification result corresponding to each test subset.
Optionally, the determining the two-layer gaussian process mixing model corresponding to each training subset specifically includes:
Determining the total class number K of clusters contained in each training subset;
and determining a double-layer Gaussian process mixing model corresponding to each training subset based on the input feature sets of all clusters in each training subset.
Optionally, the determining the total class number K of the clusters included in each training subset specifically includes:
Step S211, determining load cluster characteristics based on characteristics corresponding to various clusters;
Step S212, judging whether clusters meeting the load cluster characteristics exist in the training subset S l, if so, executing a step S214, and if not, rejecting the training subset S l, enabling l=l+1, and executing a step S213;
step S213, judging whether l is larger than M, if l is larger than M, ending, and if l is smaller than or equal to M, returning to step S212;
Step S214, determining the total class number K of the clusters and the input feature set of each cluster in each training subset according to the load cluster features.
Optionally, the determining the two-layer gaussian process mixing model corresponding to each training subset based on the input feature set of each cluster in each training subset specifically includes:
s221, constructing a Gaussian process model corresponding to each input feature based on each input feature in an ith input feature set, wherein the input feature set comprises p input features;
step S222, constructing a Gaussian process mixture model corresponding to an ith cluster according to the Gaussian process models corresponding to p input features, wherein the Gaussian process mixture model is a lower model;
Step S223, judging whether i is smaller than K, if i is smaller than K, letting i=i+1, and executing step S221, if i is larger than or equal to K, constructing an upper model according to a Gaussian process mixture model corresponding to the K-class cluster;
and step 224, determining a double-layer Gaussian process mixing model based on the upper-layer model and the lower-layer model corresponding to various clusters.
Optionally, the load cluster comprises at least one of a computer cluster, an air conditioner cluster, a water heater cluster and an electric automobile cluster.
The invention also provides a non-invasive load cluster classification system, comprising:
The system comprises a data set construction module, a load cluster and a training set, wherein the data set construction module is used for constructing a test set T and the training set S based on an electricity data set D corresponding to the load cluster, and the test set T comprises M test subsets, wherein M is a positive integer greater than 1;
The double-layer Gaussian process mixed model determining module is used for determining a double-layer Gaussian process mixed model corresponding to each training subset, wherein parameters of the double-layer Gaussian process mixed model are to be solved;
The parameter solving module is used for solving parameters of the double-layer Gaussian process mixed model corresponding to each training subset;
the double-layer Gaussian process mixed average model determining module is used for solving M groups of parameters obtained by solving to obtain a mean value, and substituting the mean value into the double-layer Gaussian process mixed model to obtain the double-layer Gaussian process mixed average model;
the probability value calculation module is used for substituting various clusters in each test subset into the double-layer Gaussian process mixed average model to calculate and outputting probability values of various clusters in each test subset;
and the classification result determining module is used for taking the cluster category corresponding to the maximum value of the output probability of each cluster in each test subset as the classification result corresponding to each test subset.
Optionally, the dual-layer gaussian process mixing model determining module specifically includes:
the total class number determining unit is used for determining the total class number K of the clusters contained in each training subset;
and the double-layer Gaussian process mixed model determining unit is used for determining the double-layer Gaussian process mixed model corresponding to each training subset based on the input characteristic sets of all clusters in each training subset.
Optionally, the total class number determining unit specifically includes:
The load cluster feature determining subunit is used for determining load cluster features based on features corresponding to various clusters;
A first judging subunit, configured to judge whether a cluster satisfying the load cluster feature exists in the training subset S l, execute the "total class number and input feature set determining subunit" if a cluster satisfying the load cluster feature exists, reject the training subset S l if a cluster satisfying the load cluster feature does not exist, and let l=l+1, and execute the "second judging subunit";
the second judging subunit is used for judging whether l is larger than M, ending if l is larger than M, and returning to the first judging subunit if l is smaller than or equal to M;
and the total class number and input feature set determining subunit is used for determining the total class number K of the clusters and the input feature set of each cluster contained in each training subset according to the load cluster features.
Optionally, the dual-layer gaussian process mixing model determining unit specifically includes:
the Gaussian process model construction subunit is used for constructing a Gaussian process model corresponding to each input feature based on each input feature in the ith input feature set, wherein the input feature set comprises p input features;
The lower layer model construction subunit is used for constructing a Gaussian process mixed model corresponding to the ith class cluster according to the Gaussian process models corresponding to the p input features, wherein the Gaussian process mixed model is a lower layer model;
The third judging subunit is used for judging whether i is smaller than K, if i is smaller than K, making i=i+1, and executing a Gaussian process model construction subunit;
And the double-layer Gaussian process mixing model determining subunit is used for determining a double-layer Gaussian process mixing model based on the upper-layer model and the lower-layer model corresponding to various clusters.
Optionally, the load cluster comprises at least one of a computer cluster, an air conditioner cluster, a water heater cluster and an electric automobile cluster.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
According to the scheme, the load with the same operation characteristics and the unified electricity utilization rule on the user side is modeled into one cluster, the probability value of the mixed average model belonging to the double-layer Gaussian process is utilized to judge the category of the unknown cluster in the test subset, so that the classification of the cluster is effectively realized, guidance is provided for the user to improve the electricity utilization efficiency, the method has important significance for safe, reliable, stable and economic operation of an electric power system, and the method is favorable for promoting the realization of comprehensive electricity utilization intellectualization.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention aims to provide a non-invasive load cluster classification method and a non-invasive load cluster classification system, so as to classify load clusters with the same operation characteristics and unified electricity utilization rule.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
Example 1
As shown in fig. 1, the present invention discloses a non-invasive load cluster classification method, which includes:
The method comprises the step S1 of constructing a test set T and a training set S based on an electricity utilization data set D corresponding to a load cluster, wherein the training set S comprises M training subsets, the test set T comprises M test subsets, and M is a positive integer greater than 1.
And S2, determining a double-layer Gaussian process mixed model corresponding to each training subset, wherein parameters of the double-layer Gaussian process mixed model are to be solved.
Step S3, solving parameters of the double-layer Gaussian process mixed model corresponding to each training subset
And S4, calculating the mean value of the M groups of parameters obtained by solving, and substituting the mean value into the double-layer Gaussian process mixed model to obtain the double-layer Gaussian process mixed average model.
And S5, substituting each cluster in each test subset into the double-layer Gaussian process mixed average model for calculation, and outputting probability values of each cluster in each test subset.
And S6, taking the cluster category corresponding to the maximum value of the output probability of each cluster in each test subset as the classification result corresponding to each test subset.
The steps are discussed in detail below:
Step S1, a test set T and a training set S are constructed based on an electricity utilization data set D corresponding to a load cluster, and the method specifically comprises the following steps:
And step S11, collecting electricity utilization data at the power inlet of the user by using a non-invasive collecting device to construct an electricity utilization data set D, and specifically, collecting total current data or total power data at the power inlet of the user by using the non-invasive collecting device in a high-frequency collecting mode.
The method comprises the steps of S12, dividing an electricity data set D into a test set T and a training set S by adopting an M-fold cross validation method, specifically dividing the electricity data set D into M subsets, namely D= [ D 1,D2,…,DM ], taking a first subset D l (l=1, 2, the..M) as the test subset T l each time, taking the remaining M-l subsets as the training subset S l, constructing the test set T on the basis of M groups of test subsets T l, namely T= [ T 1,T2,…,TM ], and constructing the training set S on the basis of M groups of training subsets S l, namely S= [ S 1,S2,…,SM ]. In this embodiment, M is selected arbitrarily according to actual requirements, and preferably, M is 10.
Step S2, determining a double-layer Gaussian process mixing model corresponding to each training subset, wherein the method specifically comprises the following steps:
step S21, determining total class number K of clusters contained in each training subset, wherein the step S21 specifically comprises the following steps:
Step S211, determining load cluster characteristics based on characteristics corresponding to various clusters, wherein the characteristics corresponding to various clusters comprise working characteristics, operation characteristics, random characteristics and the like. The load cluster comprises at least one of a computer cluster, an air conditioner cluster, a water heater cluster and an electric automobile cluster.
Step S212, judging whether a cluster meeting the load cluster characteristics exists in the training subset S l, if so, executing a step S214, and if not, rejecting the training subset S l, and letting l=l+1, and executing a step S213.
Step S213, judging whether l is larger than M, if l is larger than M, ending, and if l is smaller than or equal to M, returning to step S212.
Step S214, determining the total class number K of the clusters and the input feature set of each cluster in each training subset according to the load cluster features, wherein the input feature set comprises p input features.
Step S22, determining a double-layer Gaussian process mixing model corresponding to each training subset based on input feature sets of various clusters in each training subset, wherein the method specifically comprises the following steps:
step S221, constructing a Gaussian process model corresponding to each input feature based on each input feature in the ith input feature set, wherein the specific formula is as follows:
yig(x)~GPFR(xg;big,θig);
where x g represents the g-th input feature in the input feature set x 'k of the i-th class of cluster, g=1,..p, x' k=[x1,x2,…,xp],big,θig represents the mean and standard deviation of the g-th input feature in the input feature set x 'k of the i-th class cluster, GPFR () represents a gaussian process function and y ig (x) represents a gaussian process model corresponding to the g-th input feature in the input feature set x' k of the i-th cluster.
Step S222, constructing a Gaussian process mixed model corresponding to the ith class of clusters according to the Gaussian process models corresponding to the p input features, namely a lower model corresponding to the ith class of clusters.
Step S223, judging whether i is smaller than K, if i is smaller than K, letting i=i+1, executing step S221, and if i is larger than or equal to K, constructing an upper layer model according to the Gaussian process mixture model corresponding to the K-class cluster.
Step 224, determining a double-layer gaussian process mixing model based on the upper layer model and the lower layer model corresponding to various clusters, which specifically comprises the following steps:
defining an indicator variable (i.e., hidden variable) in the upper model Describing the relevance of the s-th two-layer Gaussian process mixture model relative to the underlying Gaussian process model, if the i-th cluster belongs to the s-th Gaussian process mixture model, then there isOtherwiseAt the same timeObeys the following probability distribution:
Wherein, Pi s is an indicator variable in the upper modelProbability of time.
In the lower layer model, the relevance of the s-th two-layer Gaussian process mixture model relative to the lower layer Gaussian process model can use another hidden variableDescribing that if the g input feature of the i-th class cluster belongs to the q-th Gaussian process model, and the i-th class load cluster belongs to the s-th double-layer Gaussian process mixed model, the method comprises the following steps ofOtherwiseAt the same timeObeys the following probability distribution:
Wherein, Eta q|s is an indicator variable in the lower modelProbability at time, n is the total number of gaussian process models.
Is provided withAnd assuming that the input variables obey gaussian distribution at each group, namely:
Where x ig is the input variable, h qs, Representing mean and covariance, respectively.
The specific formula for defining the double-layer Gaussian process mixture model is as follows:
Θ={(πs,ηq|s,Θqs)|s=1,...,K;q=1,...,n}
Wherein Θ qs={hqs,sqs,bqs,θqs},hqs,sqs is the mean and standard deviation of the upper model, b qs,θqs is the mean and standard deviation of the lower model, η q|s is the indicated variable in the lower model Probability at time, pi s is the indicator variable in the upper modelThe probability of time, n is the total number of Gaussian process models, and K is the total number of clusters contained in the load clusters.
Step S3, solving parameters of the double-layer Gaussian process mixed model corresponding to each training subsetWherein, H qs,sqs is the mean and standard deviation of the upper model, b qs,θqs is the mean and standard deviation of the lower model, η q|s is the indicated variable in the lower modelProbability at time, pi s is the indicator variable in the upper modelProbability of time.
The step S3 specifically comprises the following steps:
step S31, initializing parameters of the double-layer Gaussian process mixture model
Step S32, adopting a modified MCMC algorithm according to the following stepsAnd calculating all hidden variables Z in the upper model and all hidden variables A in the lower model.
And step S33, establishing likelihood functions of the load cluster double-layer Gaussian process mixed model according to all hidden variables Z in the upper-layer model and all hidden variables A in the lower-layer model.
Step S34, establishing the expectation function of the conditional probability distribution of the unknown data according to the likelihood functionThe following formula is calculated:
Wherein I represents the number of hidden variables Z, Z (i) represents the ith MCMC sample of Z, D represents a sample of the sample set, E A [ ] represents the desired computation function, L (Θ, Z (I), A) represents the likelihood function, A desired function representing a conditional probability distribution of unknown data.
Step S35 based on maximizationAnd update
Step S36, judging whether the convergence condition is met, if so, outputting parameters of the double-layer Gaussian process mixed model corresponding to the M groups of training subsetsIf the convergence condition is not satisfied, return to "step S32". The convergence condition in this embodiment includes but is not limited toNo further changes occur.
S5, substituting each cluster in each test subset into the double-layer Gaussian process mixed average model for calculation, and outputting probability values of each cluster in each test subset, wherein a specific calculation formula is as follows:
Wherein p (z|x) represents the probability that the z-th cluster in the test subset T l belongs to the double-layer gaussian process mixed model x, p (x) represents the probability that the x-th cluster in the double-layer gaussian process mixed average model appears, and p (zx) represents the probability that the z-th cluster in the test subset T l exists in the x-th cluster in the double-layer gaussian process mixed average model.
Example 2
As shown in fig. 2-3, the present invention discloses a non-invasive load cluster classification system, the system comprising:
The data set construction module 201 is configured to construct a test set T and a training set S based on an electricity data set D corresponding to the load cluster, where the training set S includes M training subsets, and the test set T includes M test subsets, where M is a positive integer greater than 1.
The two-layer gaussian process mixing model determining module 202 is configured to determine two-layer gaussian process mixing models corresponding to the training subsets, where parameters of the two-layer gaussian process mixing models are to be solved.
And the parameter solving module 203 is configured to solve parameters of the two-layer gaussian process mixing model corresponding to each training subset.
The two-layer gaussian process mixed average model determining module 204 is configured to determine an average value of M groups of parameters obtained by the solution, and obtain a two-layer gaussian process mixed average model by substituting the average value into the two-layer gaussian process mixed model.
The probability value calculation module 205 is configured to substitute each cluster in each test subset into the double-layer gaussian process hybrid average model for calculation, and output probability values of each cluster in each test subset.
The classification result determining module 206 is configured to use the cluster category corresponding to the maximum value of the output probabilities of the clusters in each test subset as the classification result corresponding to each test subset.
As an optional implementation manner, the dual-layer gaussian process mixing model determining module 202 of the present invention specifically includes:
and the total class number determining unit is used for determining the total class number K of the clusters contained in each training subset.
And the double-layer Gaussian process mixed model determining unit is used for determining the double-layer Gaussian process mixed model corresponding to each training subset based on the input characteristic sets of all clusters in each training subset.
As an optional implementation manner, the total class number determining unit of the present invention specifically includes:
And the load cluster characteristic determining subunit is used for determining the load cluster characteristic based on the characteristics corresponding to various clusters.
A first judging subunit, configured to judge whether a cluster satisfying the load cluster feature exists in the training subset S l, execute the "total class number and input feature set determining subunit" if a cluster satisfying the load cluster feature exists, reject the training subset S l if a cluster satisfying the load cluster feature does not exist, and let l=l+1, and execute the "second judging subunit".
And the second judging subunit is used for judging whether l is larger than M, ending if l is larger than M, and returning to the first judging subunit if l is smaller than or equal to M.
And the total class number and input feature set determining subunit is used for determining the total class number K of the clusters and the input feature set of each cluster contained in each training subset according to the load cluster features.
As an optional implementation manner, the dual-layer gaussian process mixing model determining unit of the present invention specifically includes:
and the Gaussian process model construction subunit is used for constructing a Gaussian process model corresponding to each input feature based on each input feature in the ith input feature set, wherein the input feature set comprises p input features.
The lower layer model construction subunit is used for constructing a Gaussian process mixed model corresponding to the ith class of clusters according to the Gaussian process models corresponding to the p input features, wherein the Gaussian process mixed model is a lower layer model.
And the third judging subunit is used for judging whether i is smaller than K, if i is smaller than K, letting i=i+1, executing a Gaussian process model construction subunit, and if i is larger than or equal to K, constructing an upper layer model according to a Gaussian process mixed model corresponding to the K-class cluster.
And the double-layer Gaussian process mixing model determining subunit is used for determining a double-layer Gaussian process mixing model based on the upper-layer model and the lower-layer model corresponding to various clusters.
Example 3
A non-invasive acquisition device is arranged at an electric inlet of a small office, voltage waveform and current waveform signal data are acquired, the sampling frequency is 5000Hz, and the acquisition duration is one week.
The method specifically comprises the following steps:
1) And a non-invasive acquisition device is arranged at the electric inlet of the small office, voltage waveform and current waveform signal data are acquired, and as shown in fig. 4, the electric data are divided into a test set and a training set by adopting a ten-fold cross validation method.
2) And detecting whether clusters meeting the load cluster characteristics exist in the training subset S l, and determining an input characteristic x and an output characteristic y for classifying the load clusters.
3) And establishing a Gaussian process mixture model for the ith cluster (such as a computer cluster) in the training set S l to form a lower model of the double-layer Gaussian process mixture model.
4) And establishing K Gaussian process mixed models for K clusters (such as a computer cluster and an air conditioner cluster) to form an upper model of the double-layer Gaussian process mixed model.
5) And respectively defining the indicating variables (hidden variables) of an upper layer model and a lower layer model in the load cluster double-layer Gaussian process mixed model to form the load cluster double-layer Gaussian process mixed model.
6) Solving the parameters of the load cluster double-layer Gaussian process mixed model.
7) Repeating the steps 2) to 6) to obtain 10 groups of parameters of the load cluster double-layer Gaussian process mixed model, and obtaining an average value to finally determine parameter values.
8) And calculating the probability that various load clusters in the test set T l are affiliated to a double-layer Gaussian process mixed average model, judging the class of the clusters according to the maximum value of the probability, wherein the current waveforms of the computer clusters and the air conditioner clusters are shown in fig. 5 and 6 respectively.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The principles and embodiments of the present invention have been described herein with reference to specific examples, which are intended to facilitate an understanding of the principles and concepts of the invention and are to be varied in scope and detail by persons of ordinary skill in the art based on the teachings herein. In view of the foregoing, this description should not be construed as limiting the invention.