CN118839176B

CN118839176B - A method for optimizing the collection and analysis of gas cabinet sensor data

Info

Publication number: CN118839176B
Application number: CN202411331092.1A
Authority: CN
Inventors: 郑乒乒; 谭翔友; 刘兵海
Original assignee: Dechuan Electric Co ltd
Current assignee: Dechuan Electric Co ltd
Priority date: 2024-09-24
Filing date: 2024-09-24
Publication date: 2025-01-21
Anticipated expiration: 2044-09-24
Also published as: CN118839176A

Abstract

The application relates to the technical field of digital data processing and provides a gas cabinet sensor data optimization acquisition analysis method which comprises the steps of confirming dimension data corresponding to at least two dimensions according to time sequence data corresponding to smell data, confirming importance degrees corresponding to important data segments of each dimension data based on amplitude and slope of each dimension data, clustering the importance degrees corresponding to the important data segments, confirming discrete degrees of the important data segments of each dimension data in corresponding clustering clusters, confirming convergence degrees corresponding to the important data segments of each dimension data according to the importance degrees and the discrete degrees corresponding to the important data segments of each dimension data, and carrying out dimension reduction processing on the time sequence data corresponding to the smell data based on the convergence degrees and a preset dimension reduction algorithm so as to confirm smell types corresponding to the smell data. The odor data are restrained through the convergence degree, so that the dimension reduction effect is improved, and the accuracy of judging the odor type is improved.

Description

Gas cabinet sensor data optimization acquisition analysis method

Technical Field

The application relates to the technical field of digital data processing, in particular to a gas cabinet sensor data optimization acquisition and analysis method.

Background

In modern industry and environmental monitoring, gas cabinet sensors are used for monitoring various parameters such as gas concentration, temperature, humidity and the like in real time, and are widely applied to the fields of petrochemical industry, environmental protection, intelligent home furnishing and the like.

The data obtained when detecting the odor by the optical gas sensor is high-dimensional data, contains various types of odor data, and each type of odor data contains data of various dimensions, so that the obtained data needs to be reduced in dimension, so that the odor type contained in the cabinet is conveniently analyzed according to the obtained odor data. The dimension of the data obtained when the optical gas sensor detects the smell is too high, so that the dimension reduction effect of the subsequent data is poor, and the accuracy of smell type judgment is further affected.

Disclosure of Invention

In view of the above, it is necessary to provide a gas cabinet sensor data optimization acquisition analysis method, which improves the accuracy of odor type judgment and further reduces the working cost compared with the traditional gas cabinet odor type judgment method.

The first aspect of the application provides a gas cabinet sensor data optimization acquisition analysis method, which is applied to the field of odor data analysis, and comprises the following steps:

Acquiring time sequence data corresponding to smell data in the gas cabinet, and confirming dimension data corresponding to at least two dimensions according to the time sequence data;

Confirming the importance degree corresponding to the important data segment of each dimension data based on the amplitude and the slope of each dimension data, wherein the important data segment refers to a data segment formed by data points which are on two sides of the maximum amplitude point and are larger than the average amplitude value and correspond to each dimension data;

Clustering the importance degrees corresponding to the important data segments of at least two dimension data, and confirming the discrete degree of the important data segments of each dimension data in the corresponding clustering clusters;

According to the importance degree and the discrete degree corresponding to the important data segment of each dimension data, confirming the convergence degree corresponding to the important data segment of each dimension data;

and performing dimension reduction processing on the time sequence data corresponding to the smell data based on the convergence degree corresponding to the important data segment of each dimension data and a preset dimension reduction algorithm so as to confirm the smell type corresponding to the smell data.

In one embodiment, the determining the importance degree corresponding to the important data segment of each dimension data based on the amplitude and the slope of each dimension data, where the important data segment refers to a data segment composed of data points that are greater than the average amplitude and two sides of the amplitude maximum point corresponding to each dimension data specifically includes:

Counting the amplitude of each data point in each dimension data, and calculating the average amplitude of each dimension data;

And calculating the importance degree corresponding to the important data segment of each dimension data based on the amplitude value and the average amplitude value of the maximum value point of each dimension data and the slope corresponding to each data point in the important data segment.

In one embodiment, the calculating the importance degree corresponding to the important data segment of each dimension data based on the amplitude, the average amplitude and the slope corresponding to each data point in the important data segment of each dimension data specifically includes:

Wherein, Is the firstIndividual dimension data itemThe importance degree corresponding to the important data segment corresponding to the maximum value point,Is the firstItem of dimension dataThe magnitude of the individual maxima points,Is the firstThe average amplitude of the individual dimensional data,Is the firstThe number of data points in the important data segment corresponding to the maximum value points,Is the firstThe first important data segment corresponding to the maximum value pointSlope corresponding to data point.

In one embodiment, the clustering the importance degrees corresponding to the importance data segments of the at least two dimension data, and determining the discrete degree of the importance data segment of each dimension data in the corresponding cluster specifically includes:

Clustering importance degrees corresponding to important data segments of at least two dimension data, and confirming a preset number of clustering clusters;

according to the clustering clusters, counting the number and average amplitude of important data segments in the clustering clusters corresponding to the target important data segments and the clustering distance between the target important data segments and the clustering centers of the corresponding clustering clusters;

inputting the number and average amplitude of important data segments in the cluster corresponding to the target important data segments and the clustering distance between the target important data segments and the clustering center of the corresponding cluster into a discrete degree calculation formula, and calculating the discrete degree of the target important data segments in the corresponding cluster.

In one embodiment, the inputting the number and the average amplitude of the important data segments in the cluster corresponding to the target important data segments and the clustering distance between the target important data segments and the clustering center of the corresponding cluster into the discrete degree calculation formula, and calculating the discrete degree of the target important data segments in the corresponding cluster specifically includes:

Wherein, Is the firstTarget important data segment in each clusterTo a degree of dispersion of (a),Is the firstTarget important data segment in each clusterIs used for determining the importance of the product,Is the firstThe number of important data segments in the cluster,Important data segment for targetMiddle (f)The magnitude of the data points is such that,Is the firstThe average amplitude of the important data segments in the clusters,Important data segment for targetIs set for the number of data points of (a),Important data segment for targetAnd the firstThe cluster distance of the cluster centers of the individual clusters,To take the following measuresAs a bottom exponential function.

In one embodiment, the determining the convergence degree corresponding to the important data segment of each dimension data according to the importance degree and the discrete degree corresponding to the important data segment of each dimension data specifically includes:

Counting the sum of the amplitudes of the data points in the important data segment of the target and the sum of the amplitudes of the data points in the dimension data corresponding to the important data segment of the target;

calculating constraint weight corresponding to the target important data segment based on the importance degree and the discrete degree corresponding to the target important data segment, the sum of the amplitudes of the data points in the target important data segment and the sum of the amplitudes of the data points in the dimension data corresponding to the target important data segment;

and inputting the constraint weight corresponding to the target important data segment and the amplitude mean value of the dimension data corresponding to the target important data segment into a convergence degree calculation formula, and calculating the convergence degree corresponding to the target important data segment.

In one embodiment, the calculating the constraint weight corresponding to the target important data segment based on the importance degree, the discrete degree corresponding to the target important data segment, the sum of the magnitudes of the data points in the target important data segment, and the sum of the magnitudes of the data points in the dimension data corresponding to the target important data segment specifically includes:

Wherein, Is the firstTarget important data segment in each clusterIs used as a constraint weight of the (c),Is the firstTarget important data segment in each clusterTo a degree of dispersion of (a),Is the firstTarget important data segment in each clusterIs used for determining the importance of the product,Is the firstTarget important data segment of individual dimension dataFirst, theThe magnitude of the data points is such that,Important data segment for targetIs set for the number of data points of (a),Important data segment for targetCorresponding firstIndividual dimension data itemThe magnitude of the data points is such that,Important data segment for targetCorresponding firstNumber of data points for each dimension data.

In one embodiment, inputting the constraint weight corresponding to the target important data segment and the average value of the magnitudes of the dimension data corresponding to the target important data segment into a convergence degree calculation formula, and calculating the convergence degree corresponding to the target important data segment specifically includes:

Wherein, Is the firstTarget important data segment in each clusterIs used to determine the degree of convergence of the lens,Is the firstTarget important data segment in each clusterIs used as a constraint weight of the (c),Important data segment for targetCorresponding firstThe magnitude mean of the individual dimension data.

In one embodiment, the performing the dimension reduction processing on the time series data corresponding to the smell data based on the convergence degree corresponding to the important data segment of each dimension data and a preset dimension reduction algorithm to confirm the smell type corresponding to the smell data specifically includes:

Based on the convergence degree corresponding to the important data segment of each dimension data, carrying out convergence processing on the time sequence data, and confirming final time sequence data;

Performing dimension reduction processing on the final time sequence data through a preset dimension reduction algorithm, and confirming dimension reduction data;

And confirming the odor type corresponding to the odor data according to the dimension reduction data.

In one embodiment, the collecting the time sequence data corresponding to the smell data in the gas cabinet to confirm the dimension data corresponding to at least two dimensions according to the time sequence data specifically includes:

Collecting time sequence data corresponding to smell data in a gas cabinet, denoising the time sequence data according to a preset denoising algorithm, and confirming final time sequence data;

And confirming dimension data corresponding to at least two dimensions according to the final time sequence data.

According to the embodiment of the application, the time sequence data corresponding to the smell data in the gas cabinet are collected firstly, so that dimension data corresponding to at least two dimensions are confirmed according to the time sequence data, then, based on the amplitude and the slope of each dimension data, the importance degree corresponding to the importance data section of each dimension data is confirmed, wherein the importance data section refers to the data section formed by data points which are arranged at two sides of the amplitude maximum value point and are larger than the average amplitude and correspond to each dimension data, then, the importance degree corresponding to the importance data section of at least two dimension data is clustered, the discrete degree of the importance data section of each dimension data in the corresponding cluster is confirmed, further, the convergence degree corresponding to the importance data section of each dimension data is confirmed according to the importance degree and the discrete degree corresponding to the importance data section of each dimension data, finally, based on the convergence degree corresponding to the importance data section of each dimension data and a preset dimension reduction algorithm, the time sequence data corresponding to the smell data is subjected to dimension reduction processing, influence caused by irrelevant data is reduced, and the problem that the dimension reduction effect of the follow-up dimension data caused by overhigh dimension data obtained when an optical gas sensor detects gas can be avoided.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the application, and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.

FIG. 1 is a schematic flow chart of a gas cabinet sensor data optimization acquisition analysis method according to an embodiment of the application;

FIG. 2 is a schematic diagram of a first sub-process of a gas cabinet sensor data optimization acquisition and analysis method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a second sub-flow of a gas cabinet sensor data optimization acquisition analysis method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a third sub-flow of a gas cabinet sensor data optimization acquisition analysis method according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a fourth sub-process of a gas cabinet sensor data optimization acquisition analysis method according to an embodiment of the present application;

Fig. 6 is a schematic diagram of a fifth sub-flow chart of a gas cabinet sensor data optimization acquisition and analysis method according to an embodiment of the application.

Detailed Description

In describing embodiments of the present application, words such as "exemplary," "or," "such as," and the like are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary," "or," "such as," and the like are intended to present related concepts in a concrete fashion.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. It is to be understood that, unless otherwise indicated, a "/" means or. For example, A/B may represent A or B. The "and/or" in the present application is merely one association relationship describing the association object, indicating that three relationships may exist. For example, A and/or B may mean that A alone, both A and B, and B alone are present. "at least one" means one or more. "plurality" means two or more than two. For example, at least one of a, b or c may represent seven cases of a, b, c, a and b, a and c, b and c, a, b and c.

It should be further noted that the terms "first" and "second" in the description and drawings of the present application are used for distinguishing between similar objects and not for describing a particular sequential or chronological order. The method disclosed in the embodiments of the present application or the method shown in the flowchart, including one or more steps for implementing the method, may be performed in an order that the steps may be interchanged with each other, and some steps may be deleted without departing from the scope of the specification.

Referring to fig. 1, a flowchart of a method for optimizing acquisition and analysis of gas cabinet sensor data according to an embodiment of the application is shown, the method includes the following steps:

step S101, collecting time sequence data corresponding to smell data in a gas cabinet, and confirming dimension data corresponding to at least two dimensions according to the time sequence data.

The time sequence data corresponding to the smell data in the gas cabinet refers to a data sequence which is acquired by an optical gas sensor installed in the gas cabinet and is arranged according to time sequence, wherein each data point is associated with a specific time point or time period. The time series data is typically observations or measurements collected over a continuous time frame. The step of confirming the dimension data corresponding to at least two dimensions according to the time sequence data refers to expressing various data meanings in different dimensions based on a specific relation among data points in the time sequence data.

It should be noted that the data dimension is one of important indicators for measuring complexity and information amount of a data set. The higher the dimension, the higher the complexity of the dataset, while requiring more features to describe each sample. The size of the data dimension has a significant impact on the performance and effectiveness of data analysis and machine learning algorithms. High-dimensional data (high-dimensional) generally face the challenge of 1, dimension disasters, that as the data dimension increases, the sample space grows exponentially, resulting in increased data sparsity, and distances between samples become more distant, which can present challenges for both computation and storage of the algorithm. 2. Feature selection and dimension reduction-there may be redundant or uncorrelated features in the high-dimensional data, which can affect the performance of the algorithm. Therefore, feature selection or dimension reduction is needed, the number of features is reduced, and the efficiency and accuracy of the algorithm are improved. 3. Visualization is difficult-high-dimensional data cannot be visualized directly because we can usually only observe data in two-dimensional or three-dimensional space. Therefore, it is desirable to map high-dimensional data into low-dimensional space using dimension reduction techniques to facilitate visualization and understanding. Thus, the method is applicable to a variety of applications. In data analysis, selecting the appropriate data dimension is an important issue. Either too high or too low a dimension may result in loss or redundancy of information, so that the appropriate data dimension needs to be determined based on the particular problem and the characteristics of the data set. The scheme is a technical scheme for adjusting the bad dimension reduction effect of the high-dimension smell data caused by the excessively high dimension in the dimension reduction process, so as to improve the accuracy of smell type judgment.

Specifically, referring to fig. 2, the collecting the time sequence data corresponding to the smell data in the gas cabinet to confirm the dimension data corresponding to at least two dimensions according to the time sequence data specifically includes:

S201, collecting time sequence data corresponding to smell data in a gas cabinet, denoising the time sequence data according to a preset denoising algorithm, and confirming final time sequence data;

s202, confirming dimension data corresponding to at least two dimensions according to the final time sequence data.

After the time sequence data corresponding to the smell data in the collected gas cabinet are obtained, denoising processing is performed through a preset denoising algorithm to obtain final time sequence data, dimension analysis is performed on the final time sequence data, and dimension data corresponding to at least two dimensions are confirmed.

It should be noted that the preset denoising algorithm may be a median filtering algorithm, which is a commonly used nonlinear filtering algorithm for removing noise in an image or a signal. The principle is to replace the value of the current pixel with the median value in the local area, thereby smoothing the image and removing noise. The median filtering algorithm has the advantages that outliers such as salt and pepper noise, impulse noise and the like can be effectively removed, and details of data cannot be blurred. The specific denoising process is described with reference to the prior art, and the scheme is not further limited.

S102, confirming the importance degree corresponding to the important data segment of each dimension data based on the amplitude and the slope of each dimension data, wherein the important data segment is a data segment formed by data points which are positioned at two sides of the amplitude maximum value point corresponding to each dimension data and are larger than the average amplitude value.

The amplitude of the dimension data refers to the fluctuation or amplitude of the data points of the dimension data, and is used for describing the intensity or the size of the data. The slope of each dimension data refers to the rate of change or the speed of change at a certain data point in the dimension data. The slope may be used to describe the trend and speed of change of the data. The important data segment refers to a data segment formed by data points which are positioned at two sides of a maximum value point of an amplitude corresponding to each dimension data and are larger than an average amplitude, and when each dimension data comprises a plurality of maximum value points, each dimension data comprises a plurality of important data segments, and the important data segment can be one data segment or one data point (namely, when the maximum value point is equal to the average amplitude of the current dimension data). The maximum value point of the amplitude corresponding to each dimension data can be obtained according to the prior art, and the scheme is further limited. It should be noted that, the importance degree of data refers to the influence degree of the data on the problem or decision, and can be evaluated by the aspects of target relevance, credibility and accuracy, variability and volatility, influence range, information amount, insight and the like. The importance degree corresponding to the important data segment of each dimension data refers to the influence degree of the current important data segment on the subsequent convergence degree calculation.

S103, clustering the importance degrees corresponding to the important data segments of at least two dimension data, and confirming the discrete degree of the important data segments of each dimension data in the corresponding clustering clusters.

After the importance degrees corresponding to the important data segments of the at least two dimensional data are obtained, performing cluster analysis calculation on the importance data segments to obtain a preset number of cluster clusters, wherein the cluster clusters can comprise the importance degrees corresponding to the important data segments of the plurality of dimensional data. The clustering clusters are sets formed by grouping the importance degrees corresponding to the important data segments according to the similarity of the importance degrees corresponding to the important data segments in the clustering analysis. It should be noted that, in the cluster analysis, importance degrees corresponding to important data segments are allocated to different clusters, and each cluster represents importance degrees corresponding to a group of similar important data segments. The degree of similarity between the importance levels corresponding to the important data segments in the clusters is higher, and the degree of similarity between the importance levels corresponding to the important data segments in different clusters is lower. The formation of clusters is achieved by calculating the distance or similarity between the importance levels corresponding to the important data segments. The degree of dispersion of the important data segments of each dimension data in the corresponding cluster refers to the degree of distribution or change of the important data segments of each dimension data in the corresponding cluster, and the degree of dispersion is used for measuring the interval or difference between the important data segments of each dimension data.

S104, confirming the convergence degree corresponding to the important data segment of each dimension data according to the importance degree and the discrete degree corresponding to the important data segment of each dimension data.

The convergence degree corresponding to the important data segment of each dimension data refers to a degree to which the important data segment of each dimension data needs to reach a predetermined target before the dimension is reduced. In other words, the important data segment of each dimension data needs to converge to a corresponding convergence degree to avoid the high-dimension problem. And calculating the convergence degree corresponding to the important data segment after acquiring the importance degree corresponding to the important data segment of each dimension data and the discrete degree of the importance degree corresponding to the important data segment in the corresponding cluster.

S105, performing dimension reduction processing on the time sequence data corresponding to the smell data based on the convergence degree corresponding to the important data segment of each dimension data and a preset dimension reduction algorithm so as to confirm the smell type corresponding to the smell data.

After obtaining the convergence degree corresponding to the important data segment of each dimension data, carrying out convergence processing on the important data segment pair of each dimension data to obtain new time sequence data, then carrying out dimension reduction processing on the new time sequence data through a preset dimension reduction algorithm to obtain dimension reduced data, and finally confirming the odor type corresponding to the odor data according to the dimension reduced data.

Specifically, referring to fig. 3, the step of performing dimension reduction processing on the time series data corresponding to the smell data based on the convergence degree corresponding to the important data segment of each dimension data and a preset dimension reduction algorithm to confirm the smell type corresponding to the smell data specifically includes:

S301, based on the convergence degree corresponding to the important data segment of each dimension data, carrying out convergence processing on the time sequence data, and confirming final time sequence data;

s302, performing dimension reduction processing on the final time sequence data through a preset dimension reduction algorithm, and confirming dimension reduction data;

S303, confirming the odor type corresponding to the odor data according to the dimension reduction data.

The final time sequence data refers to time sequence data after convergence processing based on the convergence degree corresponding to the important data segment, the dimension reduction algorithm may be a linear discriminant analysis algorithm, and after the final time sequence data is acquired, dimension reduction processing is performed on the final time sequence data through the linear discriminant analysis algorithm to obtain dimension reduction data, so that the odor type corresponding to the odor data is confirmed based on the dimension reduction data.

It should be noted that the general principle that the linear discriminant analysis (LINEAR DISCRIMINANT ANALYSIS, abbreviated as LDA) algorithm can be used for dimension reduction is as follows:

1. The intra-class divergence matrix Sw is calculated, for each class, the dispersion of samples within that class is calculated. The specific practice is to calculate the covariance matrix of the samples in each category, and then weight and sum the covariance matrices of all the categories.

2. Calculating an inter-class divergence matrix Sb, calculating, for each class, the differences between that class and the other classes. The specific practice is to calculate the mean vector for each category and then calculate the differences between the mean vectors.

3. And solving a generalized eigenvalue problem, namely finding the optimal projection direction by solving eigenvalues and eigenvectors of Sw (-1) Sb. The larger the feature value corresponding to the feature vector, the greater the contribution of the feature vector to classification.

4. The most important eigenvectors are selected, namely the first k most important eigenvectors are selected according to the magnitude of the eigenvalues to form a projection matrix W. The larger the feature values corresponding to these feature vectors, the more category information they contain.

5. And (3) dimension reduction, namely multiplying the original data set X by a projection matrix W to obtain a dimension reduced data set Y. Y=x×w.

Through the above steps, the LDA algorithm can map high-dimensional data into low-dimensional space while retaining the most important class information. Therefore, the purpose of dimension reduction can be realized, the dimension of the features is reduced, and a better classification effect is realized in the space after dimension reduction.

According to the embodiment of the application, the time sequence data corresponding to the smell data in the gas cabinet are collected firstly, so that dimension data corresponding to at least two dimensions are confirmed according to the time sequence data, then the importance degree corresponding to the important data segment of each dimension data is confirmed based on the amplitude and the slope of each dimension data, wherein the important data segment is a data segment formed by data points which are arranged at two sides of the amplitude maximum point and are larger than the average amplitude and correspond to each dimension data, then the importance degree corresponding to the important data segment of at least two dimension data is clustered, the discrete degree of the important data segment of each dimension data in a corresponding cluster is confirmed, the convergence degree corresponding to the important data segment of each dimension data is confirmed further according to the importance degree and the discrete degree corresponding to the important data segment of each dimension data, and finally the time sequence data corresponding to the smell data is subjected to dimension reduction processing based on the convergence degree corresponding to the important data segment of each dimension data and a preset dimension reduction algorithm so as to confirm the smell type corresponding to the smell data. The time sequence data corresponding to the smell data is subjected to dimension constraint through the convergence degree corresponding to the important data segment of each dimension data, and then dimension reduction processing is performed to confirm the smell type corresponding to the smell data, so that the problem that the dimension reduction effect of the follow-up data is poor due to the fact that the dimension of data obtained when an optical gas sensor detects smell is too high can be avoided, the accuracy of smell type judgment is further improved, and the working cost is reduced.

In one embodiment of the present application, referring to fig. 4, step S102 is to determine, based on the magnitude and the slope of each dimension data, the importance degree corresponding to the important data segment of each dimension data, where the important data segment is a data segment composed of data points located on both sides of the maximum magnitude point corresponding to each dimension data and greater than the average magnitude, and specifically includes:

s401, calculating the average amplitude of each dimension data by counting the amplitude of each data point in each dimension data.

Each dimension data comprises a preset number of data points, each data point corresponds to one amplitude, the amplitude of each data point in each dimension data is counted, and then the average amplitude of each dimension data is calculated based on the amplitude of each data point.

S402, calculating the importance degree corresponding to the important data segment of each dimension data based on the amplitude value and the average amplitude value of the maximum value point of each dimension data and the slope corresponding to each data point in the important data segment.

And taking the amplitude value and the average amplitude value of the maximum value point of each dimension data and the slope corresponding to each data point in the important data segment as the calculation parameters of the importance degree corresponding to the important data segment of the current dimension data after acquiring the amplitude value and the average amplitude value of the maximum value point of each dimension data and the slope corresponding to each data point in the important data segment.

Specifically, the calculating the importance degree corresponding to the important data segment of each dimension data based on the amplitude value, the average amplitude value and the slope corresponding to each data point in the important data segment of each dimension data specifically includes:

Wherein, Is the firstItem of dimension dataThe importance degree corresponding to the important data segment corresponding to the maximum value point,Is the firstItem of dimension dataThe magnitude of the individual maxima points,Is the firstThe average amplitude of the individual dimensional data,Is the firstThe number of data points in the important data segment corresponding to the maximum value points,Is the firstThe first important data segment corresponding to the maximum value pointSlope corresponding to data point.

When the importance degree corresponding to the important data segment of each dimension data is calculated, when the change trend or the change degree of a certain data segment in each dimension data is larger, the data segment is proved to contain more useful information, so that the amplitude change size and the slope change size corresponding to the important data segment are selected as parameters of the change trend, and the importance degree corresponding to the important data segment is further calculated. Further, the method comprises the steps of,The larger the amplitude span of the proving important data segment, the larger the corresponding importance degree is required,For the sum of the slopes of all data points in the important data segment, the greater the slope, the greater the corresponding importance level is required.

According to the embodiment of the application, the sum of the amplitude span of the important data segment and the slope of the data point in the important data segment is obtained and used as the calculation parameter to further obtain the corresponding importance degree of the important data segment, so that the subsequent dimension constraint calculation parameter is used, the subsequent dimension reduction is more accurate, and the judgment accuracy of the smell type is improved.

In one embodiment of the present application, referring to fig. 5, step S103, clustering importance degrees corresponding to important data segments of at least two dimension data, and determining a discrete degree of the important data segments of each dimension data in a corresponding cluster specifically includes:

S501, clustering importance degrees corresponding to important data segments of at least two dimensional data, and confirming a preset number of clustering clusters.

After the importance degrees corresponding to the important data segments of the at least two dimensional data are obtained, the importance degrees corresponding to the important data segments of the at least two dimensional data are clustered according to a preset clustering algorithm, wherein the preset clustering algorithm can be a DBSCAN density clustering algorithm, and a preset number of clusters are confirmed.

The DBSCAN density clustering algorithm determines cluster formation by defining density and neighborhood. The core points are points within the neighborhood that contain at least MinPts data points, the boundary points are points within the neighborhood that contain less than MinPts data points but there are core points within the neighborhood, and the noise points are neither core points nor boundary points. The DBSCAN forms clusters by expanding a neighborhood of core points and assigns boundary points into the same clusters. Noise points will be marked as noise or discarded. The specific clustering method is not further limited and can be achieved by referring to the prior art.

S502, counting the number and average amplitude of important data segments in the cluster corresponding to the target important data segments and the clustering distance between the target important data segments and the clustering center of the corresponding cluster according to the cluster.

After a preset number of clusters are obtained through clustering, the number of all important data segments of the target important data segment in the cluster where the target important data segment is located and the average amplitude of all important data segments in the cluster can be counted, and meanwhile, the clustering distance between the target important data segment and the clustering center of the cluster where the target important data segment is located is calculated so as to be used for calculating the discrete degree of the target important data segment in the corresponding cluster subsequently.

S503, inputting the number and average amplitude of the important data segments in the cluster corresponding to the target important data segments and the clustering distance between the target important data segments and the clustering center of the corresponding cluster into a discrete degree calculation formula, and calculating the discrete degree of the target important data segments in the corresponding cluster.

After the number and average amplitude of the important data segments in the cluster corresponding to the target important data segments and the cluster distance between the target important data segments and the cluster center of the corresponding cluster are obtained, the cluster distance is used as a calculation parameter and is input into a discrete degree calculation formula, and the discrete degree of the target important data segments in the corresponding cluster is calculated.

Specifically, the step of inputting the number and average amplitude of the important data segments in the cluster corresponding to the target important data segments and the cluster distance between the target important data segments and the cluster center of the corresponding cluster into a discrete degree calculation formula to calculate the discrete degree of the target important data segments in the corresponding cluster specifically includes:

It should be noted that the number of the substrates,Refers to the target important data segmentThe ratio of the importance degree of the target important data segment to the importance degree of the cluster in which the target important data segment is located can be represented by the ratio of the importance degree of the target important data segment to the importance degree of the cluster in which the target important data segment is located, namely, the smaller the ratio is, the variation trend of the data point is different from the variation trend of other data intervals, and the data discreteness is larger.Refers to the target data segmentThe larger the variance, compared to the variance of the cluster in which it is located, indicating that the data segment differs from other data, and therefore the greater the degree of discretization.Refers to the target important data segmentAnd the firstThe clustering distance of the clustering centers of the clustering clusters is used as the important data segment of the targetThe further from the cluster center, the more the target important data segment is described in the multidimensional data of the clusterThe variation of (c) differs from the variation of all data in the cluster, so the greater the degree of discretization.

In an embodiment of the present application, referring to fig. 6, in step S104, determining the convergence degree corresponding to the important data segment of each dimension data according to the importance degree and the discrete degree corresponding to the important data segment of each dimension data specifically includes:

S601, counting the sum of the amplitudes of the data points in the important data segment of the target and the sum of the amplitudes of the data points in the dimension data corresponding to the important data segment of the target;

S602, calculating constraint weights corresponding to the target important data segments based on the importance degrees and the discrete degrees corresponding to the target important data segments, the sum of the magnitudes of the data points in the target important data segments and the sum of the magnitudes of the data points in the dimension data corresponding to the target important data segments.

The constraint weight corresponding to the target important data segment means that a certain constraint condition is applied to the target important data segment in the optimization problem. These constraints may be linear constraints, non-linear constraints, or other forms of constraints that limit the range of values of weights or parameters to meet the specific requirements of the problem.

Specifically, calculating the constraint weight corresponding to the target important data segment based on the importance degree and the discrete degree corresponding to the target important data segment, the sum of the magnitudes of the data points in the target important data segment, and the sum of the magnitudes of the data points in the dimension data corresponding to the target important data segment, specifically includes:

It should be noted that the number of the substrates,Is the firstTarget important data segment of individual dimension dataIs the sum of all the data point magnitudes of (c),Is the firstThe sum of all data point magnitudes for the individual dimension data,Refers to the target important data segmentThe greater the ratio of the specific gravity of the sum of all data point amplitudes of the dimensional data in which they are located, the more proving the target data segmentFor the first place where it isThe greater the degree of influence of the individual dimension data, the more the target data segment is requiredGreater constraint weights are assigned. Meanwhile, the importance degree and the discrete degree corresponding to the target important data segment are combined and used as calculation parameters of the constraint weight, so that the accuracy of the constraint weight is higher, and the subsequent odor type judgment is more accurate.

S603, inputting constraint weights corresponding to the target important data segments and the average value of the amplitude values of the dimension data corresponding to the target important data segments into a convergence degree calculation formula, and calculating the convergence degree corresponding to the target important data segments.

After constraint weights corresponding to the target important data segments and the average value of the amplitude values of the dimension data corresponding to the target important data segments are obtained, the convergence degree corresponding to the target important data segments is calculated. And the convergence degree corresponding to the target important data segment is the final state of the target important data segment which needs to be constrained.

Specifically, the inputting the constraint weight corresponding to the target important data segment and the average value of the amplitude values of the dimension data corresponding to the target important data segment into the convergence degree calculation formula, and calculating the convergence degree corresponding to the target important data segment specifically includes:

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. In the description corresponding to the flowcharts and block diagrams in the figures, operations or steps corresponding to different blocks may also occur in different orders than that disclosed in the description, and sometimes no specific order exists between different operations or steps. For example, two consecutive operations or steps may actually be performed substantially in parallel, they may sometimes be performed in reverse order, which may be dependent on the functions involved. Each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. The foregoing description of the preferred embodiments of the present application is not intended to be limiting, but rather, any modifications, equivalents, improvements, etc. that fall within the principles of the present application are intended to be included within the scope of the present application.

Claims

1. A method for optimizing the collection and analysis of gas cabinet sensor data, applied in the field of odor data analysis, characterized in that the method comprises:

Collecting time series data corresponding to the odor data in the gas cabinet to confirm dimensional data corresponding to at least two dimensions according to the time series data;

Based on the amplitude and slope of each dimensional data, determine the importance of the important data segment of each dimensional data, wherein the important data segment refers to a data segment consisting of data points located on both sides of the amplitude maximum point corresponding to each dimensional data and greater than the average amplitude;

Clustering the importance of the important data segments of at least two dimensional data to determine the degree of dispersion of the important data segments of each dimensional data in the corresponding clustering cluster;

According to the importance and discreteness corresponding to the important data segments of each dimensional data, the convergence degree corresponding to the important data segments of each dimensional data is determined;

Based on the convergence degree corresponding to the important data segments of each dimensional data and the preset dimensionality reduction algorithm, the time series data corresponding to the odor data is subjected to dimensionality reduction processing to confirm the odor type corresponding to the odor data;

Based on the convergence degree corresponding to the important data segments of each dimensional data and the preset dimensionality reduction algorithm, the time series data corresponding to the odor data is subjected to dimensionality reduction processing to confirm the odor type corresponding to the odor data, specifically including:

Based on the convergence degree corresponding to the important data segments of each dimensional data, the time series data is converged to confirm the final time series data;

Performing dimensionality reduction processing on the final time series data by using a preset dimensionality reduction algorithm to confirm the dimensionality reduction data;

The odor type corresponding to the odor data is confirmed based on the dimension reduction data.

2. A method for optimizing the collection and analysis of gas cabinet sensor data according to claim 1, characterized in that the importance of the important data segments of each dimensional data is confirmed based on the amplitude and slope of each dimensional data, wherein the important data segment refers to a data segment composed of data points located on both sides of the amplitude maximum point corresponding to each dimensional data and greater than the average amplitude, specifically including:

Count the amplitude of each data point in each dimension of data and calculate the average amplitude of each dimension of data;

Based on the amplitude of the maximum point of each dimensional data, the average amplitude and the slope corresponding to each data point in the important data segment, the importance of the important data segment of each dimensional data is calculated.

3. A method for optimizing the collection and analysis of gas cabinet sensor data according to claim 2, characterized in that the importance of the important data segment of each dimensional data is calculated based on the amplitude of the maximum point of each dimensional data, the average amplitude and the slope corresponding to each data point in the important data segment, specifically including: in, For the Dimensional data The importance of the important data segments corresponding to the maximum value points, For the The first dimension of data The amplitude of the maximum point, For the The average amplitude of the dimensional data, For the The number of data points in the important data segment corresponding to the maximum value point, For the The important data segment corresponding to the maximum value point The slope corresponding to the data point.

4. A method for optimizing the collection and analysis of gas cabinet sensor data according to claim 3, characterized in that clustering the importance of the important data segments of at least two dimensional data to confirm the degree of dispersion of the important data segments of each dimensional data in the corresponding cluster cluster specifically includes:

Clustering the importance levels of important data segments of at least two dimensional data to determine a preset number of clusters;

According to the cluster clusters, counting the number and average amplitude of important data segments in the cluster clusters corresponding to the target important data segments, and the cluster distance between the target important data segments and the cluster center of the corresponding cluster clusters;

The number and average amplitude of important data segments in the cluster corresponding to the target important data segment, as well as the cluster distance between the target important data segment and the cluster center of the corresponding cluster are input into the discrete degree calculation formula to calculate the discrete degree of the target important data segment in the corresponding cluster cluster.

5. A method for optimizing the collection and analysis of gas cabinet sensor data according to claim 4, characterized in that the number and average amplitude of the important data segments in the cluster cluster corresponding to the target important data segment, and the cluster distance between the target important data segment and the cluster center of the corresponding cluster cluster are input into a discrete degree calculation formula to calculate the discrete degree of the target important data segment in the corresponding cluster cluster, specifically including: in, For the The important data segments of the target in the clusters The degree of discreteness, For the The important data segments of the target in the clusters The importance of For the The number of important data segments in the clusters, Target important data segment Middle The amplitude of the data point, For the The average amplitude of important data segments in clusters, Target important data segment The number of data points, Target important data segment With The cluster distance of the cluster centers of the clusters, For is the base exponential function.

6. A method for optimizing the collection and analysis of gas cabinet sensor data according to claim 5, characterized in that the step of confirming the convergence degree corresponding to the important data segments of each dimensional data according to the importance and discreteness corresponding to the important data segments of each dimensional data specifically includes:

Count the sum of the amplitudes of the data points in the target important data segment and the sum of the amplitudes of the data points in the dimensional data corresponding to the target important data segment;

Calculate the constraint weight corresponding to the target important data segment based on the importance and discreteness corresponding to the target important data segment, the sum of the amplitudes of the data points in the target important data segment, and the sum of the amplitudes of the data points in the dimensional data corresponding to the target important data segment;

The constraint weight corresponding to the target important data segment and the amplitude mean of the dimensional data corresponding to the target important data segment are input into the convergence degree calculation formula to calculate the convergence degree corresponding to the target important data segment.

7. A method for optimizing the collection and analysis of gas cabinet sensor data according to claim 6, characterized in that the constraint weight corresponding to the target important data segment is calculated based on the importance corresponding to the target important data segment, the discreteness, the sum of the amplitudes of the data points in the target important data segment, and the sum of the amplitudes of the data points in the dimensional data corresponding to the target important data segment, and specifically includes: in, For the The important data segments of the target in the clusters The constraint weight, For the The important data segments of the target in the clusters The degree of discreteness, For the The important data segments of the target in the clusters The importance of For the Target important data segments of dimensional data No. The amplitude of the data point, Target important data segment The number of data points, Target important data segment The corresponding Dimensional data The amplitude of the data point, Target important data segment The corresponding The number of data points for each dimension.

8. A method for optimizing the collection and analysis of gas cabinet sensor data according to claim 7, characterized in that the constraint weight corresponding to the target important data segment and the amplitude mean of the dimensional data corresponding to the target important data segment are input into a convergence degree calculation formula to calculate the convergence degree corresponding to the target important data segment, specifically including: in, For the The important data segments of the target in the clusters The degree of convergence, For the The important data segments of the target in the clusters The constraint weight, Target important data segment The corresponding The mean amplitude of the data in each dimension.

9. A method for optimizing the collection and analysis of gas cabinet sensor data according to any one of claims 1 to 8, characterized in that the time series data corresponding to the odor data in the gas cabinet is collected to confirm the dimensional data corresponding to at least two dimensions according to the time series data, specifically including:

Collecting time series data corresponding to the odor data in the gas cabinet, denoising the time series data according to a preset denoising algorithm, and confirming the final time series data;

According to the final time series data, dimensional data corresponding to at least two dimensions are confirmed.