CN119357642B

CN119357642B - Embodied Agent Time Series Data Modeling Method Based on Frequency Domain Learning

Info

Publication number: CN119357642B
Application number: CN202411899587.4A
Authority: CN
Inventors: 吴瀚鹏; 张希; 苏双
Original assignee: Zhongke Nanjing Artificial Intelligence Innovation Research Institute
Current assignee: Zhongke Nanjing Artificial Intelligence Innovation Research Institute
Priority date: 2024-12-23
Filing date: 2024-12-23
Publication date: 2025-10-21
Anticipated expiration: 2044-12-23
Also published as: CN119357642A

Abstract

The invention discloses a modeling method of self-adaptive body time sequence data based on frequency domain learning, which comprises the steps of carrying out self-adaptive blocking on an original time sequence data sequence, extracting features through multi-scale convolution and generating position codes, carrying out Fourier transformation on feature blocks to extract frequency domain information, carrying out nonlinear transformation and residual error compensation, calculating attention relation among the features and carrying out feature fusion, constructing a time sequence dependency graph, extracting time sequence aggregation features and dynamic system features, constructing a probability prediction model, and calculating uncertainty of a prediction result. The invention improves modeling precision through self-adaptive blocking and frequency domain feature extraction, enhances feature expression capability through attention mechanism and graph structure modeling, improves prediction reliability through uncertainty quantization, and is suitable for time sequence data modeling of a body intelligent agent in a complex environment.

Description

Modeling method for time sequence data of body-building agent based on frequency domain learning

Technical Field

The invention belongs to the field of data modeling, and particularly relates to a self-body time sequence data modeling method based on frequency domain learning.

Background

Autonomous decision-making and behavior control of the self-contained agent in a complex environment requires accurate modeling and prediction of time-ordered data. Along with the continuous improvement of the intelligent and automatic degree of industrial production, the time sequence data which needs to be processed by the intelligent body has the characteristics of high dimensionality, strong coupling, multiple scales, non-stability and the like. The dynamic characteristics, the periodic patterns and the long-term dependency relations in the complex time sequence data are accurately captured, and the method has important significance in improving the environment sensing capability of the intelligent body, optimizing the decision-making efficiency and ensuring the stability of the system. Particularly in the fields of industrial manufacture, intelligent robots, automatic control and the like, the time sequence modeling method based on frequency domain learning can effectively extract frequency characteristics and periodic modes in data, and provides reliable basis for behavior decision of an intelligent body.

Currently, time series data modeling mainly adopts a method based on deep learning, such as a cyclic neural network (RNN), a long-term short-term memory network (LSTM), a gate-controlled cyclic unit (GRU) and the like. The methods capture the dependency relationship of time series data by constructing a complex network structure, but have the problems of gradient disappearance and low calculation efficiency when processing long sequence data. Traditional frequency domain analysis methods, such as fourier transform and wavelet transform, while being able to extract frequency characteristics of signals, have difficulty in processing non-stationary signals and capturing complex timing dependencies. Although the attention mechanism-based transducer model solves the problem of long-range dependence to a certain extent, the computational complexity of the self-attention mechanism increases as the square of the sequence length, and the application of the self-attention mechanism in processing long-sequence data is limited.

However, the existing time sequence modeling method still has the technical problem that firstly, in the data preprocessing stage, a fixed blocking strategy cannot adaptively adjust the window size according to the local characteristics of data, so that important time sequence characteristics can be segmented or confused. Secondly, the existing frequency domain feature extraction method often ignores importance weights of frequency components, and key frequency information may be lost by using a uniform frequency selection standard. Third, in the feature fusion process, the correlation and complementarity between different features cannot be fully considered by simple feature stitching or weighted average, so that the efficiency of feature expression is reduced. Fourth, when the traditional graph structure modeling method constructs the time sequence dependency graph, a fixed similarity measurement standard is often adopted, and the traditional graph structure modeling method is difficult to adapt to the characteristics of different types of time sequence data. Finally, in terms of reliability assessment of the prediction results, the existing methods lack systematic quantification of prediction uncertainty, and it is difficult to provide reliability assessment for decision making by the tangible agent. These technical problems severely restrict the performance and adaptability of the body-building agent in complex environments.

Disclosure of Invention

The invention aims to provide a modeling method for time sequence data of an intelligent body based on frequency domain learning, which aims to solve at least one technical problem existing in the prior art.

According to the technical scheme, the modeling method of the self-contained agent time sequence data based on frequency domain learning comprises the following steps:

S1, acquiring an original time sequence data, determining an adaptive block size sequence according to the time relation of adjacent data points, performing block processing on the original time sequence data based on the adaptive block size sequence to obtain a data block, extracting features from the data block by adopting a multi-scale convolution method, and generating corresponding position coding information to obtain a multi-scale feature block set and a position coding matrix;

S2, performing Fourier transform on each feature block in the multi-scale feature block set to extract frequency domain information, analyzing signal amplitude in the frequency domain information to determine frequency selection parameters, and performing nonlinear transform and residual error compensation on the frequency domain information based on the frequency selection parameters to obtain a frequency domain feature matrix and a residual error feature matrix;

s3, calculating attention relation among the features based on the frequency domain feature matrix, the residual feature matrix and the position coding matrix;

S4, constructing a time sequence dependency graph based on the fusion feature matrix, extracting time sequence aggregation features based on a graph structure of the time sequence dependency graph, and calculating dynamic feature parameters of the system to obtain dynamic system features;

s5, based on the time sequence aggregation characteristics and the dynamic system characteristics, constructing a probability prediction model, and calculating uncertainty of a prediction result to obtain prediction distribution and confidence indexes.

The method has the advantages of realizing high-efficiency feature extraction of time sequence data, ensuring the integrity and accuracy of feature expression, realizing intelligent integration of different types of features, deeply describing complex time sequence dependency, realizing reliability assessment of a predicted result, improving the accuracy and efficiency of time sequence data modeling, enhancing the reliability of the predicted result, providing powerful support for state prediction and decision optimization of a complex system, realizing effective conversion from data to knowledge through multi-level feature extraction and fusion, and providing reliable technical support for intelligent decision.

Drawings

FIG. 1 is a flow chart of the present invention.

Fig. 2 is a flowchart of step S1 of the present invention.

Fig. 3 is a flowchart of step S2 of the present invention.

Fig. 4 is a flowchart of step S3 of the present invention.

Fig. 5 is a flowchart of step S4 of the present invention.

Fig. 6 is a flowchart of step S5 of the present invention.

Detailed Description

As shown in fig. 1, the invention provides a modeling method of self-body time sequence data based on frequency domain learning, comprising the following steps:

As shown in fig. 2, according to an aspect of the present application, step S1 is further:

S11, reading an original time sequence from a database, performing inner product operation on data of adjacent time points to obtain an inner product result, adding 1 to the inner product result, calculating power operation to obtain an operation result, constructing similarity degree data of the adjacent time points based on the operation result, determining the sizes of data blocks according to the similarity degree data, and obtaining a self-adaptive block size sequence;

s12, constructing convolution kernel parameters with different sizes based on the self-adaptive block size sequence and the original time sequence data sequence, performing convolution operation on each data block based on the convolution kernel parameters to obtain a convolution result, and combining the convolution results in the channel dimension to obtain a multi-scale feature block set;

s13, obtaining time index information of data blocks in the multi-scale feature block set, calculating a sine function value to serve as an initial position code, constructing position query data, calculating a correlation score of the position query data and the initial position code, and carrying out normalization processing on the correlation score to generate a position code matrix.

In one embodiment of the application, an original time series data sequence X E R ^N×D (N is the sequence length and D is the feature dimension) is obtained, a similarity matrix S E R ^N×N of adjacent time points is calculated based on a polynomial kernel function K (X, y) = (1+x ^T y)^d), an adaptive block size sequence L= [ L ₁,l₂,...,l_k ] is output, wherein L _i represents the size of an ith block. A relative position code is generated for each block, P _i = sin (ωt + phi), where t is the time index and ω and phi are the learnable parameters, a global position attention, a = softmax (QK ^T/sqrt (d)) V, Q, K, V is the position query matrix, and the position code matrix P is output.

The embodiment realizes the efficient feature extraction of time sequence data through a combination method of multi-scale convolution and self-adaptive blocking. Firstly, the block size is adaptively determined based on the similarity relation of adjacent time points, the problem of information loss or redundancy possibly caused by the fixed window size is avoided, and the similarity degree is calculated by using inner product operation and power transformation, so that a nonlinear relation mode of data can be effectively captured. And secondly, a multi-scale convolution processing method is adopted, and feature information of a plurality of scales is simultaneously extracted through convolution kernels with different sizes, so that local fine features are reserved, and a global time sequence dependency relationship is captured. Finally, the problem of position information loss in the traditional time sequence modeling is solved by introducing a position coding matrix, so that the model can sense the relative position relation between different time points. The embodiment improves the recognition capability of the model to different time scale modes and reduces the calculation complexity.

According to one aspect of the present application, step S11 is further:

S111, acquiring an original time sequence data sequence from a database, segmenting the original time sequence data sequence by adopting a sliding window method, removing outliers in each segment of data to obtain segmented data, performing Z-score standardization processing on the segmented data to obtain standardized time sequence data, and applying wavelet transformation to the standardized time sequence data to remove high-frequency noise to obtain noise reduction standardized data and time sequence window data.

S112, calculating inner products of adjacent time window data based on noise reduction standardized data and time sequence window data to obtain inner product data, adding 1 to the inner product data to obtain offset inner product data, performing power operation on the offset inner product data to obtain power characteristic data, calculating a pearson correlation coefficient of the power characteristic data, mapping the pearson correlation coefficient to a high-dimensional characteristic space by adopting a kernel function method, and reducing characteristic dimensions by adopting a local sensitive hash method to obtain time sequence correlation characteristics;

S113, determining a feature clustering center by adopting a self-adaptive neighborhood method based on time sequence related features and power feature data, calculating the Mahalanobis distance between each time point and the feature clustering center, constructing a sparse adjacency matrix based on the Mahalanobis distance and the power feature data, and optimizing the adjacency relationship by adopting a spectral clustering method to obtain a similarity matrix;

S114, calculating the change trend of the similarity by adopting a dynamic programming method based on the similarity matrix, constructing an adaptive threshold based on the change trend and power characteristic data, dividing standardized time sequence data into initial data blocks by utilizing the adaptive threshold, and performing block size optimization on the initial data blocks by adopting a hierarchical clustering method to finally obtain an adaptive block size sequence.

In one embodiment of the application, the Z-score normalization algorithm is Z (x) = (x- μ)/σ+α·sign (x- μ) ·log (1+|x- μ|/β), wherein Z (x) is normalized data, x is raw data, μ is a sequence mean, σ is a sequence standard deviation, α is a nonlinear adjustment coefficient, β is a smoothing parameter, sign is a sign function, log is a natural logarithm, and |·| represents an absolute value operation.

Wavelet transform noise reduction is performed by W (a, b) =Σ (t) [ x (t) & ψ ((t-b)/a) ] +λ·soft (|W (a, b) |, τ (a)), wherein W (a, b) is a wavelet coefficient, x (t) is an input signal, ψ is a wavelet basis function, a is a scale parameter, b is a translation parameter, λ is a regularization coefficient, soft is a soft threshold function, τ (a) is a scale adaptive threshold, and t is a time index.

K (x_i, x_j) =exp (- |phi (x_i) -phi (x_j) | ²/2σ²) +alpha·cos (ω· < x_i, x_j >); the method comprises the steps of obtaining K (x_i, x_j) which is a kernel function mapping result and represents similarity of two feature vectors in a high-dimensional space, wherein phi (x) is a feature mapping function and maps input features to the high-dimensional space, x_i, x_j are feature vectors of adjacent time windows, sigma is a kernel function bandwidth parameter and is obtained through cross verification optimization, alpha is a weight coefficient of a cosine kernel item and is used for enhancing extraction of periodic features, omega is a frequency parameter of the cosine kernel and is adaptively adjusted according to periodic characteristics of data, phi is a Euclidean distance norm, and phi is a vector inner product operation.

The locality sensitive hashing algorithm comprises the steps of H (x) =sign (W.x+b) & exp (- |W.x+b|| ²/gamma), wherein H (x) is hash coding, W is a random projection matrix, x is an input characteristic, b is a bias vector, gamma is a Gaussian kernel parameter, sign is a sign function, |·|| represents Euclidean norms, and the matrix multiplication operation is represented.

Hierarchical clustering algorithms D (C_i, C_j) =θ·min_dist (C_i, C_j) + (1- θ) ·avg_dist (C_i, C_j), wherein D (C_i, C_j) is the inter-cluster distance, C_i, C_j is the data cluster, min_dist is the minimum distance function, avg_dist is the average distance function, θ is the weight coefficient, cluster merge criteria: merge (C_i, C_j) =argmin { D (C_i, C_j) +ρ·|size (C_i) -size (C_j) | } wherein ρ is the balance coefficient.

The adaptive blocking algorithm comprises the steps of S (t, tau) =beta.exp (- |f (t) -f (t+tau) | ²/λ²) +gamma|R (t, tau) |, wherein S (t, tau) is a similarity score between a time point t and t+tau, f (t) is a feature vector at the time point t, tau is a time offset, R (t, tau) is an autocorrelation function of a time sequence, beta is a distance term weight coefficient, gamma is an autocorrelation term weight coefficient, lambda is a distance scale parameter, |·|| represents Euclidean distance norm, and|is absolute value operation.

The embodiment realizes high-quality feature extraction of the original time sequence data through adaptive data preprocessing and similarity calculation. First, the combination of sliding window and Z-score normalization effectively eliminates the scale difference and outlier effects of the data. The noise reduction processing is carried out through wavelet transformation, so that the main characteristics of data are reserved, and the interference of random noise is restrained. Secondly, when calculating the similarity of adjacent time windows, the capturing capability of nonlinear correlation is enhanced through the combination of inner product operation and power transformation. Relevant features are projected to a high-dimensional space through kernel function mapping, and the expression capacity of the features is improved. And finally, performing feature dimension reduction by using a local sensitive hash method, and reducing the computational complexity while maintaining the data similarity. The embodiment not only improves the accuracy of subsequent analysis, but also realizes the optimization of calculation efficiency through dimension reduction and feature enhancement, and lays a solid foundation for modeling of time sequence data.

According to one aspect of the present application, step S12 is further:

S121, acquiring an adaptive block size sequence and an original time sequence in a database, applying stationarity test to each time sequence data block, generating data block statistical characteristics, adaptively adjusting the receptive field size of each convolution kernel according to the statistical characteristics, generating convolution kernel weights of different scales by adopting a Glorot initialization method, and sparsity constraint is carried out on the convolution kernels based on the data distribution characteristics to obtain multi-scale convolution kernel parameters.

S122, acquiring multi-scale convolution kernel parameters, a self-adaptive block size sequence and an original time sequence data sequence, performing one-dimensional convolution operation on each data block to obtain a basic feature map, performing nonlinear transformation by adopting an exponential linear unit to obtain an activated feature map, extracting remarkable features by using maximum pooling operation to obtain a pooled feature map, and performing batch normalization processing to obtain a normalized feature map.

S123, obtaining a normalized feature map, calculating feature weights by adopting a channel attention mechanism to obtain channel weighted features, extracting multi-scale space information by using a space pyramid pooling method to obtain pyramid features, carrying out self-adaptive weighted combination on the features of all scales in the channel dimension, and reserving original information by applying residual connection to obtain a multi-scale feature block set.

In one embodiment of the application, the convolution kernel parameter is adaptively adjusted by W (l, k) =eta.tanh (V (l, k)) +rho.sigmoid (U (l, k)). M (l), wherein W (l, k) is a parameter matrix of the kth convolution kernel of the first layer, V (l, k) is a basic parameter matrix, U (l, k) is a modulation parameter matrix, M (l) is a hierarchical mask matrix, eta is a basic weight coefficient, rho is a modulation intensity coefficient, tanh is a hyperbolic tangent function, and sigmoid is an S-type activation function.

The spatial pyramid pooling algorithm is that P (x, l) =Σ (i=1→4 ^l) [ pool (x, R_i (l)) · exp (-d (R_i (l), c)/eta) ], wherein P (x, l) is a first layer pooling result, x is an input feature, pool is a pooling function, R_i (l) is an i-th region of the first layer, d (R, c) is a distance from a region R to a center c, eta is a distance attenuation parameter, and 4 ^l is the number of regions of the first layer.

The embodiment realizes the comprehensive characterization of time sequence data through multi-scale convolution feature extraction. Firstly, the size of the receptive field of the convolution kernel is adaptively adjusted based on the stability test result of the data block, so that information loss possibly caused by a fixed scale is avoided. By means of Glorot initialization and sparsity constraint combination, optimal configuration of convolution kernel parameters is achieved, and feature extraction efficiency and robustness are improved. And secondly, nonlinear transformation is carried out by adopting an exponential linear unit, so that the expression capacity of the model on complex modes is enhanced. And significant features are extracted through the maximum pooling operation, so that the data redundancy is reduced. Finally, through the combination of a channel attention mechanism and space pyramid pooling, the self-adaptive fusion of the multi-scale features is realized, local detail information is reserved, and a global time sequence dependency relationship is captured. The embodiment improves the recognition capability of the model to different scale modes, and improves the expression efficiency of the features through feature fusion and attention mechanisms.

According to one aspect of the present application, step S13 is further:

s131, extracting time index information of the data blocks based on the multi-scale feature block set, adopting polynomial interpolation to calculate a continuous time function to obtain a continuous time sequence, decomposing the continuous time sequence by using a wavelet basis function to obtain a time feature component;

S132, calculating a periodic sine function value based on layered time characteristics and initial time codes to obtain initial position codes, applying adaptive frequency modulation based on the initial position codes to obtain modulation position codes, calculating dot products among codes based on the modulation position codes and the initial position codes to obtain initial correlation scores, and obtaining multi-scale position codes based on the initial correlation scores and prestored time scale information;

S133, constructing a position attention query matrix based on the multi-scale position codes and the initial correlation score to obtain position query characteristics, calculating the correlation between the position query characteristics and the position codes to obtain position correlation scores, carrying out normalization processing on the position correlation scores to obtain normalized correlation scores, and applying kernel function transformation based on the normalized correlation scores to obtain nucleated position characteristics;

S134, performing spectrum decomposition based on the nucleated position features and the normalized correlation score to obtain a characteristic spectrum matrix, calculating the importance of the characteristic values based on the characteristic spectrum matrix and performing dimension reduction processing to obtain dimension reduction position features, performing weighting processing by combining the dimension reduction position features and the normalized correlation score to obtain weighted position features, and obtaining a position coding matrix through orthogonalization processing based on the weighted position features.

In one embodiment of the application, the multi-scale time characterization is that T (T, s) =mu.sin (2pi T/s) +v.cos (2pi T/s) +theta.wavelet (T, s), wherein T (T, s) is a characterization vector of a time point T on a scale s, the wavelet (T, s) is a wavelet transformation result, mu is a sine term weight, v is a cosine term weight, theta is a wavelet term weight, T is a time index, s is a time scale parameter, and pi is a circumference rate.

The adaptive frequency modulation algorithm comprises F (t, omega) =alpha.sin (omega.t+phi (t)) +beta.cos (omega.t+phi (t)). Exp (- |dphi (t)/dt|/mu), wherein F (t, omega) is a modulated signal, omega is a fundamental frequency, phi (t) is a phase function, alpha, beta is a modulation coefficient, mu is a phase change rate attenuation parameter, dphi (t)/dt is a phase change rate, and t is a time variable.

The embodiment realizes accurate modeling of time-lapse position information through advanced position coding and adaptive correlation calculation. Firstly, a combination method of polynomial interpolation and wavelet decomposition is adopted to construct continuous time function representation, and the problem of information loss caused by discrete sampling is solved. Through multi-scale time characterization, the description of the position relationship on different time scales is realized. And secondly, the periodic sine function of the adaptive frequency modulation is used, so that the expression capacity of the position coding is enhanced, and the model can better sense the remote dependency relationship. Through spectrum decomposition and eigenvalue importance analysis, dimension reduction optimization of the position features is realized, key position information is reserved, and calculation complexity is reduced. Finally, through kernel function transformation and orthogonalization processing, the distinguishing capability of the position codes is improved. According to the embodiment, the understanding capability of the model on the time-sequence dependency relationship is improved, and the improvement of the calculation efficiency is realized through dimension optimization and feature enhancement.

As shown in fig. 3, according to an aspect of the present application, step S2 is further:

s21, acquiring a multi-scale feature block set, performing Fourier transform operation on feature block data to obtain a Fourier transform result, calculating the amplitude of the transformed signal based on the Fourier transform result to obtain amplitude data, generating frequency selection parameters through multi-layer perceptron operation and sigmoid function based on the amplitude data, multiplying the frequency selection parameters with the Fourier transform result to obtain a frequency domain feature matrix;

s22, based on the multi-scale feature block set and the frequency domain feature matrix, performing polynomial expansion operation on the feature block data by using polynomial coefficients to obtain a polynomial expansion result, and performing difference operation on the polynomial expansion result and the frequency domain feature matrix to obtain a residual feature matrix.

In one embodiment of the application, a block-level Fourier transform is calculated based on a multi-scale feature block set B, wherein B _i (t) is the ith feature block in the multi-scale feature block set and represents a time domain signal, e ^-jωt is a complex exponential function representing the frequency content in the Fourier transform, j is an imaginary unit, ω is an angular frequency, t is time, dt is an integral variable representing a time derivative, and an adaptive frequency selection mechanism is introduced, α (ω) =sigmoid (MLP (|F (ω) |)), wherein sigmoid is an activation function, MLP () is a nonlinear transformation function, F (ω) is a frequency domain feature, and a frequency domain feature matrix F is output, wherein each element F [ i, j ] represents the feature of the ith block at frequency j. Nonlinear feature extraction is performed using Chebyshev (Chebyshev) polynomial expansion, g (x) =Σ _i c_iT_i (x), where g (x) is a nonlinear feature extraction function, c _i is a coefficient in Chebyshev polynomial expansion, T _i (x) is a Chebyshev polynomial representing a basis function in nonlinear feature extraction, high-frequency compensation terms are calculated, h=g (B) -F, and residual feature matrix R is output.

The embodiment realizes the comprehensive characterization of time sequence data through a dual mechanism of frequency domain transformation and residual error compensation. In the frequency domain feature extraction process, a time domain signal is firstly converted into a frequency domain space through Fourier transformation, and then frequency selection parameters are adaptively determined based on signal amplitude, so that a periodic mode and frequency features in data can be effectively extracted. Through the combination of the multi-layer perceptron and the sigmoid function, the self-adaptive optimization of the frequency selection parameters is realized, and the importance weight of the frequency components can be dynamically adjusted by the model according to the data characteristics. Meanwhile, nonlinear characteristics and detail information possibly lost in frequency domain transformation are captured through polynomial expansion and residual error compensation mechanisms, and complete representation of original data is formed. The embodiment not only ensures the accurate capture of the periodic mode, but also can not lose important aperiodic features, thereby improving the generalization capability of the model.

According to one aspect of the present application, step S21 is further:

S211, acquiring a multi-scale feature block set, and windowing each feature block by applying a Hanning window function to obtain windowed feature data; calculating an autocorrelation function of the windowed feature data to obtain a correlation sequence; based on the related sequence, estimating the periodic characteristics of the signals to obtain periodic characteristic data;

S212, performing fast Fourier transform based on the windowed feature data and the periodic feature data to obtain an initial frequency spectrum, estimating the power spectrum density by adopting a Welch method based on the initial frequency spectrum to obtain power spectrum data, and performing self-adaptive smoothing on the initial frequency spectrum based on the power spectrum data and the periodic feature data to obtain a smooth frequency spectrum;

S213, calculating a frequency importance score through a multi-layer perceptron based on smooth frequency spectrum and power spectrum data to obtain a frequency weight, normalizing by applying a sigmoid function based on the frequency weight to obtain a normalized weight, evaluating uncertainty of the normalized weight by adopting a Bootstrap method to obtain a weight confidence interval, and carrying out weight correction based on the weight confidence interval to obtain a frequency selection parameter;

S214, performing frequency domain filtering based on the smooth frequency spectrum and the frequency selection parameters to obtain a filtered frequency spectrum, applying inverse Fourier transform based on the filtered frequency spectrum to obtain a reconstructed time domain signal, calculating a reconstruction error based on the reconstructed time domain signal, and updating the frequency selection parameters to finally obtain a frequency domain feature matrix.

In one embodiment of the application, the Welch method is P (f) =1/K- Σ (k=1→K) [ |FFT (x_k.w) | ²]·exp(-λ·var(|FFT(x_k·w)|²)), where P (f) is the power spectral density estimate, x_k is the kth data segment, w is the window function, K is the number of data segments, FFT is the fast Fourier transform, var is the variance function, λ is the smoothing parameter, and|·| represents the modulo operation.

The frequency importance score is I (f, t) =δ·softmax (MLP (P (f, t))) +ε·exp (- |dp (f, t)/dt|/ζ), wherein I (f, t) is the importance score of frequency f at time t, P (f, t) is the temporal spectral density, MLP is a multi-layer perceptron function, dP (f, t)/dt is the time derivative of the spectral density, δ is a static importance weight, ε is a dynamic importance weight, ζ is a time-varying scaling parameter, softmax is a normalized exponential function, and|·indicates taking absolute value operations.

The Bootstrap method comprises the steps of U (theta) =sqrt (1/B Σ (b=1→B) [ theta_b-theta_mean ] ² ]. 1+gamma. Skew (theta_b)), wherein U (theta) is uncertainty estimation of a parameter theta, theta_b is a B-th Bootstrap sample estimation value, theta_mean is an average value, B is the number of Bootstrap samples, skew is a skewness function, and gamma is a skewness adjustment coefficient.

The embodiment realizes the effective capture of the time sequence data periodic mode through frequency domain feature extraction and self-adaptive frequency selection. First, a hanning window function is used for windowing, so that the influence of spectrum leakage effect is reduced. The periodic characteristics of the evaluation signal are calculated through the autocorrelation function, so that a reliable reference basis is provided for frequency selection. And secondly, the Welch method is used for estimating the power spectrum density, so that the stability and the reliability of spectrum estimation are improved. Through the combination of the multi-layer perceptron and the sigmoid function, the self-adaptive evaluation of the frequency importance is realized, and the model can dynamically adjust the weight of the frequency component according to the data characteristics. Finally, the uncertainty of the weight is evaluated by a Bootstrap method, so that the robustness of frequency selection is improved. According to the embodiment, not only is the periodic mode of the data accurately captured, but also the expression efficiency of the features is improved through frequency selection optimization, and high-quality frequency domain features are provided for subsequent time sequence modeling.

According to one aspect of the present application, step S22 is further:

S221, acquiring a multi-scale feature block set, calculating Chebyshev polynomial coefficients of the feature blocks to obtain a polynomial coefficient matrix, constructing an orthogonal basis function based on the coefficients to obtain a feature basis function, and performing feature expansion by using the basis function to obtain a polynomial expansion result.

S222, obtaining a polynomial expansion result and a frequency domain feature matrix, calculating a difference value operation of the polynomial expansion result and the frequency domain feature matrix to obtain a difference value feature matrix, extracting significant differences by sparse coding to obtain sparse difference features, and obtaining key difference features by self-adaptive threshold screening.

S223, acquiring key difference features and a difference feature matrix, constructing a nonlinear mapping network to obtain transformation features, applying a dynamic weighting method to obtain weighted features, and combining original features to obtain combined features.

S224, acquiring a combined feature and a difference feature matrix, performing feature selection to obtain a preferred feature, applying residual correction to obtain a correction feature, and integrating all the features to obtain a residual feature matrix.

In one embodiment of the application, the Chebyshev polynomial expansion is C (x, n) =Σ (k=0→n) [ ψ_k·t_k (x) ]·exp (-k ²/κ), wherein C (x, n) is the n-th order polynomial expansion result, t_k (x) is the k-th order Chebyshev polynomial, ψ_k is the k-th order expansion coefficient, x is the input feature, n is the expansion order, κ is the attenuation parameter, Σ represents the summation operation, and k is the polynomial order index.

The embodiment realizes the accurate depiction of the nonlinear time sequence mode through polynomial expansion and residual feature extraction. Firstly, a combination method of Chebyshev polynomial coefficient calculation and orthogonal basis function construction is adopted, a complete basis function set is provided for feature expansion, and the integrity and orthogonality of feature expression are ensured. The significant difference features are extracted through the sparse coding technology, so that the data redundancy is effectively reduced, and the key mode information is highlighted. And secondly, the transformation characteristics are processed by using a nonlinear mapping network, so that the expression capacity of the model on complex nonlinear relations is enhanced. The self-adaptive combination is carried out on the characteristics by a dynamic weighting method, so that the intelligent adjustment of the contribution degrees of different characteristics is realized. Finally, a dual mechanism of feature selection and residual error correction is adopted, so that the representativeness of the features is ensured, and possibly lost detail information is supplemented. The embodiment not only improves the capturing capability of the nonlinear mode, but also improves the accuracy and the integrity of the feature expression through feature selection and residual error compensation.

As shown in fig. 4, according to an aspect of the present application, step S3 is further:

S31, carrying out linear transformation on the feature data based on the frequency domain feature matrix and the residual feature matrix to obtain a query vector, a key vector and a value vector, calculating a similarity score of the query vector and the key vector, carrying out normalization processing on the similarity score to obtain a normalized score, multiplying the normalized score by the value vector to obtain an attention feature matrix;

S32, connecting the frequency domain feature matrix, the residual feature matrix and the position coding matrix on the feature dimension based on the attention feature matrix, calculating a gating parameter through linear transformation and sigmoid function, and carrying out weighted combination on the frequency domain feature matrix and the residual feature matrix based on the gating parameter to obtain a fusion feature matrix.

In one embodiment of the application, based on the frequency domain feature matrix F and the residual feature matrix R, a multi-headed attention is calculated Ahead _i=softmax ((FW _i ^q)(RW^k)^T/sqrt(d))(RW^v), where W _i ^q is the transform matrix of the query vector, W ^k is the transform matrix of the key vector, W ^v is the transform matrix of the value vector, d is the scaling factor of the feature dimension, an output attention feature matrix A. A gating mechanism is used, g=sigmoid (W [ F; R; P ]), where P is a position code, g is a gating parameter, W [ ] is a weight matrix, and a fusion feature is calculated, H=gΘF+ (1-g) ΘR, Θ is a multiplication operation at the element level, outputting a fusion feature matrix H.

The embodiment realizes intelligent integration of different types of features through the attention mechanism and the feature fusion strategy. First, by using the design of query vectors, key vectors and value vectors, a relevance metric between features is established, and important feature patterns can be automatically identified and highlighted through attention calculations. And secondly, the frequency domain characteristics and the residual characteristics are subjected to self-adaptive weighting by adopting a gating mechanism, so that the problem of information redundancy possibly caused by simple characteristic splicing is avoided. And the dynamic evaluation and selection of the feature importance are realized by calculating the gating parameters through the sigmoid function. Finally, the time sequence dependency relationship of the features can be fully considered by the model through the integration of the position codes. The embodiment not only improves the perception capability of the model on key features, but also enhances the modeling capability on long-term dependence, and simultaneously reduces the negative influence of feature redundancy on the performance of the model.

According to one aspect of the present application, step S31 is further:

S311, acquiring a frequency domain feature matrix and a residual feature matrix, generating a query matrix through linear mapping, generating a key matrix through a projection algorithm, processing by adopting a nonlinear activation function to obtain a value matrix, and performing batch normalization on the three matrices to obtain a standardized query matrix, a standardized key matrix and a standardized value matrix.

S312, a standardized query matrix, a standardized key matrix and a standardized value matrix are obtained, the dot product of the query and the key is calculated to obtain an attention score, the attention score is divided by a scaling factor to obtain a scaling score, a softmax operation is performed on the scaling score to obtain a normalized attention score, and the normalized attention score is multiplied by the standardized value matrix to obtain the multi-head attention feature.

S313, acquiring multi-head attention features, combining the outputs of a plurality of attention heads by adopting a feature aggregation network to obtain aggregation features, processing the aggregation features by using a feedforward neural network to obtain transformation features, and processing by using a layer normalization method to obtain an attention feature matrix.

In one embodiment of the application, attention moment matrix generation is performed by A (Q, K) =softmax (Q.K ^T/sqrt (d) +M). V, wherein A (Q, K) is an attention matrix, Q is a query matrix, K is a key matrix, V is a value matrix, d is a feature dimension, M is a position bias matrix, softmax is a normalized exponential function, ^T represents a matrix transpose operation, matrix multiplication, and sqrt represents a square root operation.

The embodiment realizes intelligent integration of different types of features through a multi-head attention mechanism and a feature fusion strategy. Firstly, a linear mapping method is adopted to generate a query, a key and a value matrix, and a relevance measurement standard between features is constructed. Through batch normalization processing, the numerical stability and convergence speed of attention calculation are improved. And secondly, calculating the correlation between the features by using a dot product concentration scaling mechanism, so that the gradient vanishing problem is avoided. Through softmax operation, the sparsity of attention weights and the generalization capability of the model are enhanced. And finally, adopting double processing of a characteristic aggregation network and a feedforward neural network to realize effective fusion of multi-head attention output. The embodiment not only improves the recognition capability of the model to the key features, but also enhances the robustness and reliability of feature extraction through a multi-head mechanism and a regularization strategy.

According to one aspect of the present application, step S32 is further:

S321, connecting a frequency domain feature matrix, a residual feature matrix and a position coding matrix on the feature dimension based on the attention feature matrix to obtain a connection feature matrix, performing feature dimension reduction on the connection feature matrix to obtain dimension reduction features, adopting nonlinear transformation processing to obtain transformation features based on the dimension reduction features, and calculating feature weights by applying an attention mechanism based on the transformation features to obtain weighted features;

S322, acquiring weighted characteristics, obtaining initial gating parameters through linear transformation and sigmoid function calculation, estimating parameter importance through characteristic entropy based on the initial gating parameters, obtaining parameter importance, updating the initial gating parameters based on the parameter importance, obtaining optimized gating parameters, and obtaining self-adaptive gating parameters by applying dynamic threshold adjustment based on the optimized gating parameters;

S323, weighting the frequency domain feature matrix and the residual feature matrix by utilizing the self-adaptive gating parameters to obtain weighted frequency domain features and weighted residual features, obtaining initial fusion features by adopting a self-adaptive fusion strategy based on the weighted frequency domain features and the weighted residual features, and obtaining a fusion feature matrix by applying residual connection based on the initial fusion features.

In one embodiment of the application, the dynamic gating parameters are G (H, t) =χ.sigmoid (L (H)) +ω.tanh (H (t)). Exp (-E (H)/τ), wherein G (H, t) is the gating parameter of the characteristic H at time t, L (H) is a linear transformation function, H (t) is a time coding function, E (H) is the characteristic entropy, χ is the static gating weight, ω is the dynamic gating weight, τ is the entropy scaling parameter, sigmoid is an S-type activation function, and tanh is a hyperbolic tangent function.

The embodiment realizes the efficient fusion of the multisource features through the self-adaptive gating mechanism and the feature combination strategy. First, feature redundancy is reduced and feature expression is enhanced by a combination of feature reduction and nonlinear transformation. The attention mechanism is adopted to calculate the feature weight, so that the automatic identification and the highlighting of important features are realized. And secondly, estimating the importance of the parameters based on the feature entropy, and providing a reliable reference basis for optimizing the gating parameters. By a dynamic threshold adjustment mechanism, the self-adaptive updating of the gating parameters is realized, so that the model can dynamically adjust the importance weight of the features according to the characteristics of the data. And finally, a combined strategy of residual connection and self-adaptive fusion is adopted, so that the original characteristic information is reserved, and the effective integration of the characteristics is realized. The embodiment not only improves the flexibility and adaptability of feature integration, but also improves the expression capacity and stability of the model through residual connection and self-adaptive fusion.

As shown in fig. 5, according to an aspect of the present application, step S4 is further:

S41, constructing a graph structure reflecting the dependency relationship between time points based on the fusion feature matrix, carrying out message transmission and updating operation on node features in the graph structure to obtain updated node features, and summarizing the updated node features to obtain time sequence aggregation features;

S42, calculating the change rate of the features along with time based on the time sequence aggregation features and the fusion feature matrix to generate a system stability index, extracting the change rule of the features on different time periods through time scale analysis and time window division based on the time sequence aggregation features and the fusion feature matrix to generate a feature time scale, evaluating the diversity and the nonlinearity degree of the features through complexity analysis, entropy calculation and nonlinearity measurement based on the time sequence aggregation features and the fusion feature matrix to obtain a complexity index, and combining the system stability index, the feature time scale and the complexity index to generate a dynamic system feature.

In one embodiment of the application, a timing diagram is constructed based on a fusion feature matrix H, wherein G= (V, E) is a time point set, E is a timing dependent edge set, a diagram attention network is applied, H '_i = σ(Σ_j∈N_i α_ijWh_j, wherein H' _i is an updated feature of a node i and represents a timing aggregation feature calculated by the diagram attention network, sigma () is an activation function, sigma _j∈N_i represents summing up neighbor nodes j of the node i, alpha _ij is an attention weight between the node i and the node j, W is a weight matrix, H _j is a feature vector of the node j, and a timing aggregation feature T is output. And calculating the system stability based on Lyapunov indexes, wherein lambda=lim (t & gtto & infinity) 1/t ln & lt|df/dx & lt|, lambda is Lyapunov index, extracting dynamic characteristics of the system, D & lt= [ lambda, tau, mu ] of which tau is characteristic time scale, mu is system complexity index, and outputting dynamic system characteristics D.

The embodiment realizes deep characterization of complex time sequence relations through graph structure modeling and dynamic feature extraction. Firstly, a time sequence dependency graph is constructed based on fusion characteristics, and the dependency of time sequence data is converted into a connection relation among nodes in a graph structure, so that the time sequence of the data is reserved, and potential association among discontinuous time points can be captured. Secondly, through message transmission and node characteristic updating on the graph structure, the effective aggregation and propagation of time sequence information are realized. And finally, obtaining the comprehensive depiction of the dynamic characteristics of the system by calculating the change rate of the characteristics and the system stability index. The embodiment improves the understanding capability of the model to the complex time sequence mode, and simultaneously enhances the prediction capability of the model to the system state change through the extraction of the dynamic system characteristics.

According to one aspect of the present application, step S41 is further:

S411, acquiring a fusion feature matrix, calculating Euclidean distance between feature vectors to obtain a distance matrix, calculating a time sequence alignment score based on a dynamic time warping algorithm to obtain an alignment score, and constructing time sequence dependency features by combining the distance matrix and the alignment score.

S412, acquiring time sequence dependency characteristics, constructing initial graph connection by using a K nearest neighbor algorithm to obtain an initial graph structure, optimizing a connection relation by using a graph sparsification algorithm to obtain a sparse graph structure, and calculating edge weights to obtain a weighted graph structure.

S413, acquiring a weighted graph structure and a fusion feature matrix, performing graph convolution operation to update node features to obtain updated node features, reserving original information by using jump connection to obtain residual node features, and selectively updating by a gating mechanism to obtain selective node features.

S414, acquiring selective node characteristics, aggregating node information by adopting an attention pooling method to obtain pooled characteristics, extracting key information by using a sequence compression network to obtain compressed characteristics, and combining the characteristics of different scales to obtain time sequence aggregation characteristics.

In one embodiment of the application, the dynamic time warping algorithm is DTW (X, Y) =min { Σ (i=1→n, j=1→m) [ w_k·d (x_i, y_j) ] } and (i) [ w_k·d (x_i, y_j) ]; wherein DTW (X, Y) is the alignment distance of the time series X and Y; D (x_i, y_j) is the local distance of sequence points x_i and y_j; w_k is a path weight coefficient, n, m is a sequence length, local distance calculation adopts D (x_i, y_j) = ||x_i-y_j|| ² +λ·|i-j|, path updating adopts D (i, j) = min { D (i-1, j-1), D (i-1, j), D (i, j-1) } +d (x_i, y_j), wherein D (i, j) is an accumulated distance matrix, and λ is a time penalty coefficient.

The graph structure optimization comprises the steps of R (i, j) =phi-exp (-D (i, j)/sigma) +ζ -cos (pi-S (i, j)), wherein R (i, j) is the relation strength between nodes i and j, D (i, j) is the characteristic distance of the nodes, S (i, j) is time sequence similarity, phi is distance item weight, ζ is similarity item weight, sigma is distance scaling parameter, pi is circumference ratio, exp is an exponential function, and cos is a cosine function.

The graph sparsifying algorithm comprises E (I, j) =w_ij.exp (-D_ij/sigma) ·I (R_ij > tau (d_i, d_j)), wherein E (I, j) is an edge weight, w_ij is an initial weight, D_ij is a node distance, R_ij is a correlation score, sigma is a distance scale parameter, I is an indication function, tau (d_i, d_j) is an adaptive threshold function, and d_i, d_j is a node degree.

The embodiment realizes deep characterization of complex time sequence relations through graph structure modeling and node characteristic updating. Firstly, the combination of Euclidean distance and dynamic time warping algorithm is adopted to accurately describe the similarity relation between time sequence data points. And constructing initial graph connection through a K neighbor algorithm, and providing a good initial state for subsequent graph structure optimization. And secondly, the connection relation is optimized by using a graph sparsification algorithm, so that the complexity of a graph structure is reduced and important dependency relations are highlighted. The node characteristics are updated through graph convolution operation, so that information propagation and aggregation on a graph structure are realized. Finally, the combination of the jump connection and the gating mechanism is adopted, so that the original node information is reserved, and the selective updating of the characteristics is realized. The embodiment not only improves the modeling capability of the complex time sequence dependency relationship, but also improves the expression efficiency and accuracy of the model through the feature selection and information propagation mechanism.

As shown in fig. 6, according to an aspect of the present application, step S5 is further:

s51, calculating parameters of mixed Gaussian distribution based on time sequence aggregation characteristics and dynamic system characteristics, and generating conditional probability distribution according to the parameters of the mixed Gaussian distribution to obtain prediction distribution;

s52, calculating entropy values of the prediction distribution, calculating stability scores based on dynamic system features, and combining the entropy values and the stability scores to obtain confidence indexes.

In one embodiment of the application, a mixed density network is constructed based on a time sequential aggregation feature T and a dynamic system feature D, P (y|x) =Σ _i π_i(x)N(μ_i(x),σ_i ² (X)), where P (y|x) is a conditional probability distribution representing a probability distribution of output Y given an input X, pi _i (X) is a weight of a mixed component, N (μ _i(x),σ_i ² (X)) is a normal distribution of an i-th mixed component, a mean is μ _i (X), a variance is a normal distribution of sigma _i ² (X), and an output prediction distribution P (y|x) is output. The entropy of the prediction is calculated as H= - + -p (y|x) log p (y|x) dy, and the system stability index is calculated as s=exp (- |D|| ₂) based on the dynamic characteristics, and the confidence index C= [ H, s ] is output.

The embodiment realizes the reliability evaluation of the prediction result. Firstly, a mixed Gaussian distribution model is constructed based on time sequence aggregation characteristics and dynamic system characteristics, and accurate conditional probability distribution is obtained through parameter optimization, so that the model can carry out probabilistic prediction on future states. And secondly, the quantitative evaluation of the uncertainty of the prediction result is realized by calculating the entropy value of the prediction distribution and the system stability index. The embodiment not only can provide the point estimation result, but also can provide the predicted confidence interval, and provides more comprehensive reference information for decision.

According to one aspect of the present application, step S51 is further:

s511, acquiring time sequence aggregation characteristics and dynamic system characteristics, calculating characteristic distribution by using kernel density estimation to obtain distribution parameter estimation, optimizing parameters by using a maximum likelihood method to obtain optimized distribution parameters, and obtaining parameter confidence intervals by evaluating the uncertainty of the parameters.

S512, obtaining optimized distribution parameters, calculating a mixing weight through variation inference to obtain an initial mixing weight, optimizing the mixing weight by using an EM algorithm to obtain an optimized mixing weight, and performing non-parametric modeling by using a Dirichlet process to obtain an adaptive mixing weight.

S513, obtaining optimized distribution parameters and self-adaptive mixing weights, constructing a Gaussian mixture model to obtain a mixed probability model, calculating conditional probability distribution to obtain a conditional probability matrix, and integrating all components to obtain prediction distribution.

In one embodiment of the application, the variation inference optimization is Q (z|x) =N (μ (x), Σ (x)). Exp (KL (P|Q)/η), wherein Q (z|x) is a variation posterior distribution, N (μ, Σ) is a Gaussian distribution, μ (x) is a mean function, Σ (x) is a covariance function, P is a priori distribution, KL is KL divergence, η is a temperature parameter, exp is an exponential function, z is a hidden variable, and x is an observation variable.

The EM algorithm is that Q (theta|theta_t) =E_z [ log P (X, z|theta) ]+lambda.R (theta), wherein Q (theta|theta_t) is a Q function, theta is a model parameter, theta_t is a parameter of the t-th iteration, X is observation data, Z is a hidden variable, P (X, z|theta) is a complete data likelihood function, E_z is a desire about Z, lambda is a regularization coefficient, R (theta) is a regularization term, and parameter updating adopts theta_ (t+1) =argmax_theta { Q (theta|theta_t) & exp (-mu. |theta-theta-theta_t| ²) }.

Dirichlet procedure DP (G_0, eta) =lim (K→infinity) Σ (k=1→K) [ pi_k·delta (θ_k) ], where G_0 is the basis distribution, eta is the concentration parameter, pi_k is the mixing weight, θ_k is the component parameter, delta is the Dirac function, weight generation pi_k=v_k·pi (i=1→k-1) (1-v_i), where v_k to Beta (1, eta).

The embodiment realizes accurate estimation of the prediction distribution through mixed probability model and variation inference. Firstly, modeling is carried out on characteristic distribution by adopting a combination of kernel density estimation and a maximum likelihood method, so that model deviation possibly brought by parameter distribution assumption is avoided. By evaluating the uncertainty of the parameters, the reliability and stability of the distribution estimation are improved. And secondly, optimizing the mixing weight by using a variation inference method, so that efficient learning of model parameters is realized. The convergence and accuracy of the parameter estimation are ensured by iteratively optimizing the mixing weights by an Expectation Maximization (EM) algorithm. Finally, a Dirichlet process is adopted to carry out non-parametric modeling, so that the expression capability of the model on complex distribution is enhanced. The embodiment not only can accurately describe the distribution characteristics of the data, but also improves the flexibility and the calculation efficiency of the model through non-parametric modeling and variation inference. By considering the uncertainty of the parameters, the model can provide more reliable prediction results and confidence intervals, and an important reference basis is provided for decision making in practical application.

According to one aspect of the present application, step S52 is further:

s521, obtaining prediction distribution, calculating an entropy value of the distribution to obtain an initial entropy value, estimating a confidence interval of entropy by adopting a bootstrap method to obtain an entropy confidence interval, generating an entropy uncertainty index based on interval information, and obtaining a multi-scale entropy value by combining multi-scale analysis.

S522, acquiring dynamic system features and multi-scale entropy values, calculating norms of the features to obtain feature norms, obtaining initial stability scores by using exponential function transformation, obtaining smooth stability scores by applying self-adaptive smoothing, and obtaining system stability indexes by combining the dynamic features of the system.

S523, acquiring an entropy uncertainty index and a system stability index, constructing a combined evaluation model to obtain a combined evaluation score, applying a weighted average strategy to obtain weighted confidence coefficient, combining historical evaluation information to obtain time sequence confidence coefficient, and finally integrating to obtain a confidence coefficient index.

In one embodiment of the application, the adaptive smoothing algorithm is S (x, h) =Σ (i) [ (x_i·k ((x-x_i)/h (x)) ] ]/Σ (i) [ K ((x-x_i)/h (x)) ] ], where S (x, h) is the smoothing result, x_i is the data point, K is the kernel function, h (x) is the local bandwidth function, bandwidth update is h (x) =h_0· (1+γ·|dx/dt|) ^-α, where h_0 is the base bandwidth, γ is the adjustment coefficient, and α is the attenuation index.

The combined evaluation model is E (x) =w_1.f_1 (x) +w_2.f_2 (x) +.+w_n.f_n (x), wherein E (x) is a combined evaluation score, f_i is a single evaluation index, w_i is an adaptive weight, weight update is w_i (t+1) =w_i (t) ·exp (η·r_i (t))/(Σ_j [ w_j (t) ·exp (η·r_j (t)) ], wherein r_i is an expression score of index i, and η is a learning rate.

The embodiment realizes the comprehensive evaluation of the prediction result through entropy value calculation and stability analysis. Firstly, the confidence interval of entropy is estimated by calculating the entropy value of prediction distribution and a bootstrap method, so that the prediction uncertainty is reliably quantified. Evaluating the variation characteristics of entropy values on different time scales based on a multi-scale analysis method provides a more comprehensive uncertainty measure. And secondly, constructing a quantitative evaluation index of the system stability by calculating characteristic norms and exponential function transformation. And the adaptive smoothing processing is adopted, so that the noise influence in the stability evaluation is reduced. Finally, the comprehensive integration of the entropy value and the stability index is realized by combining the evaluation model and the weighted average strategy. The embodiment not only provides the credibility quantification of the prediction result, but also enhances the reliability of the evaluation result through multi-dimensional stability analysis. By considering the time sequence characteristics, the evaluation result can reflect the dynamic change of the system state, and more comprehensive decision support is provided for the actual application of the model. Meanwhile, by the self-adaptive weight updating mechanism, the evaluation index can be dynamically adjusted according to the change of the system state, and the timeliness and the accuracy of the evaluation result are improved.

In another embodiment of the present application, step S11 may further include reading an original time sequence from the database, performing a min-max normalization process on the data to obtain normalized data, calculating a sliding average value of the normalized data to obtain smoothed data, performing cosine similarity calculation on adjacent time points in the smoothed data to obtain a similarity sequence, performing gaussian kernel function transformation on the similarity sequence to obtain kernel transformation data, and calculating an optimal partition by combining the kernel transformation data with an original sampling interval to obtain an adaptive partition size sequence by a dynamic programming algorithm.

Step S12 can also be that an adaptive block size sequence and an original time sequence data sequence are obtained, an autocorrelation function of each data block is calculated to obtain a correlation coefficient sequence, an optimal convolution kernel size is estimated through a variable decibel leaf method based on the correlation coefficient sequence to obtain a kernel size parameter, the kernel size parameter is input into a Gaussian-Zeevi mol (Gaussian-Xavier) initialization algorithm to generate an initial convolution kernel, an L1 regularization constraint is applied to the initial convolution kernel to obtain a sparse convolution kernel, a one-dimensional convolution operation is performed on the data block by using the sparse convolution kernel to obtain a convolution feature map, a LeakyReLU activation function is applied to the convolution feature map to obtain an activation feature map, and the activation feature map is processed through a spatial pyramid pooling network to obtain a multi-scale feature block set.

The Gaussian-Xavier initialization algorithm is specifically that W_l=ζ.N (0, sqrt (2/(n_in+n_out))) M_l, wherein W_l is a layer-I weight matrix, N (0, sigma) is Gaussian distribution with mean value of 0 and standard deviation of sigma, n_in is an input dimension, n_out is an output dimension, ζ is a scaling factor, and M_l is a mask matrix.

In another embodiment of the present application, step S3 may further be:

S3a, acquiring a frequency domain feature matrix and a residual feature matrix, generating a query, a key and a value matrix through linear projection to obtain an initial attention tensor, performing scaling dot-product (scaled dot-product) attention calculation on the initial attention tensor to obtain an attention score, regularizing the attention score through a dropping method (dropout) to obtain a regularized score, calculating a weighted sum based on the regularized score and the value matrix to obtain a weighted feature, and normalizing an application layer of the weighted feature to obtain the attention feature matrix.

S3b, obtaining a frequency domain feature matrix, a residual feature matrix and a position coding matrix, mapping the three matrices to the same feature space through a nonlinear projection layer to obtain unified features, calculating mutual information of the unified features to obtain feature correlation, generating self-adaptive gating parameters through a soft-shrink (soft-shrank) algorithm based on the feature correlation to obtain gating coefficients, multiplying the gating coefficients with the original feature matrix to obtain weighted features, and performing residual connection on the weighted features to obtain a fusion feature matrix.

The Soft-springage algorithm is specifically characterized by comprising the steps of S (x, lambda) =sign (x) & max (|x| -lambda (1+alpha|dx/dt|) and 0), wherein S (x, lambda) is a contracted value, x is an input value, lambda is a basic threshold parameter, alpha is a dynamic adjustment coefficient, dx/dt is a change rate, sign is a sign function, max is a maximum value operation, and|DEG| represents an absolute value.

In another embodiment of the present application, step S5 may further be:

S5a, acquiring a time sequence aggregation feature and a dynamic system feature, generating an initial Gaussian mixture distribution parameter through random sampling to obtain an initial distribution parameter, optimizing the initial distribution parameter by using a variation expectation maximization algorithm to obtain an optimized distribution parameter, constructing a conditional Gaussian mixture model based on the optimized distribution parameter to obtain a mixture probability model, and applying Monte Carlo sampling to the mixture probability model to obtain a prediction distribution.

The variation expectation maximization algorithm is specifically that L (q) =E_q [ log P (X, Z) ] -E_q [ log q (Z) ]+alpha.H (q), wherein L (q) is a variation lower bound, q (Z) is a variation posterior distribution, P (X, Z) is a joint distribution, E_q is a desire about q, H (q) is an entropy term, alpha is an entropy weight coefficient, variation distribution update: q (Z_i). Exp (E_ { q (Z_ ¬ i) } [ log P (X, Z) ]) +beta.KL (q_old|q_new), Z_i is a potential variable, E_ { q (Z_ ¬ i) } is a desire about q (Z_ ¬ i), q_old is a variation posterior distribution, and q_old is a variation coefficient, and Z_new is a variation coefficient.

S5b, obtaining prediction distribution and dynamic system characteristics, calculating differential entropy of the prediction distribution to obtain an entropy value sequence, estimating a confidence interval of the entropy value sequence by using a block bootstrapping method to obtain entropy confidence, calculating Lyapunov indexes of the dynamic system characteristics to obtain stability indexes, and combining the entropy confidence and the stability indexes through an additive model to obtain confidence indexes.

The block bootstrap algorithm specifically comprises the steps of B (x, l) =Σ (i=1- & gt n/l) [ x_ (b_i: b_i+l) & w_i ] & wherein B (x, l) is a block bootstrap sample, x is an original sequence, l is a block length, b_i is a block starting position, w_i is a block weight, n is a sequence length, and block length adaptive selection is that i=argmin { MSE (B (x, l)) +ρ·var (l) }, wherein MSE is a mean square error, l is an optimal block length, ρ is a penalty coefficient, and var (l) is a variance of the block length.

According to one aspect of the present application, a modeling method of body-building agent time series data based on frequency domain learning includes the steps of:

And step 1, input processing and feature extraction.

Step 1.1, a multi-scale convolution module. The input data x is fed into a multi-scale convolution module that captures local features in the data through multi-scale convolution operations. Specifically, for input x e R ^B×Cin×D, B is the batch size, cin is the number of input channels, d= (D1, D2,., dn) is the size of the spatial dimension, the module processes the input data using multiple convolution layers Conv (x), resulting in a multi-level feature of x _multi-scale = MultiScaleConv (x), where MultiScale is a multi-scale convolution module.

Step 1.2, a blocking module and position coding. To further enhance the representation of spatial information, the input data is partitioned by a specified block size (M1, M2,..mn) using a partitioning module. Local feature extraction is independently performed in each Block, and the segmented data are expressed as x _blocks =block (x, (M1, M2, mn)), wherein Block is a Block module; thereafter, a position code (positional encoding) is added to the features of each position to ensure that the model effectively utilizes the position information of the input data. The position code is generated by a function f (q), where q represents the coordinates of each position in the input data, x _encoded= x_blocks +f (q).

And 1.3, convolving a residual Fourier layer. On the basis of the convolution operation, modeling the frequency domain information by using a convolution residual Fourier layer. The output y _out of the convolution residual Fourier layer is that the output of the Fourier layer y _fft is fused with the high-frequency characteristic y _conv extracted by the convolution layer through residual connection, wherein y _out=y_fft+y_conv +b is a bias term.

And 2, feature fusion.

Step 2.1, self-attention. The processed multi-scale features and fourier domain convolution features are fed into a multi-headed self-attention module. The module establishes the dependency relationship among different features through a multi-head attention mechanism and learns the interaction among the features. Specifically, given feature matrix X _features∈R^B×N×D, the multi-headed attentiveness mechanism is calculated by the following formula Y _attn=MultiHeadAttention(X_features), where MultiHeadAttention is the multi-headed self-attentiveness mechanism.

Step 2.2, cross attention. On this basis, the present embodiment also incorporates a cross-attention mechanism, and interactions with other underlying variables further enhance the ability of the feature representation. Cross-attention mechanism the cross-attention output is calculated by the following formula Y _cross=CrossAttention(X_features,Z_latent, where Z _latent is a representation of the underlying variable and CrossAttention is the cross-attention mechanism.

And 3, time sequence aggregation and output generation. After feature extraction and attention mechanism processing, the time series information of the input data is aggregated by a time series aggregation module. The module aggregates the processed features according to time dimension and extracts time sequence dependency relationship in time sequence. Assuming that the processed feature sequence is Y _features = [ Y1, Y2, ], yT ], the time-series aggregation operation may be denoted as Y _aggregated=TimeAggregation(Y_features), where TimeAggregation is a time-series aggregation module.

Conventional dynamic system modeling generally relies on discretization and numerical solution of partial differential equations, which, while effective in describing the system variation to some extent, are computationally expensive and time inefficient when dealing with high dimensional space. According to the method, the time sequence data are converted into the frequency domain space, the periodicity and the frequency characteristics in the data can be effectively identified and utilized through frequency domain representation learning, the capability of the model for processing space-time change is improved, and the redundant calculation requirement in the high-dimensional space is reduced. In the face of high noise or lack of sufficient data, the frequency domain-based characterization learning method has stronger adaptability and practicability than the traditional reinforcement learning. The frequency domain representation captures the regularity of the data through the frequency characteristics, so that the key characteristics of the system can be effectively identified even under the condition that the data is scarce or the noise is large, and the robustness of the model is improved.

In one embodiment of the application, multiscale convolution is introduced to expand the receptive field of the convolution kernel, and features of different levels are extracted in the process of partitioning the data, so that the convolution kernel can cover a wider area in the input data, and more context information is captured while the control parameters are at a lower level. In the multi-scale convolution, elements in the convolution kernel are separated, provided that input data is x (t), the convolution kernel is w (K), the scale is R/y (t) = (x×w) (t) = Σ _k=0 ^K-1 x (t-r.k) ·w (K), the size of the receptive field is different from that of the convolution kernel in the standard convolution, the size of the input data is N, the size of the convolution kernel is K, the stride is S, the scale is R, the receptive field R 'can be expressed as R' = (K-1) ·r+1, and in order to ensure that the size of an output characteristic map after the convolution operation is consistent with the input data (or at least keeps a certain size), the input data needs to be filled, and the filling amount needs to be adjusted according to the scale. For different convolution kernels K, the filling amount P of the dilation convolution can be calculated by the following formula P= -L (K-1). R/2L, where L is the rounded down sign. To extract features on multiple scales simultaneously, we assume that there are N different scales r1, r 2. Finally, the outputs of all the convolution layers are spliced in the channel dimension, thereby combining features of different scales. For example, if four scales [ r1, r2, r3, r4] are chosen, the output of the convolution operation may be expressed as y (t) = [ (x. W _r1)(t),(x*w_r2)(t),(x*w_r3)(t),(x*w_r4) (t) ], a stitching operation may be performed in the channel dimension, and the outputs of each scale may be combined together to form a richer feature representation.

In order to fully utilize the features of different frequencies, the present embodiment combines fourier transform, convolution operation, and residual connection, with the goal of optimizing the feature extraction process of the fourier nerve operator. For a d-dimensional input signal x, the fourier transform results in x '(l) = _Rdx(q)e^-i2πl·q dq, where x' (l) is the frequency domain representation of the signal, x (q) is the original signal, l is the vector in frequency space, and q is the position in the space-time domain. After fourier transformation, only the low frequency part in the frequency domain is retained, l _trunc is the threshold for the cut-off frequency, the low frequency cut-off operation is as follows: x "(l) =x' (l), if l _trunc;x＇＇(l)=0,if∣l∣＞l_trunc. In the frequency domain, the convolution operation can be implemented by element-wise multiplication, for the fourier transform x 'of the input signal and the fourier transform w' of the convolution kernel, the frequency domain convolution operation is denoted as y '(l) =x' (l) ·w '(l), once the convolution operation is completed in the frequency domain, the result needs to be converted back to the time-space domain for subsequent processing, y (q) =f ^-1(y＇(l))= ∫_Rd y＇(l) e^i2πl·q dl, where y (q) is the signal in the space-time domain, F ^-1 represents the inverse fourier transform, and y' (l) is the signal in the frequency domain. In addition to processing the low frequency part in the frequency domain, the high frequency part still needs to be processed by a convolution layer, y _conv =conv (x). In order to improve the expressive power of the model and speed up the training process, residual connection is used. The convolved output is obtained by directly summing the input signals y _out=y_fft+y_conv +b. In summary, the loss function includes a reconstruction loss and a feature matching loss. In the reconstruction loss, y ' is a predicted output feature block (patches), y _target is a target feature patches, B is a batch size, s=n _i=1 ^NG_i represents a total spatial dimension, G _i is a spatial dimension, y ' _bs is a model predicted feature block, y _target,bs is a target feature block, features of different scales are matched to target features in the feature matching loss, namely, the feature matching loss L_Fourier=(1/B·N_f)∑_b=1 ^B∑_f=1 ^Nf∣∣F＇_f,b,f-F_f,target,b,f∣∣² ₂;, F ' _f,b,f is a frequency domain spatial feature of model output, nf is a feature quantity, and F _f,target,b,f is a target frequency domain spatial feature. The complete loss can be expressed as l=λl _{reconstruction}+L_Fourier, λ being the weight coefficient.

The invention realizes intelligent modeling and prediction of complex time sequence data through frequency domain learning and fusion of the body-building agent. Firstly, a combination method of self-adaptive blocking and multi-scale convolution is adopted, so that efficient feature extraction of time sequence data is realized. The integrity and accuracy of the feature expression are ensured through frequency domain transformation and residual error compensation mechanisms. And secondly, based on an attention mechanism and a feature fusion strategy, intelligent integration of different types of features is realized. Through graph structure modeling and dynamic feature extraction, a complex time sequence dependency relationship is deeply characterized. And finally, adopting a mixed probability model and an uncertainty quantization method to realize the reliability evaluation of the prediction result. The method not only improves the accuracy and efficiency of time sequence data modeling, but also enhances the reliability of the prediction result through uncertainty quantization. The method has wide application value in the fields of industrial production, financial prediction, intelligent manufacturing and the like, and can provide powerful support for state prediction and decision optimization of a complex system. By introducing the intelligent body, the model has the sensing and adapting capability to the environment, and the robustness and generalization performance of the system in practical application are further improved. The invention realizes effective conversion from data to knowledge through multi-level feature extraction and fusion, and provides reliable technical support for intelligent decision.

The preferred embodiments of the present invention have been described in detail above, but the present invention is not limited to the specific details of the above embodiments, and various equivalent changes can be made to the technical solution of the present invention within the scope of the technical concept of the present invention, and all the equivalent changes belong to the protection scope of the present invention.

Claims

1. The modeling method of the self-contained agent time sequence data based on the frequency domain learning is characterized by comprising the following steps:

S1, acquiring an original time sequence data, determining an adaptive block size sequence according to the time relation of adjacent data points, performing block processing on the original time sequence data based on the adaptive block size sequence to obtain a data block, extracting features from the data block by adopting a multi-scale convolution method, generating corresponding position coding information to obtain a multi-scale feature block set and a position coding matrix, wherein the method is used in the field of industrial production, and the original time sequence data comprises positions, dynamic features, frequency features, a periodic mode and long-term dependence relations in a time-space domain;

2. The method for modeling physical agent time series data based on frequency domain learning as claimed in claim 1, wherein the step S1 is further:

3. The method for modeling physical agent time series data based on frequency domain learning as claimed in claim 2, wherein the step S2 is further:

4. The method for modeling physical agent time series data based on frequency domain learning as claimed in claim 3, wherein the step S3 is further:

5. The method for modeling physical agent time series data based on frequency domain learning as claimed in claim 4, wherein the step S4 is further:

6. The method for modeling physical agent time series data based on frequency domain learning as claimed in claim 5, wherein the step S5 is further:

7. The method for modeling physical agent time series data based on frequency domain learning as claimed in claim 6, wherein step S11 is further:

S111, acquiring an original time sequence data sequence from a database, segmenting the original time sequence data sequence by adopting a sliding window method, removing outliers in each segment of data to obtain segmented data, performing Z-score standardization processing on the segmented data to obtain standardized time sequence data, and applying wavelet transformation to the standardized time sequence data to remove high-frequency noise to obtain noise reduction standardized data and time sequence window data;

8. The method of modeling physical agent time series data based on frequency domain learning as claimed in claim 6, wherein the step S13 is further:

9. The method of modeling physical agent time series data based on frequency domain learning as claimed in claim 6, wherein step S21 is further:

10. The method of modeling physical agent time series data based on frequency domain learning as claimed in claim 6, wherein the step S32 is further: