Disclosure of Invention
The embodiment of the specification provides a model training method, a wind control executing method, a device and electronic equipment, which can accurately analyze potential nodes affected by activities in a service platform, so that corresponding wind control measures are formulated and executed in advance to avoid the loss to the platform and users caused by sudden problems of an active link.
In order to solve the above technical problems, the embodiments of the present specification are implemented as follows:
in a first aspect, a model training method is provided, including:
acquiring time sequence data of a first historical sample set of target activities corresponding to a first type of monitoring items, wherein the first type of monitoring items are determined to be influenced by the target activities;
Performing similarity clustering on time sequence data of the first historical sample set corresponding to the first type of monitoring items to obtain a plurality of clusters;
Acquiring time sequence data of a second historical sample set of target activities corresponding to a second type of monitoring items, wherein the second type of monitoring items are different from the first type of monitoring items, and the samples of the second historical sample set are marked with training labels which are used for indicating whether the target activities are affected or not;
determining the cluster of time sequence data of a second type of monitoring item corresponding to a second historical sample set in the clusters, and constructing training features of the second historical sample set based on cluster identification of the cluster and a corresponding similarity distance;
and training a prediction model based on the training labels and training features of the second historical sample set, wherein the prediction model is used for predicting whether a monitoring item of an active link is influenced by the target activity.
In a second aspect, a wind control execution method is provided, including:
Acquiring time sequence data of a target monitoring item corresponding to a historical sample set to be predicted of a target activity;
Determining a cluster of time sequence data of the target monitoring item corresponding to the historical sample set to be predicted in a plurality of clusters, and constructing an input characteristic of the historical sample set to be predicted based on a cluster identifier of the cluster and a corresponding similarity distance, wherein the clusters are obtained by performing similarity clustering on the time sequence data of a first type of monitoring item corresponding to a first historical sample set of the target activity, and the first type of monitoring item is determined to be influenced by the target activity;
Inputting the input characteristics of the history sample set to be predicted into a prediction model to obtain a prediction result of whether the target monitoring item is influenced by the target activity, wherein the prediction model is obtained by training based on training labels and training characteristics of a second history sample set of the target activity, the training characteristics of the second history sample set are obtained by constructing cluster identifiers and corresponding similarity distances of time sequence data of a second type of monitoring item corresponding to the second history sample set in clusters, and the target monitoring item and the second type of monitoring item are different from the first type of monitoring item;
And if the prediction result indicates that the target monitoring item is influenced by the target activity, executing a wind control decision related to the target activity based on the monitoring data of the target monitoring item corresponding to the target activity.
In a third aspect, a model training apparatus is provided, including:
The system comprises a first acquisition module, a second acquisition module and a first control module, wherein the first acquisition module acquires time sequence data of a first historical sample set of target activities corresponding to a first type of monitoring items, and the first type of monitoring items are determined to be influenced by the target activities;
The first clustering calculation module is used for carrying out similarity clustering on the time sequence data of the first historical sample set corresponding to the first type of monitoring items to obtain a plurality of clusters;
The second acquisition module is used for acquiring time sequence data of a second type of monitoring items corresponding to a second historical sample set of the target activity, wherein the second type of monitoring items are different from the first type of monitoring items, and samples of the second historical sample set are marked with training labels which are used for indicating whether the target activity is affected or not;
The characteristic construction module is used for determining the cluster of the time sequence data of the second type monitoring item corresponding to the second historical sample set in the clusters, and constructing training characteristics of the second historical sample set based on the cluster identification of the cluster and the corresponding similarity distance;
And the model training module is used for training a prediction model based on the training labels and the training characteristics of the second historical sample set, wherein the prediction model is used for predicting whether the monitoring item of the active link is influenced by the target activity.
In a fourth aspect, an electronic device is presented comprising a processor, and a memory arranged to store computer executable instructions that, when executed, cause the processor to:
acquiring time sequence data of a first historical sample set of target activities corresponding to a first type of monitoring items, wherein the first type of monitoring items are determined to be influenced by the target activities;
Performing similarity clustering on time sequence data of the first historical sample set corresponding to the first type of monitoring items to obtain a plurality of clusters;
Acquiring time sequence data of a second historical sample set of target activities corresponding to a second type of monitoring items, wherein the second type of monitoring items are different from the first type of monitoring items, and the samples of the second historical sample set are marked with training labels which are used for indicating whether the target activities are affected or not;
determining the cluster of time sequence data of a second type of monitoring item corresponding to a second historical sample set in the clusters, and constructing training features of the second historical sample set based on cluster identification of the cluster and a corresponding similarity distance;
and training a prediction model based on the training labels and training features of the second historical sample set, wherein the prediction model is used for predicting whether a monitoring item of an active link is influenced by the target activity.
In a fifth aspect, a computer-readable storage medium storing one or more programs that, when executed by an electronic device comprising a plurality of application programs, cause the electronic device to:
acquiring time sequence data of a first historical sample set of target activities corresponding to a first type of monitoring items, wherein the first type of monitoring items are determined to be influenced by the target activities;
Performing similarity clustering on time sequence data of the first historical sample set corresponding to the first type of monitoring items to obtain a plurality of clusters;
Acquiring time sequence data of a second historical sample set of target activities corresponding to a second type of monitoring items, wherein the second type of monitoring items are different from the first type of monitoring items, and the samples of the second historical sample set are marked with training labels which are used for indicating whether the target activities are affected or not;
determining the cluster of time sequence data of a second type of monitoring item corresponding to a second historical sample set in the clusters, and constructing training features of the second historical sample set based on cluster identification of the cluster and a corresponding similarity distance;
and training a prediction model based on the training labels and training features of the second historical sample set, wherein the prediction model is used for predicting whether a monitoring item of an active link is influenced by the target activity.
In a sixth aspect, a wind control executing apparatus is provided, including:
the third acquisition module is used for acquiring time sequence data of a target monitoring item corresponding to a historical sample set to be predicted of the target activity;
The second clustering calculation module is used for determining clusters of time sequence data of the target monitoring items corresponding to the historical sample set to be predicted in a plurality of clusters, and constructing input features of the historical sample set to be predicted based on cluster identifiers of the clusters and corresponding similarity distances, wherein the clusters are obtained by performing similarity clustering on the time sequence data of the first type monitoring items corresponding to the first historical sample set of the target activity, and the first type monitoring items are determined to be influenced by the target activity;
The model prediction module is used for inputting the input characteristics of the historical sample set to be predicted into a prediction model to obtain a prediction result of whether the target monitoring item is influenced by the target activity, wherein the prediction model is obtained by training based on training labels and training characteristics of a second historical sample set of the target activity, the training characteristics of the second historical sample set are obtained by constructing cluster identifiers and corresponding similarity distances of time sequence data of a second type of monitoring item corresponding to the second historical sample set in clusters, and the target monitoring item and the second type of monitoring item are different from the first type of monitoring item;
And the wind control execution module is used for executing wind control decision related to the target activity based on the monitoring data of the target activity corresponding to the target monitoring item if the prediction result indicates that the target monitoring item is influenced by the target activity.
In a seventh aspect, an electronic device is presented comprising a processor, and a memory arranged to store computer executable instructions that, when executed, cause the processor to:
Acquiring time sequence data of a target monitoring item corresponding to a historical sample set to be predicted of a target activity;
Determining a cluster of time sequence data of the target monitoring item corresponding to the historical sample set to be predicted in a plurality of clusters, and constructing an input characteristic of the historical sample set to be predicted based on a cluster identifier of the cluster and a corresponding similarity distance, wherein the clusters are obtained by performing similarity clustering on the time sequence data of a first type of monitoring item corresponding to a first historical sample set of the target activity, and the first type of monitoring item is determined to be influenced by the target activity;
Inputting the input characteristics of the history sample set to be predicted into a prediction model to obtain a prediction result of whether the target monitoring item is influenced by the target activity, wherein the prediction model is obtained by training based on training labels and training characteristics of a second history sample set of the target activity, the training characteristics of the second history sample set are obtained by constructing cluster identifiers and corresponding similarity distances of time sequence data of a second type of monitoring item corresponding to the second history sample set in clusters, and the target monitoring item and the second type of monitoring item are different from the first type of monitoring item;
And if the prediction result indicates that the target monitoring item is influenced by the target activity, executing a wind control decision related to the target activity based on the monitoring data of the target monitoring item corresponding to the target activity.
In an eighth aspect, a computer-readable storage medium is provided, the computer-readable storage medium storing one or more programs that, when executed by an electronic device that includes a plurality of application programs, cause the electronic device to:
Acquiring time sequence data of a target monitoring item corresponding to a historical sample set to be predicted of a target activity;
Determining a cluster of time sequence data of the target monitoring item corresponding to the historical sample set to be predicted in a plurality of clusters, and constructing an input characteristic of the historical sample set to be predicted based on a cluster identifier of the cluster and a corresponding similarity distance, wherein the clusters are obtained by performing similarity clustering on the time sequence data of a first type of monitoring item corresponding to a first historical sample set of the target activity, and the first type of monitoring item is determined to be influenced by the target activity;
Inputting the input characteristics of the history sample set to be predicted into a prediction model to obtain a prediction result of whether the target monitoring item is influenced by the target activity, wherein the prediction model is obtained by training based on training labels and training characteristics of a second history sample set of the target activity, the training characteristics of the second history sample set are obtained by constructing cluster identifiers and corresponding similarity distances of time sequence data of a second type of monitoring item corresponding to the second history sample set in clusters, and the target monitoring item and the second type of monitoring item are different from the first type of monitoring item;
And if the prediction result indicates that the target monitoring item is influenced by the target activity, executing a wind control decision related to the target activity based on the monitoring data of the target monitoring item corresponding to the target activity.
Based on the scheme of the embodiment of the specification, a prediction model capable of measuring whether a certain monitoring item in the service platform is affected by activity can be trained. In the training process, a small number of first historical sample sets and second historical sample sets are prepared in advance, similarity clustering is firstly carried out on the basis of the time sequence data of the first type of monitoring items which are determined to be affected by the activity and correspond to the first historical sample sets, a plurality of clusters serving as reference standards are obtained, then the time sequence data of the second type of monitoring items which correspond to the second historical sample sets are taken as training sets, the clusters of the time sequence data in the training sets are determined, training features are built on the basis of the identification of the clusters of the training sets and the similarity distance of the corresponding clusters, finally, the prediction model is subjected to supervised training on the basis of the training features of the training sets and the pre-labeled training labels, and therefore when the prediction model predicts whether a certain target monitoring item is affected by the activity or not, whether the target monitoring item is affected by the activity or not can be analyzed on the basis of the cluster identification of the corresponding time sequence data of the target monitoring item in the activity and the corresponding similarity distance. In addition, in the specific application, after the prediction model determines that the target monitoring item is affected by the activity, the wind control decision related to the activity is executed based on the monitoring data of the activity corresponding to the target monitoring item, so that the loss of the activity to the service platform and the user caused by the problem of the target monitoring item is avoided in time. In the whole scheme, training features of the prediction model are composed of cluster identifications and similarity distances of time sequence data relative to reference standards, so that potential influence surfaces of activities relative to a service platform are classified and analyzed from a time sequence angle, service features under specific service scenes are not relied on, the method can be applied to different service scenes, and the scheme has good universality and mobility.
Detailed Description
For the purposes, technical solutions and advantages of this document, the technical solutions of this specification will be clearly and completely described below with reference to specific embodiments of this specification and corresponding drawings. It will be apparent that the embodiments described are only some, but not all, of the embodiments of this document. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.
As described above, various e-commerce platforms are currently frequently performing activities related to their own business. With the development of information technology, the architecture of these e-commerce platforms is more and more complex, and applications, interfaces, remote invoker programs and the like related to each activity are more and more hidden, so that it is difficult to manually analyze which are affected. If the platform lacks evaluation measures for the active link impact surface, once a significant problem occurs, great losses can be brought to the platform and the user.
Therefore, the specification aims to provide an artificial intelligence scheme based on a model analysis activity influence surface based on a deep learning technology, and potential nodes influenced by the activity in a service platform can be accurately excavated, so that wind control measures related to the activity are formulated aiming at the influenced nodes, and the purpose of automatic early warning is achieved.
Specifically, fig. 1 is a flowchart of a model training method provided in an embodiment of the present disclosure, and the method shown in fig. 2 may be executed by a corresponding apparatus, and specifically includes the following steps:
S102, time sequence data of a first historical sample set of target activities corresponding to a first type of monitoring items are obtained, wherein the first type of monitoring items are determined to be influenced by the target activities.
In this embodiment, a buried point may be set for some high-value nodes in the service platform, such as an application, an interface, a remote calling program, and a data packet, to be used as a monitoring item. And summarizing time sequence data monitored by the buried points before and after the target activities are carried out, so that a historical sample of the target activities can be obtained.
It should be understood that the time sequence data monitored in the buried point represents the change brought by the target activity before and after the target activity is developed, and if the change is obvious, the monitoring corresponding to the buried point is influenced by the target activity. It should be noted that the time sequence data corresponding to different monitoring items depends on the specific wind control requirement of the service platform.
Here, a certain target interface in the service platform is taken as an example. After the target activity is the activity newly developed by the service platform, in order to detect whether the target interface can cope with the flow pressure of the target activity, the target interface is provided with a buried point for monitoring flow related time sequence data. Correspondingly, if the flow fluctuation of the target interface is obvious during the period before and after the target activity is carried out, the target interface is indicated to be a relatively important node in the target activity. For the service platform, the wind control decision related to the traffic load can be embodied according to the bandwidth resource allocated by the target interface. For example, the target interface may be overloaded during target activity, and traffic resources may be increased.
And further, a certain target application in the service platform is taken as an example. Assuming that the target activity developed by the service platform is released in the target application, in order to detect the reaction of the user of the target application to the target activity, a buried point for monitoring and evaluating the related time sequence data can be set for the target reference. Correspondingly, if the evaluation score of the target application is obviously changed before and after the target activity is developed, the target application is a relatively important downloading platform in the target activity. For the service platform, the wind control decision related to the activity quality can be made according to the user evaluation of the target application. For example, if the user evaluation corresponding to the target application drops by more than a preset threshold value during the target activity, the service quality of the target activity can be actively improved.
Specifically, the first type of monitoring item is a monitoring item in the service platform that has been qualitatively affected by the target activity. In this embodiment, the first type of monitoring item is used as a reference standard of the monitoring item affected by the target activity, belongs to a priori knowledge, and can be obtained by artificial analysis or is obtained by mining based on other artificial intelligence technologies, and is not particularly limited herein.
S104, performing similarity clustering on time sequence data corresponding to the first type of monitoring items in the first historical sample set to obtain a plurality of clusters.
As described above, the present embodiment uses the first type of monitoring item as a reference that is affected by the target activity. After clustering the time series data of the first type monitoring items corresponding to the historical sample set, each obtained cluster is equivalent to classifying the monitoring items which are known to be affected by the target activity.
Typically, there will be some data fluctuation correlation between the nodes in the active link. For the monitoring item unknown whether to be influenced by the target activity, the similarity distance between the corresponding time sequence data and the clusters can be used as a reference basis for analyzing whether the target activity is influenced.
The method of similarity clustering is described below.
Specifically, the embodiment presets a plurality of feature dimensions related to wind control decision for the target activity. For example, in the wind control decision scenario of resource allocation, different types of resources such as a processor, a memory, a loan and the like can be used as feature dimensions, and sub-time sequence data under the subdivision can be extracted according to each feature dimension.
If different characteristic dimensions correspond to different time slot length requirements, time sequence splitting is performed on time sequence data of the first type of monitoring items corresponding to the first historical sample set according to time sequence length requirements of a plurality of characteristic dimensions, so that sub-time sequence data with time sequence length matched with the characteristic dimensions are obtained. And then, carrying out feature extraction on the sub-sequence data matched with the time sequence length based on a plurality of feature dimensions.
After feature extraction is completed, a feature matrix composed of feature extraction results is constructed for time sequence data of the first type of monitoring items corresponding to the first historical sample set.
And finally, carrying out similarity clustering on the constructed feature matrix based on the similarity measurement function.
For ease of understanding, the exemplary description is presented in connection with the illustration of FIG. 2.
In this embodiment, n pieces of initial time sequence data of the first historical sample set corresponding to the first type of monitoring item may be regarded as a time sequence matrix n× 1*s, where n represents the number of samples in the first historical sample set, 1 represents the current feature dimension, and s represents the time sequence length.
It should be appreciated that the initial composition of the time series data is derived from the superposition of multidimensional features such as activity features, daily fluctuation features, application habits, etc., and does not represent feature information in a particular feature dimension.
For this reason, further an empirical mode decomposition (EMPIRICAL MODE DECOMPOSITION, EMD) algorithm is adopted, and according to the set time sequence lengths corresponding to the m feature dimensions, time sequence splitting is performed on each initial time sequence data, so as to obtain the corresponding sub-time sequence. As in fig. 1, the initial time series data 1 is split into sub-time series feature 1.1, sub-time series feature 1.2. The initial timing data n is split into sub-timing feature n.1, sub-timing feature n.2, sub-timing feature n.m.
It should be appreciated that after the multidimensional dimensional feature is obtained based on the EMD algorithm, the time matrix is changed by n× 1*s to a value n×m×s. Where m represents the feature dimension.
Here, I (n) is defined as an input signal of initial time series data, and may be expressed by the following formula:
Where n represents the nth time sequence data in the first historical sample set, m represents the sequence number of the mth feature dimension corresponding to the time sequence data, IMF m (n) represents the eigen-mode function of multiple features, and Res m (n) represents the residual error.
And then, carrying out corresponding feature extraction on the split sub-time data, thereby obtaining local feature information of different time scales. For example, feature extraction is performed on sub-timing features 1.1 based on feature dimension 1, feature extraction is performed on sub-timing features 1.2 based on feature dimension 2, feature extraction is performed on sub-timing features 1.m based on feature dimension m, feature extraction is performed on sub-timing features n.1 based on feature dimension 1, feature extraction is performed on sub-timing features n.2 based on feature dimension 2, and feature extraction is performed on sub-timing features n.m based on feature dimension m.
After the timing matrix n×m×s of the various features IMF m (n) is obtained by the EMD algorithm, filtering and screening may be further performed to extract the actually required activity features.
Screening is based on the assumption that activities belonging to an emergency do not form periodic fluctuations. Specifically, the IMF m (n) timing characteristics of the periodic fluctuations can be further filtered out by a fourier transform algorithm, and the remaining timing characteristic matrix is used for making subsequent analysis decisions. Meanwhile, in order to ensure consistency of matrix dimensions, it is necessary to determine a ratio of each feature dimension in the first historical sample set, and filter feature extraction results corresponding to feature dimensions with a ratio lower than a preset threshold.
For example, the filter constraints of the fourier transform are:
based on the above formula, the feature extraction result corresponding to the feature dimension with the duty ratio lower than 0.5 can be removed. The time sequence matrix after the elimination is n x h x s, wherein h < m.
After feature extraction, more accurate activity feature information is obtained, and uncorrelated noise data is effectively filtered out. Then, in order to further reduce the difficulty in calculation, the embodiment may use a kmeans clustering model and perform clustering calculation based on the similarity metric function L2-norm.
Wherein, the similarity measurement function of L2-norm is dist= ||x 2-y2 |. x and x represent two endpoints, respectively, where a similarity distance needs to be calculated.
After the cluster model is obtained, the central time sequence data of each cluster can be calculated easily, wherein the central time sequence data is the shortest average similarity distance with the time sequence data of other monitoring items in the cluster. Based on the similarity distance model, the monitoring items which are not analyzed and determined whether to be influenced by the target activity can be subjected to similarity distance analysis with the clustering model, namely, the similarity between the time sequence data of the target activity and the time sequence data of the centers of all clusters is calculated, so that whether the monitoring items which are not analyzed are strongly related to the target activity or not is judged. If the association is strong, it is stated that the unanalyzed monitored item is affected by the target activity.
S106, time sequence data of a second type of monitoring items corresponding to a second historical sample set of the target activity are obtained, wherein the second type of monitoring items are different from the first type of monitoring items, samples of the second historical sample set are marked with training labels, and the training labels are used for indicating whether the target activity is affected or not.
In this embodiment, the second set of historical samples is used as a training set for the predictive model.
Here, the prediction model is trained in a supervised manner, and the final purpose of the training is to enable the prediction model to predict whether the time sequence characteristics of a certain monitoring item are affected by the target activity, so that the method belongs to the two-classification problem.
Under the prediction of the two categories, the corresponding training label only needs to classify the time sequence data of the second category monitoring item corresponding to the training set into a positive sample and a negative sample. The positive sample is time sequence data influenced by the target activity and serves as a positive example of the prediction model training, and the negative sample is time sequence data not influenced by the target activity and serves as a negative example of the prediction model training.
It should be noted that, the second history sample set belongs to data prepared in advance, and the corresponding training label may be labeled manually, which is not specifically limited herein.
S108, determining the cluster of the time sequence data of the second type monitoring item corresponding to the second history sample set in the plurality of clusters, and constructing training features of the second history sample set based on the cluster identification of the cluster and the corresponding similarity distance.
As described above, the present embodiment uses the similarity distance between the monitoring items, which are not known to be affected by the target activity, and the clusters obtained in S104 as the reference basis for analyzing whether the target activity is affected.
For this reason, a cluster identifier may be configured for each cluster obtained in S104, which is only used to distinguish between clusters, and has no practical meaning. And then, based on the similarity distance between the time sequence data corresponding to the second type monitoring items and each cluster of the second historical sample set, taking the cluster with the smallest similarity distance as the affiliated cluster.
Here, the embodiment may use the cluster identifier of the cluster and the corresponding similarity distance as the training set to train two basic features of the prediction model, and combine features of the time sequence data of the second historical sample set in other dimensions, such as a mean value, a standard deviation, a maximum value, a minimum value, a kurtosis, a skewness and the like in statistics, to construct training features corresponding to the second historical sample set. Wherein, statistical characteristics can be automatically extracted by using tsfresh algorithm package specially used for characteristic engineering construction of time sequence data, and the method is not particularly limited herein.
It should be understood that, referring to fig. 3, the dimensions of the training features of this embodiment include at least two, one is the cluster identifier of the belonging cluster, and the other is the similarity distance between the cluster and the belonging cluster, where the training features are regarded as n×2 matrix data. On the basis of n×2, if features in other dimensions, such as k statistical features extracted by tsfresh algorithm package, are introduced, the training features are regarded as matrix data of n× (k+2).
S110, training a prediction model based on training labels and training features of the second historical sample set, wherein the prediction model is used for predicting whether a monitoring item of an active link is influenced by target activity.
In this embodiment, training tags are expected as output of the predictive model. The training process is to input training features "n x (k+2)" of the second historical sample set into the prediction model, and provide a prediction result of whether the prediction model is affected by the target activity or not. In the training stage, the prediction result output by the prediction model is not necessarily accurate because the prediction capability of the prediction model is effective, and errors exist between the prediction result and the result marked by the true value training label. The error is the training loss of the prediction model. The step takes the reduction of training loss as the gradient direction to adjust the parameters of the prediction model, so that the prediction result output by the prediction model gradually approaches to the true value result marked by the training label, and the training purpose is achieved.
As an exemplary introduction, the nature of the predictive model is to calculate the following formula:
wherein x i represents training characteristics of the nth time series data, w i represents weight corresponding to the nth time series data, and b i represents deviation corresponding to the nth time series data.
The objective function for calculating training loss is:
The training is to optimally solve y through an objective function, and determine w i.
Based on the scheme of the embodiment of the specification, a prediction model capable of measuring whether a certain monitoring item in the service platform is affected by activity can be trained. In the training process, a small number of first historical sample sets and second historical sample sets are prepared in advance, similarity clustering is firstly carried out on the basis of the time sequence data of the first type of monitoring items which are determined to be affected by the activity and correspond to the first historical sample sets, a plurality of clusters serving as reference standards are obtained, then the time sequence data of the second type of monitoring items which correspond to the second historical sample sets are taken as training sets, the clusters of the time sequence data in the training sets are determined, training features are built on the basis of the identification of the clusters of the training sets and the similarity distance of the corresponding clusters, finally, the prediction model is subjected to supervised training on the basis of the training features of the training sets and the pre-labeled training labels, and therefore when the prediction model predicts whether a certain target monitoring item is affected by the activity or not, whether the target monitoring item is affected by the activity or not can be analyzed on the basis of the cluster identification of the corresponding time sequence data of the target monitoring item in the activity and the corresponding similarity distance.
In the whole scheme, training features of the prediction model are composed of cluster identifications and similarity distances of time sequence data relative to reference standards, so that potential influence surfaces of activities relative to a service platform are classified and analyzed from a time sequence angle, service features under specific service scenes are not relied on, the method can be applied to different service scenes, and the scheme has good universality and mobility.
Fig. 4 is a flowchart of a wind control execution method according to an embodiment of the present disclosure, where the method shown in fig. 4 may be executed by a device corresponding to the following, and specifically includes the following steps:
S402, time sequence data of a target monitoring item corresponding to a historical sample set to be predicted of target activity is obtained.
In this embodiment, the target monitoring item refers to a monitoring item in the service platform, such as an application, an interface, a remote calling program, a data packet, and the like of the service platform, where whether the monitoring item is affected by the target activity is to be determined.
S404, determining the cluster of the time sequence data of the target monitoring item corresponding to the history sample set to be predicted in a plurality of clusters, and constructing the input characteristic of the history sample set to be predicted based on the cluster identification of the cluster and the corresponding similarity distance.
The multiple clusters are obtained by carrying out similarity clustering on time sequence data of a first historical sample set of target activities, wherein the time sequence data corresponds to a first type of monitoring items, and the first type of monitoring items are monitoring items which are determined to be influenced by the target activities in the service platform.
It should be understood that the clusters described in this step are obtained based on the similarity clustering in S104, which is not described herein in detail. It should be noted that the input features constructed in this step should have the same feature dimensions as the training features used in the second set of history samples in the training phase of the predictive model.
Assuming that the training features of the second historical sample set are constructed based on the cluster identifier of the cluster to which the second historical sample set belongs, the similarity distance between the second historical sample set and the cluster to which the second historical sample set belongs, and the statistical mean and standard deviation, the training features should also be constructed based on the cluster identifier of the cluster to which the historical sample set to be predicted belongs, the similarity distance between the second historical sample set and the cluster to which the second historical sample set belongs, and the statistical mean and standard deviation.
S406, inputting the input characteristics of the history sample set to be predicted into a prediction model to obtain a prediction result of whether the target monitoring item is influenced by the target activity.
The prediction model is obtained by training based on training labels and training features of a second historical sample set of target activities, the training features of the second historical sample set are obtained by constructing based on cluster identifications and corresponding similarity distances of time sequence data of second type monitoring items corresponding to the second historical sample set in clusters, wherein the target monitoring items and the second type monitoring items are different from the first type monitoring items.
And S408, if the prediction result indicates that the target monitoring item is influenced by the target activity, executing a wind control decision related to the target activity based on the monitoring data of the target activity corresponding to the target monitoring item.
Here, an application of the wind control execution method of the embodiment of the present specification will be described by way of example.
Assume that there is some target activity in the business platform that transitions from a first version to a second version. The second version adds new functionality relative to the first version.
Wherein the active links of the first version are determined, and therefore the monitoring items in the current active links of the first version are defined as first type monitoring items. For wind control, during the period that the target activity is the first version, the service platform can acquire monitoring data from monitoring items in the current active link of the first version, so that corresponding wind control decisions are executed according to the monitoring data.
When the target activity iterates to the second version, as the new function is added to the second version, a monitoring item corresponding to the new function needs to be added on the basis of the current active link, so that the service platform can be ensured to be covered with the new function through wind control.
Here, the first and second history sample sets may be extracted from a history sample of the first version of the target activity recorded from the business system. And then, clustering calculation is carried out according to the time sequence data of the first historical sample set corresponding to the first type monitoring item (which is determined to be influenced by the first version of target activity) in the current active link to obtain a plurality of clusters.
And then manually extracting a few historical samples of the second version target activities recorded by the service system as a second historical sample set, and labeling training labels on the second historical sample set. After the labeling medium is finished, calculating the belonged cluster and the relative distance between the belonged cluster and the second class monitoring item based on the time sequence data of the second historical sample set corresponding to the second class monitoring item except the current active link, and further constructing the training feature. Thereafter, a predictive model is trained based on the training labels and training features of the second set of historical samples.
After the prediction model training is completed, assuming that whether a certain target monitoring item (not belonging to the current active link) is affected by the target activity of the second version is analyzed, time sequence data of the target monitoring item during the target activity of the second version can be extracted from the service platform, and the similarity distance between the target monitoring item and each cluster can be calculated based on the time sequence data. And then, constructing an input feature based on the cluster identification of the cluster to which the target monitoring item belongs and the corresponding similarity distance. Finally, after the input features are input into the model, the corresponding prediction result can be obtained.
If the predicted result indicates that the target monitoring item is affected by the target activity of the second version, the target monitoring item can be added to the current active link, and the subsequent service platform can acquire monitoring data from the target monitoring item for performing wind control decision on the target activity.
As shown in fig. 5, when the above scheme is brought into the service platform implementation, the embodiment of the present specification may construct an activity impact surface analysis system, which is specifically divided into an offline analysis framework and an online analysis framework.
Here, all historical samples of the target activity may be taken care of by a database table of the open data processing service (Open Data Processing Service, ODPS).
For the offline analysis framework part, extracting time sequence data of the first historical sample set and the second historical sample set from the ODPS database in a training stage, and correspondingly inputting the time sequence data of the first historical sample set and the second historical sample set into a clustering model and a prediction model which are prepared in advance so as to complete training of the prediction model. The trained cluster model and the prediction model can be put into a system of a service platform, namely an operation support system (Operation Support Systems, OSS) for use, so that offline prediction capability is provided. In addition, the output results of the clustering model and the prediction model can also be stored in the ODPS data table, so that technicians can perceive the precision difference of the models in different iterations.
For the online analysis framework part, time sequence data of the historical sample corresponding to the unanalyzed monitoring item is still obtained from the ODPS database and is transmitted to the corresponding algorithm service provided by the OSS. That is, firstly, the cluster model calculates the cluster and the similarity distance between the cluster and the monitored time sequence data. Then, based on the cluster identification of the cluster and the similarity distance between the cluster identification and the cluster identification, the input features are built, training features are transmitted into a prediction model, all monitoring items influenced by the target activity are determined, and the relevant influence surface ranges are summarized and generated.
After the influence surface range is obtained, the system compares the similar difference of the monitoring items in the influence surface range on the monitoring time sequence of the current activity and the historical activity, gives out an evaluation result, and stores the evaluation result into a ODPS data sheet for the technicians to screen. The screening result of the technician can be synchronously used as feedback labeling information to fall into a ODPS data table and be used as sample data of the next model training, so that the closed loop of the manual feedback- > model training is realized.
For the system of the embodiment of the specification, the prediction model does not need to rely on manual parameter adjustment of a developer, algorithm parameters can be adaptively optimized according to manual feedback, and the characteristics of various activity scenes can be automatically acquired to make decisions.
Corresponding to the method shown in fig. 1, the embodiment of the present disclosure further provides a model training device. Fig. 6 is a schematic structural diagram of a model training apparatus 600 according to an embodiment of the present disclosure, including:
A first obtaining module 610, configured to obtain time sequence data of a first historical sample set of target activities corresponding to a first type of monitoring item, where the first type of monitoring item is determined to be affected by the target activities;
the first cluster calculation module 620 performs similarity clustering on the time series data of the first historical sample set corresponding to the first type of monitoring items to obtain a plurality of clusters.
The second obtaining module 630 obtains time sequence data of a second historical sample set of the target activity corresponding to a second type of monitoring item, wherein the second type of monitoring item is different from the first type of monitoring item, and samples of the second historical sample set are marked with training labels, and the training labels are used for indicating whether the target activity is affected or not.
The feature construction module 640 determines clusters to which the time sequence data of the second type of monitoring items corresponding to the second historical sample set belongs in the plurality of clusters, and constructs training features of the second historical sample set based on cluster identifiers of the clusters to which the time sequence data belongs and the corresponding similarity distance.
Model training module 650 trains a predictive model based on training labels and training features of the second set of historical samples, wherein the predictive model is used to predict whether a monitored item of an active link is affected by the target activity.
The device of the embodiment of the specification can train a prediction model capable of measuring whether a certain monitoring item in the service platform is affected by the activity. In the training process, a small number of first historical sample sets and second historical sample sets are prepared in advance, similarity clustering is firstly carried out on the basis of the time sequence data of the first type of monitoring items which are determined to be affected by the activity and correspond to the first historical sample sets, a plurality of clusters serving as reference standards are obtained, then the time sequence data of the second type of monitoring items which correspond to the second historical sample sets are taken as training sets, the clusters of the time sequence data in the training sets are determined, training features are built on the basis of the identification of the clusters of the training sets and the similarity distance of the corresponding clusters, finally, the prediction model is subjected to supervised training on the basis of the training features of the training sets and the pre-labeled training labels, and therefore when the prediction model predicts whether a certain target monitoring item is affected by the activity or not, whether the target monitoring item is affected by the activity or not can be analyzed on the basis of the cluster identification of the corresponding time sequence data of the target monitoring item in the activity and the corresponding similarity distance.
Optionally, the first clustering calculation module 620 performs similarity clustering on the time sequence data of the first type of monitoring items corresponding to the first historical sample set, and includes performing time sequence splitting on the time sequence data of the first type of monitoring items corresponding to the first historical sample set based on a time sequence length requirement of a plurality of preset feature dimensions to obtain sub-time data with time sequence lengths matched with the feature dimensions, wherein the feature dimensions are associated with the target activity, performing feature extraction on the sub-time data matched with the time sequence lengths based on the feature dimensions, constructing a feature matrix composed of feature extraction results for the time sequence data of the first type of monitoring items corresponding to the first historical sample set, and performing similarity clustering on the constructed feature matrix based on a similarity metric function.
Optionally, before constructing the feature matrix corresponding to the first type of monitoring item in the first historical sample set based on the feature extraction result, the feature construction module 640 further performs determining a duty ratio of each feature dimension appearing in the first historical sample set based on the obtained feature extraction result, and filtering out a feature extraction result corresponding to a feature dimension with a duty ratio lower than a preset threshold.
Optionally, the first type of monitoring item and/or the second type of monitoring item includes at least one of an application, an interface, a remote caller, and a data packet of the service platform.
Optionally, the feature construction module 640 constructs training features of the second historical sample set based on cluster identifications of the clusters and corresponding similarity distances, including constructing training features of the second historical sample set based on statistical indicators of the cluster identifications of the clusters, the similarity distances corresponding to the clusters, and the time sequence data of the second type of monitoring items corresponding to the second historical sample set
Obviously, the model training apparatus of the embodiment of the present disclosure can implement the steps and functions in the embodiment shown in fig. 1, which are not described in detail herein.
Corresponding to the method shown in fig. 4, the embodiment of the present disclosure further provides a wind control executing device. Fig. 7 is a schematic structural diagram of a wind control executing device 700 according to an embodiment of the present disclosure, including:
The third obtaining module 710 obtains time sequence data of the target monitoring item corresponding to the historical sample set to be predicted of the target activity.
The second clustering calculation module 720 determines clusters of the time sequence data of the target monitoring item corresponding to the to-be-predicted historical sample set in a plurality of clusters, and constructs input features of the to-be-predicted historical sample set based on cluster identifiers of the clusters and corresponding similarity distances, wherein the clusters are obtained by performing similarity clustering on the time sequence data of the first type monitoring item corresponding to the first historical sample set of the target activity, and the first type monitoring item is determined to be influenced by the target activity.
The model prediction module 730 inputs the input features of the to-be-predicted historical sample set to a prediction model to obtain a prediction result of whether the target monitoring item is affected by the target activity, where the prediction model is obtained by training based on a training tag and a training feature of a second historical sample set of the target activity, and the training feature of the second historical sample set is obtained by constructing based on a cluster identifier of a cluster to which time series data of a second type of monitoring item corresponding to the second historical sample set belong in the plurality of clusters and a corresponding similarity distance, and the target monitoring item and the second type of monitoring item are different from the first type of monitoring item.
And the wind control executing module 740 is configured to execute a wind control decision related to the target activity based on the monitoring data of the target activity corresponding to the target monitoring item if the prediction result indicates that the target monitoring item is affected by the target activity.
Optionally, the samples in the first historical sample set correspond to a first version of the target activity, the samples in the historical sample set to be predicted correspond to a second version of the target activity, the second version is later than the first version, and the target monitoring item is a monitoring item to be evaluated for whether the target activity is affected after the target activity is iterated from the first version to the second version.
Obviously, the wind control executing device in the embodiment of the present disclosure can implement the steps and functions in the embodiment shown in fig. 4, which are not described in detail herein.
Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. Referring to fig. 8, at the hardware level, the electronic device includes a processor, and optionally an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.
The processor, network interface, and memory may be interconnected by an internal bus, which may be an ISA (Industry Standard Architecture ) bus, a PCI (PERIPHERAL COMPONENT INTERCONNECT, peripheral component interconnect standard) bus, or EISA (Extended Industry Standard Architecture ) bus, among others. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in FIG. 8, but not only one bus or type of bus.
And the memory is used for storing programs. In particular, the program may include program code including computer-operating instructions. The memory may include memory and non-volatile storage and provide instructions and data to the processor.
The processor reads the corresponding computer program from the nonvolatile memory to the memory and then runs the computer program to form the model training device on a logic level. The processor is used for executing the programs stored in the memory and is specifically used for executing the following operations:
time sequence data of a first historical sample set of target activities corresponding to a first type of monitoring items are obtained, wherein the first type of monitoring items are determined to be influenced by the target activities.
And clustering the similarity of the time sequence data of the first historical sample set corresponding to the first type of monitoring items to obtain a plurality of clusters.
And acquiring time sequence data of a second historical sample set of the target activity, wherein the second type of monitoring items are different from the first type of monitoring items, and the samples of the second historical sample set are marked with training labels which are used for indicating whether the target activity is affected or not.
Determining the cluster of the time sequence data of the second type monitoring item corresponding to the second historical sample set in the clusters, and constructing training features of the second historical sample set based on the cluster identification of the cluster and the corresponding similarity distance.
And training a prediction model based on the training labels and training features of the second historical sample set, wherein the prediction model is used for predicting whether a monitoring item of an active link is influenced by the target activity.
Or the processor reads the corresponding computer program from the nonvolatile memory to the memory and then runs the computer program to form the wind control executing device on a logic level. The processor is used for executing the programs stored in the memory and is specifically used for executing the following operations:
Acquiring time sequence data of a target monitoring item corresponding to a historical sample set to be predicted of a target activity;
Determining a cluster of time sequence data of the target monitoring item corresponding to the historical sample set to be predicted in a plurality of clusters, and constructing an input characteristic of the historical sample set to be predicted based on a cluster identifier of the cluster and a corresponding similarity distance, wherein the clusters are obtained by performing similarity clustering on the time sequence data of a first type of monitoring item corresponding to a first historical sample set of the target activity, and the first type of monitoring item is determined to be influenced by the target activity;
Inputting the input characteristics of the history sample set to be predicted into a prediction model to obtain a prediction result of whether the target monitoring item is influenced by the target activity, wherein the prediction model is obtained by training based on training labels and training characteristics of a second history sample set of the target activity, the training characteristics of the second history sample set are obtained by constructing cluster identifiers and corresponding similarity distances of time sequence data of a second type of monitoring item corresponding to the second history sample set in clusters, and the target monitoring item and the second type of monitoring item are different from the first type of monitoring item;
And if the prediction result indicates that the target monitoring item is influenced by the target activity, executing a wind control decision related to the target activity based on the monitoring data of the target monitoring item corresponding to the target activity.
The method disclosed in the embodiment shown in fig. 1 or fig. 4 of the present specification may be applied to a processor or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The Processor may be a general-purpose Processor including a central processing unit (Central Processing Unit, CPU), a network Processor (Network Processor, NP), etc., or may be a digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components. The various methods, steps, and logic blocks disclosed in one or more embodiments of the present description may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with one or more embodiments of the present disclosure may be embodied directly in a hardware decoding processor or in a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.
Of course, in addition to the software implementation, the electronic device in this specification does not exclude other implementations, such as a logic device or a combination of software and hardware, that is, the execution subject of the following process is not limited to each logic unit, but may also be hardware or a logic device.
The present specification embodiment also proposes a computer-readable storage medium storing one or more programs.
Wherein the one or more programs include instructions, which when executed by a portable electronic device comprising a plurality of application programs, enable the portable electronic device to perform the method of the embodiment of fig. 1, and in particular to:
time sequence data of a first historical sample set of target activities corresponding to a first type of monitoring items are obtained, wherein the first type of monitoring items are determined to be influenced by the target activities.
And clustering the similarity of the time sequence data of the first historical sample set corresponding to the first type of monitoring items to obtain a plurality of clusters.
And acquiring time sequence data of a second historical sample set of the target activity, wherein the second type of monitoring items are different from the first type of monitoring items, and the samples of the second historical sample set are marked with training labels which are used for indicating whether the target activity is affected or not.
Determining the cluster of the time sequence data of the second type monitoring item corresponding to the second historical sample set in the clusters, and constructing training features of the second historical sample set based on the cluster identification of the cluster and the corresponding similarity distance.
And training a prediction model based on the training labels and training features of the second historical sample set, wherein the prediction model is used for predicting whether a monitoring item of an active link is influenced by the target activity.
Or the one or more programs, comprise instructions, which when executed by a portable electronic device comprising a plurality of application programs, enable the portable electronic device to perform the method of the embodiment of fig. 4, and in particular to:
Acquiring time sequence data of a target monitoring item corresponding to a historical sample set to be predicted of a target activity;
Determining a cluster of time sequence data of the target monitoring item corresponding to the historical sample set to be predicted in a plurality of clusters, and constructing an input characteristic of the historical sample set to be predicted based on a cluster identifier of the cluster and a corresponding similarity distance, wherein the clusters are obtained by performing similarity clustering on the time sequence data of a first type of monitoring item corresponding to a first historical sample set of the target activity, and the first type of monitoring item is determined to be influenced by the target activity;
Inputting the input characteristics of the history sample set to be predicted into a prediction model to obtain a prediction result of whether the target monitoring item is influenced by the target activity, wherein the prediction model is obtained by training based on training labels and training characteristics of a second history sample set of the target activity, the training characteristics of the second history sample set are obtained by constructing cluster identifiers and corresponding similarity distances of time sequence data of a second type of monitoring item corresponding to the second history sample set in clusters, and the target monitoring item and the second type of monitoring item are different from the first type of monitoring item;
And if the prediction result indicates that the target monitoring item is influenced by the target activity, executing a wind control decision related to the target activity based on the monitoring data of the target monitoring item corresponding to the target activity.
In summary, the foregoing description is only a preferred embodiment of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, or the like, which is within the spirit and principles of one or more embodiments of the present disclosure, is intended to be included within the scope of one or more embodiments of the present disclosure.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.