WO2025010581A1

WO2025010581A1 - Attention-based model for wireless digital twin

Info

Publication number: WO2025010581A1
Application number: PCT/CN2023/106518
Authority: WO
Inventors: Chunchun YANG; Albert Thomas; Balazs KEGL; Lin Zhu
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2023-07-10
Filing date: 2023-07-10
Publication date: 2025-01-16
Anticipated expiration: 2026-01-10

Abstract

The present disclosure relates to performance prediction in a network. The disclosure proposes a computer-implemented method for performance prediction in a wireless communication network, the method comprising: obtaining historical performance data of the network, wherein the historical performance data comprises a plurality of sets of performance data associated with at least one network performance indicator, wherein each set of performance data is collected during a time period with a first sampling granularity, and wherein each piece of performance data included in the set of performance data is collected within the time period with a second sampling granularity; and predicting at least one piece of future performance data based on the plurality of sets of performance data. This disclosure further proposes a wireless digital twin of a wireless network system.

Description

ATTENTION-BASED MODEL FOR WIRELESS DIGITAL TWIN

TECHNICAL FIELD

The present disclosure relates to wireless networks, particularly to network optimization and testing in a self-driving L4-L5 communication network. In order to improve the prediction of the performance of communication networks and thus optimize network designs and configurations, the disclosure proposes a computer-implemented method and a wireless digital twin of a wireless network system.

BACKGROUND

Wireless network digital twins can be used for network optimization and testing in a self-driving L4-L5 communication network. FIG. 1 illustrates an autonomous driving network-level schematic. A wireless network digital twin simulates different scenarios and configurations and evaluates their impact on network performance and key performance indicators (KPIs) by creating a virtual replica of the physical network.

The use of wireless network digital twins allows for proactive maintenance of communication networks, detecting potential faults and performance degradations before they occur, and minimizing downtime and service disruptions. It is especially imperative for safety-critical applications like banks and self-driving cars, where a brief interruption of communication can have dire consequences.

In real-world scenarios, wireless network digital twins can predict the performance of communication networks for self-driving vehicles through simulations and models. To have a highly accurate model, there are two challenges: 1) models need to have high parameter (e.g., action) sensitivity, and 2) data from the real network contains a high level of noise.

To build the digital twin of a wireless network system, there are two existing solutions. The first solution proposes a method for throughput prediction for cellular networks. FIG. 2 shows a process of throughput prediction for cellular networks. In this work, multiple machine learning models are trained (e.g. Random Forest (RF) , Support Vector Machine (SVM) , Multi Layers Perceptron (MLP) ) and then all the trained models will be evaluated based on some criteria (e.g. Absolute Percentage Error (APE) and R-squared) . One model will be selected and the history length and prediction horizon will be decided based on the evaluation result. However, this solution only considers the short-term prediction (at most it can predict 12s) , and it fails to take into account the parameter sensitivity.

The second solution proposes a method for network performance forecasting. This method will evaluate how a KPI changes with traffic (e.g., a user may be interested in the impact of rising traffic on the dropped call rate) and then build a regression model to model the relation between one KPI and the others. However, the second solution is limited in that it is based on an assumption and an estimation of traffic growth rate for forecasting in the future, and also no parameter sensitivity is taken into account.

Therefore, an improved solution for building the digital twin for a wireless communication network is desired.

SUMMARY

In view of the above-mentioned limitations, the present disclosure aims to build an advanced wireless digital twin for network optimization and testing. In particular, an objective is to have a highly accurate model for the wireless digital twin. Another objective is to enable long-term network performance prediction. Another objective is to build the model with a high level of sensitivity toward the parameters.

These and other objectives are achieved by the solutions of this disclosure as provided in the independent claims. Advantageous implementations are further defined in the dependent claims.

A first aspect of the disclosure provides a computer-implemented method for performance prediction in a wireless communication network. The method comprises: obtaining historical performance data of the network, wherein the historical performance data comprises a plurality of sets of performance data associated with at least one network performance indicator, wherein each set of performance data is collected during a time period with a first sampling granularity, and wherein each piece of performance data included in the set of performance data is collected within the time period with a second sampling granularity; and predicting at least one piece of future performance data based on the plurality of sets of performance data.

This disclosure proposes an advanced solution for building the wireless digital twin for a wireless communication network. The disclosure enables a long-term prediction, particularly the prediction of long-term KPIs, such as traffic and throughput of the wireless network. In particular, this disclosure proposes to capture patterns of the network in different time scales, i.e., the performance data collected in different sampling granularity. The prediction is performed using data captured from different time scales.

For instance, it can be considered that each set of performance data, which is collected with the first sampling granularity, represents a daily pattern of the network. In this example, the time period is considered a day. The behavior of daily patterns of the network may be learned and modeled. Additionally, hourly patterns of the KPIs are also collected. Each piece of performance data included in the set of performance data, which is collected with the second sampling granularity, represents a daily pattern of the network.

In an implementation form of the first aspect, the step of predicting comprises applying at least one machine learning model to the plurality of sets of performance data.

Optionally, the prediction is performed based on machine learning model (s) .

In an implementation form of the first aspect, the at least one machine learning model comprises a first attention-based model and/or a second attention-based model, each comprising the following input components: query, key, and value.

Notably, a popular artificial neural network block, which is based on the Attention mechanism, contains three types of components: query, key, and value. With the Attention mechanism, the similarity between queries and keys will be computed and used as the weight to generate a weighted sum of values. The Attention mechanism is the base block of the Transformer model. This disclosure further proposes a novelty module to model the collected data patterns, which may be referred to as “AttenInAtten” . Such AttenInAtten (Attention in Attention) neural network architecture may have two attention layers. Possibly, each attention layer may be based on one of the first attention-based model and the second attention-based model.

In an implementation form of the first aspect, each set of performance data comprises state-action pairs indicating the pieces of performance data, and each state is associated with the at least one network performance indicator, and each action is associated with at least one network parameter.

For example, the collected data may be in a state-action pair format, i.e., { (s₁, a₁) , (s₂, a₂) , …, (s₇, a₇) } , where s_i is the i_th state, and a_i is the i_th action.

In an implementation form of the first aspect, the at least one piece of future performance data comprises a state-action pair, and the method further comprises: obtaining an action of the at least one piece of future performance data; and predicting a state of the at least one piece of future performance data by applying the first attention-based model to the plurality of sets of performance data, using a linear projection of the action of the at least one piece of future performance data as the query.

In order to be useful for network optimization, the models designed according to this disclosure further exhibit a high level of sensitivity toward the parameters. It may be understood that the state-action pairs from the historical data and the latest action of the network, e.g., { (s₁, a₁) , (s₂, a₂) , …, (s₇, a₇) , a₈} , may be used to forecast the future state of the network, e.g., the value of s₈. Due to the use of action as the input “query” , the output of the attention mechanism will be heavily influenced by the action (query) . Notably, actions of the network are controlled by for example network maintainers. It is considered known to the network if an action is taken or to be taken.

In an implementation form of the first aspect, the step of predicting the state of the at least one piece of future performance data, by applying the first attention-based model to the plurality of sets of performance data, further comprises: using a first linear projection of the state-action pairs of each set of performance data as the key, and using a second linear projection of the state-action pairs of each set of performance data as the value.

Notably, linear projections of the state-action pairs are used as input to the model to predict the next state of the network. Possibly, the linear functions used for generating the linear projections for the input “key” and the input “value” are different.

In an implementation form of the first aspect, the method further comprises applying the second attention-based model to at least a first set of performance data in the plurality of sets of performance data to predict a second set of performance data.

Optionally, an embodiment of this disclosure solves the long-term prediction, i.e., the final forecasting task, by first solving several short-term forecasting tasks, which may be named sub-forecasting tasks. For example, with the information about the network patterns on the first day, the network patterns on the second day can be forecasted or predicted. Each sub-forecasting task may be represented using the attention mechanism. This may be considered the first attention layer in the AttenInAtten architecture.

In an implementation form of the first aspect, the method further comprises applying the second attention-based model to at least the first set of performance data and the second set of performance data to predict a third set of performance data.

Following the previous example, once the network patterns on the second day are known (after the sub-forecasting task) , the network patterns on the third day can be forecasted or predicted using the information from the two previous days.

In an implementation form of the first aspect, the step of applying the second attention-based model to at least the first set of performance data and the second set of performance data to predict the third set of performance data comprises: obtaining an action of the third set of performance data; and predicting a state of the third set of performance data by applying the second attention-based model to the at least the first set of performance data and the second set of performance data, using a linear projection of a predicted state of the second set of performance data as the query.

If how to predict s₂ from { (s₁, a₁) , a₂} is known, s₃ can be further forecasted from { (s₁, a₁) , (s₂, a₂) , a₃} , and so on. Specifically, the query that is to be input to the second attention-based model for predicting the state of the third set of performance data is the linear projection of the state to be predicted in the previous sub-forecasting task, i.e., the predicted state of the second set of performance data.

In an implementation form of the first aspect, the step of predicting the state of the third set of performance data, by applying the second attention-based model to the at least the first set of performance data and the second set of performance data, further comprises using a first linear projection of the state-action pairs of the first set of performance data and the second set of performance data as the key, and using a second linear projection of the state-action pairs of the first set of performance data and the second set of performance data as the value.

Similar to the first attention-based model, linear projections of the state-action pairs are also used as input to the second attention-based model to predict states for the sub-forecasting task. Possibly, the linear functions used for generating the linear projections for the input “key” and the input “value” are different.

In an implementation form of the first aspect, the method further comprises applying the second attention-based model to the second set of performance data and the third set of performance data to identify a correlation between the second set of performance data and the third set of performance data.

Possibly, the sub-forecasting tasks may be similar or share some knowledge about how to forecast future states. Such similarity may be determined and used for forecasting the final forecasting task.

In an implementation form of the first aspect, the method further comprises predicting the at least one piece of future performance data by applying the first attention-based model to the plurality of sets of performance data based on the correlation.

This disclosure further proposes to consider the similarity of the sub-forecasting tasks when predicting the final forecasting task. In particular, the output from the sub-forecasting tasks, i.e., the first attention layer, will be input to the first attention-based model, which can be considered as the second attention layer in the AttenInAtten architecture. It may be understood that the second attention layer receives the output from the first attention layer and sum them up with corresponding weights.

In an implementation form of the first aspect, the at least one performance indicator comprises traffic, and/or throughput.

For instance, the KPI discussed in this disclosure is traffic or throughput of the network. However, the present disclosure can also be applied to other KPIs of the network.

A second aspect of the disclosure provides a wireless digital twin of a wireless network system, comprising a model configured to implement the method according to the first aspect or any implementation forms of the first aspect.

Accordingly, an embodiment of this disclosure further proposes a wireless digital twin, which is built for predicting the performance of a wireless communication network.

Implementation forms of the wireless digital twin of the second aspect may correspond to the implementation forms of the computer-implemented method of the first aspect described above. The wireless digital twin of the second aspect and its implementation forms achieve the same advantages and effects as described above for the computer-implemented method of the first aspect and its implementation forms.

A third aspect of the disclosure provides a computer program product comprising a program code for carrying out when implemented on a processor, the method according to the first aspect, or any implementation forms of the first aspect.

Implementation forms of the computer program product of the third aspect may correspond to the implementation forms of the computer-implemented method of the first aspect described above. The computer program product of the third aspect and its implementation forms achieve the same advantages and effects as described above for the computer-implemented method of the first aspect and its implementation forms.

It has to be noted that all devices, elements, units, and means described in the present application could be implemented in software or hardware elements or any kind of combination thereof. All steps which are performed by the various entities described in the present application as well as the functionalities described to be performed by the various entities are intended to mean that the respective entity is adapted to or configured to perform the respective steps and functionalities. Even if, in the following description of specific embodiments, a specific functionality or step to be performed by external entities is not reflected in the description of a specific detailed element of that entity that performs that specific step or functionality, it should be clear for a skilled person that these methods and functionalities can be implemented in respective software or hardware elements or any kind of combination thereof.

BRIEF DESCRIPTION OF DRAWINGS

The above-described aspects and implementation forms of the present disclosure will be explained in the following description of specific embodiments in relation to the enclosed drawings, in which:

FIG. 1 shows an autonomous driving network level schematic;

FIG. 2 shows an exemplary process for throughput prediction for cellular networks;

FIG. 3 shows a computer-implemented method according to an embodiment of the disclosure;

FIG. 4 shows an overall model architecture according to an embodiment of the disclosure;

FIG. 5 shows an exemplary transformer encoder;

FIG. 6 shows a schematic of the AttenInAtten block according to an embodiment of the disclosure;

FIG. 7 shows a wireless network and wireless digital twin diagram according to an embodiment of the disclosure;

FIG. 8 shows a forecasting result according to an embodiment of the disclosure; and

FIG. 9 shows a forecasting result according to an embodiment of the disclosure.

DETAILED DESCRIPTION OF EMBODIMENTS

Illustrative embodiments of a computer-implemented method, a wireless digital twin, and a corresponding computer program product for performance prediction in a wireless communication network are described with reference to the figures. Although this description provides a detailed example of possible implementations, it should be noted that the details are intended to be exemplary and in no way limit the scope of the application.

Moreover, an embodiment/example may refer to other embodiments/examples. For example, any description including but not limited to terminology, element, process, explanation and/or technical advantage mentioned in one embodiment/example is applicative to the other embodiments/examples.

In real-world scenarios, wireless network digital twins can predict the performance of communication networks for self-driving vehicles through simulations and models. By optimizing network design, configuration, and maintenance, network operators can ensure reliable and secure communication between vehicles and infrastructure.

The ability to predict KPIs (e.g. traffic and throughput) is key to achieving wireless digital twin. A comprehensive and realistic model of the physical network is needed to accurately predict KPIs. Base stations, antennas, user equipment, and the propagation environment should be included in this model, as well as their interactions and dependencies.

Typically, various data sources and modeling techniques can be used to build such a model, including network measurement data, traffic models, and machine learning techniques. In particular, network operators can collect data from various sources such as network probes, drive tests, and user feedback, to characterize network performance and behavior under different conditions. Based on user behavior, application requirements, and network topology, traffic models can predict the amount and type of traffic in the network. With machine learning techniques, complex relationships and dependencies between network components and variables can be analyzed and modeled, and accurate predictions can be made based on historical data.

Wireless digital twins can simulate the physical network using these data sources and modeling techniques. This will enable accurate prediction of key performance indicators such as traffic and throughput. As a result, network operators can optimize network design, configuration, and maintenance. Furthermore, it can ensure reliable and efficient communication across a wide range of applications.

FIG. 3 shows a computer-implemented method 300 for performance prediction in a wireless communication network, according to an embodiment of the disclosure. The method 300 comprises a step 301 of obtaining historical performance data of the network. In particular, the historical performance data comprises a plurality of sets of performance data associated with at least one network performance indicator. Each set of performance data is collected during a time period with a first sampling granularity. Each piece of performance data included in the set of performance data is collected within the time period with a second sampling granularity. The method 300 further comprises a step 302 of predicting at least one piece of future performance data based on the plurality of sets of performance data.

This disclosure proposes an advanced solution for building the wireless digital twin for a wireless communication network. Embodiments of the disclosure enable a long-term prediction, particularly the prediction of long-term KPIs, such as traffic and throughput of the wireless network. A main idea of this disclosure is to rely on predicting future data using data captured from different time scales.

Optionally, according to an embodiment of the disclosure, each set of performance data comprises state-action pairs indicating the pieces of performance data, and each state is associated with the at least one network performance indicator, and each action is associated with at least one network parameter.

For example, the data in a state-action pair format may be represented as (s_i, a_i) , where s_i is the i_th state, and a_i is the i_th action. For instance, the plurality of sets of performance data may be represented as { (s₁, a₁) , (s₂, a₂) , …, (s_i, a_i) } . If each set of performance data represents a daily pattern of the network, the state-action pair (s₁, a₁) represents the data of the first day.

In one example, each piece of performance data of a set of performance data (i.e., daily data) may represent an hourly pattern of the network. Thus, each piece of performance data may be represented aswhere i represents the hour, and j represents the day. In this example, history data of 7 days are concerned. It should be understood that this is just one example and the present application does not limit itself to a particular number of days.

It may be understood that the state can be considered a vector. The sizes of a daily state and an hourly state are different. One example is: for the daily state of one day, the shape of the state vector is 24*1, where 24 represents 24 hours in one day. For the hourly state of one hour, the shape of the state vector can be, for example, 12*1, if considering the sampling interval is 5 minutes (12*5 minutes is one hour) .

FIG. 4 shows an overall model architecture according to an embodiment of this disclosure. To solve the long-term prediction problem, a model with an attention mechanism is designed and two different branches are used to capture different time scales (e.g., daily and hourly) patterns.

Optionally, according to an embodiment of the disclosure, the predicting step 302 as shown in FIG. 3 may comprise applying at least one machine learning model to the plurality of sets of performance data (i.e., daily and hourly patterns of the network) .

In this particular example, a conventional transformer encoder (branch 1, shown in the upper part of FIG. 4) is utilized to represent the hourly patterns of the KPIs. A novelty AttenInAtten block (branch 2, shown in the lower part of FIG. 4) is used to learn the behavior of daily patterns.

FCN (Fully Connected Network) shown in FIG. 4 represents the fully connected layer, which implements affine transform from input to output, i.e.:

An example of the conventional transformer encoder is shown in FIG. 5. Details are not discussed in this application.

As previously discussed, the Attention mechanism is a popular artificial neural network block, which contains three types of components: query (Q) , key (K) , and value (V) . With the Attention mechanism, the similarity between queries and keys will be computed and used as the weight to generate a weighted sum of values. The Attention mechanism is the base block of the Transformer model. This disclosure further proposes a novelty module, “AttenInAtten” , which may have two attention layers.

Optionally, according to an embodiment of the disclosure, the at least one machine learning model comprises a first attention-based model and/or a second attention-based model, each comprising the following input components: query, key, and value.

To improve the impact of actions, this disclosure proposes to utilize the attention mechanism by using the latest action in the state-action pairs as the query for action sensitivity. Generally, in the attention mechanism, the similarity between the query and the key will be computed, and then a weighted summation of value will be calculated based on the similarity. Due to the use of action as a query, the output of the attention mechanism will be heavily influenced by the action (query) . Such a high level of sensitivity toward the parameters can be very useful for network optimization.

FIG. 6 shows a detailed schematic of the AttenInAtten block according to an embodiment of this disclosure. Based on the observation that inside of the long-term prediction problem, there are a number of short-term forecasting tasks (or can be called sub-forecasting tasks) . The idea of this block is that solving the sub-forecasting task may be helpful to solve the final forecasting task. The sub-forecasting task may be represented using the function f_L+1 (sa_L) →s_L+1, where L is a positive integer. For example, it is considered how to predicts₂ from { (s₁, a₁) , a₂} is known (i.e., f₂ (a₁, s₁a_2, ) →s₂) , how to forecast s₃ from { (s₁, a₁) , (s₂, a₂) , a₃} (i.e., f₃ (a₁, s₁a_2, s₂, a_3, ) →s₃) , is also known, and so on, if the eighth day’s data (i.e., A_S) need to be predicted, the model can take advantage of information from the previous sub-forecasting tasks to predict s₈ from { (s₁, a₁) , (s₂, a₂) , …, (s₇, a₇) , a₈} .

Notably, this disclosure proposes to use the attention mechanism to represent the sub-forecasting task, which is in the first attention layer. Specifically, the value is a projection of the state-action pairs that are used to predict the next state, the query is the linear projection of the state to be predicted in the sub-forecasting, and the key is another linear projection of the state-action pairs that are used to predict the next state. The outputs (for example, A₂ ～ A₇ in FIG. 6) of the first attention layer are the representation of each sub-forecasting task. These outputs will become the inputs for the second attention layer.

Therefore, according to an embodiment of the disclosure, the method 300 shown in FIG. 3 further comprises applying the second attention-based model to at least a first set of performance data (e.g., A₁) in the plurality of sets of performance data to predict a second set of performance data (e.g., A₂) .

Optionally, the method 300 may further comprise applying the second attention-based model to at least the first set of performance data and the second set of performance data to predict a third set of performance data (e.g., A₃) .

Optionally, according to an embodiment of the disclosure, the step of applying the second attention-based model to at least the first set of performance data and the second set of performance data to predict the third set of performance data comprises obtaining an action of the third set of performance data; and predicting a state of the third set of performance data by applying the second attention-based model to the at least the first set of performance data and the second set of performance data, using a linear projection of a predicted state of the second set of performance data as the query.

Optionally, according to an embodiment of the disclosure, the step of predicting the state of the third set of performance data, by applying the second attention-based model to the at least the first set of performance data and the second set of performance data, further comprises using a first linear projection of the state-action pairs of the first set of performance data and the second set of performance data as the key and using a second linear projection of the state-action pairs of the first set of performance data and the second set of performance data as the value.

In the second attention layer, another attention mechanism will be adopted. We compute the similarity between the last action (a_L+1 in FIG. 6, as the query) and all outputs (as keys and values) of the first attention layer, then these similarities will be used as the weight to get a linear combination of all sub-forecasting tasks. The output of the second attention layer (A₈ in FIG. 6) is the extracted feature from all sub-forecasting which will be further used to predict the state of the next day. That is, the final forecasting task may be represented as w₂A₂+w₃A₃ +…+w₇A₇ →A₈. It must be noted that the last action (a_L+1 in FIG. 6) is used as the query for the prediction model, so that action sensitivity will be substantial.

Optionally, according to an embodiment of the disclosure, the at least one piece of future performance data comprises a state-action pair, and the method 300 comprises obtaining an action of the at least one piece of future performance data; and predicting a state of the at least one piece of future performance data by applying the first attention-based model to the plurality of sets of performance data, using a linear projection of the action of the at least one piece of future performance data as the query.

That is, the state-action pairs from the historical data and the latest action of the network, e.g., { (s₁, a₁) , (s₂, a₂) , …, (s₇, a₇) , a₈} , may be used to forecast the future state of the network, e.g., the value of s₈. Due to the use of action as the input “query” , the output of the attention mechanism will be heavily influenced by the action (query) . Notably, actions of the network are controlled by for example network maintainers. It is considered known to the network if an action is taken or to be taken. the models designed according to this disclosure further exhibit a high level of sensitivity toward the parameters, which can be useful for network optimization.

Optionally, according to an embodiment of the disclosure, the step of predicting the state of the at least one piece of future performance data, by applying the first attention-based model to the plurality of sets of performance data, further comprises: using a first linear projection of the state-action pairs of each set of performance data as the key, and using a second linear projection of the state-action pairs of each set of performance data as the value.

Optionally, according to an embodiment of the disclosure, the method 300 as shown in FIG. 3 further comprises a step of applying the second attention-based model to the second set of performance data and the third set of performance data to identify a correlation (e.g., the weight) between the second set of performance data and the third set of performance data.

Optionally, according to an embodiment of the disclosure, the method 300 as shown in FIG. 3 further comprises predicting the at least one piece of future performance data by applying the first attention-based model to the plurality of sets of performance data based on the correlation.

It may be worth further mentioning that the KPIs discussed in this disclosure include traffic and throughput of the network. However, the present disclosure can also be applied to other KPIs of the network.

FIG. 7 depicts an application scenario according to an embodiment of this disclosure. The upper part depicts a wireless network system, including Node B and Mobile Network Automatic Engine (MAE) (wireless network management node) . The lower part depicts the wireless network digital twin 700 according to an embodiment of this disclosure, which may include a fundamental model library, optimization library, as well as application characteristics such as traffic prediction models, throughput prediction models, and wireless parameter optimization. The proposed attention-based modeling algorithm, which serves as a key capability of the fundamental model library in the wireless network digital twin 700, is provided inside the digital twin. The wireless network provides data for the wireless network digital twin 700, and the wireless network digital twin 700 provides modeling and optimization capabilities for the wireless network.

The attention-based model can be considered a part of the wireless digital twin 700 (the product will be a software package) . Basically, wireless digital twins contain a base model/optimization part and an application part, as shown in FIG. 7. The attention-based methods will be one of the core capabilities of the base model module, which is not only used for some applications about modeling but also the optimization library/application.

In the following, a particular embodiment of how to use the proposed attention based-model to predict the throughput for wireless communication networks is discussed. In this example, the observed data is a time series, and the time scale is an hour, which means that the model gets the throughput data each hour. Besides the throughput, the hourly action data is also obtained. The throughput will rely on the previous throughput and the action that has been taken or to-be-taken. When predicting the throughput, the previous throughput and the action will be used. The formula of the forecasting model may be represented as throughput_{{t+1, …, t+N}} =f (throughput_{{t-l, …, t}}, action_{{t-l, …, t}}; w) , w is the parameter of the model that needs to be estimated.

If using one week of data as input of the model, the output of the model is the prediction of throughput of the next day, i.e., the eighth day. In one example, to train the model, the data of 12 days may be used, and the model can be tested on data from the next 3 days. The model we used is the same as FIG. 4, which includes two different branches: 1) the conventional transformer encoder and 2) the AttenInAtten model. In this case, the dimension of throughput/traffic is 24, and the dimension of action is 38. The input sequence length is 7, the embedding_dim of the attention encoder is 64, and the number of heads of multi-head attention in the attention encoder is 6.

The result of our attention-based model is shown in Table 1 as follows:

Table 1

Notably, APE represents an Absolute Percentage Error, and APE@0.25 means that the ratio of predictions whose APE is less than 0.25. To further analyze the proposed method, some visualization results are shown in FIG. 8, which shows the forecasting result of throughput and the autocorrelation function of prediction error. It can be seen from FIG. 8 that most of the prediction curves match with the ground truth quite well. The prediction residual diagnostics show that the residuals are uncorrelated, which means there no more information is left in the residual which should be used for forecasting. It’s another proof of the good modeling capability of the proposed model.

In another particular embodiment, how to use the proposed attention-based model to forecast the traffic of wireless communication networks is discussed. Similarly, in this example, the observed data is a time series format, and the time scale is an hour, which means that the model gets one measurement value traffic data each hour. Besides the traffic, the hourly action data is also obtained. Same to the throughput, the traffic will also rely on the previous traffic and the action that has been taken or to-be-taken. When predicting the traffic, the previous traffic and the action will be used. The formula of the forecasting model may be represented as traffic_{{t+1, …, t+N}}=f (traffic, action_{{t-l, …, t}}; w) , w is the parameter of the model that needs to be estimated.

If using one week of data as input for the model, the output of the model is the traffic prediction of the next day. In one example, to train the model, the data of 12 days may be used, and the model can be tested on data from the next 3 days. The model is the same as FIG. 4.

The result of forecasting traffic with the proposed attention-based model is shown in Table 2 as follows:

Table 2

To further analyze the proposed method, some visualization results are shown in FIG. 9, which shows the forecasting result of traffic and the autocorrelation function of prediction error. It can be seen from FIG. 9 that most of the prediction curves match with the ground truth quite well. Some traffic curves in the prediction horizon show very different behavior compared with the historical period, and the forecasting result is able to follow the ground truth, which shows that the proposed attention-based model has a good parameter sensitivity. When using the model for downstream optimization tasks, parameter sensitivity is a critical feature. Furthermore, the prediction residual diagnostics show that the residuals are uncorrelated, which means that there no more information is left in the residual which should be used for forecasting. It’s another proof of the good modeling capability of our model.

To summarize, this disclosure proposes to build the wireless digital twin using an attention mechanism and multi-time scale modeling method. This enables modeling the long-term trend and short-term fluctuations at the same time. Further, in the wireless digital twin scenario, this disclosure proposes to build a model which also takes into account the parameter sensitivity. By considering the parameter sensitivity when building the model, it becomes more suitable for downstream tasks, like optimizing the parameters of the network.

The present disclosure has been described in conjunction with various embodiments as examples as well as implementations. However, other variations can be understood and effected by those persons skilled in the art and practicing the claimed embodiments of the disclosure, from the studies of the drawings, this disclosure, and the independent claims. In the claims as well as in the description the word “comprising” does not exclude other elements or steps and the indefinite article “a” or “an” does not exclude a plurality. A single element or other unit may fulfill the functions of several entities or items recited in the claims. The mere fact that certain measures are recited in the mutually different dependent claims does not indicate that a combination of these measures cannot be used in an advantageous implementation.

Furthermore, any method according to embodiments of the disclosure may be implemented in a computer program, having code means, which when run by processing means causes the processing means to execute the steps of the method. The computer program is included in a computer-readable medium of a computer program product. The computer-readable medium may comprise essentially any memory, such as a ROM (Read-Only Memory) , a PROM (Programmable Read-Only Memory) , an EPROM (Erasable PROM) , a Flash memory, an EEPROM (Electrically Erasable PROM) , or a hard disk drive.

Moreover, it is realized by the skilled person that embodiments of the proposed wireless digital twin 700 and the corresponding computer program product, comprises the necessary communication capabilities in the form of e.g., functions, means, units, elements, etc., for performing the solution. Examples of other such means, units, elements, and functions are: processors, memory, buffers, control logic, encoders, decoders, rate matchers, de-rate matchers, mapping units, multipliers, decision units, selecting units, switches, interleavers, de-interleavers, modulators, demodulators, inputs, outputs, antennas, amplifiers, receiver units, transmitter units, DSPs, trellis-coded modulation (TCM) encoder, TCM decoder, power supply units, power feeders, communication interfaces, communication protocols, etc. which are suitably arranged together for performing the solution.

Especially, the processor (s) of the wireless digital twin 700 and the corresponding computer program product may comprise, e.g., one or more instances of a Central Processing Unit (CPU) , a processing unit, a processing circuit, a processor, an Application Specific Integrated Circuit (ASIC) , a microprocessor, or other processing logic that may interpret and execute instructions. The expression “processor” may thus represent a processing circuitry comprising a plurality of processing circuits, such as, e.g., any, some or all of the ones mentioned above. The processing circuitry may further perform data processing functions for inputting, outputting, and processing of data comprising data buffering and device control functions, such as call processing control, user interface control, or the like.

Claims

A computer-implemented method (300) for performance prediction in a wireless communication network, the method (300) comprising:

obtaining (301) historical performance data of the network, wherein the historical performance data comprises a plurality of sets of performance data associated with at least one network performance indicator, wherein each set of performance data is collected during a time period with a first sampling granularity, and wherein each piece of performance data included in the set of performance data is collected within the time period with a second sampling granularity; and

predicting (302) at least one piece of future performance data based on the plurality of sets of performance data.
The method (300) according to claim 1, wherein the predicting (302) comprises applying at least one machine learning model to the plurality of sets of performance data.
The method (300) according to claim 2, wherein the at least one machine learning model comprises a first attention-based model and/or a second attention-based model, each comprising the following input components: query, key and value.
The method (300) according to one of the claims 1 to 3, wherein each set of performance data comprises state-action pairs indicating the pieces of performance data, and each state is associated with the at least one network performance indicator, and each action is associated with at least one network parameter.
The method (300) according to claims 3 and 4, wherein the at least one piece of future performance data comprises a state-action pair, and the method (300) comprises:

obtaining an action of the at least one piece of future performance data; and

predicting a state of the at least one piece of future performance data by applying the first attention-based model to the plurality of sets of performance data, using a linear projection of the action of the at least one piece of future performance data as the query.
The method (300) according to claim 5, wherein predicting the state of the at least one piece of future performance data, by applying the first attention-based model to the plurality of sets of performance data, further comprises:

using a first linear projection of the state-action pairs of each set of performance data as the key, and using a second linear projection of the state-action pairs of each set of performance data as the value.
The method (300) according to claim 3 or one of the claims 4 to 6 when depending on claim 3, wherein the method (300) comprises:

applying the second attention-based model to at least a first set of performance data in the plurality of sets of performance data to predict a second set of performance data.
The method (300) according to claim 7, wherein the method (300) comprises:

applying the second attention-based model to at least the first set of performance data and the second set of performance data to predict a third set of performance data.
The method (300) according to claim 8, wherein applying the second attention-based model to at least the first set of performance data and the second set of performance data to predict the third set of performance data comprises:

obtaining an action of the third set of performance data; and

predicting a state of the third set of performance data by applying the second attention-based model to the at least the first set of performance data and the second set of performance data, using a linear projection of a predicted state of the second set of performance data as the query.
The method (300) according to claim 9, wherein predicting the state of the third set of performance data, by applying the second attention-based model to the at least the first set of performance data and the second set of performance data, further comprises:

using a first linear projection of the state-action pairs of the first set of performance data and the second set of performance data as the key, and using a second linear projection of the state-action pairs of the first set of performance data and the second set of performance data as the value.
The method (300) according to one of the claims 8 to 10, wherein the method (300) comprises:

applying the second attention-based model to the second set of performance data and the third set of performance data to identify a correlation between the second set of performance data and the third set of performance data.
The method (300) according to claim 11, wherein the method (300) comprises:

predicting the at least one piece of future performance data by applying the first attention-based model to the plurality of sets of performance data based on the correlation.
The method (300) according to any one of the claims 1 to 12,

wherein the at least one performance indicator comprises traffic, and/or throughput.
A wireless digital twin (700) of a wireless network system, comprising a model configured to implement the method (300) according to any one of the claims 1 to 13.
A computer program product comprising a program code for carrying out, when implemented on a processor, the method (300) according to any one of the claims 1 to 13.