WO2025178522A1

WO2025178522A1 - Updating datasets

Info

Publication number: WO2025178522A1
Application number: PCT/SE2024/050656
Authority: WO
Inventors: Juan CANTIZANI ESTEPA; Javier VILLEGAS CARRASCO; Sergio FORTES RODRÍGUEZ; Raquel BARCO MORENO; Javier ALBERT SMET; Raúl MARTÍN CUERDO
Original assignee: Telefonaktiebolaget LM Ericsson AB
Current assignee: Telefonaktiebolaget LM Ericsson AB
Priority date: 2024-02-20
Filing date: 2024-07-02
Publication date: 2025-08-28
Anticipated expiration: 2026-08-20

Abstract

There is provided a computer-implemented method for facilitating the imputation of data into a first dataset. The first dataset comprises a plurality of data entries. The data entries in the first dataset (100a) comprises measurements (110a) of a plurality of first metrics associated with a communication network. The labels (112a) are associated with respective sets of data entries in the first dataset (100a), and the labels (112a) identify network information associated with the set. A second dataset (100b) comprises a respective plurality of data entries, with the data entries in the second dataset (100b) comprising measurements (110b) of the first metrics and measurements (110b) of one or more second metrics. The labels are further associated with respective sets of data entries in the second dataset (100b), and the first dataset (100a) does not include measurements of the one or more second metrics. The method comprises estimating (602), using a machine learning model, the second dataset (100b), and the labels associated with the respective sets of data entries in the second dataset (100b), a first relationship between measurements of the one or more first metrics in a data entry and measurements of one or more second metrics in the data entry. The method also comprises applying (604), using the labels (112a) associated with the respective sets of data entries in the first dataset (100a), the first relationship to the data entries in the first dataset (100a) to obtain estimated measurements of the one or more second metrics for said data entries.

Description

UPDATING DATASETS

Technical Field

The present disclosure relates to communications networks, and particularly to methods, apparatus and computer program products for updating datasets.

Background

The complexity of communications networks is continuously growing, increasing the costs of network infrastructure and its operation, administration and management (OAM). The large number of metrics such as indicators, counters, alarms and configuration parameters employed in communications networks can make the task of monitoring such networks difficult.

To help address this issue, automatic methods based on Artificial Intelligence (Al) and/or Machine Learning (ML) can be used to automatically monitor and analyze datasets comprising measurements of a wide number and variety of metrics associated with communication networks, such as Key Performance Indicators (KPIs). This helps simplify the work of experts who aim to detect network issues and their diagnosis. However, missing data in these datasets (e.g., measurements of metrics that have only recently been defined) can significantly affect the accuracy and reliability of the performed monitoring and analysis.

This is a particular issue for ML/AI applications which are fed rich and frequently-updated datasets. These datasets can be time-consuming to gather and tedious to label. For this reason, life cycle management (LCM) of existing datasets should be carried out in order to help save time, energy, and/or resources.

Known data imputation methods mostly focus on the imputation of individual measurements for already measured or known metrics. For example, several data imputation techniques aim to complete time series-based datasets. These techniques range from simple approaches, such as mean imputation, to complex machine learningbased methods. The earlier methods (aside from interpolation) comprise autoregressive methods, such as Auto-Regressive Integrated Moving Average (ARIMA) and its variations (“On the estimation of ARIMA models with missing values”, C. F. Ansley and R. Kohn in Time series analysis of irregularly observed data, pages 9-37, Springer, 1984). However, these are essentially linear models (after differencing). Later, more complex methods appeared, such as:

- matrix completion (“Temporal regularized matrix factorization for high-dimensional time series prediction”, H.-F. Yu, N. Rao, and I. S. Dhillon in Advances in neural information processing systems pages 847-855, 2016);

- Multivariate Imputation by Chained Equations (MICE), which generate a series of hypothetical models for data (“Multiple imputation by chained equations: what is it and how does it work?”, M. J. Azur, E. A. Stuart, C. Frangakis, and P. J. Leaf in International journal of methods in psychiatric research, 20(1):40-49, 2011); and

- non-linear dynamical systems for time series prediction (“Time series prediction by chaotic modeling of nonlinear dynamical systems”, A. Basharat and M. Shah in Computer Vision, 2009 IEEE 12th International Conference on, pages 1941-1948, 2009).

Each of the above have different issues. For example, matrix completion techniques typically apply to almost static data and require strong assumptions, while MICE assumes that data follows a distribution of hypothetical models etc.

In order to try overcome the above limitations, researchers have attempted to impute missing values in datasets using Recurrent Neural Networks (RNNs), which are better suited for time series data imputation compared to typical Neural Networks (NNs) (see the disclosure of: “Recurrent neural networks for multivariate time series with missing values”, Z. Che et al., in Scientific reports, 8(1):6085, 2018; Doctor ai: Predicting clinical events via recurrent neural networks. E. Choi et al., in Machine Learning for Healthcare Conference, pages 301-318, 2016; “Directly modeling missing data in sequences with rnns: Improved classification of clinical time series”, Z. C. Lipton et al., in Machine Learning for Healthcare Conference, pages 253-270, 2016; “BRITS: bidirectional recurrent imputation for time series”, Wei Cao et al., in Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS'18). Curran Associates Inc., Red Hook, NY, USA, 6776-6786. 2018).

Some previous ML models (e.g., see Z. C. Lipton et al (2016) and Wei Cao et al (2018) referred to above) and newer methods use Long-Short Term Memory (LSTM) cells in their architectures. These RNNs, along with Variational AutoEncoders (VAEs), have proven to be powerful tools for this task (“An introduction to variational autoencoders”, Kingma, Diederik P., and Max Welling in Foundations and Trends® in Machine Learning 12.4 (2019): 307-392, 2019). In some examples, a regular VAE is used (rather than LSTMs) for data imputation, and issues with previous methods (such as MICE and other ML models like K-Nearest Neighbours (KNN)) can be overcome (see “VAE-BRIDGE: Variational Autoencoder Filter for Bayesian Ridge Imputation of Missing Data”, R. C. Pereira et al., in 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, pp. 1-7, 2020 and “A Two-Stage Missing Value Imputation Method Based on Autoencoder Neural Network”, Yu, Jiayin et al., in IEEE International Conference on Big Data (Big Data) (2021): 6064-6066, 2021).

One mechanism has been proposed which aims to estimate long periods of missing data in time series by looking for a similar measurements in the dataset, via correlation, and training a stacked LSTMs-based model to predict the measurements missing in the damaged time series (“Transfer learning for long-interval consecutive missing values imputation without external features in air pollution time series”, J. Ma. et al., in Advanced Engineering Informatics 44 (2020): 101092, 2020).

A system has been proposed which comprises a VAE that uses LSTM layers to recover data and can recover “block” type missing values (“Smoothed LSTM-AE: A spatiotemporal deep model for multiple time-series missing imputation”, Dong Li, et al., in Neurocomputing, volume 411 , pages 351-363, 2020). However, VAEs can suffer from several issues if the element dimensions are not adjusted properly (“Variational Lossy Autoencoder”, Chen. Xi et al., in arXiv preprint arXiv: 1611.02731 , 2016). The systems of J. Ma et al (2020) and Chen. Xi et al (2016) start from the situation where there is at least some data in the time series to estimate from.

Methods disclosed in US 20210049428 A1 , US20190391968 A1 , and US20230076149 A1 relate to data imputation, but again focus only on individual measurements or portions of already present metrics in the datasets. For example, US 20210049428 A1 proposes a system to identify missing data, get confirmation by an expert, and automatically propose a method for imputation of the missing value and the missing reason. US20190391968 A1 proposes a system to impute data in time series that lack a certain percentage of measurements via Expectation-Maximization algorithms. US20230076149 A1 proposes an internet-connected system which, by generating a secondary extremeness time series, is capable of proposing values for the missing measurements in time series datasets where extreme events appear. Summary

As discussed above, multiple AI/ML techniques have been developed to analyze and monitor large datasets comprising measurements of metrics (also referred to herein as metric datasets or reference datasets) to help experts detect and classify network issues.

One problem associated with these techniques is that the reference datasets are expensive and time-consuming to generate/obtain. For example, in some cases, reference datasets are labelled manually by experts (which may require a large amount of expertise and time). However, these reference datasets can quickly become obsolete. For example, they may lack measurements of a number of relevant metrics (e.g., ones defined after the reference dataset was established).

Another problem associated with these techniques is that ML/AI models are often developed using unique training datasets. This can decrease the efficacy/applicability of the ML/AI models when fed reference datasets that lack measurements of metrics that are present in the unique training datasets. Therefore, it would be beneficial to provide methods of domain and feature space adaptation for those models by, for example, providing methods for updating/extending previously obtained reference datasets. This can help improve the results of inputting the reference datasets into the models.

Embodiments of the present disclosure aim to address the above and other issues. For example, embodiments of the present disclosure allow for the estimation of measurements of certain metrics, where there are no measurements of these metrics in a reference dataset. These metrics may be referred to herein as novel metrics. The estimation is performed by extracting information from a dataset (e.g., one that has been more recently obtained) that does include measurements of the novel metrics, as well as measurements of metrics that have corresponding measurements in the reference dataset. The dataset from which information is extracted may be referred to herein as an extended dataset, and the metrics which have corresponding measurements in both the reference and extended datasets may be referred to herein as common metrics. The estimated measurements can then be included in the reference dataset in order to improve its LCM.

For example, methods according to embodiments of the present disclosure may comprise steps of: - extracting, from one or more extended datasets, a relationship between novel and common metrics;

- using this relationship to perform an estimation of measurements of the novel metrics; and

- updating the reference dataset to include the estimated measurements.

As such, embodiments of the present disclosure enable reference datasets to be updated by having them include estimations of measurements of metrics that otherwise do not have corresponding measurements in the reference dataset. As a result, the intense and tedious work of generating, normalizing and labelling new reference datasets can be avoided by instead updating existing reference datasets, facilitating the LCM of reference datasets.

According to a first aspect, there is provided a computer-implemented method for facilitating the imputation of data into a first dataset. The first dataset comprises a plurality of data entries. The data entries in the first dataset comprising measurements of a plurality of first metrics associated with a communication network. Labels are associated with respective sets of data entries in the first dataset. The labels identify network information associated with the set. A second dataset comprises a respective plurality of data entries. The data entries in the second dataset comprise measurements of the first metrics and measurements of one or more second metrics. The labels are further associated with respective sets of data entries in the second dataset. The first dataset does not include measurements of the one or more second metrics. The method comprises estimating, using a machine learning model, the second dataset, and the labels associated with the respective sets of data entries in the second dataset, a first relationship between measurements of the one or more first metrics in a data entry and measurements of one or more second metrics in the data entry. The method further comprises applying, using the labels associated with the respective sets of data entries in the first dataset, the first relationship to the data entries in the first dataset to obtain estimated measurements of the one or more second metrics for said data entries.

According to a second aspect, there is provided a computer-implemented method for facilitating the imputation of data into a first dataset. The first dataset comprises a plurality of data entries. The data entries in the first dataset comprising measurements of a plurality of first metrics associated with a communication network. Labels are associated with respective sets of data entries in the first dataset. The labels identify network information associated with the set. A second dataset comprises a respective plurality of data entries. The data entries in the second dataset comprise measurements of the first metrics and measurements of one or more second metrics. The labels are further associated with respective sets of data entries in the second dataset. The first dataset does not include measurements of the one or more second metrics. The apparatus is configured to cause the apparatus to perform the method according to the first aspect or any embodiment thereof.

According to a third aspect, there is provided a computer-implemented method for facilitating the imputation of data into a first dataset. The first dataset comprises a plurality of data entries. The data entries in the first dataset comprising measurements of a plurality of first metrics associated with a communication network. Labels are associated with respective sets of data entries in the first dataset. The labels identify network information associated with the set. A second dataset comprises a respective plurality of data entries. The data entries in the second dataset comprise measurements of the first metrics and measurements of one or more second metrics. The labels are further associated with respective sets of data entries in the second dataset. The first dataset does not include measurements of the one or more second metrics. The apparatus comprises a processor and a memory, said memory containing instructions executable by said processor whereby said apparatus is operative to perform the method according to the first aspect or any embodiment thereof.

According to a fourth aspect, there is provided a computer program product comprising a computer readable medium having computer readable code embodied therein, the computer readable code being configured such that, on execution by a suitable computer or processor, the computer or processor is caused to perform the method according to the first aspect or any embodiment thereof.

Certain embodiments may provide one or more of the following technical advantage(s):

- Simplified LCM of datasets: Reference datasets can be extended to adapt them to the feature space and domain of a particular learning task.

- Usable to estimate measurements of metrics where there are otherwise no available samples/measurements in the reference dataset: embodiments of the present disclosure can be used to adapt reference datasets to the domain and feature space of a learning task. In contrast, previous techniques based on data imputation simply focus on same reference and target domains (e.g., estimating measurements due to missing samples of measurements and/or periods of missing samples of measurements). Since embodiments of the present disclosure can estimate measurements of novel metrics which do not have corresponding measurements in a reference dataset, these embodiments do not need to take advantage of any existing measurements in the reference dataset of the novel metrics for their estimation.

- Expert or automated labelling is utilized when estimating information/measurements that are lacking from the reference dataset: Since embodiments of the disclosure can be used to update reference datasets used for detecting issues in a communication network, the labels generated for the reference dataset for that purpose can be reused (e.g., to define special cases that the AI/ML models can take advantage of for improved measurement estimation).

- Ease of re-trainability. Embodiments of the present disclosure function with a constrained number of parameters. This facilitates/eases the re-trainability of the AI/ML models so they can be used to estimate additional measurements of metrics (also referred to herein as features).

Furthermore, as seen from Table 1 and Figure 7 below, embodiments of the present disclosure have been tested on real (i.e., actually obtained) labelled cellular network data. This shows the efficacy of the disclosed embodiments when applied to the task of estimating measurements of metrics associated with a communication network.

Brief description of the drawings

For a better understanding of examples of the present disclosure, and to show more clearly how the examples may be carried into effect, reference will now be made, by way of example only, to the following drawings in which:

Figure 1 is a schematic diagram illustrating a system according to embodiments of the disclosure;

Figure 2 is a schematic diagram illustrating the steps/functions performed by the system of Figure 1 in more detail;

Figures 3, 4 and 5 are schematic diagrams of system architecture according to embodiments of the disclosure;

Figure 6 is a is a schematic flowchart showing a method in accordance with embodiments of the disclosure; Figure 7 is a table showing example measurements of novel KPIs that are included in a data entry of the validation subset;

Figure 8 provides graphs comparing estimated measurements of metrics against target measurements of the metrics, wherein the estimated measurements were obtained via implementing embodiments of the disclosure;

Figure 9 is a schematic diagram of an apparatus according to embodiments of the disclosure; and

Figure 10 is a schematic diagram of a virtualization environment in which embodiments of this disclosure can be implemented.

Detailed Description

Some of the embodiments contemplated herein will now be described more fully with reference to the accompanying drawings. Other embodiments, however, are contained within the scope of the subject matter disclosed herein. The disclosed subject matter should not be construed as limited to only the embodiments set forth herein; rather, these embodiments are provided by way of example to convey the scope of the subject matter to those skilled in the art.

Formally, the problem being addressed by embodiments of the present disclosure may be defined as the adaptation of a domain (£)) and a feature space (X) of an AI/ML model, where:

D_s = (X_s, P_S(X_S } and

D_T = {X_T, P_T(X_T)}, where D_s D_T, X_s c x_r, and P(X) is a feature (i.e., metric) distribution.

As discussed above, known data imputation methods often focus on time series imputation (i.e., scenarios where a number of measurements in a time series are lost when generating a dataset). For example, this may be due to random failures or measurement errors. On the other hand, embodiments of the present disclosure focus on estimating measurements of metrics for datasets that lack any measurements of said metrics. That is, embodiments of the present disclosure impute estimated measurements of one or more novel metrics into datasets. Previous systems are not suited to this task; they tend to decrease in efficacy when dealing with complex errors, such as “block” type errors, where several metrics are lacking measurements at the same time for a certain period.

For example, some imputation methods can deal with block failures but are often limited due to lacking the capability of using extra information, such as previously generated labels, and assume that the evolution of values in a time series is smooth (due to the nature of the data that they were tested with) (“Smoothed LSTM-AE: A spatio-temporal deep model for multiple time-series missing imputation”, Dong Li et al., in Neurocomputing, volume 411 , pages 351-363, 2020). This is not the case for every time series. For example, cellular metrics often have erratic measurements/behaviour.

Some imputation methods utilize transfer learning techniques to simplify the estimation of measurements/values, but they still require some data in a damaged time series in order to look for correlated examples to train from (“Transfer learning for long-interval consecutive missing values imputation without external features in air pollution time series”, J. Ma et al., in Advanced Engineering Informatics 44 (2020): 101092, 2020).

Techniques like data augmentation aim to generate artificial samples from knowledge extracted from existing datasets, like in the case of variational autoencoders (VAEs) or generative adversarial networks (GANs), but the outcome of is totally artificial (“Data augmentation techniques in time series domain: A survey and taxonomy”, G. Iglesias et al., in Neural Computing & Applications 35, 10123-10145, 2023). In contrast, embodiments of the present disclosure use data imputation processes to enable real measurements of metrics in a dataset to be maintained when updating a reference dataset to include estimated measurements of novel metrics.

The methods of US 20210049428 A1 , US 20190391968 A1 , and US 20230076149 A1 (discussed above) tackle the issue of imputing several lacking values for a feature in a time series dataset, but does not teach the imputation of values in a time series dataset that lacks any values of said feature. Some methods, like the ones disclosed in US 20190391968 A1 , require direct input from experts, meaning they cannot be completely automated.

In summary of the above, none of the discussed work aims to impute completely missing measurements of metrics in a dataset. That is, none of the discussed techniques aim at generating/estimating measurements of novel metrics that are missing from X_s, where corresponding measurements of the novel metrics are present in X_T. Rather, most of the above discussed work aims at completing portions of data which lack measurements of metrics (e.g., where a number of measurements have been lost or altered).

Embodiments of the present disclosure aim to expand the lifetime of reference datasets for which large amounts of time, energy, and/or resources have been invested. This is achieved by enabling the reference dataset to be updated so that the reference dataset may include measurements of novel metrics based on measurements of those novel metrics in an extended dataset). Therefore, in some embodiments, the above defined domain and feature space adaptation may be performed for the reference dataset. This allows the updated reference dataset to be used with an increased number of ML/AI mechanisms, as the ML/AI mechanisms can be applied to datasets that contain measurements of the same features (i.e., metrics) as the datasets they were trained with (facilitating what is known as transductive transfer learning). Embodiments of the present disclosure differ from known data augmentation techniques, which as discussed above, simply generate artificial samples. Instead, embodiments of the present disclosure employ various data imputation type processes to update reference datasets.

To address the above discussed issues, embodiments of the present disclosure apply a multi-stage method, where, in each stage, knowledge obtained by generated models is applied in the next step to produce a modified and improved result. This is referred as transfer learning, as the training of the different models benefits from parameters that have been previously adjusted in earlier stages, where different objectives were in place.

Embodiments of the present disclosure do not require the measurements in the reference and extended datasets to have been obtained in the same network. Furthermore, information on the characteristics of the network or the measurement behaviour are included via encoded labels so that it can used by the ML/AI models to better estimate measurements of specific metrics with a more complex behaviour.

For example, some embodiments of the present disclosure aim to update labelled datasets (e.g., datasets labelled with indications of different network statuses or failures). The labelled datasets may comprise measurements of common network metrics but lack measurements of some novel metrics of interest (e.g. newly defined KPIs that were not defined at the time the labelled dataset was gathered). The labelled data sets may be updated by imputing, into the labelled dataset for a network cell, sets of estimated measurements of the novel metrics of interests. The estimated measurements may be based on measurements of other, common metrics (e.g. KPIs) measured for the network cell. In this way, the labelled datasets can be extended and updated so that they contain measurements for metrics used by an AI/ML model. As such, the LCM of the labelled dataset is improved.

Embodiments of the present disclosure may receive, as an input:

- a first dataset (also referred herein as a reference dataset). The first dataset may comprise measurements of common metrics but lack measurements of a certain number of novel metrics;

- a second dataset (also referred herein as an extended dataset), that comprises measurements of common metrics as well as the novel metrics; and

- corresponding labels (e.g., an identified network status) associated with both the reference and extended datasets.

Through a series of processing steps (discussed in detail below), measurements of the novel metrics are estimated for the reference dataset. This may be achieved by: applying, in a data imputation fashion, transfer learning techniques, and facilitating the domain and feature space adaptation for posterior ML/AI mechanisms that will use those reference and extended datasets.

Embodiments of the present disclosure include a system that is capable of updating labelled reference datasets. The system can be used to impute complete sets of KPI or metric measurements for different network cells based on other metrics or KPIs measured for those or other cells.

Figure 1 illustrates a schematic diagram of a system according to embodiments of the disclosure. The system is for facilitating the imputation of data into a reference dataset 100a.

The system of Figure 1 takes advantage of extended datasets 100b including measurements 110b of common and novel metrics in order to estimate measurements of said novel metrics to be included in a reference dataset 100a that otherwise lacks measurements of said novel metrics. The system is capable of modelling the relationships between common and novel metrics based on their corresponding measurements 110b and takes into account labels 112a, 112b assigned to the datasets 100a, 100b in order to provide better outputs.

The system of Figure 1 receives one or more of inputs (e.g., the inputs discussed above). The inputs may be obtained from information included in reference and extended datasets. The datasets 100a, 100b comprise data entries (which may also be referred to herein as time series or samples) which comprise measurements 110a, 110b of metrics (e.g., network metrics). In particular, the inputs comprise:

- the first dataset 100a (reference dataset). The reference dataset 100a comprises a plurality of data entries. The data entries comprise measurements 110a of a plurality of common metrics associated with a communication network. The reference data 100a set lacks measurements of novel metrics that are present in the extended dataset 100b; and

- the second dataset 100b (extended dataset). The extended dataset 100b is similar to the reference dataset 100a in terms of measured metrics; it comprises a respective plurality of data entries, and the data entries in the extended dataset 100b comprise measurements 110b of the common metrics and measurements 110b of one or more novel metrics. For example, the extended dataset 100b may be more recently collected or comprise measurements of metrics obtained from cellular networks for which novel metrics have been defined.

While embodiments of the present disclosure are capable of working with non-labelled data entries, the inputs to the system preferably further comprise inputs labels 112a, 112b. Along with the measurements 110a, 110b of common/novel metrics in each dataset, each data entry in the dataset (e.g., each individual sample or time series) may have an associated label 112a, 112b. For example, the associated label 112a, 112b may be assigned by experts or an automatic labelling tool. The associated label 112a, 112b may group different measurements 110a, 110b by a common (i.e., shared) characteristic. For example, the shared characteristic may be related to a network issue or behaviour of a cell in which the measurements were obtained. The labels 112a, 112b are the same for both datasets. That is, the available pool of labels 112a, 112b is the same for both datasets, meaning that groups of measurements in the reference dataset 100a and the extended dataset 100b having the same/similar characteristics have the same label.

In some embodiments, at least some of the measurements 110a in the first dataset 100a and measurements 110b in the second dataset 100b may be obtained in the same network. In other embodiments, they may be obtained in different networks. If the measurements in the first dataset 100a and the second dataset 100b are obtained in the same network, the quality and/or accuracy of the outputs obtained may be improved.

In some embodiments, corresponding measurements in the reference dataset 100a and extended dataset 100b may have the same granularity.

The system of Figure 1 may return a third dataset 105 (extended reference dataset) as output. The extended reference dataset 105 may correspond to an updated version of the reference dataset 100a (i.e., the reference dataset 100a updated to include estimated measurements of novel metrics which have corresponding measurements in the extended dataset 100b) and its corresponding labels 112c) The measurements of the common metrics and estimated measurements of novel metrics in the extended reference dataset 105 are collectively labelled measurements 110c.

Aside from the above discussed inputs and outputs, several other external variables may be defined. For example, the following variables may be identified and input into embodiment of the present disclosure:

• A cellular network, system, and/or the measuring elements from which the measurements included in the datasets were obtained; and

• Labelling tools and/or the experts that provided the labels for the datasets.

After receiving the above discussed inputs, the system performs a number of steps/functions in order to output the extended reference dataset.

Figure 2 is a schematic diagram illustrating the steps/functions performed by the system of Figure 1 in more detail. In Figures 1 and 2, the system is illustrated as comprising four main blocks of steps/functions: a metric characteristics modelling block 101 , a modelled metrics regularization block 102, a label-based model specialization block 103; and a novel metrics estimation block 104. The skilled person would understand that one or more of these blocks may be omitted from the embodiments of the present disclosure.

Block 101 : Metric characteristics modelling

Block 101 aims to extract a relationship between the common metrics and the novel metrics included in the extended dataset 100b. First, the system may begin by identifying metrics that do not have corresponding measurements in the reference dataset 100a (i.e., identifying the novel metrics). In some embodiments, these novel metrics may be identified by comparing the metrics in the reference dataset 100a to the metrics in the extended dataset 100b (block 101a).

The system may then perform an iterative process in which relationships between novel and common metrics in the extended dataset 100b are modelled. This may be achieved by training and applying an AI/ML model that can estimate measurements of the novel metrics, and obtaining an estimation error by comparing the estimated measurements to the real measurements 110b included in the extended dataset 100b (block 101 b). The model parameters may be adjusted iteratively until the estimation error meets a threshold (e.g., it is minimized in block 101c).

The model used in block 101 does not need to be able to estimate measurements of the novel metrics to a high degree of accuracy, as its aim is to extract an estimation of the relationships between the novel and common metrics, and not to obtain highly accurate estimations of the novel metrics. Therefore, once the model utilized in step 101 b can start to approximate measurements of the novel metrics to an appropriate degree of accuracy, the system can proceed to block 102.

Embodiments of block 101 may utilize any suitable AI/ML model to estimate the relationships between novel and common metrics. As an example, the model may comprise an autoencoder. Autoencoders may comprise dense connected layers, unidimensional convolutional layers, RNNs, etc.

An example of an AI/ML model according to embodiments of the present disclosure is illustrated in Figure 3. The model comprises an autoencoder 300. The autoencoder 300 is trained to compute initial weights for its elements and to learn to extract metrics characteristics and encode them in a latent space. Metric characteristics may comprise one or more measurements (e.g., combinations of measurements) of the novel and common metrics from which the autoencoder 300 can extract a suitable amount of information (e.g., they may comprise one or more measurements that contain information of signal values and their evolution in time).

The autoencoder comprises an encoder and a decoder. The encoder comprises a series of layers whose objective is to compress the information of an input into a smaller number of variables. The decoder is in charge of returning the compressed information into the original input. Hence, the encoder is learning to compress (or extract) the relevant characteristics of the input into a smaller number of variables.

As the extended dataset 100b contains measurements 110b of the novel metrics (e.g., a KPI or metric of interest), the autoencoder 300 can compute an error of its estimation of its measurements of the novel metrics by comparing them to the measurements 110b of the novel metrics included in the extended dataset 100b. As can be seen in Figure 3, the autoencoder 300 may be composed of four layers: three layers composing the encoding element 302 and one layer composing the decoding element 304. The dense layers in encoding element 302 may be classical fully connected layers. LSTM layers in encoding element 302 may be a type of recurrent layer capable of considering the temporal information preceding a measurement from a time series. For example, if a time series X comprises n consecutive measurements of a metric, x (e.g., X = [x₁,x₂,x₃, ... ,x_n]), where each x_t is obtained at a respective time, t (e.g., at t , t₃, t₃, ..., t_n), the recurrent layer, when analyzing sample x takes into account all the previous measurements of such as [x₁,x₂,x₃, ... ,x ).

In the dimensioning of the model, three time series characteristics are defined:

1 . NUMBER OF SAMPLES (e.g., a number of data entries, also referred to herein as samples or time series, available in the dataset).

2. TIMESTEPS (e.g., a number of measurements of each metric in each time series).

3. FEATURES (e.g., a number of metrics which have corresponding measurements in a sample (i.e., the number of metrics measured)).

The autoencoder 300 takes as input the time series dataset 301 in batches (e.g., as portions of the NUMBER OF SAMPLES used). Each sample in a batch of samples has the form of a matrix of dimensions: [TIMESTEPS x FEATURES - 1], where the novel metric being estimated (e.g., a KPI or metric of interest) is excluded. A secondary reference is input into the autoencoder 300, which is the novel metric being estimated (having the form of a matrix of dimensions: [TIMESTEPS x 1]). The output of the autoencoder 300 is the estimated measurement of the novel metric 305 for the input data and has the form of a matrix of dimensions: [TIMESTEPS x 1], For the model training and evaluation, the scaled mean squared error (SMSE) of the estimated measurements versus their real values in the extended dataset 100b is computed. The model aims to reduce this error as much as possible, thus learning to recover measurements of the novel metric from the measurements of the common metrics in the extended dataset 100b, using the novel metrics in the extended dataset 100b as the ground truth. To achieve this, in some examples, the model may utilize equation 1 presented below, where n is the total amount of measurements to estimate in a time series (n = TIMESTEPS ■ 1), X is the value of an individual measurement value in the time series and X is the estimated value for X which is output by the autoencoder 300. In other examples (e.g., if measurements of more than one features/novel metric are to be estimated), n is an integer result of (TIMESTEPS ■ FEATU REStoESTI MATE), while the output time series would have the shape (TIMESTEPS x FEATUREStoESTIMATE).

Equation 1 : SMSE error used in the autoencoder training.

Block 102: Modelled metrics regularization

As a result of block 101 , the system is able to encode a relationship between the common metrics and novel metrics (e.g., a metric or KPI of interest). To improve the accuracy of the encoded relationship, block 102 can be employed, which expands and retrains the model obtained by the system in block 101.

Block 102 aims to regularize the relationships obtained in block 101. This regularization may comprise organizing, in a logical manner, the different models of the characteristics of the novel and common metrics created in block 101 (block 102a) (i.e., the encoded relationships obtained from block 101). For example, the different models may be related via a similarity metric. Once that similarity metric has been defined, it can be optimized (e.g., such that it is minimized at block 102c) via an iterative process that regularizes the models (block 102b) whilst maintaining their representative capacities. To achieve this, a watchdog process may be implemented that is in charge of annealing the strength of the iterative regularization process (block 102d). For example, the watchdog process can modify the weight of the regularization versus the accuracy of the output of the system. The outcome of block 102 may be an improved generalization of the relationships obtained in block 101.

In order to illustrate details of the functioning of block 102, the example of Figure 3 is continued. That is, in order to employ block 102, the autoencoder 300 of figure 3 may be expanded to include dense layers 406 in the latent space, as well as a sampling layer 408, as illustrated in Figure 4. Furthermore, the training errorof block 102 (e.g., Equation 3 below) can be calculated by adding a regularization term (e.g., Equation 2 below) to the training error of block 101 (e.g., Equation 1). By modifying the autoencoder 300 of Figure 3 in this way, the autoencoder 300 can be turned into a VAE. In other words, the neural network (NN) architecture used by the system in Figure 3 to estimate measurements of the novel metrics can be adapted to become a VAE by adding internal dense layers and sampling layers, as shown in Figure 4.

Equation 2 provides a regularization term, KL, that focuses on shaping the latent space distributions of the VAE such that proximity and similarity are correlated. To compute this regularization term, the sampling layer 408 may be used since it can provide sampled values of the latent space distributions (indicated in Equation 2 by ’" ). These sampled values are the result of sampling the latent space dense layers 406, namely z_mean and ^z;_oasi_ama that represent gaussian distributions modelling the input-output behaviour. The regularization term also includes a p coefficient (which is the one modified by the Kullback-Leibler (KL) annealing technique). Based on the training iterations, the value of this coefficient may range from 0 to 1 to avoid issues like the ones discussed by CHEN, Xi, et al. (2016), “Variational Lossy Autoencoder”, In: arXiv preprint arXiv: 1611.02731 , 2016.

Equation 2 (calculates the Kullback-Leibler regularization term):

The error of this phase (i.e., the error computed in block 102) can be computed using Equation 3:

TRAINING ERROR = SMSE + KL

Block 103: Label-based model specialization

Block 103 aims to modify the model used by the system of Figure 2 (and/or, for example, Figure 3) so that it includes information derived from labels associated with the time series measurements in each dataset, if said labels are available (Block 103a). This can be achieved by employing a variety techniques which may depend on the methods employed in blocks 101 and 102. Regardless of the technique used, the iterative process used in the block 102b may be repeated (block 103b) until an optimum adjustment of the models is reached (block 103c).

The information derived from the associated labels may be entered into the system in order to help it understand which metrics are more relevant for the prediction of the output relationship(s). The more independent the labels are from the measurements of the metrics, the more improvements can be achieved with this modification.

Returning to the example of Figure 3, the information related to the associated labels can be input into the AI/ML model. As shown in Figure 5, this can be achieved by: concatenating (e.g., via a concatenation element 510) a series of bits 512 that encode the class of samples (e.g., in a one-hot-encoding fashion); and providing the concatenated series of bits to the model as an additional input. The concatenated series of bits can be added to the output of the sampling layer 408, before the decoder 304, as shown in Figure 5. This modification allows the decoder 304 to take into account the information provided by the labels while training to improve its estimations. In some examples, block 103 has been shown to reduce the root-mean-square-error (RMSE) between the measurement estimations and the real measurements values by up to 30% for certain KPIs.

The error computed in this phase is the same as the error computer using Equation 3.

Block 104: Novel metrics estimation

Block 104 aims to apply the models generated and adjusted in previous steps (e.g., in blocks 101-103) so that it can estimate measurements of the novel metrics and add them to the reference dataset 100a (block 104a). For example, by applying the model trained by the system of Figure 2 (e.g., the models of any one of Figures 3-5) to the measurements included in the reference dataset 100a, it can produce estimations of the novel metrics for the timesteps for which the model has been trained (block 104b).

The system of Figures 1 and 2 can therefore generate and output an extended reference dataset 105 which comprise measurements 110c of common metrics as well as estimated measurements 110c of novel metrics. That is, the extended reference dataset 105 may be considered to be “completed” in the sense that it can include measurements for all metrics that have corresponding measurements in the extended dataset 100b. For example, the extended reference dataset 105 may have the same number (and same type) of measurements for the novel metrics as the extended dataset 100b. This allows the application of general ML and Al techniques to the extended reference dataset 105 for its analysis. It also means that bigger training datasets can be made available for these general ML and Al techniques.

Figure 6 illustrates a computer-implemented method according to embodiments of the present disclosure. The method is for facilitating the imputation of data into a first dataset 100a (i.e., a reference dataset). The method of Figure 6 may be implemented by any of the systems discussed in relation to Figures 1-5 and 7-9.

The first dataset 100a comprises a plurality of data entries. The data entries in the first dataset 100a comprise measurements 110a of a plurality of first metrics (i.e., common metrics) associated with a communication network (e.g., KPIs, key quality indicators, and/or alarms). Labels 112a are associated with respective sets of data entries in the first dataset 100a. The labels 112a identify network information associated with the set. For example, the network information may indicate any one or more of: a network issue (e.g., a radio link failure, reduced user throughput, delayed access to network resources resulting from network congestion, and/or user cell mobility); a type of network; an expected behaviour; and a network status.

A second dataset 100b (i.e., an extended dataset) comprises a respective plurality of data entries. The data entries in the second dataset 100b comprise measurements 110b of the first metrics (i.e., the common metrics) and measurements 110b of one or more second metrics (i.e., novel metrics). For example, the one or more second metrics may comprise KPIs, key quality indicators, and/or alarms. The labels 112b are further associated with respective sets of data entries in the second dataset 100b. At least some of the measurements 110b in the second dataset 100b may be associated with the same communication network as at least some of the measurements 110a in the first dataset 100a, or they may be associated with a different communication network to the measurements 110a in the first dataset 100a. The first dataset 100a does not include measurements of the one or more second metrics (i.e., the novel metrics). In some embodiments, respective measurements of the first metrics in the first dataset 100a may have the same granularity as corresponding measurements of the first metrics in the second dataset 100b.

The method may begin at step 602. In this step, using a machine learning model, the second dataset 100b, and the labels 112b, a first relationship between measurements of the one or more first metrics in a data entry and measurements of one or more second metrics in the data entry is estimated. The machine learning model may comprise an autoencoder, such as autoencoder 300 discussed above in relation to Figures 3-5.

To obtain the first relationship, step 602 may comprise implementing one or more of the method steps discussed above in relation to blocks 101 , 102, and 103 of Figures 1 and 2.

For example, step 602 may comprise providing, to the machine learning model, information derivable from the labels 112b to enable the machine learning model to determine which measurements 110b in the second dataset 100b are relevant for the estimation of the first relationship. The information derived from the labels 112b may comprise a concatenated series of bits (e.g., obtained using the concatenation element 510). The concatenated series of bits may encode (e.g., using one-hot-encoding) the labels. The concatenated series of bits can be input to a sampling layer 408 of the autoencoder 300.

Step 602 may comprise training the machine learning model using the second dataset 100b and the labels 112b. For example, the machine learning model may be trained to estimate, for respective sets of data entries in the second dataset 100b, a relationship between measurements 110b of the one or more first metrics in the set of data entries and measurements 110b of the one or more second metrics in the set of data entries. For example, this may be achieved by implementing any one or more of the method steps discussed in relation to block 101 of Figures 1 and 2.

Step 602 may comprise, for respective sets of data entries in the second dataset 100b, estimating a relationship between measurements of the one or more first metrics in the set of data entries and measurements of the one or more second metrics in the set of data entries. The method of Figure 6 may then further comprise a step of regularising the relationships estimated for the respective sets of data entries in the second dataset 100b to estimate the first relationship. For example, this may be achieved by implementing any one or more of the method steps discussed in relation to block 102 of Figure 2.

At step 604, the method may comprise applying, using the labels 112a, the first relationship to the data entries in the first dataset 100a to obtain estimated measurements of the one or more second metrics for said data entries.

In some embodiments, the method of Figure 6 may further comprise imputing the first dataset 100a using the estimated measurements of the one or more second metrics. For example, this may comprise implementing any one or more of the method steps discussed above in relation to block 104 of Figures 1 and 2.

To illustrate the above discussed embodiments, an example of their implementation is discussed below. In this example, the ML model of the implemented system corresponds to the model discussed in relation to Figure 5.

The implemented system utilized real (i.e., actually obtained) measurements of novel metrics included in data entries of an extended dataset to estimate measurements of four novel metrics: KPI 1 , KPI 2, KPI 3, and KPI 4. The data entries in the extended dataset each comprised: twenty-four measurements of common metrics (i.e., KPIs that were not KPIs 1 to 4) that were collected at one hour intervals; and twenty-four measurements of novel metrics (i.e., KPIs 1 to 4) that were collected at one hour intervals.

KPI 1 is related to handover, KPI 2 is related to Multiple Input Multiple Output (MIMO) usage, KPI 3 is related to paging, and KPI 4 is related to Control Channel Element (CCE) requests.

Before being input into the implemented system, the extended dataset was separated into two subsets: a training subset including 90% of the data entries; and a validation subset including 10% of the data entries. The model of the implemented system was trained according to embodiments of the disclosure using the data entries in the training subset. The real measurements of the novel KPIs in the training subset were used to compute the error on the different stages. Example measurements of the novel KPIs included in a data entry of the validation subset are shown in the table in Figure 7 (see the normalized measurements of the novel KPIs in the row marked as KPI Target).

As can be seen from Figure 7, the novel KPIs have different patterns in their corresponding measurements. For example, measurements of KPI 4 have an almost constantly low value, whilst the measurements of KPI 3 range from being relatively small to relatively large.

Figure 7 also shows the estimated measurements of the novel KPIs obtained from the implemented system when measurements of common KPIs included in the validation subsets were input into the implemented system (in the rows marked KPI estimation).

Figure 7 also shows the error associated with the estimated measurements of the novel KPIs (in the rows marked KPI error). The errors were calculated by comparing the estimated measurements of the novel KPIs obtained for the validation subset to the actual measurements of the novel KPIs included in the validation subset.

Figure 8 illustrates the results of the table in Figure 7 graphically so that the differences between the target (i.e., actual) measurements of the novel KPIs and the estimations thereof can be observed. As shown, the implemented system was capable of generating estimations of novel KPIs for time intervals where no previous measurements of said KPIs were provided. This demonstrates the efficacy of the above discussed embodiments, which utilize knowledge extracted from relationships between metrics in an extended dataset to update a reference dataset with estimated measurements of novel metrics.

Figure 9 is a schematic diagram of an apparatus 800 for facilitating the imputation of data into a first dataset 100a according to embodiments of the disclosure. The apparatus comprises memory 802, processing circuitry 804, and interface(s) 806. The processing circuitry 804 may be configured such that the apparatus 800 may be operable to perform the methods embodiments of the disclosure discussed in relation to Figures 1-8. It will be appreciated that the apparatus 800 may comprise one or more virtual machines running different software and/or processes. The apparatus 800 may therefore comprise, or be implemented in or as one or more servers, switches and/or storage devices and/or may comprise cloud computing infrastructure that runs the software and/or processes.

The processing circuitry 804 controls the operation of the apparatus 800 to implement the relevant part of the methods described herein. The processing circuitry 804 can comprise one or more processors, processing units, multi-core processors or modules that are configured or programmed to control the apparatus 800 in the manner described herein. In particular implementations, the processing circuitry 804 can comprise a plurality of software and/or hardware modules that are each configured to perform, or are for performing, individual or multiple steps of the method described herein in relation to the apparatus 800.

The interface(s)/communications interface(s) 806 is for use in enabling communications with other apparatus, network nodes, computers, servers, etc. For example, the communications interface 806 can be configured to transmit to and/or receive from other apparatus or nodes requests, acknowledgements, information, data, signals, or similar. The communications interface 806 can use any suitable communication technology.

The processing circuitry 804 may be configured to control the communications interface 806 to transmit to and/or receive from other nodes, etc. requests, acknowledgements, information, data, signals, or similar, according to the methods described herein.

In some embodiments, the memory 802 can be configured to store program code that can be executed by the processing circuitry 804 to perform the methods described herein. Alternatively or in addition, the memory 802 can be configured to store any requests, acknowledgements, information, data, signals, or similar that are described herein. The processing circuitry 804 may be configured to control the memory 802 to store such information therein.

Figure 10 is a block diagram illustrating a virtualization environment 900 in which functions implemented by some embodiments may be virtualized. For example, the apparatus 800 described herein can be implemented in virtualization environment 900. In the present context, virtualizing means creating virtual versions of apparatuses or devices which may include virtualizing hardware platforms, storage devices and networking resources. As used herein, virtualization can be applied to any device described herein, or components thereof, and relates to an implementation in which at least a portion of the functionality is implemented as one or more virtual components. Some or all of the functions described herein may be implemented as virtual components executed by one or more virtual machines (VMs) implemented in one or more virtual environments 900 hosted by one or more hardware nodes, such as a hardware computing device. Alternatively, the apparatus 900 may be entirely virtualized.

Applications 902 (which may alternatively be called software instances, virtual appliances, network functions, virtual nodes, virtual network functions, etc.) are run in the virtualization environment 900 to implement some of the features, functions, and/or benefits of some of the embodiments disclosed herein.

Hardware 904 includes processing circuitry, memory that stores software and/or instructions executable by hardware processing circuitry, and/or other hardware devices as described herein, such as a network interface, input/output interface, and so forth. Software may be executed by the processing circuitry to instantiate one or more virtualization layers 906 (also referred to as hypervisors or virtual machine monitors (VMMs)), provide VMs 908a and 908b (one or more of which may be generally referred to as VMs 908), and/or perform any of the functions, features and/or benefits described in relation with some embodiments described herein. The virtualization layer 906 may present a virtual operating platform that appears like networking hardware to the VMs 908.

The VMs 908 comprise virtual processing, virtual memory, virtual networking or interface and virtual storage, and may be run by a corresponding virtualization layer 906. Different embodiments of the instance of a virtual appliance 902 may be implemented on one or more of VMs 908, and the implementations may be made in different ways. Virtualization of the hardware is in some contexts referred to as network function virtualization (NFV). NFV may be used to consolidate many network equipment types onto industry standard high volume server hardware, physical switches, and physical storage, which can be located in data centers, and customer premise equipment.

In the context of NFV, a VM 908 may be a software implementation of a physical machine that runs programs as if they were executing on a physical, non-virtualized machine. Each of the VMs 908, and that part of hardware 904 that executes that VM, be it hardware dedicated to that VM and/or hardware shared by that VM with others of the VMs, forms separate virtual network elements. Still in the context of NFV, a virtual network function is responsible for handling specific network functions that run in one or more VMs 908 on top of the hardware 904 and corresponds to the application 902.

Hardware 904 may be implemented in a standalone network node with generic or specific components. Hardware 904 may implement some functions via virtualization. Alternatively, hardware 904 may be part of a larger cluster of hardware (e.g. such as in a data center or CPE) where many hardware nodes work together and are managed via management and orchestration 910, which, among others, oversees lifecycle management of applications 902. In some embodiments, hardware 904 is coupled to one or more radio units that each include one or more transmitters and one or more receivers that may be coupled to one or more antennas. Radio units may communicate directly with other hardware nodes via one or more appropriate network interfaces and may be used in combination with the virtual components to provide a virtual node with radio capabilities, such as a radio access node or a base station. In some embodiments, some signalling can be provided with the use of a control system 912 which may alternatively be used for communication between hardware nodes and radio units.

Although the computing devices described herein (e.g. the apparatus 800) may include the illustrated combination of hardware components, other embodiments may comprise computing devices with different combinations of components. It is to be understood that these computing devices may comprise any suitable combination of hardware and/or software needed to perform the tasks, features, functions and methods disclosed herein. Determining, calculating, obtaining or similar operations described herein may be performed by processing circuitry, which may process information by, for example, converting the obtained information into other information, comparing the obtained information or converted information to information stored in the network node, and/or performing one or more operations based on the obtained information or converted information, and as a result of said processing making a determination. Moreover, while components are depicted as single boxes located within a larger box, or nested within multiple boxes, in practice, computing devices may comprise multiple different physical components that make up a single illustrated component, and functionality may be partitioned between separate components. For example, a communication interface may be configured to include any of the components described herein, and/or the functionality of the components may be partitioned between the processing circuitry and the communication interface. In another example, non-computationally intensive functions of any of such components may be implemented in software or firmware and computationally intensive functions may be implemented in hardware. It should be noted that the above-mentioned examples illustrate rather than limit the disclosure, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended embodiments. The word “comprising” does not exclude the presence of elements or steps other than those listed in an embodiment or claim, “a” or “an” does not exclude a plurality, and a single processor or other unit may fulfil the functions of several units recited in the embodiments. Any reference signs in the claims shall not be construed so as to limit their scope.

Claims

1 . A computer-implemented method for facilitating the imputation of data into a first dataset (100a), wherein the first dataset (100a) comprises a plurality of data entries, the data entries in the first dataset (100a) comprising measurements (110a) of a plurality of first metrics associated with a communication network, wherein labels (112a) are associated with respective sets of data entries in the first dataset (100a), the labels (112a) identifying network information associated with the set, and wherein a second dataset (100b) comprises a respective plurality of data entries, the data entries in the second dataset (100b) comprising measurements (110b) of the first metrics and measurements (110b) of one or more second metrics, wherein the labels are further associated with respective sets of data entries in the second dataset (100b), wherein the first dataset (100a) does not include measurements of the one or more second metrics, the method comprising:

- estimating (602), using a machine learning model, the second dataset (100b), and the labels associated with the respective sets of data entries in the second dataset (100b), a first relationship between measurements of the one or more first metrics in a data entry and measurements of one or more second metrics in the data entry; and

- applying (604), using the labels (112a) associated with the respective sets of data entries in the first dataset (100a), the first relationship to the data entries in the first dataset (100a) to obtain estimated measurements of the one or more second metrics for said data entries.

2. The method of claim 1 , wherein the estimation step comprises: providing, to the machine learning model, information derivable from the labels to enable the machine learning model to determine which measurements (110b) in the second dataset (100b) are relevant for the estimation of the first relationship.

3. The method of claim 2, wherein the information derived from the labels comprises a concatenated series of bits, wherein the concatenated series of bits encodes the labels.

4. The method of claim 3, wherein the concatenated series of bits encodes the labels using one-hot-encoding.

5. The method of claim 3 or 4, wherein the machine learning model comprises an autoencoder (300) and the concatenated series of bits is input to a sampling layer (408) of the autoencoder (300).

6. The method of any one of claims 1 to 5, wherein the network information may indicate any one or more of: a network issue;

- a type of network;

- an expected behaviour; and

- a network status.

7. The method of claim 6, wherein the network issue may relate to any one or more of: radio link failure; reduced user throughput;

- delayed access to network resources resulting from network congestion; and user cell mobility.

8. The method of any one of claims 1 to 7, wherein the estimation step comprises:

- training (103b) the machine learning model using the second dataset (100b) and the labels.

9. The method of claim 8, wherein the machine learning model is trained to estimate, for respective sets of data entries in the second dataset (100b), a relationship between measurements of the one or more first metrics in the set of data entries and measurements of the one or more second metrics in the set of data entries.

10. The method of any one of claims 1-9, wherein the estimation step comprises:

- for respective sets of data entries in the second dataset (100b), estimating a relationship between measurements of the one or more first metrics in the set of data entries and measurements of the one or more second metrics in the set of data entries; and regularising the relationships estimated for the respective sets of data entries in the second dataset (100b) to estimate the first relationship.

11 . The method of any one of claims 1 to 10, the method further comprising: imputing the first dataset (100a) using the estimated measurements of the one or more second metrics.

12. The method of any one of claims 1 to 11 , wherein at least some of the measurements in the second dataset (100b) are associated with the same communication network as at least some of the measurements in the first dataset (100a).

13. The method of any one of claims 1 to 12, wherein the measurements in the second dataset (100b) are associated with a different communication network to the measurements in the first dataset (100a).

14. The method of any one of claims 1 to 13, wherein respective measurements of the first metrics in the first dataset (100a) have the same granularity as corresponding measurements of the first metrics in the second dataset (100b).

15. The method of any one of claims 1 to 14, wherein the first plurality of metrics and/or the second plurality of metrics comprise any one or more of: key performance indicators, KPIs; key quality indicators; and

- alarms.

16. The method of any one of claims 1 to 15, wherein the machine learning model comprises an autoencoder.

17. An apparatus (800) for facilitating the imputation of data into a first dataset (100a), wherein the first dataset (100a) comprises a plurality of data entries, the data entries in the first dataset (100a) comprising measurements of a plurality of first metrics associated with a communication network, wherein labels are associated with respective sets of data entries in the first dataset (100a), the labels identifying network information associated with the set, and wherein a second dataset (100b) comprises a respective plurality of data entries, the data entries in the second dataset (100b) comprising measurements of the first metrics and measurements of one or more second metrics, wherein the labels are further associated with respective sets of data entries in the second dataset (100b), wherein the first dataset (100a) does not include measurements of the one or more second metrics, the apparatus configured to cause the apparatus (800) to perform the method according to any one of claims 1-16.

18. The apparatus (800) of claim 17, wherein the apparatus is:

- a network node of the communications network; or

- a node external to the communications network which receives the first dataset (100a) and the second dataset (100b).

19. An apparatus (800) for facilitating the imputation of data into a first dataset (100a), wherein the first dataset (100a) comprises a plurality of data entries, the data entries in the first dataset (100a) comprising measurements of a plurality of first metrics associated with a communication network, wherein labels are associated with respective sets of data entries in the first dataset (100a), the labels identifying network information associated with the set, and wherein a second dataset (100b) comprises a respective plurality of data entries, the data entries in the second dataset (100b) comprising measurements of the first metrics and measurements of one or more second metrics, wherein the labels are further associated with respective sets of data entries in the second dataset (100b), wherein the first dataset (100a) does not include measurements of the one or more second metrics, the apparatus (800) comprises a processor (804) and a memory (802), said memory (802) containing instructions executable by said processor (804) whereby said apparatus (800) is operative to perform the method according to any one of claims 1-16.

20. The apparatus (800) of claim 19, wherein the apparatus (800) is:

- a network node of the communications network; or

21. A computer program product comprising a computer readable medium having computer readable code embodied therein, the computer readable code being configured such that, on execution by a suitable computer or processor, the computer or processor is caused to perform the method of any of claims 1-16.