WO2023097370A1

WO2023097370A1 - Systems and methods for received signal strength prediction using a distributed federated learning framework

Info

Publication number: WO2023097370A1
Application number: PCT/AU2022/051437
Authority: WO
Inventors: Yifan GU; Wanli OUYANG; Yonghui Li; Branka Vucetic; Zhanwei HOU; Haiyao YU
Original assignee: University of Sydney
Current assignee: University of Sydney
Priority date: 2021-12-01
Filing date: 2022-12-01
Publication date: 2023-06-08
Anticipated expiration: 2024-06-01

Abstract

Systems and methods for received signal strength (RSS) prediction using a distributed federated learning framework are disclosed herein. In an example, a system for predicting RSS includes a plurality of user devices. Each of the user devices is configured to generate private information including at least one of user device movement trajectories, user device velocities, or RSS measurements at a plurality of locations over time. The system also includes a server configured to store, retrieve, or access public information that is related to a serving base station for each user device of the plurality of user devices. The server uses a distributed machine learning process in conjunction with the public and private information to accurately predict the RSS of the plurality of the user devices at locations that each of the user devices has not yet reached without revealing the private information of each user device.

Description

TITLE

SYSTEMS AND METHODS FOR RECEIVED SIGNAL STRENGTH PREDICTION USING A DISTRIBUTED FEDERATED LEARNING FRAMEWORK

BACKGROUND

[0001] In the field of wireless communication networks, received signal strength (RSS) prediction accuracy may impact coverage optimization and interference management processes for network planning, as well as proactive resource allocation and anticipated network management processes. Traditional methods for RSS prediction include methods based on ray tracing and stochastic radio propagation models.

[0002] Ray tracing methods are used for the prediction of radio propagation environment geographically. Although ray tracing can provide an accurate prediction of RSS, it requires detailed 3D geometric information and dielectric properties of the reflective surfaces of the obstacles in the radio environment, which may not be available in practice for some applications. Moreover, ray tracing methods may lead to high computational complexity when applied over large areas.

[0003] Stochastic radio propagation models such as the Urban Macro (UMa) path loss model are associated with relatively less computational complexity. However, these models typically only consider some rough classification of radio environment types such as urban or rural areas, and thus mainly depend on the relative parameters between the transmitter and the receiver. As such, stochastic radio propagation models usually fail to provide accurate RSS predictions for different locations in a specific radio environment.

[0004] Recently, deep learning has been leveraged for RSS prediction by using satellite maps, where the image of the satellite map around a user is added as the input feature of a convolutional neural network (CNN). Such deep learning models may provide better accuracy than the stochastic radio propagation model and may have less computational complexity than raytracing. This is because a satellite map can include rich information in terms of the terrain, reflection and blockage of a specific radio environment, which can be captured by CNN. However, the data sets used for such CNN models is constructed offline by field survey, which requires the manual measurement of a large number of locations in order to build a large enough data set for the deep neural network. Moreover, the offline data set cannot adapt to environmental changes such as seasonal or landscape changes, which may not be accurate due to these changes.

[0005] Any discussion of the prior art throughout the specification should in no way be considered as an admission that such prior art is widely known or forms part of common general knowledge in the field.

SUMMARY

[0006] Systems and methods for received signal strength prediction using a distributed federated learning framework are disclosed herein. In an example, a system for implementing a distributed federated learning (FL) framework for received signal strength (RSS) prediction includes a plurality of user devices. Each of the user devices is configured to generate private information including at least one of user device movement trajectories, user device velocities, or RSS measurements at a plurality of locations over time. The system also includes a server configured to wirelessly communicate with the plurality of user devices over a wireless communication network. The server is configured to at least one of store, retrieve, or access public information that is related to a serving base station (BS) for each user device of the plurality of user devices, and accurately predict using a distributed machine learning process the RSS of the plurality of the user devices at locations that each of the user devices has not yet reached without revealing the private information of each user device.

[0007] In some embodiments, which may be combined with other embodiments disclosed herein, the public information includes at least one of a base station location LBS, a base station height HBS, a base station transmitter power PT X, a base station identifier BID, or public satellite map data M .

[0008] In some embodiments, which may be combined with other embodiments disclosed herein, the distributed machine learning process includes a distributed federated learning (FL) framework that utilizes a deep vision attention mechanism for RSS predictions.

[0009] In some embodiments, which may be combined with other embodiments disclosed herein, the distributed FL framework includes a preparing stage, a training stage, and a prediction stage. [0010] In some embodiments, which may be combined with other embodiments disclosed herein, for the preparing stage the server is configured to select a set of the plurality of user devices to participate in the RSS prediction process, and broadcast public information and information about an example attention neural network architecture for the RSS prediction process to all the selected user devices of the set. The broadcasting of the public information and information about an example attention neural network architecture for the RSS prediction process causes the selected used devices to record private information including RSS measurements and velocity measurements at their respective current locations and subsequent locations over time, use the private information and the received public information to determine distances from the user device of the selected set of user devices to base stations serving the user device at the times the private information was generated, select or construct satellite images corresponding to locations of the user device at times using at least one of the private information or the public information, and convert the private information and public information into input features of a neural network for RSS prediction that includes a data augmentation block that uses the selected or constructed satellite images, a path loss estimation model block, two fully connected neural network blocks including the determined distances, and a deep vision transformer (DeepVIT) block.

[0011] In some embodiments, which may be combined with other embodiments disclosed herein, the RSS measurements are obtained from reference signal received power (RSRP) measurements obtained by measuring cell specific reference signal in a long-term-evolution (LTE) network.

[0012] In some embodiments, which may be combined with other embodiments disclosed herein, the RSS measurements are obtained by measuring synchronization signal (SS) and a channel state information reference signal (CSI-RS) in a 5G wireless network.

[0013] In some embodiments, which may be combined with other embodiments disclosed herein, the velocity is measured using inertial measurement units (IMU) sensors of the user devices.

[0014] In some embodiments, which may be combined with other embodiments disclosed herein, after generating the input features described above, each user device of the selected set performed the training stage by training the neural network for RSS prediction using the input features, and uploading computed neural network weights of the trained neural network for RSS prediction.

[0015] In some embodiments, which may be combined with other embodiments disclosed herein, the server is configured to calculate average weights for the trained neural network for RSS prediction based on the received weights from the user devices of the selected set, and broadcast the average weights to the plurality of user devices.

[0016] In some embodiments, which may be combined with other embodiments disclosed herein, for the prediction stage, the server is configured to use the average weights for the trained neural network to predict RSS at a location of interest or a given velocity.

[0017] In some embodiments, which may be combined with other embodiments disclosed herein, the predicted RSS at the location of interest or the given velocity is used to optimize coverage, interference, and/or cell association in network planning.

[0018] In some embodiments, which may be combined with other embodiments disclosed herein, the predicted RSS at the location of interest or the given velocity, is used to proactively allocate network resources and network management processes.

[0019] In some embodiments, which may be combined with other embodiments disclosed herein, a server for implementing a distributed federated learning (FL) framework for received signal strength (RSS) prediction is configured to wirelessly communicate with a plurality of user devices over a wireless communication network. Each of the plurality of user devices is configured to generate private information including at least one of user device movement trajectories, user device velocities, or RSS measurements at a plurality of locations over time. The server is also configured to at least one of store, retrieve, or access public information that is related to a serving base station (BS) for each user device of the plurality of user devices, select a set of the plurality of user devices to participate in the RSS prediction process, broadcast public information and information about an example attention neural network architecture for the RSS prediction process to all the selected user devices of the set, receive from each user device of the set of user devices computed neural network weights of a trained neural network for RSS prediction, and use the computed neural network weights of the trained neural network to predict RSS at a location of interest or at a given velocity. [0020] In some embodiments, which may be combined with other embodiments disclosed herein, broadcasting the public information and the information about the example attention neural network architecture for the RSS prediction process to all the selected user devices of the set causes each of the user devices of the set to record private information including RSS measurements and velocity measurements at their respective current locations and subsequent locations over time, use the private information and the received public information to determine distances from the user device of the selected set of user devices to base stations serving the user device at the times the private information was generated, select or construct satellite images corresponding to locations of the user device at times using at least one of the private information or the public information, convert the private information and public information into input features of a neural network for RSS prediction that includes a data augmentation block that uses the selected or constructed satellite images, a path loss estimation model block, two fully connected neural network blocks including the determined distances, and a deep vision transformer (DeepVIT) block, train the neural network for RSS prediction using the input features, and upload computed neural network weights of the trained neural network for RSS prediction to the server.

[0021] In some embodiments, which may be combined with other embodiments disclosed herein, the public information includes at least one of a base station location LBS, a base station height HBS, a base station transmitter power PT X, a base station identifier BID, or public satellite map data M .

[0022] In some embodiments, which may be combined with other embodiments disclosed herein, the predicted RSS at the location of interest or at the given velocity is used to optimize coverage, interference, and/or cell association in network planning.

[0023] In some embodiments, which may be combined with other embodiments disclosed herein, the predicted RSS at the location of interest or at the given velocity, is used to proactively allocate network resources and network management processes.

[0024] It is an object of the present invention to overcome or ameliorate at least one of the disadvantages of the prior art, or to provide a useful alternative.

[0025] Additional features and advantages are described in, and will be apparent from, the following Detailed Description and the Figures. The features and advantages described herein are not all-inclusive and, in particular, many additional features and advantages will be apparent to one of ordinary skill in the art in view of the figures and description. Also, any particular embodiment does not have to have all of the advantages listed herein and it is expressly contemplated to claim individual advantageous embodiments separately. Moreover, it should be noted that the language used in the specification has been selected principally for readability and instructional purposes, and not to limit the scope of the inventive subject matter.

BRIEF DESCRIPTION OF THE FIGURES

[0026] Fig. 1 is a simplified block diagram of an example computing system used to implement an example distributed federated learning (FL) framework for received signal strength (RSS) prediction, according to an example embodiment of the present disclosure.

[0027] Fig. 2 illustrates an example image processing operation for generating an image of an environment of a user using satellite map data, according to an example embodiment of the present disclosure.

[0028] Fig. 3 illustrates an example user environment that includes reflectors affecting a user device in the environment, according to an example embodiment of the present disclosure.

[0029] Fig. 4 is a simplified block diagram of an example deep vision transformer (DeepVIT) model used for processing satellite image data, according to an example embodiment of the present disclosure.

[0030] Fig. 5 is a simplified flow chart of a process for transformer encoding with a reattention mechanism, according to an example embodiment of the present disclosure.

[0031] Fig. 6 is a diagram that depicts route information collected from a plurality of user devices and superimposed on a satellite image, according to an example embodiment of the present disclosure.

[0032] Fig. 7 is a graph illustration of example test errors associated with RSS prediction results computed using an example machine learning process in a centralized configuration, according to an example embodiment of the present disclosure.

[0033] Fig. 8 is a graph illustration of example test errors associated with RSS prediction results computed using an example machine learning process in a distributed configuration, according to an example embodiment of the present disclosure. [0034] Fig. 9 is a graph illustration of example average computing costs per epoch in centralized and distributed configurations, according to an example embodiment of the present disclosure.

DETAILED DESCRIPTION

[0035] Methods, systems, and apparatus are disclosed herein for an RSS prediction framework that leverages accessible satellite map data to capture features of a radio environment. In an example, a distributed federated learning (FL) architecture is disclosed that utilizes user device generated real-time (raw) data while preserving user privacy. Examples herein also include RSS prediction processes and systems that further improve prediction accuracy by utilizing deep vision transformer (DeepVIT) image processing techniques to process satellite image data to identify and/or pay attention to specific portions of an image associated with, for example, reflective surfaces that may impact RSS. The example methods and systems of the present disclosure provide improved prediction accuracy that outperforms traditional RSS prediction methods, such as ray tracing, Urban Macro (UMa) model, and convolutional neural network (CNN) model-based methods. Furthermore, the example methods and systems of the present disclosure provide improved computational performance and may require up to five times less computational time costs as compared with traditional CNN based methods.

[0036] Additional examples, features, and advantages are possible as well and will be described in greater detail below in connection with example embodiments herein.

[0037] Referring now to the Figures, Fig. 1 is a simplified block diagram of an example computing system 100 used to implement an example distributed federated learning (FL) framework for received signal strength (RSS) prediction, according to an example embodiment of the present disclosure. The system 100 includes a server 110 configured to wirelessly communicate with a plurality of user devices 120 (e.g., N users) over a wireless communication network.

[0038] The server 110 may store, retrieve, and/or access public information (i.e., information that is not deemed as private user information, etc.) about a serving base station (BS) for each user device n of the plurality of user devices 120. The public information may include base station location LBS, base station height HBS, base station transmitter power PTX, BS identifier BID, and/or public satellite map data M (e.g., Google maps data, etc.). Each user device 120 generates private information, which may be collected in real-time by the user device, including user device movement trajectories L_u.t.n, t = 1, 2, ..., T , velocities Vt,n , and/or RSS measurements by the user device n at a plurality of locations R_t,_n, where t is the index of time slot.

[0039] For example, the system 100, as well as other systems and methods of the present disclosure, enable the server 110 and a user device 120 to accurately predict (e.g., using a distributed machine learning process) the RSS of the user device 120 at locations that the user device 120 has not yet reached (and/or has never reached), without revealing the private information of the user device 120 to the server 110.

[0040] To facilitate this, in some embodiments, the system 100 employs a distributed FL framework that utilizes a deep vision attention mechanism for RSS predictions. In this embodiment, the framework/processs, as described below, includes three main stages: a preparing stage, a training stage, and a prediction stage.

[0041] In the preparing stage, the server 110 may select A/ user devices 120 to participate in the RSS prediction process. Next, the server 110 may broadcast public information (e.g., LBS, HBS, PTX, BID, M, etc.) and information about an example attention neural network architecture for the RSS prediction process to all the selected user devices 120.

[0042] The selected user devices 120 may start to record RSS measurements and velocity measurements at their respective current locations. The RSS measurements can be obtained from reference signal received power (RSRP) measurements, for example, obtained by measuring cellspecific reference signal in a long-term-evolution (LTE) network, or by the measuring synchronization signal (SS) and the channel state information reference signal (CSI-RS) in a 5G wireless network. The velocity can be measured using inertial measurement units (IMU) sensors, which are generally included in many types of user devices such as mobile phones and the like. For locations, although 6G networks are expected to support high accuracy location information from the mobile network to the user devices, some existing mobile network architectures do not provide accurate user location information. Such user devices 120 may be configured to use information from a global navigation satellite systems (GNSSs), such as the Global Positioning System (GPS), which is private information that is typically collected using sensors in the user device 120 (e.g., GPS sensors, etc.). [0043] During the training stage, each of the selected user devices 120 may be configured to collect its respective RSS and velocity measurements at multiple locations, which may be collected in real-time (i.e., while training the machine learning model, etc.). For example, at a specific time to, a user device n may be configured to accumulate a data set including its movement trajectories L_u.t.n, velocities vt,_n, and RSS at each location R_t,_n and time slot t = 1, 2, ..., to. During prepossessing, raw data such as the public information and private information collected by the user devices 120 (and/or broadcast by the server 110) are converted into input features of an example neural network for RSS prediction, which includes: a data augmentation block, a path loss estimation model block, two fully connected neural network blocks and a deep vision transformer (DeepVIT) block. Among these example input features, as illustrated in Fig. 1, {PTX, dt,_n }are example input features processed by the path loss model block, { vt,_n, BID, di._n\^‘ & example input features processed by the first neural network (NN) block, and {m_t,_n} is an example input feature processed by the data augmentation block.

[0044] In the illustrated example, dt,_n denotes a distance between the nth user and the BS serving the nth user at the time slot t, which could be calculated using the LBS, HBS and L_u,t,n- information described above.

[0045] Further, a satellite image around the user device 120 may be identified and utilized to capture characteristics of the radio environment of the user device 120. In particular, m_t,_n denotes a corresponding satellite image of an environment of or near the user device 120, which could be constructed using M (public information) and L_u,t,_n (private information). In an example, each image m_t,_n has a rectangular shape with side lengths of about 185 meters (m) x 185m and a pixel size of about 256 x 256 x 3 pixels centered at a location of the user device 120 such that a surrounding environment of the user is sufficiently clear for training the example attention neural network. Further, in an example, the user device 120 is configured to rotate the image to ensure that a bottom side of the rotated image points toward the BS serving the user device 120 at the time slot t associated with the image.

[0046] By way of example, Fig. 2 illustrates an example satellite image 200 of an environment around a user device (labelled as “User 1” in Fig. 2). In this example, an extracted portion 210 of the image 200 has specific dimensions (e.g., 185m x 185m) and pixel size (e.g., 256 x 256 x 3 pixels) and in which the user device 120 is approximately at the center of the extracted image 210. Further, the extracted image portion 210 is rotated so that a bottom side of the image 210 faces (e.g., aligned perpendicular to) a direction at which the base station 220 is located.

[0047] Returning now to Fig. 1, after generating the input features described above, each user device 120 may be configured to train a local attention neural network 130 using its local data set prepared as described above. Then, each user device 120 is configured to upload its computed neural network weights of its local model 130 to the server 110. Next, the server is configured to calculate average weights for the neural network model based on the received weights from all the user devices 120, and to broadcast the average weights to all the user devices 120. This process may then be repeated until the local neural network 130 of each user device 120 converges.

[0048] Compared with existing centralized schemes, the proposed framework of the example system 100 advantageously allows avoiding and/or reducing the time and labor consuming process of manually measuring an initial data set in a field survey repeatedly to update a neural network machine learning model. For example, the proposed framework of the example system 100 can leverage user device generated and/or compiled data sets in real-time and thus quickly adapt to changes in the radio environment of the user devices better than existing offline schemes.

[0049] Finally, in the prediction stage, a neural network (e.g., a recent or latest instance of the deep attention neural network 130) associated with the latest weights may be used for predicting the RSS at a location of interest and/or at a given velocity. In some examples, the accurate RSS prediction computed using the example system 100 can also be used in a variety of network management processes, such as to optimize the coverage, interference and cell association in network planning, and/or proactive resource allocation and network management processes. To that end, in various examples, the functions of the prediction stage can be performed at a given user device 120, at the server 110, a base station, and/or at any other computing system authorized to run an instance of the deep attention neural network 130 using the latest, most recent, or any other specific set of weights computed by the server 110 and the plurality of user devices 120 in accordance with the example distributed federated learning architecture described above so as to accurately perform RSS predictions.

[0050] The various blocks of the example deep attention neural network 130 will now be described in greater detail. As shown in Fig. 1, the deep attention neural network 130 includes a data augmentation block, a path loss estimation model block, two fully connected neural network blocks (labeled “1st NN” and “2nd NN”), and a DeepVIT block.

[0051] The data augmentation block represents a data augmentation process that is applied to each input image (m_f,_n ) extracted from satellite map data before providing it as an input to the DeepVIT block. During the data augmentation process, a method of the example system 100 may include rotating each image by a random angle less than 20° and/or randomly shearing the input image. Introducing random rotation and/or shearing may improve the robustness of the deep attention neural network model 130 by accounting for imperfectly aligned images.

[0052] The path loss model block represents a path loss model that is trained using the input parameters PTX and dt,_n. Thus, the system 100 may integrate domain knowledge (i.e., a path loss model) with a deep neural network (e.g., the two complete neural network layers) to further improve the prediction accuracy of the deep attention neural network model 130. In an example, the path loss model is implemented according to the equation [ 1 ] below.

PL = PLo + ylogw(d_t,n) + X_n. [ 1 ]

[0053] In equation [1], PLo and y may be calculated based on previous measurements, such as dt,_n and PTX- X_n is a normal random variable with zero means. Different path loss models could be implemented for different scenarios or environments, such as rural, suburban, or urban. In an example, the path loss model implemented by the path loss model block of the deep attention neural network 130 uses use the Urban Macro (UMa) model to measure the path loss.

[0054] The first and second neural network (NN) blocks (respectively labeled “ 1 st NN” and “2nd NN” in Fig 1) represent two fully connected neural network layers. The first neural network layer receives the input parameters {vt.n, BID, dt,_n}as inputs. The second neural network layer receives, as inputs, the outputs of the first neural network layer and the outputs of the DeepVIT block. In an example, the first and second neural network relays use a ReLu activation function.

[0055] The DeepVIT block represents a deep vision transformer machine learning model that is adapted and trained to use the output images from the data augmentation block as input. In general, DeepVIT machine learning models use the mechanisms of attention to differentially weight the significance of each part of an input image. For the purposes of the deep attention neural network 130, the DeepVIT block implements an adapted DeepVIT model to improve the performance and efficiency of RSS prediction. In particular, the DeepVIT model of the present disclosure is configured to pay attention to important parts of an image that may be relevant for RSS prediction, such as walls and other reflective surfaces in the environment of a wireless user device. As an analog example involving human eyes watching an image, a human observer may focus on important parts of the image and blur other less relevant parts in his mind. Similarly, a transformer model such as DeepVIT can learn to focus on important parts of an image during the training stage and ignore or deem as less significant other portions of the image that are less relevant or less important for RSS prediction. More generally, DeepVIT is a deep learning version of transformer learning models that are used in the field of computer vision.

[0056] In the proposed RSS prediction method of the system 100, as noted above, satellite images around each user device are used (after being augmented, rotated, sheered, etc.) as an input feature for the DeepVIT block. The individual pixels of an input image are not equally important with respect to the application of RSS prediction. For example, a particular RSS may be associated with multiple paths in an environment due to reflections and/or blockage in the environment. Such reflection surfaces may be advantageously identified and/or assigned relatively more significant weights by efficiently using the adapted DeepVIT learning model of the system 100.

[0057] Fig. 3 illustrates an example environment 300 in which a user device receives a signal from a base station 320 from multiple paths, including paths reflected off reflective surfaces (e.g., building walls) in the environment 300.

[0058] Fig. 4 is a simplified block diagram of an example DeepVIT process 400 for processing input images in an example deep attention neural network machine learning system for RSS prediction, according to an example embodiment of the present disclosure. For example, the DeepVIT process 400 can be used to implement the DeepVIT block of Fig. 1.

[0059] At block 410, the DeepVIT process 400 includes receiving an input image (e.g., from the data augmentation block of the system 100) and splitting the input image into miniimages. The input image may be represented by an H x W x C matrix, where H represents each pixel’s height position, I represents its width position, and C represents its color values. In an example, the input image is split into z x z mini-images. Thus, the output of the splitting process at block 410 may be represented by a Z x P² x C matrix, where Z = z² and P = H/z. [0060] At block 420 (“Flatten layer”), the DeepVIT process 400 involves compressing each mini-image into a patch represented by a 1 x D matrix using a linear neural network. The process for compressing each mini-image is formulated by the equation [2] :

I => [IiE, I₂E... IzE [2]

[0061] In equation [

[0062] At block 430 (“Position Embedding”), the method 400 includes prepending an extra learnable embedding at a beginning of the output patch from the flatten layer 420. Additionally, position embedding may be added to the patch, with a (Z + 1) x D dimension into each patch. For example, the input of the transformer encoder (block 440) may be represented by the following equation [3]:

Tinput ⁼G> [fcZass, E, hE... IzE] + Epos [3]

[0063] In equation

[0064] FIG. 5 is a simplified block diagram of example processes performed by the example transformer encoder 440 of Fig. 4, according to an example embodiment of the present disclosure.

[0065] As shown, the transformer encoder 440 receives and processes the embedded patches 430 using a LayerNorm block (LN) 442. In general, layer normalization (LayerNorm) may include any technique for normalizing distributions of intermediate layers in a multi-layer neural network learning model. The transformer encoder 440 may then multiply each patch output from the LN 442 by three different matrices to generate different vectors q, k, v, which denote, respectively, a query vector, a key vector, and an information vector. In an example, each of the three vectors can be used to learn a different aspect or knowledge. For instance, the transformer encoder 440 may be configured to multiply the q, k, v vectors by different matrices to generate six vectors: q , q , ki, ki, vi, V2 respectively. This process may also be referred to as two-head attention.

[0066] Next, at the re-attention block 444, the transformer encoder 440 is configured to process the q, k, v vectors of each patch according to the following equation [4] : l?e atention

[0067] In equation [4], Q, K, V are matrices composed of the q, k, v vectors of each patch. 0 is the learnable matrix (trained by the transformer encoder 440) to ensure each head could learn different features, and DM represents the dimension of the query or key matrices.

[0068] The output of the re-attention block 444 may optionally be processed using a second normalization layer 446 (similar to LN 442), and then input into an MLP block 448 for training the MLP 448. In an example, the MLP block 448 may be implemented as a multilayer perceptron (e.g., a feedforward artificial neural network, etc.). In an example, the MLP 448 includes two linear neural networks, configured with GeLu activation and a dropout layer.

[0069] In an example, the output of the MLP 448 (Tout) is reloaded as new input Tinput to the transformer encoder 440, and the process of blocks 442-448 may then be repeated I times to generate the final output Tout of the transformer encoder 440.

[0070] Returning now to Fig. 4, the DeepVIT process 400 next includes processing the final output (Tout) of the transformer encoder 440 using an MLP head block (MH) 450. In an example, the MH 450 is a neural network that includes contains one normalization layer and two neural network layers with a ReLu activation configuration. Finally, the DeepVIT process 400 provides the output (y) of the MH 450 block as the output of the DeepVIT process 400. The process described above is represented by the following mathematical equations [5], [6], and [7]:

T’input ⁼ RA(LN (Tinput)) + Tinput,^ [5]

Tout = MLP (LN (T’input)) + T input [6] y = MH(Tout). [7]

[0071] Equation [5] represents the functions performed by the re-attention block 444. Equation [6] the functions performed by the MLP block 448. Equation [7] represents the functions performed by the MH block 450 (after equations [5] and [6] are repeated I times).

[0072] To verify the performance of the example distributed RSS prediction method of Figs. 1-5, simulations were conducted using a real RSS data set. The data set includes around 60,000 samples, of which 52,500 samples were used for training the example deep attention neural network 130 of Fig. 1 and around 7,000 samples were used for testing. For the purposes of the simulations, it is assumed that the data set was collected using N = 105 user devices 120. For each selected user device, its movement is in a local region, which is composed of 500 samples.

14

SUBSTITUTE SHEET (RULE 26) [0073] Fig. 6 illustrates example movement routes of the user devices associated with the data set.

[0074] In a first example simulation for a fully connected neural network, the input and output layer sizes used for the first NN block (labeled as “1st NN” in Fig. 1) are 9, 200 and 200, 200, respectively. For the second NN block (labeled as “2nd NN” in Fig. 1), the input and output layer sizes are 200, 16 and 16, 1, respectively. During the simulation, a global epoch value used is N_g = 25, and the local epoch value used for each edge device is Ni = 3. For each epoch, the fraction of users value used is Cy = 0.1, which means we randomly chose L x NJ = 10 users to train the model. The learning rate used is Ir = 5 x 10^-4 and the loss function is a mean square error (MSE) function. For training the DeepVIT model, the parameters used are listed below in Table I.

Table I. Simulation Parameters

[0075] Fig. 7 is a graph illustration 700 of test error results (e.g., Normalized MSE) computed in accordance with the simulation parameters described above for the first example simulation of the DeepVIT learning model of the present disclosure executed in a centralized computing configuration. The test errors indicated in Fig. 7 are based on a centralized configuration (i.e., without using a distributed computer architecture such as the one shown for the system 100 in Fig. 1). Test errors were also computed for a traditional CNN based learning model for the sake of comparison in the graph 700 of Fig. 7. [0076] Fig. 8 is a graph illustration 800 of test error results (e.g., Normalized MSE) of a second example simulation performed using similar parameters as the simulation of Fig. 7 but in a distributed computing configuration (e.g., similar to the configuration described for the system 100 of Fig. 1). Similarly to the graph 700 of Fig. 7, the graph 800 of Fig. 8 also includes test error results computed using a CNN based learning model for the sake of comparison.

[0077] As noted above, the normalized MSE results of the proposed attention neural network architecture 100 for RSS prediction are compared with MSE results of a CNN based architecture executed in the centralized configuration in Fig. 7 and in a distributed configuration in Fig. 8.

[0078] As shown in Fig. 7, the normalized test error of the proposed model (0.072) is lower than that of CNN based method (0.086) in the centralized scenario. As shown in Fig. 8, in the distributed scenario, the normalized test error of the proposed DeepVIT based learning model is around 0.1 with a corresponding root mean square error (RMSE) of 5.746 dB, whereas the test error of the CNN based model is around 0.15 with a corresponding RMSE of 7.037dB. These results indicate that the proposed DeepVIT based learning model of the present disclosure could perform better in centralized and distributed computing configurations. The reason for this is that the DeepVIT based learning model of the present disclosure may advantageously pay attention to areas in an image that are more important for the purposes of RSS prediction (e.g., reflection surfaces) than other areas of the image that are relatively less important for the purposes of RSS prediction.

[0079] Table II below shows a comparison of RMSE errors associated with the proposed method of the present disclosure and various other baseline methods, including ray tracing, stochastic propagation model UMa 38.901, and CNN based method. Moreover, we have evaluated multiple types of transformers including VET, DeepVIT and pooling based vision transformer (PIT) as shown in Table II.

Table II. Test Errors and RMSE of Different Approaches

[0080] As shown in Table II, the proposed DeepVIT learning model of the present disclosure outperforms the other baseline methods in both centralized and distributed scenarios where much lower RMSE errors are obesrved.

[0081] Further, we have evaluated multiple types of geographical landscapes (including central business areas, metropolitan suburbs, semi-rural areas, and rural areas covering multiple signal propagation scenarios) using data sets from those landscaps for signals at a frequency of 3544MHz (used in 5G networks) as shown in Table III.

Table III. Test Errors and RMSE of Different Geographical Landscapes

[0082] In Table III, the existing UMa model uses models defined in standard ITU-R M. 2412 [2] and 3GPP TR 38.901 [3]. Table III shows a comparison for different geographical landscapes of RMSE errors associated with the proposed method of the present disclosure using the proposed DeepVIT learning model and the existing UMa model. As can be clearly seen, the proposed DeepVIT learning model of the present disclosure outperforms the existing UMa model in all signal propagation scenarios where much lower RMSE errors are obesrved.

[0083] In a distributed framework configuration, computation times may be important as a user device typically has limited computational capacity. To that end, Fig. 9 is a graph illustration of a comparison of the average computation times for computing the same batch size using the DeepVIT based learning model and CNN based learning model simulations (per epoch). For the sake of reference, the hardware configuration used for the simulations of Fig. 9 is a laptop computer operating under a Windows 10 Home operating system, and having an Intel(R) Core(TM) i7-10750H CPU @ 2.60 GHz, with an NVIDIA GeForce RTX 2070 Super, 16 GB of Memory and a IT Solid State Drive (SSD). As shown in Fig. 9 for the centralized approach, time costs associated with the DeepVIT based learning model of the present disclosure is 1.23 minutes to train the model per epoch, whereas the time cost for training a CNN based model is higher than 6 minutes. In the distributed scenario, the time consumed for the DeepVIT based model of the present disclosure is also much lower than the CNN based method at the server and user device, which is only approximately one fifth of the time consumed when using the CNN based learning model. One reason for the significant improvement in the performance of the DeepVIT based learning model of the present disclosure as compared with the CNN based learning model is that the DeepVIT based learning model can process data in parallel.

[0084] In accordance with the present disclosure, examples herein include a novel high accurate distributed federated learning (FL) framework for satellite map based RSS prediction, which leverages real-time user-specific data set including a location of a user device, satellite map images of the environment around the user device, and corresponding RSS measurements collected by the user device, while also advantageously preserving user privacy such as user movement trajectory information, and images of a user’s surrounding environment. Compared with previous and existing methods, the present method may advantageously enable avoiding time and laborconsuming field survey processes associated with other RSS prediction methods, and may advantageously adapt to environmental changes by repetitively performing the example online training processes of the present disclosure using the latest and/or relatively recent generated user data. To further improve the prediction accuracy in the proposed framework, some examples herein involve the use of a vision attention model (e.g., a Vision Transformer learning model). Advantageously, example Vision Transformer based implementations of the present disclosures may be configured to learn to “pay attention to” important parts of an image during the example training process of the present disclosure, including key factors for the purposes of RSS prediction such as the presence of reflection surfaces and/or blockages in the surrounding environment of a user device. As such, example attention-based machine learning methods of the present disclosure can achieve higher accuracy than CNN based methods. Furthermore, example Vision Transformer based models of the present disclosure may be associated with a lower computational complexity than CNN based RSS prediction models with respect to both user device computations and server computations.

[0085] The terms “a,” “an,” “the” and similar referents used in the context of describing the invention (especially in the context of the following claims) are to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., “such as”) provided herein is intended merely to better illuminate the invention and does not pose a limitation on the scope of the invention otherwise claimed. No language in the specification should be construed as indicating any non-claimed element essential to the practice of the invention.

[0086] It should be appreciated that in the above description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, Figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this invention.

[0087] Furthermore, while some embodiments described herein include some but not other features included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention, and form different embodiments, as would be understood by those skilled in the art. For example, in the following claims, any of the claimed embodiments can be used in any combination.

[0088] Unless specifically stated otherwise, as apparent from the following discussions, it is appreciated that throughout the specification discussions utilizing terms such as "processing," "computing," "calculating," "determining", analysing" or the like, refer to the action and/or processes of a computer or computing system, or similar electronic computing device, that manipulate and/or transform data represented as physical, such as electronic, quantities into other data similarly represented as physical quantities.

[0089] In a similar manner, the term "processor" may refer to any device or portion of a device that processes electronic data, for example, from registers and/or memory to transform that electronic data into other electronic data that, for example, may be stored in registers and/or memory. A "computer" or a "computing machine" or a "computing platform" may include one or more processors.

[0090] Some methodologies or portions of methodologies described herein are, in one embodiment, performable by one or more processors that accept computer-readable (also called machine-readable) code containing a set of instructions that when executed by one or more of the processors carry out at least one of the methods described herein. A memory subsystem of a processing system includes a computer-readable carrier medium that carries computer-readable code (for example, software) including a set of instructions to cause performing, when executed by one or more processors, one of more of the methods described herein. Note that when the method includes several elements, for example, several steps, no ordering of such elements is implied, unless specifically stated. The software may reside in the storage medium, or may also reside, completely or at least partially, within the RAM and/or within the processor during execution thereof by the computer system. Thus, the memory and the processor also constitute computer-readable carrier medium carrying computer-readable code.

[0091] The terms "comprise", "comprises", "comprised" or "comprising", "including" or "having" and the like in the present specification and claims are used in an inclusive sense, that is to specify the presence of the stated features but not preclude the presence of additional or further features.

[0092] It should be understood that various changes and modifications to the presently preferred embodiments described herein will be apparent to those skilled in the art. Such changes and modifications can be made without departing from the spirit and scope of the present subject matter and without diminishing its intended advantages. It is therefore intended that such changes and modifications be covered by the appended claims.

Claims

CLAIMS The invention is claimed as follows:

1. A system for implementing a distributed federated learning (FL) framework for received signal strength (RSS) prediction, the system comprising: a plurality of user devices, each of the user devices configured to generate private information including at least one of user device movement trajectories, user device velocities, or RSS measurements at a plurality of locations over time; and a server configured to wirelessly communicate with the plurality of user devices over a wireless communication network, the server configured to: at least one of store, retrieve, or access public information that is related to a serving base station (BS) for each user device of the plurality of user devices, and accurately predict using a distributed machine learning process the RSS of the plurality of the user devices at locations that each of the user devices has not yet reached without revealing the private information of each user device.

2. The system of Claim 1 , wherein the public information includes at least one of a base station location LBS, a base station height HBS, a base station transmitter power PT X, a base station identifier BID, or public satellite map data M.

3. The system of Claim 1, wherein the distributed machine learning process includes a distributed federated learning (FL) framework that utilizes a deep vision attention mechanism for RSS predictions.

4. The system of Claim 3, wherein the distributed FL framework includes a preparing stage, a training stage, and a prediction stage.

5. The system of Claim 4, wherein for the preparing stage the server is configured to: select a set of the plurality of user devices to participate in the RSS prediction process; and broadcast public information and information about an example attention neural network architecture for the RSS prediction process to all the selected user devices of the set, causing the selected used devices to: record private information including RSS measurements and velocity measurements at their respective current locations and subsequent locations over time, use the private information and the received public information to determine distances from the user device of the selected set of user devices to base stations serving the user device at the times the private information was generated, select or construct satellite images corresponding to locations of the user device at times using at least one of the private information or the public information, and convert the private information and public information into input features of a neural network for RSS prediction that includes a data augmentation block that uses the selected or constructed satellite images, a path loss estimation model block, two fully connected neural network blocks including the determined distances, and a deep vision transformer (DeepVIT) block.

6. The system of Claim 5, wherein the RSS measurements are obtained from reference signal received power (RSRP) measurements obtained by measuring cell specific reference signal in a long-term-evolution (LTE) network.

7. The system of Claim 5, wherein the RSS measurements are obtained by measuring synchronization signal (SS) and a channel state information reference signal (CSI-RS) in a 5G wireless network.

8. The system of Claim 5, wherein the velocity is measured using inertial measurement units (IMU) sensors of the user devices.

9. The system of Claim 5, wherein after generating the input features described above, each user device of the selected set performed the training stage by: training the neural network for RSS prediction using the input features; and uploading computed neural network weights of the trained neural network for RSS prediction.

10. The system of Claim 9, wherein the server is configured to: calculate average weights for the trained neural network for RSS prediction based on the received weights from the user devices of the selected set; and broadcast the average weights to the plurality of user devices.

11. The system of Claim 10, wherein for the prediction stage, the server is configured to use the average weights for the trained neural network to predict RSS at a location of interest or a given velocity.

12. The system of Claim 11, wherein the predicted RSS at the location of interest or the given velocity is used to optimize coverage, interference, and/or cell association in network planning.

13. The system of Claim 11, wherein the predicted RSS at the location of interest or the given velocity, is used to proactively allocate network resources and network management processes.

14. A server for implementing a distributed federated learning (FL) framework for received signal strength (RSS) prediction, the server configured to: wirelessly communicate with a plurality of user devices over a wireless communication network, each of the plurality of user devices configured to generate private information including at least one of user device movement trajectories, user device velocities, or RSS measurements at a plurality of locations over time; at least one of store, retrieve, or access public information that is related to a serving base station (BS) for each user device of the plurality of user devices; select a set of the plurality of user devices to participate in the RSS prediction process; broadcast public information and information about an example attention neural network architecture for the RSS prediction process to all the selected user devices of the set; receive from each user device of the set of user devices computed neural network weights of a trained neural network for RSS prediction; and use the computed neural network weights of the trained neural network to predict RSS at a location of interest or at a given velocity.

15. The server of Claim 14, wherein broadcasting the public information and the information about the example attention neural network architecture for the RSS prediction process to all the selected user devices of the set causes each of the user devices of the set to: record private information including RSS measurements and velocity measurements at their respective current locations and subsequent locations over time; use the private information and the received public information to determine distances from the user device of the selected set of user devices to base stations serving the user device at the times the private information was generated; select or construct satellite images corresponding to locations of the user device at times using at least one of the private information or the public information; convert the private information and public information into input features of a neural network for RSS prediction that includes a data augmentation block that uses the selected or constructed satellite images, a path loss estimation model block, two fully connected neural network blocks including the determined distances, and a deep vision transformer (DeepVIT) block; train the neural network for RSS prediction using the input features; and upload computed neural network weights of the trained neural network for RSS prediction to the server.

16. The server of Claim 14, wherein the public information includes at least one of a base station location LBS, a base station height HBS, a base station transmitter power PT X, a base station identifier BID, or public satellite map data M.

17. The server of Claim 14, wherein the predicted RSS at the location of interest or at the given velocity is used to optimize coverage, interference, and/or cell association in network planning.

18. The server of Claim 14, wherein the predicted RSS at the location of interest or at the given velocity, is used to proactively allocate network resources and network management processes.