WO2025162612A1

WO2025162612A1 - Collaborative learning including logit exchange between nodes of wireless communications network

Info

Publication number: WO2025162612A1
Application number: PCT/EP2024/081800
Authority: WO
Inventors: Maryam SABZEVARI; Sankaran BALASUBRAMANIAM; Mohammed Saad ELBAMBY; Muhammad Ikram ASHRAF; Anna Pantelidou
Original assignee: Nokia Solutions and Networks Oy
Current assignee: Nokia Solutions and Networks Oy
Priority date: 2024-01-31
Filing date: 2024-11-11
Publication date: 2025-08-07
Anticipated expiration: 2026-07-31

Abstract

A method includes generating, by a first node, a first set of logits for a radio function by applying a set of inputs or radio measurements to a machine learning model of the first node; receiving, by the first node from a second node in a wireless network, a second set of logits for the radio function; aggregating respective logits of the first set and the second set of logits to obtain a set of aggregated logits; determining a set of classes based on the aggregated logits, wherein a class of the set of classes is provided for each of the aggregated logits, and wherein a probability is assigned to each class; and determining a predicted label that is a class, of the set of classes, having a highest assigned probability.

Description

COLLABORATIVE LEARNING INCLUDING LOGIT EXCHANGE BETWEEN NODES OF WIRELESS COMMUNICATIONS NETWORK

TECHNICAL FIELD

[0001] This description relates to wireless communications.

BACKGROUND

[0002] A communication system may be a facility that enables communication between two or more nodes or devices, such as fixed or mobile communication devices. Signals can be carried on wired or wireless carriers.

[0003] An example of a cellular communication system is an architecture that is being standardized by the 3rd Generation Partnership Project (3GPP). A recent development in this field is often referred to as the long-term evolution (LTE) of the Universal Mobile Telecommunications System (UMTS) radio-access technology. E-UTRA (evolved UMTS Terrestrial Radio Access) is the air interface of 3GPP's Long Term Evolution (LTE) upgrade path for mobile networks. In LTE, base stations or access points (APs), which are referred to as enhanced Node AP (eNBs), provide wireless access within a coverage area or cell. In LTE, mobile devices, or mobile stations are referred to as user equipments (UE). LTE has included a number of improvements or developments. Aspects of LTE are also continuing to improve.

[0004] 5G New Radio (NR) development is part of a continued mobile broadband evolution process to meet the requirements of 5G, similar to earlier evolution of 3G and 4G wireless networks. In addition, 5G is also targeted at the new emerging use cases in addition to mobile broadband. A goal of 5 G is to provide significant improvement in wireless performance, which may include new levels of data rate, latency, reliability, and security. 5G NR may also scale to efficiently connect the massive Internet of Things (loT) and may offer new types of mission-critical services. For example, ultra-reliable and low-latency communications (URLLC) devices may require high reliability and very low latency. 6G and other networks are also being developed.

SUMMARY

[0005] A method may include receiving, by a first node from a second node within a wireless communications network, information indicating a capability of the second node to share logits with other nodes; transmitting, by the first node to the second node, a data collection request to request a reporting of logits generated by the second node, wherein the data collection request comprises a function identifier identifying a radio function for which logits reporting is requested; receiving, by the first node from the second node, a data collection response accepting the data collection request; receiving, by the first node from the second node, a report including a first set of logits for the identified radio function; generating, by the first node using the machine learning model for the identified radio function based on a set of radio measurements, a second set of logits; and, training, by the first node, a machine learning model of the first node based at least in part on a difference between the first set of logits received from the second node and the second set of logits generated by the first node to minimize or at least decrease a mimicry loss term.

[0006] An apparatus may include at least one processor; and at least one memory including computer program code; the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to: receive, by a first node from a second node within a wireless communications network, information indicating a capability of the second node to share logits with other nodes; transmit, by the first node to the second node, a data collection request to request a reporting of logits generated by the second node, wherein the data collection request comprises a function identifier identifying a radio function for which logits reporting is requested; receive, by the first node from the second node, a data collection response accepting the data collection request; receive, by the first node from the second node, a report including a first set of logits for the identified radio function; generate, by the first node using the machine learning model for the identified radio function based on a set of radio measurements, a second set of logits; and train, by the first node, a machine learning model of the first node based at least in part on a difference between the first set of logits received from the second node and the second set of logits generated by the first node to minimize or at least decrease a mimicry loss term. [0007] A method may include generating, by a first node, a first set of logits for a radio function by applying a set of inputs or radio measurements to a machine learning model of the first node; receiving, by the first node from a second node in a wireless network, a second set of logits for the radio function; aggregating respective logits of the first set and the second set of logits to obtain a set of aggregated logits; determining a set of classes based on the aggregated logits, wherein a class of the set of classes is provided for each of the aggregated logits, and wherein a probability is assigned to each class; and determining a predicted label that is a class, of the set of classes, having a highest assigned probability.

[0008] An apparatus may include at least one processor; and at least one memory including computer program code; the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to: generate, by a first node, a first set of logits for a radio function by applying a set of inputs or radio measurements to a machine learning model of the first node; receive, by the first node from a second node in a wireless network, a second set of logits for the radio function; aggregate respective logits of the first set and the second set of logits to obtain a set of aggregated logits; determine a set of classes based on the aggregated logits, wherein a class of the set of classes is provided for each of the aggregated logits, and wherein a probability is assigned to each class; and determine a predicted label that is a class, of the set of classes, having a highest assigned probability.

[0009] A method may include receiving, by a second node from a first node within a wireless communications network, a data collection request to request a reporting of logits generated by the second node, wherein the data collection request comprises a function identifier identifying a radio function for which logits reporting is requested; transmitting, by the second node to the first node, a data collection response accepting the data collection request; and transmitting, by the second node to the first node, a report including logits for the identified function to enable the first node to train a machine learning model based on the reported logits.

[0010] An apparatus may include at least one processor; and at least one memory including computer program code; the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to: receive, by a second node from a first node within a wireless communications network, a data collection request to request a reporting of logits generated by the second node, wherein the data collection request comprises a function identifier identifying a radio function for which logits reporting is requested; transmit, by the second node to the first node, a data collection response accepting the data collection request; transmit, by the second node to the first node, a report including logits for the identified function to enable the first node to train a machine learning model based on the reported logits.

[0011] Other example embodiments are provided or described for each of the example methods, including: means for performing any of the example methods; a non-transitory computer-readable storage medium comprising instructions stored thereon that, when executed by at least one processor, are configured to cause a computing system to perform any of the example methods; and an apparatus including at least one processor, and at least one memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform any of the example methods.

[0012] The details of one or more examples of embodiments are set forth in the accompanying drawings and the description below. Other features will be apparent from the description and drawings, and from the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

[0013] FIG. 1 is a block diagram of a wireless network.

[0014] FIG. 2 is a flow chart illustrating operation of a node.

[0015] FIG. 3 is a flow chart illustrating operation of a node.

[0016] FIG. 4 is a diagram illustrating operation of a node.

[0017] FIG. 5 is a diagram illustrating workflow for ML model training for an example radio function of UE positioning.

[0018] FIG. 6 is a diagram illustrating collaborative or mutual learning for two machine learning (ML) models within a wireless network.

[0019] FIG. 7 is a signaling chart that illustrates an example message exchange and operations that may be performed by two nodes to perform mutual learning or collaborative learning of ML models for the same radio function.

[0020] FIG. 8 is a diagram illustrating workflow for ML model inference for an example radio function of UE positioning, e.g., using the trained ML model to predict one or more classes for an indicated radio function.

[0021] FIG. 9 is a diagram illustrating collaborative or mutual inference for two machine learning (ML) models through sharing of logits within a wireless network.

[0022] FIG. 10 is a block diagram of a wireless station or node (e.g., UE, user device, AP, BS, eNB, gNB, RAN node, network node, relay node, TRP, or other node) 1300.

DETAILED DESCRIPTION

[0023] FIG. 1 is a block diagram of a wireless network 130. In the wireless network 130 of FIG. 1, user devices 131, 132, 133 and 135, which may also be referred to as mobile stations (MSs) or user equipment (UEs), may be connected (and in communication) with a base station (BS) 134, which may also be referred to as an access point (AP), an enhanced Node B (eNB), a gNB or a network node. The terms user device and user equipment (UE) may be used interchangeably. A BS may also include or may be referred to as a RAN (radio access network) node, and may include a portion of a BS or a portion of a RAN node, such as (e.g., such as a centralized unit (CU) and/or a distributed unit (DU) in the case of a split BS or split gNB). At least part of the functionalities of a BS (e.g., access point (AP), base station (BS) or (e)Node B (eNB), gNB, RAN node) may also be carried out by any node, server or host which may be operably coupled to a transceiver, such as a remote radio head. BS (or AP) 134 provides wireless coverage within a cell 136, including to user devices (or UEs) 131, 132, 133 and 135. Although only four user devices (or UEs) are shown as being connected or attached to BS 134, any number of user devices may be provided. BS 134 is also connected to a core network 150 via a S I interface 151. This is merely one simple example of a wireless network, and others may be used.

[0024] A base station (e.g., such as BS 134) is an example of a radio access network (RAN) node within a wireless network. A BS (or a RAN node) may be or may include (or may alternatively be referred to as), e.g., an access point (AP), a gNB, an eNB, or portion thereof (such as a central/centralized unit (CU) and/or a distributed unit (DU) in the case of a split BS or split gNB), or other network node.

[0025] Some functionalities of the communication network may be carried out, at least partly, in a central/centralized unit, CU, (e.g., server, host or node) operationally coupled to distributed unit, DU, (e.g., a radio head/node). Thus, 5G networks architecture may be based on a so-called CU-DU split. The gNB-CU (central node) may control a plurality of spatially separated gNB-DUs, acting at least as transmit/receive (Tx/Rx) nodes. In some examples, the gNB-DUs (also referred to as a DU) may comprise e.g., a radio link control (RLC), medium access control (MAC) layer and a physical (PHY) layer, whereas the gNB- CU (also referred to as a CU) may comprise the layers above RLC layer, such as a packet data convergence protocol (PDCP) layer, a radio resource control (RR.C) and an internet protocol (IP) layers. Other functional splits are possible too.

[0026] According to an illustrative example, a BS node (e.g., BS, eNB, gNB, CU/DU, . . . ) or a radio access network (RAN) may be part of a mobile telecommunication system. A RAN (radio access network) may include one or more BSs or RAN nodes that implement a radio access technology, e.g., to allow one or more UEs to have access to a network or core network. Thus, for example, the RAN (RAN nodes, such as BSs or gNBs) may reside between one or more user devices or UEs and a core network. According to an example embodiment, each RAN node (e.g., BS, eNB, gNB, CU/DU, . . .) or BS may provide one or more wireless communication services for one or more UEs or user devices, e.g., to allow the UEs to have wireless access to a network, via the RAN node. Each RAN node or BS may perform or provide wireless communication services, e.g., such as allowing UEs or user devices to establish a wireless connection to the RAN node, and sending data to and/or receiving data from one or more of the UEs. For example, after establishing a connection to a UE, a RAN node or network node (e.g., BS, eNB, gNB, CU/DU, . . .) may forward data to the UE that is received from a network or the core network, and/or forward data received from the UE to the network or core network. RAN nodes or network nodes (e.g., BS, eNB, gNB, CU/DU, . . .) may perform a wide variety of other wireless functions or services, e.g., such as broadcasting control information (e.g., such as system information or on-demand system information) to UEs, paging UEs when there is data to be delivered to the UE, assisting in handover of a UE between cells, scheduling of resources for uplink data transmission from the UE(s) and downlink data transmission to UE(s), sending control information to configure one or more UEs, and the like. These are a few examples of one or more functions that a RAN node or BS may perform.

[0027] A user device or user node (user terminal, user equipment (UE), mobile terminal, handheld wireless device, etc.) may refer to a portable computing device that includes wireless mobile communication devices operating either with or without a subscriber identification module (SIM), including, but not limited to, the following types of devices: a mobile station (MS), a mobile phone, a cell phone, a smartphone, a personal digital assistant (PDA), a handset, a device using a wireless modem (alarm or measurement device, etc.), a laptop and/or touch screen computer, a tablet, a phablet, a game console, a notebook, a vehicle, a sensor, and a multimedia device, as examples, or any other wireless device. It should be appreciated that a user device may also be (or may include) a nearly exclusive uplink only device, of which an example is a camera or video camera loading images or video clips to a network. Also, a user node may include a user equipment (UE), a user device, a user terminal, a mobile terminal, a mobile station, a mobile node, a subscriber device, a subscriber node, a subscriber terminal, or other user node. For example, a user node may be used for wireless communications with one or more network nodes (e.g., gNB, eNB, BS, AP, CU, DU, CU/DU) and/or with one or more other user nodes, regardless of the technology or radio access technology (RAT). In LTE (as an illustrative example), core network 150 may be referred to as Evolved Packet Core (EPC), which may include a mobility management entity (MME) which may handle or assist with mobility/handover of user devices between BSs, one or more gateways that may forward data and control signals between the BSs and packet data networks or the Internet, and other control functions or blocks. Other types of wireless networks, such as 5G (which may be referred to as New Radio (NR)) may also include a core network.

[0028] In addition, the techniques described herein may be applied to various types of user devices or data service types, or may apply to user devices that may have multiple applications running thereon that may be of different data service types. New Radio (5G) development may support a number of different applications or a number of different data service types, such as for example: machine type communications (MTC), enhanced machine type communication (eMTC), Internet of Things (loT), and/or narrowband loT user devices, enhanced mobile broadband (eMBB), and ultra-reliable and low-latency communications (URLLC). Many of these new 5G (NR) - related applications may require generally higher performance than previous wireless networks.

[0029] loT may refer to an ever-growing group of objects that may have Internet or network connectivity, so that these objects may send information to and receive information from other network devices. For example, many sensor type applications or devices may monitor a physical condition or a status, and may send a report to a server or other network device, e.g., when an event occurs. Machine Type Communications (MTC, or Machine to Machine communications) may, for example, be characterized by fully automatic data generation, exchange, processing and actuation among intelligent machines, with or without intervention of humans. Enhanced mobile broadband (eMBB) may support much higher data rates than currently available in LTE.

[0030] Ultra-reliable and low-latency communications (URLLC) is a new data service type, or new usage scenario, which may be supported for New Radio (5G) systems. This enables emerging new applications and services, such as industrial automations, autonomous driving, vehicular safety, e-health services, and so on. 3GPP targets in providing connectivity with reliability corresponding to block error rate (BLER) of 10-5 and up to 1 ms U-Plane (user/data plane) latency, by way of illustrative example. Thus, for example, URLLC user devices/UEs may require a significantly lower block error rate than other types of user devices/UEs as well as low latency (with or without requirement for simultaneous high reliability). Thus, for example, a URLLC UE (or URLLC application on a UE) may require much shorter latency, as compared to an eMBB UE (or an eMBB application running on a UE).

[0031] The techniques described herein may be applied to a wide variety of wireless technologies or wireless networks, such as 5G (New Radio (NR)), cmWave, and/or mmWave band networks, loT, MTC, eMTC, eMBB, URLLC, 6G, etc., or any other wireless network or wireless technology. These example networks, technologies or data service types are provided only as illustrative examples.

[0032] A machine learning (ML) model may be used within a wireless network to perform (or assist with performing) one or more tasks. In general, one or more nodes (e.g., BS, gNB, eNB, RAN node, user node, UE, user device, relay node, or other wireless node) within a wireless network may use or employ a ML model, e.g., such as, for example a neural network model (e.g., which may be referred to as a neural network, an artificial intelligence (Al) neural network, an Al neural network model, an Al model, a machine learning (ML) model or algorithm, a model, or other term) to perform, or assist in performing, one or more ML-enabled tasks. Other types of models may also be used. A ML-enabled task may include tasks that may be performed (or assisted in performing) by a ML model, or a task for which a ML model has been trained to perform or assist in performing).

[0033] ML-based algorithms or ML models may be used to perform and/or assist with performing a variety of wireless and/or radio resource management (RRM) and/or RAN- related functions or tasks to improve network performance, such as, e.g., in the UE for beam prediction (e.g., predicting a best beam or best beam pair based on measured reference signals), antenna panel or beam control, RRM (radio resource measurement) measurements and feedback (channel state information (CSI) feedback), link monitoring, Transmit Power Control (TPC), etc. In some cases, ML models may be used to improve performance of a wireless network in one or more aspects or as measured by one or more performance indicators or performance criteria.

[0034] Models (e.g., neural networks or ML models) may be or may include, for example, computational models used in machine learning made up of nodes organized in layers. The nodes are also referred to as artificial neurons, or simply neurons, and perform a function on provided input to produce some output value. A neural network or ML model may typically require a training period to learn the parameters, i.e., weights, used to map the input to a desired output. The mapping may occur via the function that is learned from a given data for the problem in question. Thus, the weights are weights for the mapping function of the neural network. Each neural network model or ML model may be trained for a particular task.

[0035] To provide the output given the input, the ML functionality of a neural network model or ML model should be trained, which may involve learning the proper value for a large number of parameters (e.g., weights and/or biases) for the mapping function (or of the ML functionality of the ML model). For example, the parameters may be used to weight and/or adjust terms in the mapping function. This training may be an iterative process, with the values of the weights and/or biases being tweaked over many (e.g., tens, hundreds and/or thousands) of rounds of training episodes or training iterations until arriving at the optimal, or most accurate, values (or weights and/or biases). In the context of neural networks (neural network models) or ML models, the parameters may be initialized, often with random values, and a training optimizer iteratively updates the parameters (e.g., weights) of the neural network to minimize error in the mapping function. In other words, during each round, or step, of iterative training the network updates the values of the parameters so that the values of the parameters eventually converge to the optimal values.

[0036] ML models may be trained in either a supervised or unsupervised manner, as examples. In supervised learning, training examples are provided to the ML model or other machine learning algorithm. A training example includes the inputs and a desired or previously observed output. Training examples are also referred to as labeled data because the input is labeled with the desired or observed output. In the case of a neural network (which may be a specific case of ML model), the network (or ML model) learns the values for the weights used in the mapping function or ML functionality of the ML model that most often result in the desired output when given the training inputs. In unsupervised training, the ML model learns to identify a structure or pattern in the provided input. In other words, the model identifies implicit relationships in the data. Unsupervised learning is used in many machine learning problems and typically requires a large set of unlabeled data.

[0037] According to an example embodiment, a ML model may be classified into (or may include) two broad categories (supervised and unsupervised), depending on whether there is a learning “signal” or “feedback” available to a model. Thus, for example, within the field of machine learning, there may be two main types of learning or training of a model: supervised, and unsupervised. The main difference between the two types is that supervised learning is done using known or prior knowledge of what the output values for certain samples of data should be. Therefore, a goal of supervised learning may be to learn a function that, given a sample of data and desired outputs, best approximates the relationship between input and output observable in the data. Unsupervised learning, on the other hand, does not have labeled outputs, so its goal is to infer the natural structure present within a set of data points.

[0038] Supervised learning: The computer is presented with example inputs and their desired outputs, and the goal may be to learn a general rule that maps inputs to outputs. Supervised learning may, for example, be performed in the context of classification, where a computer or learning algorithm attempts to map input to output labels, or regression, where the computer or algorithm may map input(s) to a continuous output(s). Common algorithms in supervised learning may include, e.g., logistic regression, naive Bayes, support vector machines, artificial neural networks, and random forests. In both regression and classification, a goal may include finding specific relationships or structure in the input data that allow us to effectively produce correct output data. In some example cases, the input signal may be only partially available, or restricted to special feedback. Semi-supervised learning: the computer may be given only an incomplete training signal; a training set with some (often many) of the target outputs missing. Active learning: the computer can only obtain training labels for a limited set of instances (based on a budget), and also may optimize its choice of objects for which to acquire labels. When used interactively, these can be presented to the user for labeling.

[0039] Unsupervised learning: No labels are given to the learning algorithm, leaving it on its own to find structure in its input. Some example tasks within unsupervised learning may include clustering, representation learning, and density estimation. In these cases, the computer or learning algorithm is attempting to learn the inherent structure of the data without using explicitly -provided labels. Some common algorithms include k-means clustering, principal component analysis, and auto-encoders. Since no labels are provided, there may be no specific way to compare model performance in most unsupervised learning methods. [0040] Continual Learning (CL) may refer to or may include a capability of the ML model to adapt to ever-changing (or continuously changing, or periodically changing) surrounding environment or data by learning or adapting the ML model continually based on incoming data (or new or updated data), e.g., without forgetting original or previous knowledge or ML model settings, and, e.g., which may be based on less than a full or complete set of data. For example, given a (e.g., potentially unlimited or continuous) stream of data (e.g., data reflecting changing or updated conditions or environment upon which the ML model should be updated), a continual learning (CL) algorithm may (or should) learn, e.g., by updating or adapting weights or other parameters of the ML model, based on a sequence of partial experiences or partial data (e.g., a most recent set of data) where all data may not be available at once, since new or updated data will be received later (thus, the new data potentially renders the weights or parameter settings of the ML model obsolete or inaccurate). Thus, a full or complete set of data may not be considered available at that time of ML model updating or adaptation, since the data or environment may be continuously or continually changing over time. Thus, at any given point or moment in time, data (upon which the ML model may be updated or adapted) may be considered incomplete because there may be a continuous stream of data. Thus, a CL algorithm may include or may refer to iteratively updating or adapting weights or other parameters of the ML model based on an updated set of data, and then repeating the learning or adaptation process for the ML model when a second (or later) set of updated data is received subsequently.

[0041] ML models may be used in a node (e.g., UE or network node/gNB) to perform or assist in performing a variety of radio or wireless-related functions. However, in some cases, and/or for some radio functions, a problem that may arise is there is not sufficient data or unavailability of enough data samples to train powerful ML model that reach a desired model accuracy. As an illustrative example, in a case where a ML model is used to perform the radio function of UE positioning, accuracy enhancement of the ML model is inhibited may stem from not having enough ground truth data. Many ML models are not powerful enough to learn from limited datasets. Therefore, collaboration across different (or multiple) nodes can be beneficial to overcome this limitation, e.g., where different ML models are trained at different nodes, and collaborate. Mutual learning or collaboration across or between multiple nodes (e.g., gNBs, UEs) may improve performance and/or accuracy of ML models that are running or provided on the nodes.

[0042] FIG. 2 is a flow chart illustrating operation of a node. Operation 210 includes receiving, by a first node from a second node within a wireless communications network, information indicating a capability of the second node to share logits with other nodes. Operation 220 includes transmitting, by the first node to the second node, a data collection request to request a reporting of logits generated by the second node, wherein the data collection request comprises a function identifier identifying a radio function for which logits reporting is requested. Operation 230 includes receiving, by the first node from the second node, a data collection response accepting the data collection request. Operation 240 includes receiving, by the first node from the second node, a report including a first set of logits for the identified radio function. Operation 250 includes generating, by the first node using the machine learning model for the identified radio function based on a set of radio measurements, a second set of logits. Operation 260 includes training, by the first node, a machine learning model of the first node based at least in part on a difference between the first set of logits received from the second node and the second set of logits generated by the first node to minimize or at least decrease a mimicry loss term.

[0043] With respect to the method of FIG. 2, the method may further include applying the trained machine learning model in inference mode to perform the radio function.

[0044] With respect to the method of FIG. 2, the method may further include performing at least one of the following: transmitting, by the first node to the second node, a first capability information indicating a capability of the first node for collaborative learning for machine learning models based on a sharing of logits; and receiving, by the first node from the second node, a second capability information indicating a capability of the second node for collaborative learning for machine learning models based on a sharing of logits.

[0045] With respect to the method of FIG. 2, the method may further include wherein: the first capability information comprises at least one of the following: a set of one or more function identifiers identifying one or more radio functions for which the first node may request and/or provide logits; and a set of classes for each function identifier for which the first node may request and/or provide logits.

[0046] With respect to the method of FIG. 2, the second capability information may include at least one of the following: a set of one or more function identifiers identifying one or more radio functions for which the second node may request and/or provide logits; and a set of classes for each function identifier for which the second node may request and/or provide logits.

[0047] With respect to the method of FIG. 2, the data collection request may include at least one of: a periodicity of logits reporting; and/or an indication of one or more classes for which logits are requested to be reported.

[0048] With respect to the method of FIG. 2, the received logits may include perturbed logits.

[0049] With respect to the method of FIG. 2, the receiving logits may include receiving, by the first node from the second node, the report including perturbed logits for the identified radio function, wherein one perturbed logit is provided for each of a plurality of classes, and wherein the perturbed logits comprise logits generated by a machine learning model of the second node which are perturbed and then transmitted to the first node.

[0050] With respect to the method of FIG. 2, the method may further include transmitting, by the first node to the second node, an indication of a privacy level or level of perturbation that is requested by the first node for logits.

[0051] With respect to the method of FIG. 2, the indication of a privacy level or level of perturbation is provided by the first node within either the data collection request or the first capability information, and includes at least one of the following parameters: a differential privacy parameter indicating the privacy level or level of perturbation of the logits; or a parameter related to the privacy level or level of perturbation, including at least one of the following: a mean; a standard deviation; or an epsilon value.

[0052] With respect to the method of FIG. 2, the method may further include receiving, by the first node from the second node, an indication of a privacy level or level of perturbation that can be provided by the second node for logits. [0053] With respect to the method of FIG. 2, the indication of a privacy level or level of perturbation is provided by the second node within either the data collection response or the second capability information, and includes at least one of the following parameters: a differential privacy parameter indicating the privacy level or level of perturbation of the logits; or a parameter related to the privacy level or level of perturbation, including at least one of the following: a mean; a standard deviation; or an epsilon value.

[0054] With respect to the method of FIG. 2, the method may further include determining, by the first node, a first set of logits generated by the machine learning model of the first node with respect to a first radio function; receiving, by the first node from the second node, a second set of logits generated by a machine learning model of the second node with respect to the first function; determining, by the first node, a mimicry loss term based on the first set of logits and the received second set of logits; wherein the training comprises training or adapting weights of the machine learning model of the first node based at least in part on the mimicry loss term.

[0055] With respect to the method of FIG. 2, the first node and second node each may include one of a gNB, Centralized Unit (CU), a Distributed Unit (DU) or other network node, a user terminal a user equipment or other user device, or a relay node.

[0056] With respect to the method of FIG. 2, the applying the trained machine learning model in inference mode may include receiving, by the first node from the second node, a third set of logits for the identified radio function; generating, by the first node, a fourth set of logits for the identified radio function; aggregating, by the first node, respective logits of the third set and fourth set of logits to obtain a set of aggregated logits; determining a set of classes based on the set of aggregated logits, wherein a class of the set of classes is provided for each of the aggregated logits, wherein a probability is assigned to each class; and determine a predicted label that is a class having a highest assigned probability.

[0057] With respect to the method of FIG. 2, the method may further include performing at least one of the following: using the predicted class to perform or assist in performing the identified radio function; or forwarding the predicted class to another node to be used to perform or assist in performing the identified radio function.

[0058] With respect to the method of FIG. 2, the generating a fourth set of logits may include: receiving or determining, by the first node, a set of inputs for the trained machine learning model; and applying the set of inputs to the trained machine learning model to generate, as outputs of the trained machine learning model, a fourth set of logits for the identified radio function. [0059] With respect to the method of FIG. 2, the third set of logits may be generated by another machine learning model at the second node for the identified radio function.

[0060] With respect to the method of FIG. 2, the determining a set of classes based on the set of aggregated logits may include applying each aggregated logit, of the set of aggregated logits, to a softmax function to obtain the set of classes for the identified radio function. [0061] With respect to the method of FIG. 2, the predicted label is associated with a set of one or more intermediate features or measurements that are used, by the first node or another node, to perform the identified radio function.

[0062] With respect to the method of FIG. 2, the method may further include the identified radio function may include at least one of the following radio functions: user device positioning; beam prediction; antenna panel or beam control; channel state information feedback; link monitoring; transmit power control; and/or performing handovers or assisting with handovers.

[0063] FIG. 3 is a flow chart illustrating operation of a node. Operation 310 includes generating, by a first node, a first set of logits for a radio function by applying a set of inputs or radio measurements to a machine learning model of the first node. Operation 320 includes receiving, by the first node from a second node in a wireless network, a second set of logits for the radio function. Operation 330 includes aggregating respective logits of the first set and the second set of logits to obtain a set of aggregated logits. Operation 340 includes determining a set of classes based on the aggregated logits, wherein a class of the set of classes is provided for each of the aggregated logits, and wherein a probability is assigned to each class. And, operation 350 includes determining a predicted label that is a class, of the set of classes, having a highest assigned probability.

[0064] With respect to the method of FIG. 3, the method may further include performing at least one of the following: using the predicted class to perform or assist in performing the identified radio function; or forwarding the predicted class to another node to be used to perform or assist in performing the identified radio function.

[0065] With respect to the method of FIG. 3, the generating the first set of logits may include: receiving or determining, by the first node, a set of inputs for the first machine learning model; and applying the set of inputs to the trained machine learning model to generate, as outputs of the trained machine learning model, the first set of logits for the identified radio function.

[0066] With respect to the method of FIG. 3, the second set of logits may be generated by another machine learning model at the second node for the identified radio function. [0067] With respect to the method of FIG. 3, the determining the set of classes based on the set of aggregated logits may include: applying each aggregated logit, of the set of aggregated logits, to a softmax function to obtain the set of classes for the identified radio function.

[0068] With respect to the method of FIG. 3, the predicted label may be associated with a set of one or more intermediate features or measurements that are used, by the first node or another node, to perform or assist in performing the identified radio function.

[0069] With respect to the method of FIG. 3, the identified radio function may include at least one of the following radio functions: user device positioning; beam prediction; antenna panel or beam control; channel state information feedback; link monitoring; transmit power control; and/or performing handovers or assisting with handovers.

[0070] FIG. 4 is a flow chart illustrating operation of a node. Operation 410 includes receiving, by a second node from a first node within a wireless communications network, a data collection request to request a reporting of logits generated by the second node, wherein the data collection request comprises a function identifier identifying a radio function for which logits reporting is requested. Operation 420 includes transmitting, by the second node to the first node, a data collection response accepting the data collection request. And, operation 430 includes transmitting, by the second node to the first node, a report including logits for the identified function to enable the first node to train a machine learning model based on the reported logits.

[0071] The text hereinbelow and FIGs. 5-10 describe or provide further details, operations or examples with respect to the methods of FIGs. 2-4.

[0072] Deep neural networks are commonly used to solve Machine Learning (ML) problems - they are often very large in depth and/or width and contain large numbers of parameters. This may lead to large memory requirements and latency in execution, limiting their usage in use cases which have low memory or low latency requirements. This has led to the design and development of ML modes that are lighter - one of these mechanisms is knows as model distillation. Distillation-based model compression is where a powerful (deep and/or wide) teacher network or teacher ML model (or network ensemble) trains a smaller student network (one or more student ML models) to mimic the teacher. Mimicking the teacher’s class probabilities and/or feature representation may be the main targets. At least in some cases, the optimization problem of learning to mimic the teacher turns out to be easier than learning the target function directly, and in some cases, the student (student ML model) may match or even outperform the much larger teacher (teacher ML model). [0073] A slightly modified version of the distillation concept is called Mutual Learning. In this case, the problem solved is similar; however, multiple untrained student ML models may be used that simultaneously learn from each other to solve the problem together (e.g., work collaboratively to optimize their ML models to perform a specific function or task). Specifically, each student ML model is trained with (or based on) two losses: a conventional supervised learning loss, and a mimicry loss that aligns each student’s class posterior probabilities with the class posterior probabilities of other students. Trained in this way, it turns out that each student ML model in such a peer-teaching based scenario learns significantly better than when learning alone in a conventional supervised learning scenario (or) students (student ML models) trained by conventional distillation from a larger pretrained teacher (pre-trained teacher ML model). While distillation may involve a teacher larger and more powerful than the intended student, the mutual learning concept may include student ML models (e.g., which may be of comparable size, or may be lighter, in some cases). This makes the deep mutual learning strategy generally applicable, e.g., it can also be used in application scenarios where there is no constraint on the model size and the recognition accuracy is the only concern.

[0074] Logits may be or may include the raw scores or probabilities assigned to each class by a neural network before applying a softmax function. These scores are typically the unnormalized outputs of the penultimate layer of the network, before the softmax transformation (which is the last layer of the network) is applied to obtain probabilities that sum to 1.

[0075] Softmax is a mathematical function often used in machine learning and deep learning, particularly in multiclass classification problems, to convert a set of real -valued scores (logits) into probabilities. It's a way to transform raw scores into a probability distribution over multiple classes, ensuring that the probabilities sum to 1. From the output of the softmax function (probabilities for each class), hard labels can be computed by assigning the class with the highest probability (argmax) as the predicted class for a given input. [0076] Differential Privacy (DP) may be or may include a mechanism for privacypreserving in data analysis which achieves this by adding carefully calibrated noise or perturbation to the data before generating the final output and sharing them with other entities. The key in this mechanism is to carefully calibrate the noise to balance privacy and utility, allowing effective collaboration without compromising individual data privacy. Differential Privacy (DP) parameters that may be used or may be needed for calibrated noise generation may include privacy strength (measures how much privacy protection is needed), risk control (controls the probability of a privacy breach), sensitivity (measures how much a single data point can influence the analysis) and noise type (using for example Laplace or Gaussian distributions for noise generation). Furthermore, to enhance the safeguard against sensitive information sharing, Differential Privacy (DP) may be used as described hereinbelow to perturb the original Logits before sharing them with other peers (entities in the DML setup).

[0077] As described herein, a collaborative learning framework may be set up, namely mutual learning among the ML models of different nodes (across gNBs, UEs, or LMF/RAN, in positioning scenarios, or for other radio functions). For example, using Deep Mutual Learning (DML), train two (or more) ML (neural) models may be trained, e.g., in parallel, to find a stable solution for the underlying optimization problem, e.g., for a specific radio function. Limiting factors for training may include the available scarce, noise prone and non- iid (non-independent and identically distributed) training data and not being aware of the most suitable ML model may hinder the training of ML models and the ability of the trained models at the inference time. Hence, as described herein a collaborative learning framework may be used in which the ML models not only learn from the available data but also from each other's views (e.g., based on shared logits) is beneficial. To this end, in mutual learning setup, each model includes or generates a mimicry loss, to align its estimated class posterior distribution with the class probabilities captured by the other models. Hence, in DML mismatches in the probability estimates from different models will be penalized, which may lead to more robust predictions.

[0078] As described herein, mutual learning or collaborative learning techniques may be employed in which logits may be shared between nodes of a wireless network, e.g., to improve training or learning of the ML models during training mode.

[0079] During a training mode: First and second nodes (e.g., which each node may be a UE or gNB) may each employ or use a ML model to perform a radio function. The first and second nodes may exchange capabilities information to determine that the other node is capable of sharing logits (e.g., for mutual or collaborative learning). The first node may transmit to the second node a data collection request to request a reporting of logits generated by the second node. The data collection request may include a function identifier identifying a radio function for which logits reporting is requested. The first node may receive a report from the second node including a first set of logits for the identified radio function. The first node may also generate, using a ML model for the identified radio function, a second set of logits. The first node may train the ML model of the first node based at least in part on a difference between the first set of logits received from the second node and the second set of logits generated by the first node to decrease a mimicry loss term. Logits may be or may include unnormalized raw scores or values or probabilities assigned to a class based on inputs, before a softmax transformation. To provide privacy, perturbed logits may be shared between nodes.

[0080] During an inference mode: A first node and a second node receive the same test example or query (e.g., including the same set of inputs) for prediction. And ML models of the first node and the second node are provided to perform or assist in performing the same radio function. The first node may receive from the second node a third set of logits for the identified radio function. The second node may generate a fourth set of logits. The first node may aggregate of (e.g., perform a sum, an average or other mathematical function based on) respective logits of the third and fourth sets of logits to obtain an aggregated set of logits. A set of classes is determined based on the set of aggregated logits, e.g., by applying each aggregated logit to a softmax function. A class is obtained or determined for each aggregated logit, and probability is assigned to each class. A predicted label is determined that is the class having the highest assigned probability. Thus, the predicted label may be the class (of the set of classes) having a highest probability after applying the softmax function to the aggregated logits.

[0081] For example, to perform a radio function of UE positioning, the set of classes may be or may represent different UE positions (or each class may be or represent a range of positions) of the UE, where the predicted label may be the class (or UE position) having a highest probability of being correct. Or, for the radio function of UE positioning (as an example), the different classes may be or may represent intermediate features that may be used in determining UE positioning. For example, intermediate features for UE positioning may include, e.g., uplink (UL) angle of arrival, UL relative time of arrival, gNB Receivetransmit (Rx-Tx) time difference, etc. In this case, for example, each class may be or represent a different UL angle of arrival (or range of values for UL angle of arrival), and the predicted class may be or may represent the UL angle of arrival (or a range of values for UL angle of arrival) that has a highest probability of being the correct UL angle of arrival with respect to a UE that is being positioned. The node may directly use the predicted class (e.g., for one or more intermediate features) to perform the radio function (e.g., the first node may determine the UE position based on the predicted class (e.g., for one or more intermediate features, such as UL angle of arrival), or the node may forward the predicted class (e.g., for one or more intermediate features) to another node, e.g., a location management function (LMF) which may determine or estimate the UE position based on the predicted class(es).

[0082] Training Mode:

[0083] FIG. 5 is a diagram illustrating workflow for ML model training for an example radio function of UE positioning. In this example, ML model inputs may include sounding reference signal (SRS) configuration and layer 1 (LI) measurements, which are input to both the ML model and ML model training function 510, and to legacy NG-RAN functionality 520 (to generate assistance information and UL measurements).

[0084] Model Inputs: 1) SRS Configuration, including Number of Ports, Comb Size, Start Position, Number of Symbols, Repetition Factor, Resource Type - Periodic/Aperiodic; and 2) LI Measurements, including and SRS Channel Impulse Response, Propagation Delay UL-SRS RSRP.

[0085] Model Outputs: Measured Results or intermediate features: UL Angle of Arrival, UL Relative Time of Arrival, gNB Rx-Tx Time Difference, Time of Measurement, Current Time, Measurement Quality or reference signal received quality (RSRQ), Beam Information, SRS Resource Type, and/or LoS/NLoS Information.

[0086] For example, for the radio function, different classes may have different UL angles of arrival; and other intermediate features (UL relative time of arrival, beam information, LoS/NLoS information, etc. ) or combination of these. These primary measurements (which may also be referred to as intermediate features) may be converted to position, e.g., by the node or by a location management function LMF (530). The ML model for UE positioning may be used to predict primary measurement(s) or intermediate feature(s), e.g., the class having the highest probability. Or class(es) can include values for a combination of multiple predicted primary measurements intermediate features. Output is a set of values for one or more of these.

[0087] FIG. 6 is a diagram illustrating collaborative or mutual learning for two machine learning (ML) models within a wireless network. As shown in FIG. 6, a UE 610 may provide (or may be used for the gNBs to obtain) one or more ML model inputs (such as LI measurements and/or SRS configuration), which are input to ML model 0 at gNBO and ML model 1 at gNBl. The gNBs (gNBO and gNBl) may be already aware of the SRS configuration. The operation for training the ML model will be briefly described.

[0088] Step 1 : A training example (e.g., a set of ML model inputs and its related output or ground truth, from which the ML model is to learn from during training and adjust its weights of the ML model) is received by model 0 and model 1.

[0089] Step 2: Both model 0 and model 1 perform a conventional pass and compute the (unnormalized yet) probability (score) of belonging the input example to each target class, which are called logits (PO and Pl in Figure 2). The logits later can be converted to normalized probabilities (LO and LI) using a softmax layer (more details below). During the forward pass, a neural network takes input data (e.g., an image, a signal, etc., or the set of input listed or described above for radio function of UE positioning) and passes it through a series of layers, such as convolutional, fully connected, and activation layers. These layers apply various transformations to the input data, extracting features and learning patterns relevant to the task at hand. In a classification task, the final layer of the neural network is typically a dense (fully connected) layer, which is often referred to as the "logits layer." The output of this logits layer is a set of real-valued numbers, known as logits (PO and P 1 in FIG. 6), one logit for each class in the classification problem. For multi-class classification, if there are N classes, there would be N logits (e.g., P0= {zO, zl, ..., zN} and Pl= {zO, zl, ... , zN}). The logits represent the evidence or the degree of belief that the input data belongs to each class. They are not yet probabilities; they can be any real number, positive or negative. [0090] Step 3 : Differential Privacy (DP) may be used to perturb the original logits before sharing them with other peer nodes. Each ML model (model 0 and model 1) may apply differential privacy (DP, 420-0, or 420-1 in FIG. 6 for each gNB) on their obtained or generated logits, to obtain perturbed logits (PgO, Pgl), and then shares the perturbed logits (PgO and Pgl) with the other model. For example, model 0 at gNBO generates a logit P0, while model 1 at gNBl generates a logit PL Logit P0 is perturbed to obtain perturbed logit PgO, while logit Pl is perturbed to obtain perturbed logit Pgl. gNBO shares perturbed logit PgO with gNBl, while gNBl may share perturbed logit Pgl with gNBO. Each node (e.g., gNBO and gNBl) determines or calculates a mimicry loss term (Lm) which may be the difference between the (locally) generated perturbed logit and the perturbed logit received from the other node. For example, gNBl may determine a mimicry loss term (Lm) as a difference between the perturbed logit Pgl generated or determined by gNBl and the perturbed logit PgO received from gNBO. As described below, each gNB may also determine or calculate a conventional loss term (Lc). Each ML model may be trained or adapted (e.g., weights of the ML models may be adjusted or trained) based on the mimicry loss term (Lm) and/or the conventional loss term (Lc). At step 3, in the event differential privacy is not used, then logits P0 or P 1 may be shared with the other node(s) (without first perturbing the logits), and a mimicry loss term (Lm) may be determined based on these unperturbed logits P0 and PL For example, in the case where perturbation is not used, gNBl may determine a mimicry loss term (Lm) as a difference between the logit Pl generated or determined by gNBl and the logit PO received from gNBO (where logits PO and P 1 are unperturbed logits).

[0091] Step 4: the original (generated) logits can be converted using a softmax layer (with temperature value equal to 1) to hard targets or so called labels (or hard labels), L0 and LI in this example (more details below). For example, in applying softmax to the logits, the temperature parameter, denoted as T, may be introduced. The softmax function with zz temperature is defined as: max ( where the result is the probability of j=_Oe ^T class i, zi is the logit for class i, and the sum in the denominator is taken over all classes j . The temperature parameter T serves as a scaling factor. It can have different effects on the output probabilities: When T is set to 1, the softmax behaves as usual and produces a standard probability distribution. The class with the highest probability is the hard label assignment. When T is greater than 1, it softens the distribution, making the probabilities for all classes more equal. This means the model becomes less certain and more exploratory in its predictions. When T is less than 1, it sharpens the distribution, emphasizing the logits of the most confident classes. The model becomes more deterministic in its predictions. In a multiclass classification problem, after applying softmax with temperature, the hard label (L0 and LI in this example) can be derived by selecting the class with the maximum probability, like: L0 = argmaxi(softmax (zi)). A conventional loss term (Lc) may be obtained by subtracting the hard label from the correct label or ground truth. This conventional loss term may be used to adjust or train the weights of the ML model at each gNB. However, as noted, to improve the training accuracy and robustness of the ML models, each ML model may also be trained based on the mimicry loss term (Lm) (or based on a combination of both Lc and Lm). [0092] Step 5: The loss function of both model 0 and model 1 may include two terms: the conventional loss term (Lc, via supervised learning) concerns with prediction of the labels correctly (through conventional loss Lc), and the mimicry loss term (Lm) penalizes the (Kullback Leibler) divergence of the obtained posterior probability distributions from that particular model (e.g., model 0) and the received logits from the model in the other node (e.g., model 1). The conventional loss term Lc for gNBl may be determined, e.g., as a difference between the correct label and the hard label L 1 (or ground truth) output by softmax function 622-1. Similarly, the conventional loss term Lc for gNBO may be determined, e.g., as a difference between the correct label (or ground truth, known by the gNB during training) and the hard label L0 output by softmax function 622-0. Also, as noted above, the mimicry loss term (Lm) (determined by each of the nodes) may be determined, e.g., as a difference between the locally generated perturbed logit, and the received logit from the other node. Thus, mimicry loss term (Lm) for gNB 1 may be determined as a difference between perturbed logits Pgl and PgO. Each gNB may adapt or train the weights of its ML model based on either conventional loss term (Lc) or mimicry loss term (Lm), or based on both the conventional loss term (Lc) and the mimicry loss term (Lc). Hence, in a collaborative manner, each ML model learns to not only predict the correct labels for the input examples (example inputs received for training) but also to mimic the peer model probability distribution which has richer information than only the labels.

[0093] Note that the soft targets (logits, such as perturbed logits received from the other node(s)) have usually higher entropy than hard labels and therefore provide more information per training example. Additionally, training based at least on the mimicry loss term (Lm, based on the logits received from the other node), the gradient between training examples will have less variance (as compared to training just based on conventional loss term, Lm), which means ML model training based on other node’s logits (and thus based on mimicry loss term, Lm) will cause a higher learning rate, and will thus cause a faster convergence for ML model training. Hence, using logits received from one or more other ML models for training may result in faster learning, with a smaller number of examples needed and the obtained solution will be more stable, because the mismatches in the probability distribution obtained from two models will be penalized (via the mimicry loss term, Lm). Thus, nodes (e.g., UEs or gNBs or other nodes) exchanging logits with other nodes and using these logits received from other nodes to generate a mimicry loss term (Lm), and using this mimicry loss term (Lm) to adapt or train a local ML model has significant technical advantages over using only a conventional loss term (Lc) for ML model training.

[0094] In a multi-vendor scenario (e.g., where models and/or gNBs that are collaborating are provided by different vendors or manufacturers), by injecting carefully calibrated noise into these shared logit values, DP (differential privacy) ensures that no party can determine the specific impact of another party's detailed data, thus preserving individual privacy. This noise addition fosters trust and encourages collaboration between nodes, allowing models to improve collectively without revealing sensitive details about any party's data.

[0095] FIG. 7 is a signaling chart that illustrates an example message exchange and operations that may be performed by two nodes to perform mutual learning or collaborative learning of ML models for the same radio function. Although the message exchange in FIG> 7 is shown as being between two network nodes or gNBs, such message exchange (and sharing of logits) may be between two UEs, between two gNBs, between a UE and gNB, or between other types of nodes. Also, each node may receive logits from multiple other nodes (and thus may determine multiple mimicry loss terms, based on logits received from each of the other nodes), not just one other node as shown in FIG. 7. Each of gNBO and gNBl may have a ML model (ML model 0 and ML model 1, respectively) to perform a same radio function, e.g., UE positioning.

[0096] At step 0 of FIG. 7, as part of a capability exchange or other communication, gNBO may transmit to gNBl, and gNBl may transmit to gNBO, information indicating a capability of the node to share logits, e.g., as part of mutual or collaborative learning, and/or a capability to perform logits perturbation (to provide perturbed logits), including possibly a parameter (or range of values) indicating a level(s) of perturbation or level(s) of differential privacy that are supported by the node. Also, within capability exchange (step 0), or within other message(s) (such as messages 1-4), the node may indicate one or more functions for which logits sharing is enabled, and may list a set of classes for each function for which logits may be shared or provided to other nodes. As described hereinbelow, the messages 1-4 shown in FIG. 7 may include a data collection (or data exchange) request (requesting logits reporting for an indicated radio function), a data collection (or data exchange) response confirming that the receiving node can provide the requested logits, and then a data collection or data exchange report in which a node provides or reports the requested logits to the other node.

[0097] As part of step 0 (capability exchange) of FIG. 7, or as part of the messages at steps 1-4 of FIG. 7, or within other communications or messages exchanged between the nodes, a perturbation of logits capability, and/or a level of perturbation and/or level of privacy may be indicated by each node (as being supported by the node) and/or may be requested by each node, so that the nodes may agree on a level of differential privacy or perturbation to be applied to the logits before they are transmitted to (shared with) the other node/gNB. Thus, for example, as part of a capability exchange between the nodes (not shown), or as part of messages 1-4 of FIG. 7, or via other communication, the gNBs (e.g., gNBO and gNBl) that plan to collaborate (e.g., via exchange of logits) for ML model training or mutual learning, may each provide the other node(s) with an indication that perturbation of logits will (or can) be performed. Also, within messages 1-4, or via other communication or messages, an indication of a privacy level or level or perturbation may be requested (a requested privacy level or level of perturbation) and/or indicated (an indicated level of privacy or level of perturbation that can be provided by the node) by each node. Thus, each node, for example: 1) may indicate to the other node an indication of a privacy level or level of perturbation that is requested, and/or 2) may indicate to the other node an indication of a privacy level or level of perturbation that is supported or can be provided. For example, the indication of privacy level or level of perturbation that is requested and/or can be provided or is supported by a node, may be indicated via one or more values or parameters, e.g., such as one or more of: 1) a differential privacy parameter indicating the privacy level or level of perturbation of the logits; or 2) a parameter related to the privacy level or level of perturbation, including at least one of the following: a mean; a standard deviation; or an epsilon value.

[0098] At step 1 of FIG. 7, gNBl transmits a data exchange request (or data collection request) to request a reporting of logits generated by gNBO, which may include a function identifier that identifies a specific radio function(s) for which logits reporting is requested, e.g., UE positioning. At step 2, gNBO transmits a data exchange response (or data collection response) to gNBl confirming that gNBO can provide the requested logits. Likewise, at step 3, gNBO transmits a data exchange request (or data collection request) to request a reporting of logits generated by gNBl, which may include a function identifier that identifies a specific radio function for which logits reporting is requested, e.g., UE positioning. At step 4, gNBl transmits a data exchange response to gNBO (or data collection response) confirming that gNBO can provide the requested logits.

[0099] At step 5, gNBl trains its ML model 1 and obtains logits as an output of the ML model 1. At step 6, gNBl transmits to gNBO the requested logits (which may be perturbed, or not perturbed, depending on if the nodes agreed to exchange perturbed logits) that are generated by ML model 1. At step 7, gNBO may use the received logits from gNBl to calculate or determine a mimicry loss term (Lm), which is used to adapt or train the weights of ML model 0 at step 8. Also, at step 8, gNBO trains its ML model 0 to generate its logits, which are then transmitted or reported to gNB 1 (as either logits or perturbed logits, if the nodes agreed to exchange perturbed logits) via message at step 9. At step 10, gNBl may use the received logits from gNBO (along with its own locally generated logits) to determine a mimicry loss term, which is used to adapt or train weights of ML model 1.

[0100] Therefore, messages or signals of FIG. 7 may include, for example, some different messages, such as: AIML Data Exchange Request - this message may be used by a requesting gNB to share its own prediction classes and request for logits from a neighboring gNB. AIML Data Exchange Response - this message may be used by a reporting gNB to respond to a AIML Data Exchange Request sharing its own list of prediction classes to a requesting gNB. AIML Data Exchange Report - this message may be used by the reporting gNB to report its logits values associated with its own prediction classes to the requesting gNB. Example Message Structure:

AIML Data Exchange Request

{

List of Prediction Classes in Requesting gNB

{

Enum values for Prediction Classes

}

AIML Data Exchange Response

{

List of Prediction Classes in Reporting gNB

{

Enum (enumerated) values for Prediction Classes (values for c 1lasses x) 20

}

AIML Data Exchange Report

{

List of Logits per Prediction Class in Reporting gNB

{

Logit value

Confidence value for each Logit

}

[0101] Examples for Prediction Classes: 1) Inferences made in LOS (line of sight) and NLOS (non-line of sight) environments may be considered as different prediction types with their own logits values; and 2) Inferences made for different Positioning methods

[0102] Sharing Prediction Classes (sharing classes with another node(s)): The logit shared by a gNB or node should be usable appropriately in the neighbor gNB or node. For this purpose, the prediction class may or should be available along with the logit. For example, each prediction class may have a different ML model in a gNB. Thus, the right ML model may be chosen based on the prediction class and then the corresponding logit may be consumed by that ML model (e.g., the corresponding logit may be used to determine the mimicry loss term for ML model training). Also, note that UE positioning is used as an illustrative example use case or example radio function. The same mutual (or collaborative) training concept and associated signaling and/or principles described herein may could be used in a variety of other use cases (or various radio functions) that may have such collaborative ML models. In FIG. 7, the call flow or signaling diagram shows the signaling for XnAP protocol (e.g., gNB to gNB signaling), but similar messages or signaling may be provided or implemented between other types of nodes (e.g., UE to UE, UE to gNB, . . .) and/or may be used or implemented in other protocols such as NRPPa, RRC etc.

[0103] Inference Mode:

[0104] FIG. 8 is a diagram illustrating workflow for ML model inference for an example radio function of UE positioning, e.g., using the trained ML model to predict one or more classes for an indicated radio function based on a set of inputs or query. In this example, ML model inputs may include sounding reference signal (SRS) configuration and layer 1 (LI) measurements, which are input to both the ML model and ML model inference function 810, and to legacy NG-RAN functionality 520 (to generate assistance information and UL measurements).

[0105] Model Inputs: 1) SRS Configuration, including Number of Ports, Comb Size, Start Position, Number of Symbols, Repetition Factor, Resource Type - Periodic/Aperiodic; and 2) LI Measurements, including and SRS Channel Impulse Response, Propagation Delay UL-SRS RSRP.

[0106] Model Outputs: Primary measurements or intermediate features: UL Angle of Arrival, UL Relative Time of Arrival, gNB Rx-Tx Time Difference, Time of Measurement, Current Time, Measurement Quality or reference signal received quality (RSRQ), Beam Information, SRS Resource Type, and/or LoS/NLoS Information. A predicted class (output by the ML model and ML inference function 810 operating in inference mode) for one or more primary measurements or intermediate features may be used for UE positioning, or may be forwarded to LMF 530 to be used for UE positioning.

[0107] For example, for the radio function, different classes may have different UL angles of arrival; and other intermediate features (UL relative time of arrival, beam information, LoS/NLoS information, etc.) or combination of these. These primary measurements (which may also be referred to as intermediate features) may be converted to position, e.g., by the node or by a location management function LMF (530). The ML model for UE positioning may be used to predict primary measurement(s) or intermediate feature(s), e.g., the class having the highest probability. Or class(es) can include values for a combination of multiple predicted primary measurements intermediate features.

[0108] FIG. 9 is a diagram illustrating collaborative or mutual inference for two machine learning (ML) models through sharing of logits within a wireless network. In the inference example of FIG. 9, each gNB or node may determine a set of aggregated logits (based on its own generated logits and logits received from the other node). A predicted class may be determined as the aggregated logit having a highest probability (e.g., a highest probability of being correct). As shown in FIG. 9, two nodes are shown, as gNBO and gNBl. gNBO may have ML model 0 to perform a radio function (or assist in performing a radio function), such as UE positioning. gNBl may have ML model 1 to perform or assist in performing the same radio function, e.g., UE positioning. Inputs (or radio measurements) or query xO may be input to ML model 0, and, inputs (or radio measurements) or query xl may be input to ML model 1 (operating in inference mode). gNBl generates a first (locally generated) set of logits (including logit Pl) by applying the set of inputs or radio measurements to ML model 1. gNBO generates a second (locally generated) set of logits (including logit P0 as an example) by applying the set of inputs or radio measurements to ML model 1. The logits generated by model 1 are output via line 912 (including logit Pl), and the logits generated by model 0 are output via line 920. gNB 1 may share its locally generated second set of logits with gNBO via line 914. Similarly, gNBO may share its locally generated first set of logits with gNBO via line 914. The shared logits may be non-perturbed logits, or perturbed logits (see above for description of perturbed logits). Thus, each node may apply differential privacy or perturbation to the logits before sharing them with the other node. The aggregation may be aggregation of non-perturbed logits or perturbed logits. The rest of the description of this example (as shown in FIG. 9) is shown and described with non-perturbed logits, but it is understood that these techniques or operations described with reference to FIG. 9 may be applied to (or performed with respect to) perturbed logits, e.g., to generate an aggregated perturbed logit(s) (as an aggregation of respective perturbed logits).

[0109] Thus, as shown in FIG. 9, gNB 1 may receive the second set of logits (generated by gNBO) from gNBO via line 918. gNBl may aggregate respective logits of the first set of logits (generated locally by ML model 1 at gNBl) and the second set of logits (generated at gNBO and set to gNBl) to obtain a set of aggregated logits. For example, gNBl may aggregate locally generated logit Pl and received logit P0, e.g., as respective logits, to obtain aggregated logit 916, and this aggregation may be repeated for respective logits of the first set of logits and the received second set of logits, to obtain a set of aggregated logits. gNB 1 may apply each aggregated logit to a softmax function 922-1 to obtain a set of classes for the radio function of UE positioning. In the set of classes, there may be one class per aggregated logit, to obtain a set of classes based on the aggregated logits. A probability is assigned to or associated with each class. gNBl may determine a predicted label (LI) that is a class (of the set of classes) having a highest probability (e.g., a highest probability of being correct). For example, for the example radio function of UE positioning, the predicted label (LI) may be the class (associated with an UL angle of arrival), based on both a local logit and a received logit from the other node, that has a highest probability of being correct. In this manner, gNB 1 may determine a predicted class that is based on both locally generated first set of logits and a second set of logits that were generated by a ML model of gNBO, and then shared with gNBl, e.g., to improve the accuracy or robustness of ML model prediction while operating in inference mode. This predicted label (LI), or predicted UL angle of arrival for the UE, may be used by the gNB 1 to perform UE positioning for the UE, or may be forwarded to the LMF to be used for performing UE positioning.

[0110] Also, as shown in FIG. 9, gNBO may perform the same or similar process to determine a predicted label L0 based on the locally generated second set of logits and the first set of logits received from gNBl. gNBO may aggregate locally generated logit P0 and received logit P0, e.g., as respective logits, to obtain aggregated logit, and this aggregation may be repeated for respective logits of the second set of logits and the received first set of logits, to obtain a set of aggregated logits. gNBO may apply each aggregated logit to a softmax function 922-0 to obtain a set of classes for the radio function of UE positioning. In the set of classes, there may be one class per aggregated logit, to obtain a set of classes based on the aggregated logits. A probability is assigned to or associated with each class. gNBO may determine a predicted label (L0) that is a class (of the set of classes) having a highest probability (e.g., a highest probability of being correct). The predicted class, or the intermediate measurement(s) or set of intermediate measurements associated with the predicted class, may be used for UE positioning or may be forwarded to another node, such as LMF 530, to perform UE positioning.

[0111] In inference mode, aggregating logits from another node with locally generated logits to generate a label (based on the aggregated logit(s)) may provide a technical advantage of producing more accurate or more robust predictions based on more diverse data set and/or based on multiple ML models that have been developed (e.g., based on different data sets) for the same function. [0112] In order to aggregate logits of two (or multiple) different ML models, the inputs to the ML models may be different, but the output classes should typically be the same. For example, a first ML model receives as an input, a channel impulse response, and outputs LOS/NLOS logit or class. And a second ML model receives as inputs time of arrival input outputs LOS/NLOS logit or class. Each LOS/NLOS logit may be converted to a class by applying the logit to the softmax function. In this example, both ML models output predicted logits for LOS/NLOS classes, but may have different inputs. A node or gNB may aggregate its logits with logits of other model that are similar to same to its own logits (e.g., for same classes), and the node may receive many different logits from different other nodes.

[0113] With reference to FIG. 9, the following steps or operations may be performed, for example (as scenario 1).

[0114] Step 1: ML model 0 and ML model 1 perform forward pass:

[0115] Model 0 forward pass: ML model 0 may be, for example, a sequence model such as Recurrent Neural Network (RNN), which as input receives the input sequence represents network signal data over a period of time, with each time step denoted as [s_tl, s_t2, ..., s_tn]. The signal sequence is passed through an embedding layer to convert the raw signal values into continuous representations. The embedded sequence is then processed through, for example, recurrent layers (e.g., LSTM (long short term memory) to capture temporal dependencies in the data. The final hidden state from the recurrent layers is used as input to a for example, fully connected layer, producing logits [logit_a, logit_b, logit c] representing the unnormalized scores. The logits are passed through the softmax activation function to convert logits to classes.

[0116] ML model 1 forward pass: The same input sequence [s_tl, s_t2, ..., s_tn] is used as input features for the feedforward neural network. The input features are fed into the input layer of the feedforward network. The input is processed through one or more hidden layers with activation functions (e.g., ReLU) to capture non-linear relationships. The final hidden layer output is used as input to a fully connected layer, producing logits [logit a, logit b, logit c] representing the unnormalized scores.. Similar to Model 1, the logits are passed through the softmax activation function, resulting in a probability distribution over multiple gNBs.

[0117] Step 2: ML model 0 shares the computed logits (or perturbed logits using differential privacy) with ML model 1 for example as: [logit al , logit b 1, logit c 1] . Also ML model 1 shares its computed logits with ML model 0 (or perturbed logits using differential privacy). For example, as [logit_a2, logit_b2, logit_c2]. [0118] Step 3 : ML model 0 and 1 each one computes an aggregated logit computation from their own and received logits, for example as: [logit al + logit_a2, logit_b 1+ logit_b2, logit_cl + logit_c2]. Thus, aggregation of two logits may include a mathematical function being applied to the two logits, e.g., addition subtraction, weighted addition, averaging, etc. Note that more sophisticated techniques can be used for logits aggregation such as assigning weights to the logits based on model confidence on this data.

[0119] Step 4: The aggregated logits in each model (0 and 1) can be used for the prediction.

[0120] A scenario 2: The data that arrives for the two models (0 and 1) can be different (e.g., RAN or LMF interactions).

[0121] As in this case the data that arrives for inference in each model can belong to a different class label, a direct aggregation of model's logits may not be performed for the two models as these logits might not be both representing data from a particular class label. In this case, for example, a buffer may be used for each model which may store every logit (or multiple logits over a time period) that the model receives from the other model. Then, after initial logit collecting for a time period, whenever an inference is needed, after computing its logits (in the same way that is explained above) each model needs to find the M closest (using a clustering approach such K-means) logits to the current computed ones and aggregate them. In this way each model posterior distribution computed by its parameters will be aligned with what the other model posterior distribution (belief) for the similar classes.

[0122] A number of examples will now be described.

[0123] Example 1. An apparatus (e.g., 1300, FIG. 10;) comprising: at least one processor (e.g., processor 1304, FIG. 10); and at least one memory (e.g., memory 1306, FIG. 10) storing instructions that, when executed by the at least one processor (1304), cause the apparatus at least to: receive (e.g., 210, FIG. 2; step 0, FIG. 7), by a first node from a second node within a wireless communications network, information indicating a capability of the second node to share logits with other nodes; transmit (e.g., 220, FIG. 2; step 1, FIG. 7), by the first node to the second node, a data collection request to request a reporting of logits generated by the second node, wherein the data collection request comprises a function identifier identifying a radio function for which logits reporting is requested; receive (e.g., 230, FIG. 2; step 3, FIG. 7), by the first node from the second node, a data collection response accepting the data collection request; receive (e.g., 240, FIG. 2; step 9, FIG. 7), by the first node from the second node, a report including a first set of logits (e.g., gNBl receives logits or perturbed logits) from gNBO, FIG. 6) for the identified radio function; generate (e.g., 250, FIG. 2; locally generated logits, such as locally generated logit Pl by model 1, FIG. 6), by the first node using the machine learning model for the identified radio function based on a set of radio measurements, a second set of logits; and train, by the first node, a machine learning model of the first node based at least in part on a difference between the first set of logits received from the second node and the second set of logits generated by the first node to minimize or at least decrease a mimicry loss term (e.g., gNBl generates a mimicry loss term, Lm, based on difference between received logit(s) and locally generated logit(s), FIG. 6, and weights of ML model 1 are adjusted or trained based at least on the mimicry loss term, Lm, FIG. 6).

[0124] Example 2. The apparatus of example 1, wherein the apparatus is further caused to: apply the trained machine learning model in inference mode to perform the radio function. [0125] Example 3. The apparatus of any of examples 1-2, wherein the apparatus is further caused to perform at least one of the following: transmit, by the first node to the second node, a first capability information indicating a capability of the first node for collaborative learning for machine learning models based on a sharing of logits; and receive, by the first node from the second node, a second capability information indicating a capability of the second node for collaborative learning for machine learning models based on a sharing of logits.

[0126] Example 4. The apparatus of example 3, wherein: the first capability information comprises at least one of the following: a set of one or more function identifiers identifying one or more radio functions for which the first node may request and/or provide logits; and a set of classes for each function identifier for which the first node may request and/or provide logits.

[0127] Example 5. The apparatus of claim 3, wherein: the second capability information comprises at least one of the following: a set of one or more function identifiers identifying one or more radio functions for which the second node may request and/or provide logits; and a set of classes for each function identifier for which the second node may request and/or provide logits.

[0128] Example 6. The apparatus of any of examples 1-5, wherein the data collection request comprises at least one of: a periodicity of logits reporting; and/or an indication of one or more classes for which logits are requested to be reported.

[0129] Example 7. The apparatus of any of examples 1-6, wherein the received logits comprise perturbed logits. [0130] Example 8. The apparatus of any of examples 1-7, wherein the apparatus caused to receive logits comprises the apparatus caused to: receive, by the first node from the second node, the report including perturbed logits for the identified radio function, wherein one perturbed logit is provided for each of a plurality of classes, and wherein the perturbed logits comprise logits generated by a machine learning model of the second node which are perturbed and then transmitted to the first node.

[0131] Example 9. The apparatus of any of examples 1-8, wherein the apparatus is further caused to: transmit, by the first node to the second node, an indication of a privacy level or level of perturbation that is requested by the first node for logits.

[0132] Example 10. The apparatus of example 9, wherein the indication of a privacy level or level of perturbation is provided by the first node within either the data collection request or the first capability information, and includes at least one of the following parameters: a differential privacy parameter indicating the privacy level or level of perturbation of the logits; or a parameter related to the privacy level or level of perturbation, including at least one of the following: a mean; a standard deviation; or an epsilon value.

[0133] Example 11. The apparatus of any of claims 1-10, wherein the apparatus is further caused to: receive, by the first node from the second node, an indication of a privacy level or level of perturbation that can be provided by the second node for logits.

[0134] Example 12. The apparatus of example 11, wherein the indication of a privacy level or level of perturbation is provided by the second node within either the data collection response or the second capability information, and includes at least one of the following parameters: a differential privacy parameter indicating the privacy level or level of perturbation of the logits; or a parameter related to the privacy level or level of perturbation, including at least one of the following: a mean; a standard deviation; or an epsilon value.

[0135] Example 13. The apparatus of any of examples 1-12, wherein the apparatus is further caused to: determine, by the first node, a first set of logits generated by the machine learning model of the first node with respect to a first radio function; receive, by the first node from the second node, a second set of logits generated by a machine learning model of the second node with respect to the first function; determine, by the first node, a mimicry loss term based on the first set of logits and the received second set of logits; wherein the apparatus caused to train comprises the apparatus caused to train or adapt weights of the machine learning model of the first node based at least in part on the mimicry loss term.

[0136] Example 14. The apparatus of any of examples 1-13, wherein the first node and second node each comprises one of a gNB, Centralized Unit (CU), a Distributed Unit (DU) or other network node, a user terminal a user equipment or other user device, or a relay node.

[0137] Example 15. The apparatus of example 2, wherein the apparatus caused to apply the trained machine learning model in inference mode comprises the apparatus caused to: receive, by the first node from the second node, a third set of logits for the identified radio function; generate, by the first node, a fourth set of logits for the identified radio function; aggregate, by the first node, respective logits of the third set and fourth set of logits to obtain a set of aggregated logits; determine a set of classes based on the set of aggregated logits, wherein a class of the set of classes is provided for each of the aggregated logits, wherein a probability is assigned to each class; and determine a predicted label that is a class having a highest assigned probability.

[0138] Example 16. The apparatus of example 15, wherein the apparatus is further caused to perform at least one of the following: use the predicted class to perform or assist in performing the identified radio function; or forward the predicted class to another node to be used to perform or assist in performing the identified radio function.

[0139] Example 17. The apparatus of any of examples 15-16, wherein the apparatus caused to generate a fourth set of logits comprises the apparatus caused to: receive or determine, by the first node, a set of inputs for the trained machine learning model; and apply the set of inputs to the trained machine learning model to generate, as outputs of the trained machine learning model, a fourth set of logits for the identified radio function. [0140] Example 18. The apparatus of any of examples 15-17, wherein the third set of logits are generated by another machine learning model at the second node for the identified radio function.

[0141] Example 19. The apparatus of any of examples 15-18, wherein the apparatus caused to determine a set of classes based on the set of aggregated logits comprises the apparatus caused to: apply each aggregated logit, of the set of aggregated logits, to a softmax function to obtain the set of classes for the identified radio function.

[0142] Example 20. The apparatus of any of examples 15-19, wherein the predicted label is associated with a set of one or more intermediate features or measurements that are used, by the first node or another node, to perform the identified radio function.

[0143] Example 21. The apparatus of any of examples 1-20, wherein the identified radio function comprises at least one of the following radio functions: user device positioning; beam prediction; antenna panel or beam control; channel state information feedback; link monitoring; transmit power control; and/or performing handovers or assisting with handovers. [0144] Example 22. The apparatus of any of claims 1-21, wherein the identified radio function comprises user device positioning.

[0145] Example 23. An apparatus (e.g., 1300, FIG. 10;) comprising: at least one processor (e.g., processor 1304, FIG. 10); and at least one memory (e.g., memory 1306, FIG. 10) storing instructions that, when executed by the at least one processor (1304), cause the apparatus at least to: generate (e.g., 310, FIG. 3), by a first node, a first set of logits for a radio function by applying a set of inputs or radio measurements to a machine learning model of the first node; receive (e.g., 320, FIG. 3), by the first node from a second node in a wireless network, a second set of logits for the radio function; aggregate (e.g., 330, FIG. 3) respective logits of the first set and the second set of logits to obtain a set of aggregated logits; determine (e.g., 340, FIG. 3) a set of classes based on the aggregated logits, wherein a class of the set of classes is provided for each of the aggregated logits, and wherein a probability is assigned to each class; and determine (e.g., 350, FIG. 3) a predicted label that is a class, of the set of classes, having a highest assigned probability.

[0146] Example 24. The apparatus of example 23, wherein the apparatus is further caused to perform at least one of the following: use the predicted class to perform or assist in performing the identified radio function; or forward the predicted class to another node to be used to perform or assist in performing the identified radio function.

[0147] Example 25. The apparatus of any of examples 23-24, wherein the apparatus caused to generate a first set of logits comprises the apparatus caused to: receive or determine, by the first node, a set of inputs for the first machine learning model; and apply the set of inputs to the trained machine learning model to generate, as outputs of the trained machine learning model, the first set of logits for the identified radio function.

[0148] Example 26. The apparatus of any of examples 23-25, wherein the second set of logits are generated by another machine learning model at the second node for the identified radio function.

[0149] Example 1. The apparatus of any of examples 23-26, wherein the apparatus caused to determine a set of classes based on the set of aggregated logits comprises the apparatus caused to: apply each aggregated logit, of the set of aggregated logits, to a softmax function to obtain the set of classes for the identified radio function.

[0150] Example 28. The apparatus of any of examples 23-27, wherein the predicted label is associated with a set of one or more intermediate features or measurements that are used, by the first node or another node, to perform or assist in performing the identified radio function. [0151] Example 29. The apparatus of any of examples 23-28, wherein the identified radio function comprises at least one of the following radio functions: user device positioning; beam prediction; antenna panel or beam control; channel state information feedback; link monitoring; transmit power control; and/or performing handovers or assisting with handovers.

[0152] Example 30. Example 1. An apparatus (e.g., 1300, FIG. 10;) comprising: at least one processor (e.g., processor 1304, FIG. 10); and at least one memory (e.g., memory 1306, FIG. 10) storing instructions that, when executed by the at least one processor (1304), cause the apparatus at least to: receive (e.g., 410, FIG. 4), by a second node from a first node within a wireless communications network, a data collection request to request a reporting of logits generated by the second node, wherein the data collection request comprises a function identifier identifying a radio function for which logits reporting is requested; transmit (e.g., 420, FIG. 4), by the second node to the first node, a data collection response accepting the data collection request; and, transmit (e.g., 440, FIG. 4), by the second node to the first node, a report including logits for the identified function to enable the first node to train a machine learning model based on the reported logits.

[0153] Example 31. A method comprising receiving, by a first node from a second node within a wireless communications network, information indicating a capability of the second node to share logits with other nodes; transmitting, by the first node to the second node, a data collection request to request a reporting of logits generated by the second node, wherein the data collection request comprises a function identifier identifying a radio function for which logits reporting is requested; receiving, by the first node from the second node, a data collection response accepting the data collection request; receiving, by the first node from the second node, a report including a first set of logits for the identified radio function; generating, by the first node using the machine learning model for the identified radio function based on a set of radio measurements, a second set of logits; and training, by the first node, a machine learning model of the first node based at least in part on a difference between the first set of logits received from the second node and the second set of logits generated by the first node to minimize or at least decrease a mimicry loss term.

[0154] Example 32. The method of example 31, further comprising: applying the trained machine learning model in inference mode to perform the radio function.

[0155] Example 33. The method of any of examples 31-32, further comprising performing at least one of the following: transmitting, by the first node to the second node, a first capability information indicating a capability of the first node for collaborative learning for machine learning models based on a sharing of logits; and receiving, by the first node from the second node, a second capability information indicating a capability of the second node for collaborative learning for machine learning models based on a sharing of logits. [0156] Example 34. The method of example 33, wherein: the first capability information comprises at least one of the following: a set of one or more function identifiers identifying one or more radio functions for which the first node may request and/or provide logits; and a set of classes for each function identifier for which the first node may request and/or provide logits.

[0157] Example 35. The method of example 33, wherein: the second capability information comprises at least one of the following: a set of one or more function identifiers identifying one or more radio functions for which the second node may request and/or provide logits; and a set of classes for each function identifier for which the second node may request and/or provide logits.

[0158] Example 36. The method of any of examples 31-35, wherein the data collection request comprises at least one of: a periodicity of logits reporting; and/or an indication of one or more classes for which logits are requested to be reported.

[0159] Example 37. The method of any of examples 31-36, wherein the received logits comprise perturbed logits.

[0160] Example 38. The method of any of examples 31-37, wherein the receiving logits comprises receiving, by the first node from the second node, the report including perturbed logits for the identified radio function, wherein one perturbed logit is provided for each of a plurality of classes, and wherein the perturbed logits comprise logits generated by a machine learning model of the second node which are perturbed and then transmitted to the first node. [0161] Example 39. The method of any of examples 31-38, further comprising: transmitting, by the first node to the second node, an indication of a privacy level or level of perturbation that is requested by the first node for logits.

[0162] Example 40. The method of example 39, wherein the indication of a privacy level or level of perturbation is provided by the first node within either the data collection request or the first capability information, and includes at least one of the following parameters: a differential privacy parameter indicating the privacy level or level of perturbation of the logits; or a parameter related to the privacy level or level of perturbation, including at least one of the following: a mean; a standard deviation; or an epsilon value.

[0163] Example 41. The method of any of examples 31-40, further comprising receiving, by the first node from the second node, an indication of a privacy level or level of perturbation that can be provided by the second node for logits.

[0164] Example 42. The method of example 41, wherein the indication of a privacy level or level of perturbation is provided by the second node within either the data collection response or the second capability information, and includes at least one of the following parameters: a differential privacy parameter indicating the privacy level or level of perturbation of the logits; or a parameter related to the privacy level or level of perturbation, including at least one of the following: a mean; a standard deviation; or an epsilon value.

[0165] Example 43. The method of any of examples 31-42, further comprising: determining, by the first node, a first set of logits generated by the machine learning model of the first node with respect to a first radio function; receiving, by the first node from the second node, a second set of logits generated by a machine learning model of the second node with respect to the first function; determining, by the first node, a mimicry loss term based on the first set of logits and the received second set of logits; wherein the training comprises training or adapting weights of the machine learning model of the first node based at least in part on the mimicry loss term.

[0166] Example 44. The method of any of examples 31-43, wherein the first node and second node each comprises one of a gNB, Centralized Unit (CU), a Distributed Unit (DU) or other network node, a user terminal a user equipment or other user device, or a relay node. [0167] Example 45. The method of example 32, wherein the applying the trained machine learning model in inference mode comprises receiving, by the first node from the second node, a third set of logits for the identified radio function; generating, by the first node, a fourth set of logits for the identified radio function; aggregating, by the first node, respective logits of the third set and fourth set of logits to obtain a set of aggregated logits; determining a set of classes based on the set of aggregated logits, wherein a class of the set of classes is provided for each of the aggregated logits, wherein a probability is assigned to each class; and determine a predicted label that is a class having a highest assigned probability.

[0168] Example 46. The method of example 45, further comprising performing at least one of the following: using the predicted class to perform or assist in performing the identified radio function; or forwarding the predicted class to another node to be used to perform or assist in performing the identified radio function.

[0169] Example 47. The method of any of examples 45-56, wherein the generating a fourth set of logits comprises: receiving or determining, by the first node, a set of inputs for the trained machine learning model; and applying the set of inputs to the trained machine learning model to generate, as outputs of the trained machine learning model, a fourth set of logits for the identified radio function.

[0170] Example 48. The method of any of examples 45-47, wherein the third set of logits are generated by another machine learning model at the second node for the identified radio function.

[0171] Example 49. The method of any of examples 45-48, wherein the determining a set of classes based on the set of aggregated logits comprises applying each aggregated logit, of the set of aggregated logits, to a softmax function to obtain the set of classes for the identified radio function.

[0172] Example 50. The method of any of examples 45-49, wherein the predicted label is associated with a set of one or more intermediate features or measurements that are used, by the first node or another node, to perform the identified radio function.

[0173] Example 51. The method of any of examples 31-50, wherein the identified radio function comprises at least one of the following radio functions: user device positioning; beam prediction; antenna panel or beam control; channel state information feedback; link monitoring; transmit power control; and/or performing handovers or assisting with handovers.

[0174] Example 52. The method of any of examples 31-51, wherein the identified radio function comprises user device positioning.

[0175] Example 53. A method comprising: generating, by a first node, a first set of logits for a radio function by applying a set of inputs or radio measurements to a machine learning model of the first node; receiving, by the first node from a second node in a wireless network, a second set of logits for the radio function; aggregating respective logits of the first set and the second set of logits to obtain a set of aggregated logits; determining a set of classes based on the aggregated logits, wherein a class of the set of classes is provided for each of the aggregated logits, and wherein a probability is assigned to each class; and determining a predicted label that is a class, of the set of classes, having a highest assigned probability.

[0176] Example 54. The method of example 53, further comprising performing at least one of the following: using the predicted class to perform or assist in performing the identified radio function; or forwarding the predicted class to another node to be used to perform or assist in performing the identified radio function.

[0177] Example 55. The method of any of examples 53-54, wherein the generating the first set of logits comprises: receiving or determining, by the first node, a set of inputs for the first machine learning model; and applying the set of inputs to the trained machine learning model to generate, as outputs of the trained machine learning model, the first set of logits for the identified radio function.

[0178] Example 56. The method of any of examples 53-55, wherein the second set of logits are generated by another machine learning model at the second node for the identified radio function.

[0179] Example 57. The method of any of examples 53-56, wherein the determining the set of classes based on the set of aggregated logits comprises: applying each aggregated logit, of the set of aggregated logits, to a softmax function to obtain the set of classes for the identified radio function.

[0180] Example 58. The method of any of examples 53-57, wherein the predicted label is associated with a set of one or more intermediate features or measurements that are used, by the first node or another node, to perform or assist in performing the identified radio function. [0181] Example 59. The method of any of examples 53-58, wherein the identified radio function comprises at least one of the following radio functions: user device positioning; beam prediction; antenna panel or beam control; channel state information feedback; link monitoring; transmit power control; and/or performing handovers or assisting with handovers.

[0182] Example 60. A method comprising: receiving, by a second node from a first node within a wireless communications network, a data collection request to request a reporting of logits generated by the second node, wherein the data collection request comprises a function identifier identifying a radio function for which logits reporting is requested; transmitting, by the second node to the first node, a data collection response accepting the data collection request; and transmitting, by the second node to the first node, a report including logits for the identified function to enable the first node to train a machine learning model based on the reported logits.

[0183] FIG. 10 is a block diagram of a wireless station or node (e.g., UE, user device, AP, BS, eNB, gNB, RAN node, network node, TRP, or other node) 1300 according to an example embodiment. The wireless station 1300 may include, for example, one or more (e.g., two as shown in FIG. 10) RF (radio frequency) or wireless transceivers 1302A, 1302B, where each wireless transceiver includes a transmitter to transmit signals and a receiver to receive signals. The wireless station also includes a processor or control unit/entity (controller) 1304 to execute instructions or software and control transmission and receptions of signals, and a memory 1306 to store data and/or instructions.

[0184] Processor 1304 may also make decisions or determinations, generate frames, packets or messages for transmission, decode received frames or messages for further processing, and other tasks or functions described herein. Processor 1304, which may be a baseband processor, for example, may generate messages, packets, frames or other signals for transmission via wireless transceiver 1302 (1302A or 1302B). Processor 1304 may control transmission of signals or messages over a wireless network, and may control the reception of signals or messages, etc., via a wireless network (e.g., after being down-converted by wireless transceiver 1302, for example). Processor 1304 may be programmable and capable of executing software or other instructions stored in memory or on other computer media to perform the various tasks and functions described above, such as one or more of the tasks or methods described above. Processor 1304 may be (or may include), for example, hardware, programmable logic, a programmable processor that executes software or firmware, and/or any combination of these. Using other terminology, processor 1304 and transceiver 1302 together may be considered as a wireless transmitter/receiver system, for example.

[0185] In addition, referring to FIG. 10, a controller (or processor) 1308 may execute software and instructions, and may provide overall control for the station 1300, and may provide control for other systems not shown in FIG. 10, such as controlling input/output devices (e.g., display, keypad), and/or may execute software for one or more applications that may be provided on wireless station 1300, such as, for example, an email program, audio/video applications, a word processor, a Voice over IP application, or other application or software.

[0186] In addition, a storage medium may be provided that includes stored instructions, which when executed by a controller or processor may result in the processor 1304, or other controller or processor, performing one or more of the functions or tasks described above. [0187] According to another example embodiment, RF or wireless transceiver(s) 1302A/1302B may receive signals or data and/or transmit or send signals or data. Processor 1304 (and possibly transceivers 1302A/1302B) may control the RF or wireless transceiver 1302A or 1302B to receive, send, broadcast or transmit signals or data.

[0188] Example embodiments are provided or described for each of the example methods, including: An apparatus (e.g., 1300, FIG. 10) including means (e.g., processor 1304, RF transceivers 1302A and/or 1302B, and/or memory 1306, in FIG. 10) for carrying out any of the methods; a non-transitory computer-readable storage medium (e.g., memory 1306, FIG. 10) comprising instructions stored thereon that, when executed by at least one processor (processor 1304, FIG. 10), are configured to cause a computing system (e.g., 1300, FIG. 10) to perform any of the example methods; and an apparatus (e.g., 1300, FIG. 10) including at least one processor (e.g., processor 1304, FIG. 10), and at least one memory (e.g., memory 1306, FIG. 10) including computer program code, the at least one memory (1306) and the computer program code configured to, with the at least one processor (1304), cause the apparatus (e.g., 1300) at least to perform any of the example methods.

[0189] Embodiments of the various techniques described herein may be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations of them. Embodiments may be implemented as a computer program product, i.e., a computer program tangibly embodied in an information carrier, e.g., in a machine-readable storage device or in a propagated signal, for execution by, or to control the operation of, a data processing apparatus, e.g., a programmable processor, a computer, or multiple computers. Embodiments may also be provided on a computer readable medium or computer readable storage medium, which may be a non-transitory medium. Embodiments of the various techniques may also include embodiments provided via transitory signals or media, and/or programs and/or software embodiments that are downloadable via the Internet or other network(s), either wired networks and/or wireless networks. In addition, embodiments may be provided via machine type communications (MTC), and also via an Internet of Things (IOT).

[0190] As used in this application, the term ‘circuitry’ or “circuit” refers to all of the following: (a) hardware-only circuit implementations, such as implementations in only analog and/or digital circuitry, and (b) combinations of circuits and soft-ware (and/or firmware), such as (as applicable): (i) a combination of processor(s) or (ii) portions of processor(s)/software including digital signal processor(s), software, and memory(ies) that work together to cause an apparatus to perform various functions, and (c) circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present. This definition of ‘circuitry’ applies to all uses of this term in this application. As a further example, as used in this application, the term ‘circuitry’ would also cover an implementation of merely a processor (or multiple processors) or a portion of a processor and its (or their) accompanying software and/or firmware. The term ‘circuitry’ would also cover, for example and if applicable to the particular element, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network device, or another network device.

[0191] The computer program may be in source code form, object code form, or in some intermediate form, and it may be stored in some sort of carrier, distribution medium, or computer readable medium, which may be any entity or device capable of carrying the program. Such carriers include a record medium, computer memory, read-only memory, photoelectrical and/or electrical carrier signal, telecommunications signal, and software distribution package, for example. Depending on the processing power needed, the computer program may be executed in a single electronic digital computer, or it may be distributed amongst a number of computers.

[0192] Furthermore, embodiments of the various techniques described herein may use a cyber-physical system (CPS) (a system of collaborating computational elements controlling physical entities). CPS may enable the embodiment and exploitation of massive amounts of interconnected ICT devices (sensors, actuators, processors microcontrollers,...) embedded in physical objects at different locations. Mobile cyber physical systems, in which the physical system in question has inherent mobility, are a subcategory of cyber-physical systems. Examples of mobile physical systems include mobile robotics and electronics transported by humans or animals. The rise in popularity of smartphones has increased interest in the area of mobile cyber-physical systems. Therefore, various embodiments of techniques described herein may be provided via one or more of these technologies.

[0193] A computer program, such as the computer program(s) described above, can be written in any form of programming language, including compiled or interpreted languages, and can be deployed in any form, including as a standalone program or as a module, component, subroutine, or other unit or part of it suitable for use in a computing environment. A computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.

[0194] Method steps may be performed by one or more programmable processors executing a computer program or computer program portions to perform functions by operating on input data and generating output. Method steps also may be performed by, and an apparatus may be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).

[0195] Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer, chip or chipset. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both.

Elements of a computer may include at least one processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer also may include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. Information carriers suitable for embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and the memory may be supplemented by, or incorporated in, special purpose logic circuitry.

[0196] To provide for interaction with a user, embodiments may be implemented on a computer having a display device, e.g., a cathode ray tube (CRT) or liquid crystal display (LCD) monitor, for displaying information to the user and a user interface, such as a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input.

[0197] Embodiments may be implemented in a computing system that includes a backend component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a frontend component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an embodiment, or any combination of such backend, middleware, or frontend components. Components may be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (LAN) and a wide area network (WAN), e.g., the Internet.

[0198] While certain features of the described embodiments have been illustrated as described herein, many modifications, substitutions, changes and equivalents will now occur to those skilled in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes as fall within the true spirit of the various embodiments.

Claims

WHAT IS CLAIMED IS:

1. An apparatus comprising: at least one processor; and at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to: receive, by a first node from a second node within a wireless communications network, information indicating a capability of the second node to share logits with other nodes; transmit, by the first node to the second node, a data collection request to request a reporting of logits generated by the second node, wherein the data collection request comprises a function identifier identifying a radio function for which logits reporting is requested; receive, by the first node from the second node, a data collection response accepting the data collection request; receive, by the first node from the second node, a report including a first set of logits for the identified radio function; generate, by the first node using the machine learning model for the identified radio function based on a set of radio measurements, a second set of logits; and train, by the first node, a machine learning model of the first node based at least in part on a difference between the first set of logits received from the second node and the second set of logits generated by the first node to minimize or at least decrease a mimicry loss term.

2. The apparatus of claim 1, wherein the apparatus is further caused to: apply the trained machine learning model in inference mode to perform the radio fiinction.

3. The apparatus of any of claims 1-2, wherein the apparatus is fiirther caused to perform at least one of the following: transmit, by the first node to the second node, a first capability information indicating a capability of the first node for collaborative learning for machine learning models based on a sharing of logits; and receive, by the first node from the second node, a second capability information indicating a capability of the second node for collaborative learning for machine learning models based on a sharing of logits.

4. The apparatus of claim 3, wherein: the first capability information comprises at least one of the following: a set of one or more function identifiers identifying one or more radio functions for which the first node may request and/or provide logits; and a set of classes for each function identifier for which the first node may request and/or provide logits.

5. The apparatus of claim 3, wherein: the second capability information comprises at least one of the following: a set of one or more function identifiers identifying one or more radio functions for which the second node may request and/or provide logits; and a set of classes for each function identifier for which the second node may request and/or provide logits.

6. The apparatus of any of claims 1-5, wherein the data collection request comprises at least one of: a periodicity of logits reporting; and/or an indication of one or more classes for which logits are requested to be reported.

7. The apparatus of any of claims 1-6, wherein the received logits comprise perturbed logits.

8. The apparatus of any of claims 1-7, wherein the apparatus caused to receive logits comprises the apparatus caused to: receive, by the first node from the second node, the report including perturbed logits for the identified radio function, wherein one perturbed logit is provided for each of a plurality of classes, and wherein the perturbed logits comprise logits generated by a machine learning model of the second node which are perturbed and then transmitted to the first node.

9. The apparatus of any of claims 1-8, wherein the apparatus is further caused to: transmit, by the first node to the second node, an indication of a privacy level or level of perturbation that is requested by the first node for logits.

10. The apparatus of claim 9, wherein the indication of a privacy level or level of perturbation is provided by the first node within either the data collection request or the first capability information, and includes at least one of the following parameters: a differential privacy parameter indicating the privacy level or level of perturbation of the logits; or a parameter related to the privacy level or level of perturbation, including at least one of the following: a mean; a standard deviation; or an epsilon value.

11. The apparatus of any of claims 1-10, wherein the apparatus is further caused to: receive, by the first node from the second node, an indication of a privacy level or level of perturbation that can be provided by the second node for logits.

12. The apparatus of claim 11, wherein the indication of a privacy level or level of perturbation is provided by the second node within either the data collection response or the second capability information, and includes at least one of the following parameters: a differential privacy parameter indicating the privacy level or level of perturbation of the logits; or a parameter related to the privacy level or level of perturbation, including at least one of the following: a mean; a standard deviation; or an epsilon value.

13. The apparatus of any of claims 1-12, wherein the apparatus is further caused to: determine, by the first node, a first set of logits generated by the machine learning model of the first node with respect to a first radio Junction; receive, by the first node from the second node, a second set of logits generated by a machine learning model of the second node with respect to the first function; determine, by the first node, a mimicry loss term based on the first set of logits and the received second set of logits; wherein the apparatus caused to train comprises the apparatus caused to train or adapt weights of the machine learning model of the first node based at least in part on the mimicry loss term.

14. The apparatus of any of claims 1-13, wherein the first node and second node each comprises one of a gNB, Centralized Unit (CU), a Distributed Unit (DU) or other network node, a user terminal a user equipment or other user device, or a relay node.

15. The apparatus of claim 2, wherein the apparatus caused to apply the trained machine learning model in inference mode comprises the apparatus caused to: receive, by the first node from the second node, a third set of logits for the identified radio function; generate, by the first node, a fourth set of logits for the identified radio function; aggregate, by the first node, respective logits of the third set and fourth set of logits to obtain a set of aggregated logits; determine a set of classes based on the set of aggregated logits, wherein a class of the set of classes is provided for each of the aggregated logits, wherein a probability is assigned to each class; and determine a predicted label that is a class having a highest assigned probability.

16. The apparatus of claim 15, wherein the apparatus is further caused to perform at least one of the following: use the predicted class to perform or assist in performing the identified radio function; or forward the predicted class to another node to be used to perform or assist in performing the identified radio function.

17. The apparatus of any of claims 15-16, wherein the apparatus caused to generate a fourth set of logits comprises the apparatus caused to: receive or determine, by the first node, a set of inputs for the trained machine learning model; and apply the set of inputs to the trained machine learning model to generate, as outputs of the trained machine learning model, a fourth set of logits for the identified radio function.

18. The apparatus of any of claims 15-17, wherein the third set of logits are generated by another machine learning model at the second node for the identified radio function.

19. The apparatus of any of claims 15-18, wherein the apparatus caused to determine a set of classes based on the set of aggregated logits comprises the apparatus caused to: apply each aggregated logit, of the set of aggregated logits, to a softmax function to obtain the set of classes for the identified radio function.

20. The apparatus of any of claims 15-19, wherein the predicted label is associated with a set of one or more intermediate features or measurements that are used, by the first node or another node, to perform the identified radio function.

21. The apparatus of any of claims 1-20, wherein the identified radio function comprises at least one of the following radio functions: user device positioning; beam prediction; antenna panel or beam control; channel state information feedback; link monitoring; transmit power control; and/or performing handovers or assisting with handovers.

22. The apparatus of any of claims 1-21, wherein the identified radio function comprises user device positioning.

23. An apparatus comprising: at least one processor; and at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to: generate, by a first node, a first set of logits for a radio function by applying a set of inputs or radio measurements to a machine learning model of the first node; receive, by the first node from a second node in a wireless network, a second set of logits for the radio function; aggregate respective logits of the first set and the second set of logits to obtain a set of aggregated logits; determine a set of classes based on the aggregated logits, wherein a class of the set of classes is provided for each of the aggregated logits, and wherein a probability is assigned to each class; and determine a predicted label that is a class, of the set of classes, having a highest assigned probability.

24. The apparatus of claim 23, wherein the apparatus is further caused to perform at least one of the following: use the predicted class to perform or assist in performing the identified radio function; or forward the predicted class to another node to be used to perform or assist in performing the identified radio Junction.

25. The apparatus of any of claims 23-24, wherein the apparatus caused to generate a first set of logits comprises the apparatus caused to: receive or determine, by the first node, a set of inputs for the first machine learning model; and apply the set of inputs to the trained machine learning model to generate, as outputs of the trained machine learning model, the first set of logits for the identified radio function.

26. The apparatus of any of claims 23-25, wherein the second set of logits are generated by another machine learning model at the second node for the identified radio function.

27. The apparatus of any of claims 23-26, wherein the apparatus caused to determine a set of classes based on the set of aggregated logits comprises the apparatus caused to: apply each aggregated logit, of the set of aggregated logits, to a softmax function to obtain the set of classes for the identified radio function.

28. The apparatus of any of claims 23-27, wherein the predicted label is associated with a set of one or more intermediate features or measurements that are used, by the first node or another node, to perform or assist in performing the identified radio function.

29. The apparatus of any of claims 23-28, wherein the identified radio function comprises at least one of the following radio functions: user device positioning; beam prediction; antenna panel or beam control; channel state information feedback; link monitoring; transmit power control; and/or performing handovers or assisting with handovers.

30. An apparatus comprising: at least one processor; and at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to: receive, by a second node from a first node within a wireless communications network, a data collection request to request a reporting of logits generated by the second node, wherein the data collection request comprises a function identifier identifying a radio function for which logits reporting is requested; transmit, by the second node to the first node, a data collection response accepting the data collection request; transmit, by the second node to the first node, a report including logits for the identified function to enable the first node to train a machine learning model based on the reported logits.