[go: up one dir, main page]

WO2023224533A1 - Nœuds et procédés de rapport de csi basés sur un apprentissage automatique (ml) - Google Patents

Nœuds et procédés de rapport de csi basés sur un apprentissage automatique (ml) Download PDF

Info

Publication number
WO2023224533A1
WO2023224533A1 PCT/SE2023/050474 SE2023050474W WO2023224533A1 WO 2023224533 A1 WO2023224533 A1 WO 2023224533A1 SE 2023050474 W SE2023050474 W SE 2023050474W WO 2023224533 A1 WO2023224533 A1 WO 2023224533A1
Authority
WO
WIPO (PCT)
Prior art keywords
common
decoder
encoder
training
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/SE2023/050474
Other languages
English (en)
Inventor
Athanasios KARAPANTELAKIS
Roy TIMO
Konstantinos Vandikas
Maxim TESLENKO
Hossein SHOKRI GHADIKOLAEI
Abdulrahman ALABBASI
Lackis ELEFTHERIADIS
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Telefonaktiebolaget LM Ericsson AB
Original Assignee
Telefonaktiebolaget LM Ericsson AB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget LM Ericsson AB filed Critical Telefonaktiebolaget LM Ericsson AB
Publication of WO2023224533A1 publication Critical patent/WO2023224533A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B17/00Monitoring; Testing
    • H04B17/30Monitoring; Testing of propagation channels
    • H04B17/309Measuring or estimating channel quality parameters
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/098Distributed learning, e.g. federated learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B17/00Monitoring; Testing
    • H04B17/30Monitoring; Testing of propagation channels
    • H04B17/391Modelling the propagation channel
    • H04B17/3913Predictive models, e.g. based on neural network models

Definitions

  • the embodiments herein relate to nodes and methods for ML-based CSI reporting.
  • a corresponding computer program and a computer program carrier are also disclosed.
  • wireless devices also known as wireless communication devices, mobile stations, stations (STA) and/or User Equipments (UE), communicate via a Local Area Network such as a Wi-Fi network or a Radio Access Network (RAN) to one or more core networks (CN).
  • the RAN covers a geographical area which is divided into service areas or cell areas. Each service area or cell area may provide radio coverage via a beam or a beam group.
  • Each service area or cell area is typically served by a radio access node such as a radio access node e.g., a Wi-Fi access point or a radio base station (RBS), which in some networks may also be denoted, for example, a NodeB, eNodeB (eNB), or gNB as denoted in 5G.
  • a radio access node such as a radio access node e.g., a Wi-Fi access point or a radio base station (RBS), which in some networks may also be denoted, for example, a NodeB, eNodeB (eNB), or gNB as denoted in 5G.
  • a service area or cell area is a geographical area where radio coverage is provided by the radio access node.
  • the radio access node communicates over an air interface operating on radio frequencies with the wireless device within range of the radio access node.
  • the Evolved Packet System also called a Fourth Generation (4G) network
  • EPS comprises the Evolved Universal Terrestrial Radio Access Network (E-UTRAN), also known as the Long Term Evolution (LTE) radio access network
  • EPC Evolved Packet Core
  • SAE System Architecture Evolution
  • E- UTRAN/LTE is a variant of a 3GPP radio access network wherein the radio access nodes are directly connected to the EPC core network rather than to RNCs used in 3G networks.
  • the functions of a 3G RNC are distributed between the radio access nodes, e.g. eNodeBs in LTE, and the core network.
  • the RAN of an EPS has an essentially “flat” architecture comprising radio access nodes connected directly to one or more core networks, i.e. they are not connected to RNCs.
  • the E-UTRAN specification defines a direct interface between the radio access nodes, this interface being denoted the X2 interface.
  • Figure 1 illustrates a simplified wireless communication system.
  • a UE 12 which communicates with one or multiple access nodes 103-104, which in turn is connected to a network node 106.
  • the access nodes 103-104 are part of the radio access network 10.
  • the access nodes 103-104 corresponds typically to Evolved NodeBs (eNBs) and the network node 106 corresponds typically to either a Mobility Management Entity (MME) and/or a Serving Gateway (SGW).
  • MME Mobility Management Entity
  • SGW Serving Gateway
  • the eNB is part of the radio access network 10, which in this case is the E-UTRAN (Evolved Universal Terrestrial Radio Access Network), while the MME and SGW are both part of the EPC (Evolved Packet Core network).
  • the eNBs are inter-connected via the X2 interface, and connected to EPC via the S1 interface, more specifically via S1-C to the MME and S1-U to the SGW.
  • the access nodes 103-104 corresponds typically to an 5G NodeB (gNB) and the network node 106 corresponds typically to either an Access and Mobility Management Function (AMF) and/or a User Plane Function (UPF).
  • the gNB is part of the radio access network 10, which in this case is the NG-RAN (Next Generation Radio Access Network), while the AMF and UPF are both part of the 5G Core Network (5GC).
  • the gNBs are inter-connected via the Xn interface, and connected to 5GC via the NG interface, more specifically via NG-C to the AMF and NG-U to the UPF.
  • LTE eNBs may also be connected to the 5G-CN via NG-U/NG-C and support the Xn interface.
  • An eNB connected to 5GC is called a next generation eNB (ng-eNB) and is considered part of the NG-RAN.
  • LTE connected to 5GC will not be discussed further in this document; however, it should be noted that most of the solutions/features described for LTE and NR in this document also apply to LTE connected to 5GC. In this document, when the term LTE is used without further specification it refers to LTE-EPC.
  • NR uses Orthogonal Frequency Division Multiplexing (OFDM) with configurable bandwidths and subcarrier spacing to efficiently support a diverse set of use-cases and deployment scenarios.
  • OFDM Orthogonal Frequency Division Multiplexing
  • NR improves deployment flexibility, user throughputs, latency, and reliability.
  • the throughput performance gains are enabled, in part, by enhanced support for Multi-User Multiple-Input Multiple-Output (MU-MIMO) transmission strategies, where two or more UEs receives data on the same time frequency resources, i.e. , by spatially separated transmissions.
  • MU-MIMO Multi-User Multiple-Input Multiple-Output
  • FIG. 2 illustrates an example transmission and reception chain for MU-MIMO operations. Note that the order of modulation and precoding, or demodulation and combining respectively, may differ depending on the implementation of MU-MIMO transmission.
  • a multi-antenna base station with NTX antenna ports is simultaneously, e.g., on the same OFDM time-frequency resources, transmitting information to several UEs: the sequence S (1) is transmitted to is transmitted to UE(2), and so on.
  • An antenna port may be a logical unit which may comprise one or more antenna elements. Before modulation and transmission, precoding is applied to each sequence to mitigate multiplexing interference - the transmissions are spatially separated.
  • Each UE demodulates its received signal and combines receiver antenna signals to obtain an estimate S® of the transmitted sequence.
  • This estimate S® for UE / may be expressed as (neglecting other interference and noise sources except the MU-MIMO interference)
  • the second term represents the spatial multiplexing interference, due to MU-MIMO transmission, seen by UE(j).
  • a goal for a wireless communication network may be to construct a set of precoders to meet a given target.
  • One such target may be to make - the norm 1
  • , j # i small (this norm represents the interference of user i’s transmission received by user j).
  • the precoder Wy 1 - 1 shall correlate well with the channel H® observed by UE(j) whereas it shall correlate poorly with the channels observed by other UEs.
  • SRS Sounding Reference Signals
  • the wireless communication network may directly estimate the uplink channel from SRS and, therefore (by reciprocity), the downlink channel H®.
  • the wireless communication network cannot always accurately estimate the downlink channel from uplink reference signals.
  • the uplink and downlink channels use different carriers and, therefore, the uplink channel may not provide enough information about the downlink channel to enable MU-MIMO precoding.
  • FDD frequency division duplex
  • the wireless communication network may only be able to estimate part of the uplink channel using SRS because UEs typically have fewer TX branches than RX branches (in which case only certain columns of the precoding matrix may be estimated using SRS). This situation is known as partial channel knowledge.
  • CSI-RS Channel State Information reference signals
  • the UE estimates the downlink channel (or important features thereof such as eigenvectors of the channel or the Gram matrix of the channel, one or more eigenvectors that correspond to the largest eigenvalues of an estimated channel covariance matrix, one or more Discrete Fourier Transform (DFT) base vectors (described on the next page), or orthogonal vectors from any other suitable and defined vector space, that best correlates with an estimated channel matrix, or an estimated channel covariance matrix, the channel delay profile), for each of the N antenna ports from the transmitted CSI-RS.
  • DFT Discrete Fourier Transform
  • the UE reports CSI (e.g., channel quality index (CQI), precoding matrix indicator (PMI), rank indicator (Rl)) to the wireless communication network over an uplink control channel and/or over a data channel.
  • CSI e.g., channel quality index (CQI), precoding matrix indicator (PMI), rank indicator (Rl)
  • the wireless communication network uses the UE’s feedback, e.g., the CSI reported from the UE, for downlink user scheduling and MIMO precoding.
  • both Type I and Type II reporting is configurable, where the CSI Type II reporting protocol has been specifically designed to enable MU -Ml MO operations from uplink UE reports, such as the CSI reports.
  • the CSI Type II normal reporting mode is based on the specification of sets of Discrete Fourier Transform (DFT) basis functions in a precoder codebook.
  • the UE selects and reports L DFT vectors from the codebook that best match its channel conditions (like the classical codebook precoding matrix indicator (PMI) from earlier 3GPP releases).
  • the number of DFT vectors L is typically 2 or 4 and it is configurable by the wireless communication network.
  • the UE reports how the L DFT vectors should be combined in terms of relative amplitude scaling and co-phasing.
  • Algorithms to select L, the L DFT vectors, and co-phasing coefficients are outside the specification scope - left to UE and network implementation. Or, put another way, the 3gpp Rel. 16 specification only defines signaling protocols to enable the above message exchanges.
  • DFT beams will be used interchangeably with DFT vectors. This slight shift of terminology is appropriate whenever the base station has a uniform planar array with antenna elements separated by half of the carrier wavelength.
  • the CSI type II normal reporting mode is illustrated in Figure 3, and described in 3gpp TS 38.214 “Physical layer procedures for data (Release 16). The selection and reporting of the L DFT vectors b n and their relative amplitudes a n is done in a wideband manner; that is, the same beams are used for both polarizations over the entire transmission frequency band.
  • the selection and reporting of the DFT vector co-phasing coefficients are done in a subband manner; that is, DFT vector co-phasing parameters are determined for each of multiple subsets of contiguous subcarriers.
  • the co-phasing parameters are quantized such that e j9n is taken from either a Quadrature phase-shift keying (QPSK) or 8-Phase Shift Keying (8PSK) signal constellation.
  • QPSK Quadrature phase-shift keying
  • 8PSK 8-Phase Shift Keying
  • the precoder W v f J reported by the UE to the network can be expressed as follows:
  • the Type II CSI report can be used by the network to co-schedule multiple UEs on the same OFDM time-frequency resources. For example, the network can select UEs that have reported different sets of DFT vectors with weak correlations.
  • the CSI Type II report enables the UE to report a precoder hypothesis that trades CSI resolution against uplink transmission overhead.
  • NR 3GPP Release 15 supports Type II CSI feedback using port selection mode, in addition to the above normal reporting mode. In this case,
  • the base station transmits a CSI-RS port in each one of the beam directions.
  • the UE does not use a codebook to select a DFT vector (a beam), instead the UE selects one or multiple antenna ports from the CSI-RS resource of multiple ports.
  • Type II CSI feedback using port selection gives the base station some flexibility to use non-standardized precoders that are transparent to the UE.
  • the precoder reported by the UE can be described as follows
  • the vector e is a unit vector with only one non-zero element, which can be viewed as a selection vector that selects a port from the set of ports in the measured CSI- RS resource.
  • the UE thus feeds back which ports it has selected, the amplitude factors and the co-phasing factors.
  • An AE is a type of Neural Network (NN) that may be used to compress and decompress data in an unsupervised manner.
  • NN Neural Network
  • Unsupervised learning is a type of machine learning in which the algorithm is not provided with any pre-assigned labels or scores for the training data.
  • unsupervised learning algorithms may first self-discover any naturally occurring patterns in that training data set. Common examples include clustering, where the algorithm automatically groups its training examples into categories with similar features, and principal component analysis, where the algorithm finds ways to compress the training data set by identifying which features are most useful for discriminating between different training examples and discarding the rest. This contrasts with supervised learning in which the training data include pre-assigned category labels, often by a human, or from the output of non-learning classification algorithm.
  • Figure 4a illustrates a fully connected (dense) AE.
  • the AE may be divided into two parts: an encoder (used to compress the input data ), and a decoder (used to recover important features of the input data).
  • the encoder and decoder are separated by a bottleneck layer that holds a compressed representation, Y in Figure 4a, of the input data X.
  • the variable Y is sometimes called the latent representation of the input X. More specifically,
  • the size of the bottleneck (latent representation) Y is smaller than the size of the input data X.
  • the AE encoder thus compresses the input features X to Y.
  • the decoder part of the AE tries to invert the encoder’s compression and reconstruct X with minimal error, according to some predefined loss function.
  • AEs may have different architectures. For example, AEs may be based on dense NNs (like Figure 4a), multi-dimensional convolution NNs, recurrent NNs, transformer NNs, or any combination thereof. However, all AEs architectures possess an encoder- bottleneck-decoder structure.
  • Figure 4b illustrates how an AE may be used for Al-based CSI reporting in NR during an inference phase (that is, during live network operation).
  • the UE estimates the downlink channel (or important features thereof) using configured downlink reference signal(s), e.g., CSI-RS.
  • configured downlink reference signal(s) e.g., CSI-RS.
  • the UE estimates the downlink channel as a 3D complex-valued tensor, with dimensions defined by the gNB’s Tx-antenna ports, the UE’s Rx antenna ports, and frequency units (the granularity of which is configurable, e.g., SubCarrier (SC) or subband).
  • SC SubCarrier
  • the 3D complex-valued tensor is illustrated as a rectangular hexahedron with lengths of the sides defined by the gNB’s Tx-antenna ports, the UE’s Rx antenna ports, and frequency (SC).
  • the UE uses a trained AE encoder to compress the estimated channel or important features thereof down to a binary codeword.
  • the binary codework is reported to the network over an uplink control channel and/or data channel.
  • this codeword will likely form one part of a channel state information (CSI) report that may also include rank, channel quality, and interference information.
  • CSI may be used for MU -Ml MO precoding to shape an “energy pattern” of a wireless signal transmitted by the gNB.
  • the network uses a trained AE decoder to reconstruct the estimated channel or the important features thereof.
  • the decompressed output of the AE decoder is used by the network in, for example, MIMO precoding, scheduling, and link adaption.
  • the architecture of an AE may need to be tailored for each particular use case, e.g., for CSI reporting.
  • the tailoring may be achieved via a process called hyperparameter tuning.
  • properties of the data e.g., CSI-RS channel estimates
  • the channel size e.g., uplink feedback rate, and hardware limitations of the encoder and decoder may all need to be considered when designing the AE’s architecture.
  • the training datasets need to be representative of the actual data the AE will encounter during live operation in a network.
  • the training process involves numerically tuning the AE’s trainable parameters (e.g., the weights and biases of the underlying NN) to minimize a loss function on the training datasets.
  • the loss function may be, for example, the Mean Squared Error (MSE) loss calculated as the average of the squared error between the UE’s downlink channel estimate H and the network’s reconstruction H, i.e. , ⁇ H - H
  • MSE Mean Squared Error
  • the training process is typically based on some variant of the gradient descent algorithm, which, at its core, comprises three components: a feedforward step, a back propagation step, and a parameter optimization step.
  • a feedforward step i.e., a dense NN with a bottleneck layer, see Figure 4a.
  • Feedforward A batch of training data, such as a mini-batch, (e.g., several downlink-channel estimates) is pushed through the AE, from the input to the output.
  • the loss function is used to compute the reconstruction loss for all training samples in the batch.
  • the reconstruction loss may be an average reconstruction loss for all training samples in the batch.
  • BP Back propagation
  • the gradients (partial derivatives of the loss function, L, with respect to each trainable parameter in the AE) are computed.
  • the back propagation algorithm sequentially works backwards from the AE output, layer-by-layer, back through the AE to the input.
  • the back propagation algorithm is built around the chain rule for differentiation: When computing the gradients for layer n in the AE, it uses the gradients for layer n + 1. For a dense AE with N layers the back propagation calculations for layer n may be expressed with the following equations where * here denotes the Hadamard multiplication of two vectors.
  • a core idea here is to make small adjustments to each parameter with the aim of reducing the loss over the (mini) batch. It is common to use special optimizers to update the AE’s trainable parameters using gradient information. The following optimizers are widely used to reduce training time and improving overall performance: adaptive subgradient methods (AdaGrad), RMSProp, and adaptive moment estimation (ADAM).
  • AdaGrad adaptive subgradient methods
  • RMSProp RMSProp
  • ADAM adaptive moment estimation
  • An acceptable level of performance may refer to the AE achieving a pre-defined average reconstruction error over the training dataset (e.g., normalized MSE of the reconstruction error over the training dataset is less than, say, 0.1).
  • it may refer to the AE achieving a pre-defined user data throughput gain with respect to a baseline CSI reporting method (e.g., a MIMO precoding method is selected, and user throughputs are separately estimated for the baseline and the AE CSI reporting methods).
  • the above steps use numerical methods (e.g., gradient descent) to optimize the AE’s trainable parameters (e.g., weights and biases).
  • the training process typically involves optimizing many other parameters (e.g., higher-level hyperparameters that define the model or the training process).
  • Some example hyperparameters are as follows:
  • the architecture of the AE e.g., convolutional, transformer
  • types of layers e.g., dense
  • Architecture-specific parameters e.g., the number of nodes per layer in a dense network, or the kernel sizes of a convolutional network.
  • the depth or size of the AE e.g., number of layers.
  • the mini-batch size (e.g., the number of channel samples fed into each iteration of the above training steps).
  • the regularization method e.g., weight regularization or dropout
  • Additional validation datasets may be used to tune such hyperparameters.
  • the process of designing or creating an AE may be expensive - consuming significant time, compute, memory, and power resources.
  • AE-based CSI reporting is of interest for 3GPP Rel 18 “AI/ML on PHY” study item for example because of the following reasons:
  • AEs may include non-linear transformations (e.g., activation functions) that help improve compression performance and, therefore, help improve MU-MI MO performance for the same uplink overhead.
  • non-linear transformations e.g., activation functions
  • the normal Type II CSI codebooks in 3GPP Rel 16 are based on linear DFT transformations and Singular Value Decomposition (SVD), which cannot fully exploit redundancies in the channel for compression.
  • AEs may be trained to exploit long-term redundancies in the propagation environment and/or site (e.g., antenna configuration) for compression purposes. For example, a particular AE does not need to work well for all possible deployments. Improved compression performance is obtained by learning which channel inputs it needs to (and doesn’t need to) reliably reconstruct at the base-station. AEs may be trained to compensate for antenna array irregularities, including, for example, non-uniformly spaced antenna elements and non-half wavelength element spacing.
  • the Type II CSI codebooks in Rel 15 and 16 use a two- dimensional DFT codebook designed for a regular planar array with perfect half wavelength element spacing.
  • AEs may be trained to be robust against, or updated (e.g., via transfer learning and training) to compensate for partially failing hardware as the massive MIMO product ages. For example, over time one or more of the multiple Tx and Rx radio chains in the massive MIMO antenna arrays at the base station may fail compromising the effectiveness of Type II CSI feedback.
  • Transfer learning implies that parts of a previous neural network that has learned a different but often related task is transferred to the current network in order to speed up the learning process of the current network.
  • the AE training process may be a highly iterative process that may be expensive - consuming significant time, compute, memory, and power resources.
  • AE architecture design and training will largely be performed offline, e.g., in a development environment, using appropriate compute infrastructure, training data, validation data, and test data.
  • Data for training, validation, and testing may be collected from one or more of the following examples: real measurements recorded in live networks, synthetic radio channel data from, e.g., 3GPP channel models or ray tracing models and/or digital twins, and mobile drive tests.
  • Validation data may be part of the development and tuning of the NN, whereas the test data may be applied to the final NN.
  • a “validation dataset” may be used to optimize AE hyperparameters (like its architecture).
  • two different AE architectures may be trained on the same training dataset. Then the performance of the two trained AE architectures may be validated on the validation dataset. The architecture with the best performance on the validation dataset may be kept for the inference phase.
  • validation may be performed on the same data set as the training, but on “unseen” data samples (e.g. taken from the same source). Test may be performed on a new data set, usually from another source and it tests the NN ability to generalize.
  • the training of the AE in Figure 4c has some similarities with split NNs, where an NN is split into two or more sections and where each section consists of one or several consecutive layers of the NN.
  • These sections of the NN may be in different entities/nodes and each entity may perform both feedforward and back propagations. For example, in the case of splitting the NN into two sections, the feedforward outputs of a first section are pushed to a second section. Conversely, in the back propagation step, the gradients of the first layer of the second section are pushed into the last layer of the first section.
  • the split NN (a.k.a. split learning) was introduced primarily to address privacy issues with user data.
  • the privacy (proprietary) aspects of the sections are of interest, and training channel data may need to be shared to calculate reconstruction errors.
  • the AE encoder ⁇ s in the UE and the AE decoder is in the wireless communications network, usually in the radio access network.
  • the UE and the wireless communications network are typically represented by different vendors (manufactures), and, therefore, the AE solution needs to be viewed from a multi-vendor perspective with potential standardization (e.g., 3GPP standardization) impacts.
  • the UE performs channel encoding and the network performs channel decoding.
  • the channel encoders have been specified in 3GPP, which ensures that the UE’s behaviour is understood by the network and may be tested.
  • the channel decoders are left for implementation (vendor proprietary).
  • AE-based CSI encoders for use in the UEs
  • the corresponding AE decoders in the network may be left for implementation (e.g., constructed in a proprietary manner by training the decoders against specified AE encoders.
  • Figure 4d illustrates a network vendor training of an AE decoder with a specified (untrainable) AE encoder.
  • a training method for the decoder may comprise comparing a loss function of the channel and the decoded channel, or some features thereof, computing the gradients (partial derivatives of the loss function, L, with respect to each trainable parameter in the AE) by back propagation, and updating the decoder weights and biases.
  • Channel coding has a long and well-developed academic literature that enabled 3GPP to pre-select a few candidate architectures (or types); namely, turbo codes, linear parity check codes, and polar codes.
  • Channel codes may all be mathematically described as linear mappings that, in turn, may be written into a standard. Therefore, synthetic channel models may be sufficient to design, study, compare, and specify channel codes for 5G.
  • - AEs for CSI feedback have more architectural options and require many tuneable parameters (possibly hundreds of thousands). It is preferred that the AEs are trained, at least in part, on real field data that accurately represents live, in-network, conditions.
  • AE encoder or AE decoder, or both may be standardized in a first scenario, o Training within 3GPP (e.g., NN architectures, weights and biases are specified), o Training outside 3GPP (e.g., NN architectures are specified), o Signalling for AE-based CSI reporting/configuration are specified,
  • AE encoder and AE decoder may be implementation specific (vendor proprietary) in a second scenario, o Interfaces to the AE encoder and AE decoder are specified, o Signalling for AE-based CSI reporting/configuration are specified.
  • AE-based CSI reporting has at least the following implementation/standardization challenges and issues to solve:
  • the AE encoder and the AE decoder may be complicated NNs with thousands of tuneable parameters (e.g., weights and biases) that potentially need to be open and shared, e.g., through signalling, between the network and UE vendors.
  • tuneable parameters e.g., weights and biases
  • the UE’s compute and/or power resources are limited so the AE encoder will likely need to be known in advance to the UE such that the UE implementation may be optimized for its task. o
  • the AE encoder’s architecture will most likely need to match chipset vendors hardware, and the model (with weights and biases possibly fixed) will need to be compiled with appropriate optimizations.
  • the process of compiling the AE encoder may be costly in time, compute, power, and memory resources. Moreover, the compilation process requires specialized software tool chains to be installed and maintained on each UE.
  • the AE may depend on the UE’s, and/or network’s, antenna layout and RF chains, meaning that many different trained AEs (NNs) may be required to support all types of base station and UE designs.
  • the AE design is data driven meaning that the AE performance will depend on the training data.
  • a specified AE (either encoder or decoder or both) developed using synthetic training data (e.g., specified 3GPP channel models) may not generalize well to radio channels observed in real deployments.
  • synthetic training data e.g., specified 3GPP channel models
  • overfitting means that the AE generalizes poorly to real data, or data observed in field, e.g., the AE achieves good performance on the training dataset, but when used in the real work, e.g. on the test set, it has poor performance.
  • the joint training procedure may protect proprietary implementations of the AE encoder and decoder; that is, it may not expose details of the encoder and/or decoder trained weights and loss function to the other party.
  • UEs potentially need to implement different encoders for different NW vendor decoders.
  • NW vendors may need to implement different decoders for different UE/chipset vendors encoders.
  • a UE will need to handover and/or roam between gNBs of different NW vendors (it is commonplace for a mobile network operator to use different NW vendors).
  • a gNB will serve UEs that have different radio chipset vendors.
  • the CSI encoders and decoders may be built on proprietary architectures and thus be different among radio chipset vendors. The same is true for NW vendors.
  • both UE and NW need to either maintain all models into memory or switch between models when needed. Both approaches are computationally expensive.
  • Maintaining models in memory is a power-consuming task, as multiple models of different complexity will be required to be stored in a form of volatile memory, such as most cases of Random-Access Memory (RAM).
  • RAM Random-Access Memory
  • NB-loT narrowband- loT
  • NW network’s
  • the NW constructs a training dataset for each UE AE encoder by logging the UE’s CSI report received over the air interface (the AE encoder output) together with the NW’s SRS-based estimate of the UL channel.
  • the resulting dataset may then be used to train the NW’s AE decoder without having to know the UE’s AE encoder since the NW knows, from the dataset, both the input and the output of the encoder.
  • This solution assumes that the CSI-RS based estimated downlink channel measured by the UE, i.e. , the input to the AE encoder, can be well approximated by the uplink channel measured by the NW using the SRSs.
  • the UE vendor may implement a proprietary mapping (e.g., an NN) from the channel measurements on its receive antenna ports (e.g. the CSI-RS-based channel estimate) to a standardized channel feature space.
  • the standardized channel feature space may be a latent representation of the channel designed using, for example, DFT basis vectors.
  • model aggregation One solution to address the challenge of operating multiple encoders on the UE side, or decoders on the NW side, as result of their proprietary parts, may be to use a model aggregation approach.
  • model aggregation individually trained models are used in combination to provide better predictions.
  • An example of a model aggregation approach is ensemble learning techniques such as boosting and bagging, which use multiple algorithms to obtain a better prediction than when using one of such algorithms in isolation.
  • One of their main disadvantages is that they are computationally expensive, as multiple iterations or parallel executions of different algorithms are required.
  • the NW may have to deploy and maintain many different decoders - potentially one for each UE encoder.
  • a UE may have to train and maintain multiple encoders, to be able to provide compression of CSI in a NW-vendor compatible way, for the potentially many vendors a UE will encounter throughout its lifetime.
  • supporting many UE encoder or NW decoder models may result in excessive training and model management costs (e.g., computational and power consumption-related costs), especially in no-stationary network where the distribution (second order statistics) of the channel changes and therefore every of those models may need to be retrained.
  • an object of embodiments herein may be to obviate some of the above-mentioned problems. Specifically, an object of embodiments herein may be to train CSI AE-encoders for a multi-vendor environment.
  • the object is achieved by a method, performed by a first node for training a first Neural Network, NN, -based encoder or decoder of a system of two or more encoders or decoders.
  • the method comprises:
  • the method comprises receiving, from a second node configured for training a second encoder or decoder, a proposal for a common loss calculation method for training the two or more encoders or decoders and a proposal for a set of common NN architecture parameters.
  • the method further comprises determining a common loss calculation method for training the two or more encoders or decoders based on the received proposal for the common loss calculation method.
  • the method further comprises determining a set of common NN architecture parameters for training the two or more encoders or decoders based on the received proposal for the set of common NN architecture parameters.
  • the method further comprises training the first encoder or decoder based on the common loss calculation method, the set of common NN architecture parameters, first channel data and first encoded channel data.
  • the method further comprises providing a first set of trained values of common trainable decoder parameters to a third node.
  • the method further comprises receiving, from the third node, common updated values of the common trainable decoder parameters.
  • the object is achieved by a first node.
  • the first node is configured to perform the method according to the first aspect above.
  • the object is achieved by a method, performed by a second node for training a second NN-based encoder or decoder of a system of two or more NN-based encoders or decoders to encode Channel State Information CSI or to decode the encoded CSI associated with a wireless channel between a wireless communications device and a radio access node in a wireless communications network.
  • the method comprises transmitting, to a first node configured for training a first encoder or decoder of the two or more encoders or decoders, a proposal for a common loss calculation method to be used for training each of the two or more encoders or decoders and a proposal for a set of common NN architecture parameters for training each of the two or more encoders or decoders.
  • the method further comprises receiving, from the first node, the common loss calculation method to be used for training each of the two or more encoders or decoders and the set of common NN architecture parameters for training each of the two or more encoders or decoders.
  • the method further comprises training the second encoder or decoder based on the common loss calculation method, and the set of common NN architecture parameters, second channel data and second encoded channel data.
  • the method further comprises obtaining, based on training the second encoder or decoder, a second set of trained values of the common trainable decoder parameters.
  • the method further comprises providing the second set of trained values of the common trainable encoder or decoder parameters to a third node.
  • the method further comprises receiving, from the third node, common updated values of the common trainable encoder or decoder parameters.
  • the object is achieved by a second node.
  • the second node is configured to perform the method according to the third aspect above.
  • the object is achieved by a method, performed by a third node for training a system of two or more NN-based decoders or encoders to encode Channel State Information CSI or to decode the encoded CSI associated with a wireless channel between a wireless communications device and a radio access node in a wireless communications network.
  • the method comprises receiving a first set of trained values of common trainable encoder or decoder parameters from a first node configured for training a first encoder or decoder of the system of two or more decoders or encoders.
  • the method further comprises receiving a second set of trained values of common trainable encoder or decoder parameters from a second node configured for training a second encoder or decoder of the system of two or more decoders or encoders.
  • the method further comprises computing common updated values of the common trainable encoder or decoder parameters based on a distributed optimization algorithm and further based on the received first set of trained values and second set of trained values as input to the distributed optimization algorithm.
  • the method further comprises transmitting, to the first and second nodes, the computed common updated values of the common trainable decoder parameters.
  • the object is achieved by a third node.
  • the third node is configured to perform the method according to the fifth aspect above.
  • the object is achieved by a computer program comprising instructions, which when executed by a processor, causes the processor to perform actions according to any of the aspects above.
  • the object is achieved by a carrier comprising the computer program of the aspect above, wherein the carrier is one of an electronic signal, an optical signal, an electromagnetic signal, a magnetic signal, an electric signal, a radio signal, a microwave signal, or a computer-readable storage medium.
  • embodiments herein allow for creation of any one or more of: a global encoder model that may compress CSI signals for a LIE regardless of the NW decoder, e.g., regardless of the mobile network the LIE is attached to. a global decoder model that may decompress CSI signals for a NW regardless of the LIE encoder, e.g., regardless of the vendor of the radio chipset of the LIE that is attached to a specific NW cell.
  • an additional advantage is that chipset and/or network vendors do not share any sensitive data while training.
  • Figure 1 is a schematic block diagram illustrating embodiments of a wireless communications network.
  • Figure 2 illustrates an example transmission and reception chain for MU -Ml MO operations.
  • Figure 3 is a schematic block diagram illustrating CSI type II normal reporting mode.
  • Figure 4a is a schematic block diagram illustrating a fully connected (dense) AE.
  • Figure 4b is a schematic block diagram illustrating how an AE may be used for Al- based CSI reporting in NR during an inference phase.
  • Figure 4c is a schematic block diagram illustrating training of an AE.
  • Figure 4d is a schematic block diagram illustrating a network vendor training of an
  • Figure 5 is a schematic block diagram illustrating embodiments of a wireless communications network.
  • Figure 6 is a schematic block diagram illustrating a reference solution that addresses issues of proprietary decoder/encoder during training.
  • Figure 7a is a schematic block diagram illustrating a system of nodes for decoder training according to embodiments herein.
  • Figure 7b is a schematic block diagram illustrating a system of nodes for decoder training according to embodiments herein.
  • Figure 7c is a schematic block diagram illustrating a system of nodes for encoder training according to embodiments herein.
  • Figure 7d is a schematic block diagram illustrating a system of nodes for encoder training according to embodiments herein.
  • Figure 8a is a flow chart illustrating a method performed by a first node for encoder training according to embodiments herein.
  • Figure 8b is a flow chart illustrating a further method performed by a first node for encoder or decoder training according to embodiments herein.
  • Figure 8c is a flow chart illustrating a method performed by a second node for encoder or decoder training according to embodiments herein.
  • Figure 8d is a flow chart illustrating a further method performed by a third node for encoder or decoder training according to embodiments herein.
  • Figure 9a is a signaling diagram illustrating a method for decoder training according to embodiments herein.
  • Figure 9b is a signaling diagram illustrating a first part of a method for decoder training according to embodiments herein.
  • Figure 9c is a signaling diagram illustrating a second part of a method for decoder training according to embodiments herein.
  • Figure 9d is a signaling diagram illustrating a method for decoder training according to embodiments herein.
  • Figure 10 is a schematic block diagram illustrating a first node according to embodiments herein.
  • Figure 11 is a schematic block diagram illustrating a second node according to embodiments herein.
  • Figure 12 is a schematic block diagram illustrating a third node according to embodiments herein.
  • Figure 13 schematically illustrates a telecommunication network connected via an intermediate network to a host computer.
  • Figure 14 is a generalized block diagram of a host computer communicating via a base station with a user equipment over a partially wireless connection.
  • Figures 15-18 are flowcharts illustrating methods implemented in a communication system including a host computer, a base station and a user equipment.
  • Al-based CSI reporting in wireless communication networks may be improved in several ways.
  • An object of embodiments herein is therefore to improve Al-based CSI reporting in wireless communication networks.
  • Embodiments herein relate to wireless communication networks in general.
  • Figure 5 is a schematic overview depicting a wireless communications network 100 wherein embodiments herein may be implemented.
  • the wireless communications network 100 comprises one or more RANs and one or more CNs.
  • the wireless communications network 100 may use a number of different technologies, such as Wi-Fi, Long Term Evolution (LTE), LTE-Advanced, 5G, New Radio (NR), Wideband Code Division Multiple Access (WCDMA), Global System for Mobile communications/enhanced Data rate for GSM Evolution (GSM/EDGE), Worldwide Interoperability for Microwave Access (WiMax), or Ultra Mobile Broadband (UMB), just to mention a few possible implementations.
  • LTE Long Term Evolution
  • NR New Radio
  • WCDMA Wideband Code Division Multiple Access
  • GSM/EDGE Global System for Mobile communications/enhanced Data rate for GSM Evolution
  • WiMax Worldwide Interoperability for Microwave Access
  • UMB Ultra Mobile Broadband
  • Embodiments herein relate to
  • Network nodes such as radio access nodes, operate in the wireless communications network 100.
  • Figure 5 illustrates a radio access node 111.
  • the radio access node 111 provides radio coverage over a geographical area, a service area referred to as a cell 115, which may also be referred to as a beam or a beam group of a first radio access technology (RAT), such as 5G, LTE, Wi-Fi or similar.
  • the radio access node 111 may be a NR-RAN node, transmission and reception point e.g. a base station, a radio access node such as a Wireless Local Area Network (WLAN) access point or an Access Point Station (AP STA), an access controller, a base station, e.g.
  • WLAN Wireless Local Area Network
  • AP STA Access Point Station
  • a radio base station such as a NodeB, an evolved Node B (eNB, eNode B), a gNB, a base transceiver station, a radio remote unit, an Access Point Base Station, a base station router, a transmission arrangement of a radio base station, a stand-alone access point or any other network unit capable of communicating with a wireless device within the service area depending e.g. on the radio access technology and terminology used.
  • the respective radio access node 111 may be referred to as a serving radio access node and communicates with a UE with Downlink (DL) transmissions on a DL channel 123-DL to the UE and Uplink (UL) transmissions on an UL channel 123-UL from the UE.
  • DL Downlink
  • UL Uplink
  • a number of wireless communications devices operate in the wireless communication network 100, such as a UE 121.
  • the UE 121 may be a mobile station, a non-access point (non-AP) STA, a STA, a user equipment and/or a wireless terminals, that communicate via one or more Access Networks (AN), e.g. RAN, e.g. via the radio access node 111 to one or more core networks (CN) e.g. comprising a CN node 130, for example comprising an Access Management Function (AMF).
  • AN Access Networks
  • CN core networks
  • AMF Access Management Function
  • UE is a non-limiting term which means any terminal, wireless communication terminal, user equipment, Machine Type Communication (MTC) device, Device to Device (D2D) terminal, or node e.g. smart phone, laptop, mobile phone, sensor, relay, mobile tablets or even a small base station communicating within a cell.
  • MTC Machine Type Communication
  • D2D Device to Device
  • FIG. 6 illustrates a first node 601 comprising a Neural Network, NN, -based Auto Encoder, AE, -encoder 601-1.
  • the first node 601 may also be referred to as a training apparatus.
  • the first node 601 is configured for training the AE-encoder 601-1 in a training phase of the AE-encoder 601-1.
  • the AE-encoder 601-1 is trained to provide encoded CSI from a first communications node, such as the UE 121 , to a second communications node, such as the radio access node 111 , over a communications channel, such as the UL channel 123-LIL, in a communications network, such as the wireless communications network 100.
  • the CSI is provided in an operational phase of the AE-encoder wherein the AE-encoder 601-1 is comprised in the first communications node 121.
  • Figure 6 further illustrates a second node 602 comprising an NN-based AE- decoder 602-1 and having access to the channel data.
  • the second node 602 may provide a network-controlled training service for AE-encoders to be deployed in the first communications node 121, such as a UE.
  • the NN-based AE-decoder 602-1 may comprise a same number of input nodes as a number of output nodes of the AE-encoder
  • the first node 601 may have access to one or more trained NN-based AE-encoder models for encoding the CSI.
  • the second node 602 may have access to one or more trained NN-based AE-decoder models for decoding the encoded CSI provided by the first node 602.
  • the implementation of the AE-decoder 602-1 may not be fully known to the first node 601.
  • the implementation of the AE-decoder 602-1 may be proprietary to the vendor of a certain base station.
  • the implementation of the AE-decoder excluding the encoder-decoder interface may not be known to the first node 601.
  • Figure 6 further illustrates a further node 603 comprising a channel data base
  • the channel database 603-1 may be a channel data source.
  • each node 601, 602, 603 may be implemented as a Distributed Node (DN) and functionality, e.g. comprised in a cloud 140 as shown in Figure 6, may be used for performing or partly performing the methods. There may be a respective cloud for each node.
  • Figure 6 may also be seen as an illustration of an embodiment of a training interface between the second node 602 providing the network-controlled training service and the UE/chipset-vendor training apparatus 601. In other words, Figure 6 illustrates a standardized development domain training interface that enables UE/chipset vendors and NW vendors to jointly train a UE encoder together with a NW decoder, without exposing proprietary aspects of the encoders and decoders.
  • the reference solution of Figure 6 is designed to allow the UE side to be able to train the encoder without knowledge of the NW side’s decoder loss function, output, or internal gradients of the decoder.
  • the decoder may therefore be proprietary, without having to disclose the loss function or the architecture of the network, except for the input layer, that may be standardized, e.g., by 3GPP.
  • the reference solution allows for fully proprietary development of encoders, i.e., they do not need to be standardized or their architecture shared with the NW. However, there may still be room for improvements with respect to compute, storage and power concerns on the NW and UE side as a result of training multiple encoders or multiple decoders.
  • a multi-vendor training setup may consist of a channel data service and a NW- decoder training service:
  • the channel data service provides training, validation, and test channel data.
  • the NW-vendor controlled training service provides a solution for UE/chipset vendors (e.g., research and/or development labs) to train candidate UE encoders against the NWs pre-trained decoders.
  • Details of the second node 602 and/or the network-controlled training service such as decoder architecture, trainable parameters, a reconstructed channel H, a loss function, and a method to compute gradients may be transparent to the UE/chipset-vendor training apparatus 601. Instead, UE/chipset-vendor training apparatus 601 is provided with the gradients of the input to the decoder.
  • Embodiments herein solve the above problems by introducing solutions for training universal encoders and/or decoders. More specifically, embodiments herein are directed to
  • the UE/chipset and NW vendors may agree on
  • the one or more common loss functions may be approximately the same.
  • different regularizers such as L1 and L2 may be used to generate additional loss so that loss functions produce approximately the same output, given a set of training data.
  • the approximation is defined here by a scalar; a threshold which, when comparing output of loss functions on the same dataset should not be crossed. For example, assuming two loss functions the polynomials fitting the loss curve, indicating loss reduction rate, may be computed. Comparing those polynomials using e.g., a vector similarity approach on the coefficients of the indeterminates could be one approach to identify how similar they are.
  • a result of the comparison may be compared to the threshold, and if the result satisfies a threshold condition, such as if it is equal or below the threshold, that means that the regularizers were set to correct values.
  • the second action performs federated learning between encoder and/or decoder models (possibly from different vendors) to train universal encoders and/or decoders.
  • the federated learning algorithms need not be applied to the whole encoder I decoder models - they can be applied to identified “common parts” leaving room for UE/chipset and NW vendor differentiation.
  • vendors may first align their individual model architectures and loss function, as well as other hyperparameters of the training process.
  • Embodiments herein allows for communication between UE/chipset vendors and NW vendors to prepare (offline or online) training infrastructure for distributed training of CSI encoders and decoders via federated learning.
  • the embodiments may be implemented or illustrated, at least in part, with an interface which may be standardized.
  • Embodiments herein disclose a method for federated learning of a global encoder or decoder model for CSI compression or reconstruction, from multiple UE or NW vendors, wherein the encoder or decoder is trained in a distributed manner.
  • Federated learning techniques such as federated averaging train algorithms in a distributed manner, across different nodes known as workers. Training is done in an iterative way, wherein in every global iteration, known as “epoch”, workers train a local model with their own data, then submit the trained model weights to a server. The server aggregates the weights to generate a global model, which is sent to workers for another round of training (i.e., another global iteration, or epoch). Over time, the global model will incorporate learnings from different worker models.
  • federated averaging FedAvg
  • Privacy-preserving extensions such as secure aggregation may also be used to protect the transfer of model weighs between workers and server.
  • Embodiments herein allow for creation of either/or: a global encoder model that may compress CSI signals for a UE regardless of the NW decoder, e.g., regardless of the mobile network the UE is attached to. a global decoder model that may decompress CSI signals for a NW regardless of the UE encoder, e.g., regardless of the vendor of the radio chipset of the UE that is attached to a specific NW cell.
  • an additional advantage is that chipset and/or network vendors do not share any sensitive data while training.
  • Embodiments herein may comprise one or more systems of nodes according to Figure 6 above.
  • a system of nodes according to embodiments herein may comprise the following components: - A Channel Data Service (CDS), which provides input to the encoder 601-1.
  • the CDS corresponds to the channel data base 603-1 of the further node 603.
  • this input contains features indicating an estimated channel quality.
  • the estimated channel quality may, as previously discussed, be a 3- dimensional tensor where dimensions correspond to Tx antenna ports of a gNB, Rx antenna ports of an UE and frequency (either divided in subcarriers or subbands).
  • the encoder 601-1 which compresses the 3-dimensional tensor of the CDS to a compressed representation known as codeword.
  • the CDS 603-1 and encoder 601-1 may coexist on the UE-chipset vendor side.
  • the decoder 602-1 which reconstructs the estimated channel quality from the codeword.
  • the decoder 602-1 exists on the NW vendor side.
  • Figure 7a illustrate a system of nodes 700 in which embodiments herein may be implemented.
  • Figure 7a illustrates a first node 701 configured for training a first encoder 701-1 and a second node 702 configured for training a second encoder 702-1.
  • Figure 7a further illustrates a further first node 711 configured for training a first decoder 711-1 and a further second node 712 configured for training a second decoder 712-1.
  • first node 701 and the further first node 711 will assume a role of a driver node which may initiate and/or coordinate the training of multiple decoders or encoders.
  • Figure 7a further illustrate latent space Yi, .... YK, channel data Hi, ... HK, reconstructed channel data Ai, ... HK, loss function Li, ... LK, GI, ... GK gradients of the loss with respect to the trainable parameters Pi , ... PK, of the encoders and 0i , ... 0K of the decoders.
  • multiple NW vendors 1, ... K train their decoders 711-1, 712-1 using data from different encoders 701-1, 702-1 from different multiple UE vendors 1, ... K.
  • Every NW vendor training a decoder using data provided from a local CDS 701-2, 702-2 and encoder output, may be considered a “worker”.
  • the server aggregates all worker-provided model weights to a global model, in this case a global decoder.
  • the role of the “server” may be assumed by a NW vendor, as illustrated in Figure 7a, but may also be external to the NW vendor, e.g., a third party cloud service.
  • the server periodically collects the weights of locally trained decoders and aggregates them, e.g., using a federated learning algorithm to a global model. This embodiment may specifically apply in cases where the decoder is to be deployed to all participating network vendors.
  • a single network vendor operates all the participating decoders.
  • This network vendor establishes training sessions with multiple UE vendors and also assumes the role of the “server”.
  • This embodiment may be suited for cases where the decoders are exclusively deployed by a single network vendor.
  • the network vendor may need to negotiate with the different chip set vendors about loss function and model.
  • multiple UE chipset vendors train their encoders using output data from different NW decoders from multiple network vendors.
  • this embodiment may be suited in cases where the encoder is to be deployed at all participating UE vendors.
  • a single UE chipset vendor trains its multiple encoders using output data from different respective NW decoders.
  • this approach may be suited in cases where the encoder is to be exclusively deployed by the single UE chipset vendor.
  • encoders receive gradients which in turn may be used to update (e.g., train) its weights or alternatively the encoders may remain static, i.e. not updated during the process. It may be up to chipset vendors to apply and accept the changes to their encoders.
  • decoders may be updated or not during training of encoders.
  • a chipset vendor may train two separate encoders for two network vendors.
  • an aggregation of local training results may happen at multiple levels.
  • an encoder 2 for the chipset vendor is determined, based on federation of training results from two encoders 2.1, 2.2 trained with decoders 2.1, 2.2 of different network vendors.
  • a global encoder for different chipset vendors is established based on the training of the decoders 1, 2 at the first level.
  • a fifth embodiment instead of training one common neural network, such as encoder or decoder, that works with all corresponding neural networks, such as decoders or encoders, different encoders for different NW decoders or different decoders for different encoders may be created but they share parts of their neural networks.
  • layers of the neural network may be marked either as a common or a dedicated layer.
  • Workers send trained weights of local models that belong only to common layers to the server, keeping weights of dedicated layers only for itself.
  • the server may do federated averaging of the weights and then sends the results back to the workers which update their common layers accordingly.
  • the storage in memory of such encoders may be minimized by keeping only one copy of weights of common layers while only weights of dedicated layers need to be stored for each individual encoder.
  • the encoder information may be stored for example in a volatile memory of the UE (e.g., RAM) or in a non-volatile memory (e.g., EEPROM).
  • a UE also need less time to switch between different encoders because only weights of dedicated layers need to be switched.
  • the choice which layers to make dedicated versus common may be made empirically based on ability of the layers to provide maximum customization of encoder or decoder network with respect to amount of data associated with storing the weights of the layers.
  • Good candidates for such layers may be convolutional layers, since weights of convolutional layers are used generally multiple times (depending on network configuration they may be used thousands of times) during inference.
  • weights of fully connected layers are used only once during inference. So convolutional layers compared to fully connected layers have better potential to have bigger relative impact to customize performance of the encoder or the decoder relative to number of weights in the layers.
  • the flow charts illustrate a method, performed by a first node 701, 711, for training a first Neural Network, NN, -based encoder 701-1 or decoder 711-1 of a system of two or more NN-based encoders 701-1, 702-1 or decoders 711-1, 712-1 to encode Channel State Information CSI or decode the encoded CSI associated with the wireless channel between the wireless communications device 121 and the radio access node 111 in the wireless communications network 100.
  • the method is either for training the first NN-based encoder 701-1 to encode CSI, or for training the first NN-based decoder 711-1 to decode the encoded CSI.
  • the CSI is associated with the wireless channel between the wireless communications device 121 and the radio access node 111 in the wireless communications network 100.
  • the first node initializes the training process by sending a suggested-parameters-request message to all worker nodes.
  • the suggested-parameters- request message is a message comprising or indicating a request to respond with suggested NN architecture parameters.
  • the first node 701, 711 may receive, from the second node, a proposal for a common type of encoder or decoder training, wherein the type of encoder or decoder training comprises: a) a first type in which the two or more encoders train together with respective two or more trainable decoders for which the decoder trainable parameters are to be trained or in which the two or more decoders train together with respective two or more trainable encoders for which the encoder trainable parameters are to be trained, and b) a second type in which the two or more encoders train together with respective two or more fixed decoders for which the decoder trainable parameters are not to be trained or in which the two or more decoders train together with respective two or more fixed encoders for which the encoder trainable parameters are not to be trained.
  • the first node 701, 711 may determine the common type of encoder or decoder training based on at least the proposal for the common type of encoder or decode
  • the first node 701 , 711 receives, from the second node 712 configured for training the second encoder or decoder of the two or more encoders or decoders, a proposal for a common loss calculation method to be used for training each of the two or more encoders or decoders and a proposal for a set of common NN architecture parameters for training each of the two or more encoders or decoders.
  • NN architecture parameters may for example be NN weights and biases.
  • the common NN architecture parameters may be associated with one or more common NN layers of the first encoder and the second encoder, or of the first decoder and the second decoder.
  • the common NN layers is a subset of the NN layers of the first encoder or the first decoder and/or a subset of the NN layers of the second encoder or the second decoder.
  • the common NN layers may be a subset of the NN layers of the first decoder and the second decoder.
  • the common NN layers may be a subset of the NN layers of the first encoder and the second encoder.
  • the at least one common NN layer may be a convolutional layer.
  • the first node may receive from all worker nodes a suggested-parameters- response message with proposals on training process and NN model-architecture parameters.
  • the suggested-parameters-response message may contain a description about the loss function used during the training at the worker node.
  • the model architecture may contain a description of the number, size and type of layers used, including input, output, and hidden layers.
  • the first node 701 , 711 determines a common loss calculation method to be used for training each of the two or more encoders or decoders based on the received proposal for the common loss calculation method.
  • the first node 701 , 711 determines a set of common NN architecture parameters for training each of the two or more encoders or decoders based on the received proposal for the set of common NN architecture parameters.
  • the first node may select a set of training process parameters and a NN model architecture using a selection process and may send them as suggested- parameters-selection to all worker nodes (in action 804 below). The selection may be done from the received suggestions.
  • the training process parameters may include learning rate, size of step in gradient and batch size.
  • the suggested-parameters-selection message may contain a selection of the loss function.
  • the loss function may be determined by means of majority from suggested- parameters-response, or relative vendor importance, as stored in the first node a priori to training process.
  • the suggested-parameters-selection message may contain a mapping function relating a worker node to a weight which weight the other parameters, the gradient or updated NN weights.
  • the mapping function may be determined by means of majority from suggested-parameters-response, or relative vendor importance.
  • the suggested-parameters-selection message may contain a description of the selected model architecture.
  • the description may contain either a complete or a partial part of the original suggested-parameters-response message in terms of number of NN layers and NN layer position which may be either “front” or “back” of the model.
  • front of the model is meant one or more layers in the beginning of the model and by “back” is meant one or more layers at the end of the model.
  • the first node 701 , 711 may transmit, to the second node, the common loss calculation method to be used for training each of the two or more encoders or decoders and the set of common NN architecture parameters for training each of the two or more NN-based encoder or decoders.
  • the worker nodes may respond to the first node in a suggested-parameters- selection-response message, either with a full or partial acknowledgement of the training parameters and NN model architecture.
  • the first node may initialize a federated learning session between the worker nodes and the first node acting as server executing a federated learning algorithm.
  • the first node 701 , 711 trains the first encoder or decoder based on the common loss calculation method, the set of common NN architecture parameters, first channel data and encoded first channel data. In some embodiments the first node 701, 711 trains the first encoder or the first decoder further based on the determined common type of encoder or decoder training.
  • a first payload of the first encoded channel data may be of the same size as a second payload of a second encoded channel data used for training of the second encoder or the second decoder.
  • a size of the payload may refer to a format or shape of the payload.
  • the first payload of the first encoded channel data may be of the same format or shape as the second payload of the second encoded channel data.
  • the first payload of the first encoded channel data may be of the same data type as the second payload of the second encoded channel data.
  • the first node 701 may train the first encoder 701-1 by making the first encoder 701-1 encode the first channel data based on the set of common NN architecture parameters.
  • the encoded first channel data is sent to the first decoder 711-1 in the further first node 711 which decodes the encoded first channel data.
  • a loss is calculated based on the common loss calculation method and the first channel data and decoded first channel data.
  • the loss function may take as inputs the first channel data and decoded channel data.
  • the decoded channel data corresponds to decoded encoded channel data.
  • the first decoder 711-1 may decode the encoded first channel data.
  • the loss function may compare the original channel data (ground truth) to the decoded channel data to compute a loss.
  • Gradients of the loss with respect to the common NN architecture parameters may also be calculated.
  • the loss and the gradients may be used to update the common NN architecture parameters for the first encoder 701-1.
  • the loss and the gradients may be used to update the common NN architecture parameters for the first decoder 711-1.
  • the common loss calculation method may be based on any of the following: a common loss function, or different loss functions together with L1 and/or L2 regularizers or methods for maintaining custom decoder per encoder such as personalized federated learning which will be described below in relation to Figure 9d.
  • the first node 701 , 711 may obtain a first initial set of values of common trainable encoder or decoder parameters.
  • the first node 701, 711 may further obtain the first (uncompressed) channel data.
  • the channel data may be obtained as input for calculating a first loss value based on the common loss calculation method.
  • the first node 701, 711 may further obtain encoded first channel data, e.g., as output from the first encoder 701-1.
  • the encoded first channel data may also be referred to as compressed first channel data.
  • the encoded first channel data may be used as input to the first decoder 711-1.
  • the first node 701 , 711 obtains based on training the first encoder or decoder, a first set of trained values of common trainable encoder or decoder parameters.
  • the first node 701 , 711 provides the first set of trained values of the common trainable encoder or decoder parameters to the third node.
  • the first node 701 , 711 receives, from the third node, common updated values of the common trainable encoder or decoder parameters.
  • the first node 701, 711 may then re-train the first decoder based on the received common updated values of the common trainable decoder parameters, the common loss calculation method and the set of common NN architecture parameters.
  • the re-training may be repeated, e.g., until a required level of performance has been achieved.
  • the performance may be measured by the loss.
  • Each training loop may be called an "epoch". There may be many epochs. The number of epochs may either be preconfigured or based on the performance of the global model (e.g., using an accuracy metric). An epoch may be determined (negotiated) as an NN parameter (or hyper-parameter).
  • the first node is configured to perform the method of Figure 8a or Figure 8b based on an availability of the first node in terms of current load.
  • the role of the first node 701, 711 may be assigned at random, pre-agreed between NW vendors, or may be assigned based on a number of objective factors, such as the availability of each gNB, e.g., in terms of current load or resources (compute, store, or network resources).
  • the current load may be expressed as a combination or as one of available throughput on the uplink and downlink interface, number of attached UEs in active and idle state, voltage ripples in power supply unit etc.
  • the driver may alternatively be selected based on level of authority (e.g., “tier-1” operator). In case of a single vendor, selection defaults to the single vendor itself.
  • level of authority e.g., “tier-1” operator
  • the driver may be an entity that coordinates the federation and performs the aggregation.
  • the driver may be external to the nodes comprising the encoders or decoders. However, the driver may also be an internal driver.
  • the above method will now be described from the network vendor side, or in other words from the decoder side.
  • the method is performed by the first node 711.
  • the method is for training the first NN-based decoder 711-1 of the system of two or more NN-based decoders 711-1 , 712-1 to decode encoded Channel State Information CSI associated with the wireless channel between the wireless communications device 121 and the radio access node 111 in the wireless communications network 100.
  • the method comprises:
  • the above method will now be described from the UE vendor side, or in other words from the encoder side.
  • the method is performed by the first node 701.
  • the method is for training the first NN-based encoder 701-1 of the system of two or more NN-based encoders 701-1 , 702-1 to encode Channel State Information CSI associated with the wireless channel between the wireless communications device 121 and the radio access node 111 in the wireless communications network 100.
  • the method comprises: receiving, from the second node 702 configured for training the second encoder of the two or more NN-based encoders, the proposal for the common loss calculation method to be used for training each of the two or more NN-based encoders and the proposal for the set of common NN architecture parameters for training each of the two or more NN-based encoders; determining the common loss calculation method to be used for training each of the two or more NN-based encoders based on the received proposal for the common loss calculation method; determining the set of common NN architecture parameters for training each of the two or more NN-based encoders based on the received proposal for the set of common NN architecture parameters; training the first encoder based on the common loss calculation method, the set of common NN architecture parameters, first channel data and first encoded channel data; obtaining, based on training the first encoder, the first set of trained values of common trainable encoder parameters; providing the first set of trained values of the common trainable encoder parameters to the third node; and receiving, from the third node
  • the flow chart illustrates a method, performed by the second node 702, 712, for training the second NN-based encoder 702-1 or decoder 712-1 of the system 700 of two or more NN-based encoders 701-1 , 702-1 or decoders 711-1 , 712-1 to encode Channel State Information CSI or to decode the encoded CSI associated with the wireless channel between the wireless communications device 121 and the radio access node 111 in the wireless communications network 100.
  • the second node 702, 712 transmits, to the first node 701 , 711 configured for training the first encoder or decoder of the two or more encoders or decoders, the proposal for the common loss calculation method to be used for training each of the two or more encoders or decoders and the proposal for the set of common NN architecture parameters for training each of the two or more encoders or decoders.
  • the second node 702, 712 receives, from the first node, the common loss calculation method to be used for training each of the two or more encoders or decoders and the set of common NN architecture parameters for training each of the two or more encoders or decoders.
  • the second node 702, 712 trains the second encoder or decoder based on the common loss calculation method, and the set of common NN architecture parameters, second channel data and encoded second channel data.
  • the second channel data may differ from first channel data used for training of the first decoder.
  • the second node 702 may obtain a second initial set of values of common trainable encoder or decoder parameters.
  • the second node 701, 711 may further obtain the second (uncompressed) channel data.
  • the channel data may be obtained as input for calculating a second loss value based on the common loss calculation method.
  • the second node 702, 712 may further obtain encoded second channel data, e.g., as output from the second encoder 702-1.
  • the second node 702 may train the second encoder 702-1 by making the second encoder 702-1 encode the second channel data based on the set of common NN architecture parameters.
  • the encoded second channel data is sent to the second decoder 712-1 in the further second node 712 which decodes the encoded second channel data.
  • a loss is calculated based on the common loss calculation method and the second channel data and decoded second channel data.
  • the loss function may take as inputs the second channel data and decoded channel data.
  • the decoded channel data corresponds to decoded encoded channel data.
  • the second decoder 712-1 may decode the encoded second channel data.
  • the loss function may compare the original channel data (ground truth) to the decoded channel data to compute a loss.
  • Gradients of the loss with respect to the common NN architecture parameters may also be calculated.
  • the loss and the gradients may be used to update the common NN architecture parameters for the second encoder 702-1.
  • the loss and the gradients may be used to update the common NN architecture parameters for the second decoder 712-1.
  • the second node 702, 712 obtains, based on training the second encoder or decoder, a second set of trained values of the common trainable encoder or decoder parameters.
  • the second node 702, 712 provides the second set of trained values of the common trainable encoder or decoder parameters to the third node 703, 713.
  • the second node 702, 712 receives, from the third node 713, common updated values of the common trainable encoder or decoder parameters.
  • the second node 702, 712 may then re-train the second decoder based on the received common updated values of the common trainable decoder parameters, the common loss calculation method and the set of common NN architecture parameters.
  • the re-training may be repeated, e.g., until a required level of performance has been achieved.
  • the performance may be measured by the loss.
  • Each training loop may be called an "epoch". There may be many epochs. The number of epochs may either be preconfigured or based on the performance of the global model (e.g., using an accuracy metric).
  • the flow chart illustrates a method, performed by the third node 703, 713, for training the system 700 of two or more NN-based decoders
  • the third node 703, 713 receives the first set of trained values of common trainable encoder or decoder parameters from the first node 701, 711 configured for training the first encoder or decoder of the system 700 of two or more decoders 711-1 ,
  • the third node 703, 713 receives the second set of trained values of common trainable encoder or decoder parameters from the second node 702, 712 configured for training the second encoder 702-1 or decoder 712-1 of the system 700 of two or more decoders 711-1, 712-1 or encoders 701-1 , 702-1.
  • the third node 703, 713 computes common updated values of the common trainable encoder or decoder parameters based on a distributed optimization algorithm and further based on the received first set of trained values and second set of trained values as input to the distributed optimization algorithm.
  • the distributed optimization algorithms may be one of the following: federated averaging, federated weighed averaging, federated stochastic gradient descent, or federated learning with dynamic regularization.
  • the federated learning algorithm may be replaced with any other distributed optimization algorithms that may solve a global optimization problem using a set of computational Workers without the need of collecting their private datasets, for example distributed stochastic gradient descent.
  • the third node 703, 713 transmits, to the first and second nodes 701 , 702, 711, 712, the computed common updated values of the common trainable decoder parameters.
  • Figures 9a and 9b illustrate the flow for a global decoder creation in a single NW vendor and a multi NW vendor embodiment respectively. Similar processes exist for creation of the global encoder.
  • the process may be split in a training phase and in an operational phase.
  • the training phase the federated autoencoder is generated, while in the operational phase the decoder in the gNB is used to be able to reproduce an original input from the latent representation (or latent space) that is sent from the UE.
  • UE1 and UE2 of Figure 9a each represents a node hosted on the chipset vendors administrative domain/cloud infrastructure and gNB represents a corresponding further node on the vendors’ side.
  • the UEs of Figure 9a are responsible for training an encoder while the gNB of Figure 9a is responsible for training a decoder.
  • the encoder on the UE side is frozen and as such it is not retrained.
  • the decoder on the other side is retrained in a federated way to produce a global decoder that may work with all participating encoders originating from the different UE chipset vendors.
  • the CDS Channel Data Source
  • UE1 and UE2 the UE
  • gNB the gNB
  • Actions 4-16 of Figure 9a operate in a loop which is repeated for several rounds or iterations in a federated learning context.
  • the gNB produces specific decoders which specialize for each UE chipset vendor.
  • Actions 6- 15 relate to the training of each decoder. In this two decoders are trained sequentially. However, the training of these decoders may also take place in parallel.
  • the UE sends its latent space (latent_space_1 and latent_space_2) respectively for each encoder and the gNB's decoder computes a reproduction of a ground truth H1 A which is compared with a ground truth H A as provided by each batch.
  • the gNB-node Based on a given loss function (e.g,. MSE) the gNB-node computes a first reconstruction loss 11 and a second reconstruction loss I2 which are used then for a backpropagation process to retrain the decoders.
  • a given loss function e.g,. MSE
  • the gNB-node computes a first reconstruction loss 11 and a second reconstruction loss I2 which are used then for a backpropagation process to retrain the decoders.
  • action 16 at the end of each round the specialized decoders are averaged into one by computing the average of the weights of each layer of the decoders.
  • the global decoder produced previously is used to reproduce CSI data in action 18 from the input it receives in action 17 from each UE.
  • the reproduced CSI data is used tosetup a physical channel with the UE in action 19. Instead of receiving CSI data directly, it receives (action 17) an encoded representation of such data which is decoded (action 18) in the gNB using the decoder which was averaged in action 16.
  • Figure 9b and 9c illustrate an embodiment where a global decoder is trained from multiple NW vendors.
  • a preparation phase illustrated in Figure 9b takes place where:
  • One of the gNBs of the NW vendors assumes the role of the “driver”.
  • the role may be assigned at random, pre-agreed between NW vendors, or may be assigned based on a number of objective factors, such as the availability of each gNB (in terms of current load, said load expressed as combination or as one of available throughput on the uplink and downlink interface, number of attached UE in active and idle state, voltage ripples in power supply unit etc.).
  • the driver may start the method by sending a suggested-parameters-request message to other participating NW vendors.
  • the other participating NW vendors respond with a set of network parameters that may include: o
  • the loss function that may be encoded as a natural number, e.g., 1 corresponding to mean squared error, 2 for mean squared error logarithmic loss, 3 for mean absolute error loss, etc.
  • An identifier of the vendor may also be a natural number, e.g., 1 corresponding to a first network vendor, 2 to a second network vendor, 3 to a third network vendor, etc.
  • a description of the architecture of the model that may include:
  • a description of the input layer may include a description of the input in terms of structure, e.g., 1-dimensional list, 2-dimensional matrix, etc., but also in terms of datatype of input e.g., float32, float64, int32, etc..
  • a description of the hidden layers that may include the number of neurons, the type of layer, e.g., dense/fully connected, convolutional, etc.., as well as the activation function and connections of these neurons to the next layer.
  • a description of the output layer that may include a structure of output and datatype (similar to input layer description).
  • the driver selects a type of architecture to use for the decoder and a loss function.
  • selection may be based on majority e.g., what loss function most of the network vendors prefer, on relative priority between vendors (e.g., this priority may be calculated on the number of total cells each vendor has, or its coverage, etc.).
  • adjustment of individual lambda parameter for regularization coefficients added to loss functions such as L1 and L2 may be considered.
  • Regularization adds a penalty to an existing loss function, the significance of the penalty may be indicated by a lambda coefficient (L1 and L2). This means that in a way it is possible to control/shape the behaviour of the loss function.
  • the loss functions may be aligned to produce similar probability value distributions.
  • the loss curves produced by the loss function during training may have similar probability value distributions (in terms of type and parameter).
  • Such regularization terms may be signalled to different vendors.
  • the driver may provide recommendations, e.g., using dropout layers to model architectures.
  • Dropout layers indicate which neurons’ activation functions may be zeroed out, therefore changing the architecture of the network. It is conceivable that in some cases, use of different dropout layers on different architectures of decoders in individual NW vendors may lead the architectures of the decoders to eventually converge.
  • Figure 9d illustrates an embodiment for personalized federated learning.
  • Figure 9d describes a variation of the process described in Figure 9a where a personalized version of each decoder is maintained for each chipset.
  • the personalized version is obtained in actions 8 and 12 accordingly for the two different UE vendors in the example embodiment.
  • the goal of the solver may be to estimate a function u which is UE-specific and which when applied to the decoder output produces a result that is close to the specialized decoder (decoder_1 and decoder_2) and not as close to the global model (or decoder).
  • the function u personalizes the model. It acts as a filter to the model weights which when applied reduces the loss to the reconstruction loss of the input for the specific combination between encoder and decoder. In essence it acts against the global decoder but since it is only a parameter it is used in case the global decoder fails, e.g., when there is high Block Error Rate for the channel that is established using the global decoder.
  • Figure 9d - loop 8 and loop 12 determine a function that personalizes the decoder to the UE instead of the global decoder.
  • the gNB then has access to a global decoder and also to one or more functions that may be triggered in step 24 of Figure 9d if the global model underperforms.
  • the function is applied to the global model and personalizes it.
  • the same approach may be used to establish specialized models for specific environmental conditions such as location, line-of-sight etc.
  • the main prerequisite in both approaches is for the network vendor (denoted as gNB in the sequence diagram) to receive input from the UE when such a model is trained.
  • the network vendor denoteted as gNB in the sequence diagram
  • LIE/NW vendor it is possible to train for a specific UE but for different conditions e.g., location, line of sight and other conditions that are known to affect the physical channel.
  • Figure 10 shows an example of the first node 701 , 711 and Figure 11 shows an example of the second node 702, 712.
  • Figure 12 shows an example of the third node 703, 713.
  • the first node 701 , 711 may be configured to perform the method actions of Figure 8a and Figure 8b above.
  • the second node 702, 712 may be configured to perform the method actions of Figure 8c above.
  • the third node 703, 713 may be configured to perform the method actions of Figure 8d above.
  • the first node 701, 711, the second node 702, 712 and the third node 703, 713 may comprise the units illustrated in Figures 10 to 12 to perform the method actions above.
  • the first node 701, 711, the second node 702, 712 and the third node 703, 713 may each comprise a respective input and output interface, I/O, 1006, 1106, 1206 configured to communicate with each other.
  • the input and output interface may comprise a wireless receiver (not shown) and a wireless transmitter (not shown).
  • the first node 701, 711, the second node 702, 712 and the third node 703, 713 may each comprise a respective processing unit 1001, 1101, 1201 for performing the above method actions.
  • the respective processing unit 1001, 1101, 1201 may comprise further sub-units which will be described below.
  • the first node 701, 711, the second node 702, 712 and the third node 703, 713 may further comprise a respective receiving unit 1010, 1120, 1210 and a transmitting unit 1060, 1110, 1230 which may receive and transmit messages and/or signals.
  • the first node 701, 711 is further configured to receive, from the second node 702, 712, a proposal for a common loss calculation method to be used for training each of the two or more encoders 701-1, 702-1 or decoders 711-1 , 712-1 and a proposal for a set of common NN architecture parameters for training each of the two or more encoders 701-1, 702-1 or decoders 711-1, 712-1.
  • the second node 702, 712 is configured to, e.g. by the transmitting unit 1110 being configured to, transmit, a proposal for a common loss calculation method to be used for training each of the two or more encoders 701-1, 702-1 or decoders 711-1, 712-1 and a proposal for a set of common NN architecture parameters for training each of the two or more encoders 701-1, 702-1 or decoders 711-1 , 712-1.
  • the second node 702, 712 is further configured to, e.g. by the receiving unit 1120 being configured to, receive, from the first node 701, 711, the common loss calculation method and the set of common NN architecture parameters.
  • the third node 703, 713 is configured to, e.g. by the receiving unit 1210 being configured to, receive, the first set of trained values of common trainable encoder or decoder parameters from the first node 701, 711 and receive a second set of trained values of common trainable encoder or decoder parameters from the second node 702, 712.
  • the first node 701, 711 is further configured to determine the common loss calculation method based on the received proposal for the common loss calculation method, and configured to determine the set of common NN architecture parameters.
  • the respective receiving unit 1010, 1120 of the first node 701, 711 and the second node 702, 712 may be configured to receive, from the third node 703, 713, common updated values of the common trainable decoder parameters.
  • first node 701, 711 and the second node 702, 712 are each configured to receive, from the third node 703, 713, common updated values of the common trainable decoder parameters.
  • the first node 701, 711 may further comprise a determining unit 1020 which for example may determine the common loss calculation method to be used for training each of the two or more encoders or decoders based on the received proposal for the common loss calculation method, and/or determine the set of common NN architecture parameters for training each of the two or more encoders or decoders based on the received proposal for the set of common NN architecture parameters.
  • a determining unit 1020 may determine the common loss calculation method to be used for training each of the two or more encoders or decoders based on the received proposal for the common loss calculation method, and/or determine the set of common NN architecture parameters for training each of the two or more encoders or decoders based on the received proposal for the set of common NN architecture parameters.
  • the first node 701, 711 is further configured to train the first encoder or decoder based on the common loss calculation method, the set of common NN architecture parameters, first channel data and first encoded channel data which is based on the first channel data.
  • the second node 702, 712 is further configured to train the second encoder or decoder based on the common loss calculation method, and the set of common NN architecture parameters, second channel data and second encoded channel data.
  • the first node 701, 711 may further comprise a training unit 1030, to train the first encoder or decoder based on the common loss calculation method, the set of common NN architecture parameters, first channel data and first encoded channel data which is based on the first channel data.
  • the second node 702, 712 may further comprise a training unit 1130 to train the second encoder or decoder based on the common loss calculation method, the set of common NN architecture parameters, second channel data and second encoded channel data which is based on the second channel data.
  • the first node 701, 711 may further comprise a obtaining unit 1040 configured to obtain, based on training the first encoder or decoder, the first set of trained values of the common trainable encoder or decoder parameters.
  • the second node 702, 712 may further comprise an obtaining unit 1140 configured to obtain, based on training the second encoder or decoder, the second set of trained values of the common trainable encoder or decoder parameters.
  • the first node 701, 711 and the second node 702, 712 may further comprise a respective providing unit 1050, 1150, e.g., to provide the respective first and second sets of trained values of the common trainable decoder parameters to the third node.
  • the first node 701, 711 is further configured to receive, from the second node 702, 712, the proposal for the common type of encoder or decoder training.
  • the type of encoder or decoder training comprises: a a first type in which the two or more encoders 701-1 , 702-1 train together with respective two or more trainable decoders 711-1, 712-1 for which the decoder trainable parameters are to be trained or in which the two or more decoders 711-1, 712-1 train together with respective two or more trainable encoders 701-1, 702-1 for which the encoder trainable parameters are to be trained, and b a second type in which the two or more encoders 701-1, 702-1 train together with respective two or more fixed decoders 711-1, 712-1 for which the decoder trainable parameters are not to be trained or in which the two or more decoders 711-1, 712-1 train together with respective two or more fixed encoders 701-1 , 70
  • the first node 701, 711 may be further configured to train the first encoder 701-1 or the first decoder 711-1 further based on the determined common type of encoder or decoder training.
  • the first node 701, 711 may further be configured to transmit, to the second node 702, 712, the common loss calculation method to be used for training each of the two or more encoders 701-1, 702-1 or decoders 711-1, 712-1 and the set of common NN architecture parameters for training each of the two or more NN-based encoders or decoders 711-1, 712-1.
  • the first node 701, 711 is further configured to obtain a first initial set of values of common trainable encoder or decoder parameters. Then the first node 701, 711 may further be configured to obtain first channel data and to obtain first compressed channel data.
  • the second node 702, 712 is configured to obtain a second initial set of values of common trainable encoder or decoder parameters. Then the second node 702, 712 may be configured to obtain the encoded second channel data for calculating a second loss value based on the common loss calculation method and to obtain second compressed channel data.
  • the embodiments herein may be implemented through a respective processor or one or more processors, such as the respective processor 1004, 1104 and 1204, of a processing circuitry in the first node 701, 711, the second node 702, 712 and the third node 703, 713 together with computer program code for performing the functions and actions of the embodiments herein.
  • the program code mentioned above may also be provided as a computer program product, for instance in the form of a data carrier carrying computer program code for performing the embodiments herein when being loaded into the respective first node 701, 711, the second node 702, 712 and the third node 703, 713.
  • One such carrier may be in the form of a CD ROM disc. It is however feasible with other data carriers such as a memory stick.
  • the computer program code may furthermore be provided as pure program code on a server and downloaded to the respective first node 701, 711, the second node 702, 712 and the third node 703, 713.
  • the first node 701, 711, the second node 702, 712 and the third node 703, 713 may further comprise a respective memory 1002, 1102 and 1202 comprising one or more memory units.
  • the memory comprises instructions executable by the processor in the first node 701 , 711 , the second node 702, 712 and the third node 703, 713.
  • Each respective memory 1002, 1102 and 1202 is arranged to be used to store e.g. information, data, configurations, and applications to perform the methods herein when being executed in the respective first node 701, 711, the second node 702, 712 and the third node 703, 713.
  • a respective computer program 1003, 1103 and 1203 comprises instructions, which when executed by the at least one processor, cause the at least one processor of the respective first node 701 , 711, the second node 702, 712 and the third node 703, 713 to perform the actions above.
  • a respective carrier 1005, 1105 and 1205 comprises the respective computer program, wherein the carrier is one of an electronic signal, an optical signal, an electromagnetic signal, a magnetic signal, an electric signal, a radio signal, a microwave signal, or a computer-readable storage medium.
  • the units described above may refer to a combination of analog and digital circuits, and/or one or more processors configured with software and/or firmware, e.g. stored in the respective first node 701 , 711 , the second node 702, 712 and the third node 703, 713, that when executed by the respective one or more processors such as the processors described above.
  • processors as well as the other digital hardware, may be included in a single Application- Specific Integrated Circuitry (ASIC), or several processors and various digital hardware may be distributed among several separate components, whether individually packaged or assembled into a system-on-a-chip (SoC).
  • ASIC Application- Specific Integrated Circuitry
  • SoC system-on-a-chip
  • a communication system includes a telecommunication network 3210, such as a 3GPP-type cellular network, which comprises an access network 3211, such as a radio access network, and a core network 3214.
  • the access network 3211 comprises a plurality of base stations 3212a, 3212b, 3212c, such as the source and target access node 111, 112, AP STAs NBs, eNBs, gNBs or other types of wireless access points, each defining a corresponding coverage area 3213a, 3213b, 3213c.
  • Each base station 3212a, 3212b, 3212c is connectable to the core network 3214 over a wired or wireless connection 3215.
  • a first user equipment (UE) such as a Non-AP STA 3291 located in coverage area 3213c is configured to wirelessly connect to, or be paged by, the corresponding base station 3212c.
  • a second UE 3292 such as a Non-AP STA in coverage area 3213a is wirelessly connectable to the corresponding base station 3212a. While a plurality of UEs 3291, 3292 are illustrated in this example, the disclosed embodiments are equally applicable to a situation where a sole UE is in the coverage area or where a sole UE is connecting to the corresponding base station 3212.
  • the telecommunication network 3210 is itself connected to a host computer 3230, which may be embodied in the hardware and/or software of a standalone server, a cloud-implemented server, a distributed server or as processing resources in a server farm.
  • the host computer 3230 may be under the ownership or control of a service provider, or may be operated by the service provider or on behalf of the service provider.
  • the connections 3221 , 3222 between the telecommunication network 3210 and the host computer 3230 may extend directly from the core network 3214 to the host computer 3230 or may go via an optional intermediate network 3220.
  • the intermediate network 3220 may be one of, or a combination of more than one of, a public, private or hosted network; the intermediate network 3220, if any, may be a backbone network or the Internet; in particular, the intermediate network 3220 may comprise two or more subnetworks (not shown).
  • the communication system of Figure 13 as a whole enables connectivity between one of the connected UEs 3291 , 3292 such as e.g. the UE 121 , and the host computer 3230.
  • the connectivity may be described as an over-the-top (OTT) connection 3250.
  • the host computer 3230 and the connected UEs 3291 , 3292 are configured to communicate data and/or signaling via the OTT connection 3250, using the access network 3211 , the core network 3214, any intermediate network 3220 and possible further infrastructure (not shown) as intermediaries.
  • the OTT connection 3250 may be transparent in the sense that the participating communication devices through which the OTT connection 3250 passes are unaware of routing of uplink and downlink communications.
  • a base station 3212 may not or need not be informed about the past routing of an incoming downlink communication with data originating from a host computer 3230 to be forwarded (e.g., handed over) to a connected UE 3291. Similarly, the base station 3212 need not be aware of the future routing of an outgoing uplink communication originating from the UE 3291 towards the host computer 3230.
  • Example implementations, in accordance with an embodiment, of the UE, base station and host computer discussed in the preceding paragraphs will now be described with reference to Figure 14.
  • a host computer 3310 comprises hardware 3315 including a communication interface 3316 configured to set up and maintain a wired or wireless connection with an interface of a different communication device of the communication system 3300.
  • the host computer 3310 further comprises processing circuitry 3318, which may have storage and/or processing capabilities.
  • the processing circuitry 3318 may comprise one or more programmable processors, application-specific integrated circuits, field programmable gate arrays or combinations of these (not shown) adapted to execute instructions.
  • the host computer 3310 further comprises software 3311, which is stored in or accessible by the host computer 3310 and executable by the processing circuitry 3318.
  • the software 3311 includes a host application 3312.
  • the host application 3312 may be operable to provide a service to a remote user, such as a UE 3330 connecting via an OTT connection 3350 terminating at the UE 3330 and the host computer 3310. In providing the service to the remote user, the host application 3312 may provide user data which is transmitted using the OTT connection 3350.
  • the communication system 3300 further includes a base station 3320 provided in a telecommunication system and comprising hardware 3325 enabling it to communicate with the host computer 3310 and with the UE 3330.
  • the hardware 3325 may include a communication interface 3326 for setting up and maintaining a wired or wireless connection with an interface of a different communication device of the communication system 3300, as well as a radio interface 3327 for setting up and maintaining at least a wireless connection 3370 with a UE 3330 located in a coverage area (not shown in Figure 14) served by the base station 3320.
  • the communication interface 3326 may be configured to facilitate a connection 3360 to the host computer 3310.
  • connection 3360 may be direct or it may pass through a core network (not shown in Figure 14) of the telecommunication system and/or through one or more intermediate networks outside the telecommunication system.
  • the hardware 3325 of the base station 3320 further includes processing circuitry 3328, which may comprise one or more programmable processors, application-specific integrated circuits, field programmable gate arrays or combinations of these (not shown) adapted to execute instructions.
  • the base station 3320 further has software 3321 stored internally or accessible via an external connection.
  • the communication system 3300 further includes the UE 3330 already referred to.
  • Its hardware 3335 may include a radio interface 3337 configured to set up and maintain a wireless connection 3370 with a base station serving a coverage area in which the UE 3330 is currently located.
  • the hardware 3335 of the UE 3330 further includes processing circuitry 3338, which may comprise one or more programmable processors, applicationspecific integrated circuits, field programmable gate arrays or combinations of these (not shown) adapted to execute instructions.
  • the UE 3330 further comprises software 3331 , which is stored in or accessible by the UE 3330 and executable by the processing circuitry 3338.
  • the software 3331 includes a client application 3332.
  • the client application 3332 may be operable to provide a service to a human or non-human user via the UE 3330, with the support of the host computer 3310.
  • an executing host application 3312 may communicate with the executing client application 3332 via the OTT connection 3350 terminating at the UE 3330 and the host computer 3310.
  • the client application 3332 may receive request data from the host application 3312 and provide user data in response to the request data.
  • the OTT connection 3350 may transfer both the request data and the user data.
  • the client application 3332 may interact with the user to generate the user data that it provides.
  • the host computer 3310, base station 3320 and UE 3330 illustrated in Figure 14 may be identical to the host computer 3230, one of the base stations 3212a, 3212b, 3212c and one of the UEs 3291, 3292 of Figure 13, respectively.
  • the inner workings of these entities may be as shown in Figure 14 and independently, the surrounding network topology may be that of Figure 13.
  • the OTT connection 3350 has been drawn abstractly to illustrate the communication between the host computer 3310 and the use equipment 3330 via the base station 3320, without explicit reference to any intermediary devices and the precise routing of messages via these devices.
  • Network infrastructure may determine the routing, which it may be configured to hide from the UE 3330 or from the service provider operating the host computer 3310, or both. While the OTT connection 3350 is active, the network infrastructure may further take decisions by which it dynamically changes the routing (e.g., on the basis of load balancing consideration or reconfiguration of the network).
  • the wireless connection 3370 between the UE 3330 and the base station 3320 is in accordance with the teachings of the embodiments described throughout this disclosure.
  • One or more of the various embodiments improve the performance of OTT services provided to the UE 3330 using the OTT connection 3350, in which the wireless connection 3370 forms the last segment. More precisely, the teachings of these embodiments may improve the data rate, latency, power consumption and thereby provide benefits such as reduced user waiting time, relaxed restriction on file size, better responsiveness, extended battery lifetime.
  • a measurement procedure may be provided for the purpose of monitoring data rate, latency and other factors on which the one or more embodiments improve.
  • the measurement procedure and/or the network functionality for reconfiguring the OTT connection 3350 may be implemented in the software 3311 of the host computer 3310 or in the software 3331 of the UE 3330, or both.
  • sensors (not shown) may be deployed in or in association with communication devices through which the OTT connection 3350 passes; the sensors may participate in the measurement procedure by supplying values of the monitored quantities exemplified above, or supplying values of other physical quantities from which software 3311 , 3331 may compute or estimate the monitored quantities.
  • the reconfiguring of the OTT connection 3350 may include message format, retransmission settings, preferred routing etc.; the reconfiguring need not affect the base station 3320, and it may be unknown or imperceptible to the base station 3320. Such procedures and functionalities may be known and practiced in the art.
  • measurements may involve proprietary UE signaling facilitating the host computer’s 3310 measurements of throughput, propagation times, latency and the like.
  • the measurements may be implemented in that the software 3311, 3331 causes messages to be transmitted, in particular empty or ‘dummy’ messages, using the OTT connection 3350 while it monitors propagation times, errors etc.
  • FIGURE 12 is a flowchart illustrating a method implemented in a communication system, in accordance with one embodiment.
  • the communication system includes a host computer, a base station such as a AP STA, and a UE such as a Non-AP STA which may be those described with reference to Figure 13 and Figure 14. For simplicity of the present disclosure, only drawing references to Figure 15 will be included in this section.
  • a first action 3410 of the method the host computer provides user data.
  • the host computer provides the user data by executing a host application.
  • the host computer initiates a transmission carrying the user data to the UE.
  • the base station transmits to the UE the user data which was carried in the transmission that the host computer initiated, in accordance with the teachings of the embodiments described throughout this disclosure.
  • the UE executes a client application associated with the host application executed by the host computer.
  • FIGURE 16 is a flowchart illustrating a method implemented in a communication system, in accordance with one embodiment.
  • the communication system includes a host computer, a base station such as a AP STA, and a UE such as a Non-AP STA which may be those described with reference to Figure 13 and Figure 14. For simplicity of the present disclosure, only drawing references to Figure 13 will be included in this section.
  • the host computer provides user data.
  • the host computer provides the user data by executing a host application.
  • the host computer initiates a transmission carrying the user data to the UE. The transmission may pass via the base station, in accordance with the teachings of the embodiments described throughout this disclosure.
  • the UE receives the user data carried in the transmission.
  • FIGURE 17 is a flowchart illustrating a method implemented in a communication system, in accordance with one embodiment.
  • the communication system includes a host computer, a base station such as a AP STA, and a UE such as a Non-AP STA which may be those described with reference to Figure 13 and Figure 14. For simplicity of the present disclosure, only drawing references to Figure 14 will be included in this section.
  • the UE receives input data provided by the host computer.
  • the UE provides user data.
  • the UE provides the user data by executing a client application.
  • the UE executes a client application which provides the user data in reaction to the received input data provided by the host computer.
  • the executed client application may further consider user input received from the user.
  • the UE initiates, in an optional third subaction 3630, transmission of the user data to the host computer.
  • the host computer receives the user data transmitted from the UE, in accordance with the teachings of the embodiments described throughout this disclosure.
  • FIGURE 18 is a flowchart illustrating a method implemented in a communication system, in accordance with one embodiment.
  • the communication system includes a host computer, a base station such as a AP STA, and a UE such as a Non-AP STA which may be those described with reference to Figures 32 and 33.
  • a base station such as a AP STA
  • a UE such as a Non-AP STA which may be those described with reference to Figures 32 and 33.
  • the base station receives user data from the UE.
  • the base station initiates transmission of the received user data to the host computer.
  • the host computer receives the user data carried in the transmission initiated by the base station.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Electromagnetism (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

Un procédé, mis en œuvre par un premier nœud pour entraîner un premier codeur ou décodeur basé sur un réseau neuronal, NN, d'un système de deux codeurs ou décodeurs ou plus. Le procédé consiste : à recevoir (801), en provenance d'un deuxième nœud configuré pour entraîner un second codeur ou décodeur, une proposition pour un procédé de calcul de perte commune pour entraîner les deux codeurs ou décodeurs ou plus et une proposition pour un ensemble de paramètres d'architecture NN communs ; à déterminer (802) un procédé de calcul de perte commune pour entraîner les deux codeurs ou décodeurs ou plus sur la base de la proposition reçue pour le procédé de calcul de perte commune ; à déterminer (803) un ensemble de paramètres d'architecture NN communs pour entraîner les deux codeurs ou décodeurs ou plus sur la base de la proposition reçue pour l'ensemble de paramètres d'architecture NN communs ; à entraîner (805) le premier codeur ou décodeur sur la base du procédé de calcul de perte commune, l'ensemble de paramètres d'architecture NN communs, des premières données de canal et des premières données de canal codées ; à fournir (807) un premier ensemble de valeurs entraînées de paramètres de décodeur pouvant être entraînés communs à un troisième nœud ; et à recevoir (808), en provenance du troisième nœud, des valeurs mises à jour communes des paramètres de décodeur pouvant être entraînés communs.
PCT/SE2023/050474 2022-05-16 2023-05-15 Nœuds et procédés de rapport de csi basés sur un apprentissage automatique (ml) Ceased WO2023224533A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GR20220100395 2022-05-16
GR20220100395 2022-05-16

Publications (1)

Publication Number Publication Date
WO2023224533A1 true WO2023224533A1 (fr) 2023-11-23

Family

ID=88835604

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SE2023/050474 Ceased WO2023224533A1 (fr) 2022-05-16 2023-05-15 Nœuds et procédés de rapport de csi basés sur un apprentissage automatique (ml)

Country Status (1)

Country Link
WO (1) WO2023224533A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2025134100A1 (fr) * 2024-02-19 2025-06-26 Lenovo (Singapore) Pte. Ltd. Adaptation de décodeur de réseau neuronal dans un système de communication sans fil impliquant une augmentation de données
WO2025170422A1 (fr) * 2024-02-08 2025-08-14 Samsung Electronics Co., Ltd. Procédé et appareil pour transmettre et recevoir un signal de liaison montante

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210266763A1 (en) * 2020-02-24 2021-08-26 Qualcomm Incorporated Channel state information (csi) learning
WO2021208061A1 (fr) * 2020-04-17 2021-10-21 Qualcomm Incorporated Réseau neuronal configurable pour apprentissage de rétroaction d'état de canal (csf)
WO2022040655A1 (fr) * 2020-08-18 2022-02-24 Qualcomm Incorporated Apprentissage fédéré de paires d'autocodeur pour communication sans fil
WO2022040678A1 (fr) * 2020-08-18 2022-02-24 Qualcomm Incorporated Apprentissage fédéré pour classificateurs et auto-encodeurs pour communication sans fil
WO2022086949A1 (fr) * 2020-10-21 2022-04-28 Idac Holdings, Inc Procédés de formation de composants d'intelligence artificielle dans des systèmes sans fil

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210266763A1 (en) * 2020-02-24 2021-08-26 Qualcomm Incorporated Channel state information (csi) learning
WO2021208061A1 (fr) * 2020-04-17 2021-10-21 Qualcomm Incorporated Réseau neuronal configurable pour apprentissage de rétroaction d'état de canal (csf)
WO2022040655A1 (fr) * 2020-08-18 2022-02-24 Qualcomm Incorporated Apprentissage fédéré de paires d'autocodeur pour communication sans fil
WO2022040678A1 (fr) * 2020-08-18 2022-02-24 Qualcomm Incorporated Apprentissage fédéré pour classificateurs et auto-encodeurs pour communication sans fil
WO2022086949A1 (fr) * 2020-10-21 2022-04-28 Idac Holdings, Inc Procédés de formation de composants d'intelligence artificielle dans des systèmes sans fil

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ERICSSON: "Discussions on AI-CSI", 3GPP DRAFT; R1-2203282, 3RD GENERATION PARTNERSHIP PROJECT (3GPP), MOBILE COMPETENCE CENTRE ; 650, ROUTE DES LUCIOLES ; F-06921 SOPHIA-ANTIPOLIS CEDEX ; FRANCE, vol. RAN WG1, no. Online; 20220516 - 20220527, 29 April 2022 (2022-04-29), Mobile Competence Centre ; 650, route des Lucioles ; F-06921 Sophia-Antipolis Cedex ; France, XP052152910 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2025170422A1 (fr) * 2024-02-08 2025-08-14 Samsung Electronics Co., Ltd. Procédé et appareil pour transmettre et recevoir un signal de liaison montante
WO2025134100A1 (fr) * 2024-02-19 2025-06-26 Lenovo (Singapore) Pte. Ltd. Adaptation de décodeur de réseau neuronal dans un système de communication sans fil impliquant une augmentation de données

Similar Documents

Publication Publication Date Title
US20240129008A1 (en) Neural network based channel state information feedback
US20250047346A1 (en) Communications nodes and methods for proprietary machine learning-based csi reporting
US20220149904A1 (en) Compression and Decompression of Downlink Channel Estimates
WO2020080989A1 (fr) Gestion d'apprentissage machine pour améliorer les performances d'un réseau de communication sans fil
WO2023211346A1 (fr) Noeud et procédés d'entraînement d'un codeur de réseau neuronal pour des csi basées sur un apprentissage automatique
EP4466799A1 (fr) Solution d'apprentissage de modèle hybride pour signalement de csi
WO2023224533A1 (fr) Nœuds et procédés de rapport de csi basés sur un apprentissage automatique (ml)
US20250220482A1 (en) Methods for dynamic channel state information feedback reconfiguration
US20250031066A1 (en) Server and Agent for Reporting of Computational Results during an Iterative Learning Process
EP4480102A1 (fr) Évaluation des performances d'un codeur ae
WO2023208781A1 (fr) Équipement utilisateur et procédé dans un réseau de communications sans fil
EP4150777A1 (fr) Précodage adaptatif su-mimo de liaison montante dans des systèmes cellulaires sans fil sur la base de mesures de qualité de réception
CN117811627A (zh) 一种通信的方法和通信装置
EP4479892A1 (fr) Noeuds et procédés d'évaluation des performances d'un codeur ae
WO2023158354A1 (fr) Nœuds et procédés de gestion d'une évaluation de performances d'un codeur ae
US20250286590A1 (en) Nodes and methods for enhanced ml-based csi reporting
WO2023208474A1 (fr) Premier noeud sans fil, noeud d'opérateur et procédés dans un réseau de communication sans fil
US20250016065A1 (en) Server and agent for reporting of computational results during an iterative learning process
CN118509014A (zh) 一种通信的方法和通信装置
WO2023113677A1 (fr) Nœuds et procédés de rapport de csi à base d'apprentissage automatique propriétaire
WO2022214935A1 (fr) Approximation de matrice de covariance de canal de liaison descendante dans des systèmes duplex par répartition en fréquence
WO2023158363A1 (fr) Évaluation de la performance d'un codeur ae
WO2024172732A1 (fr) Indicateur de qualité de canal (cqi) cible basé sur information d'état de canal (csi)
EP4666461A1 (fr) Procédé et appareil de rapport d'informations d'état de canal à l'aide d'un autocodeur
WO2024172744A1 (fr) Procédé et appareil de rapport d'informations d'état de canal à l'aide d'un autocodeur

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23808002

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 23808002

Country of ref document: EP

Kind code of ref document: A1