WO2025172978A1

WO2025172978A1 - Obtaining learning model parameters at a device

Info

Publication number: WO2025172978A1
Application number: PCT/IB2025/052898
Authority: WO
Inventors: Venkata Srinivas Kothapalli
Original assignee: Lenovo Singapore Pte Ltd
Current assignee: Lenovo Singapore Pte Ltd
Priority date: 2024-04-11
Filing date: 2025-03-19
Publication date: 2025-08-21
Anticipated expiration: 2026-10-11

Abstract

Various aspects of the present disclosure relate to obtaining learning model parameters at a device. A device selects a subset of data samples from a set of data samples based on a determination that one or more parameters for at least one normalization layer of an encoder learning model are to be updated. The encoder learning model is part of a two-sided learning model that includes the encoder learning model and a decoder learning model. A device updates the parameters for the normalization layer of the encoder learning model based on providing the subset of data samples as input to the encoder learning model. A device transmits a message to a second device to use for updating parameters of the decoder learning model. The message includes an output associated with the subset of data samples from the encoder learning model.

Description

Lenovo Ref. No. SMM92040007-WO-PCT 1 OBTAINING LEARNING MODEL PARAMETERS AT A DEVICE RELATED APPLICATION [0001] This application claims priority to U.S. Provisional Application Serial No. 63/632,786 filed April 11, 2024, entitled “OBTAINING LEARNING MODEL PARAMETERS AT A DEVICE,” the disclosure of which is incorporated by reference herein in its entirety. TECHNICAL FIELD ^[0002] The present disclosure relates to wireless communications, and more specifically to learning techniques for communications. BACKGROUND [0003] A wireless communications system may include one or multiple network communication devices, which may be otherwise known as network equipment (NE), supporting wireless communications for one or multiple user communication devices, which may be otherwise known as user equipment (UE), or other suitable terminology. The wireless communications system may support wireless communications with one or multiple user communication devices by utilizing resources of the wireless communication system (e.g., time resources (e.g., symbols, slots, subframes, frames, or the like)) or frequency resources (e.g., subcarriers, carriers, or the like). Additionally, the wireless communications system may support wireless communications across various radio access technologies including third generation (3G) radio access technology, fourth generation (4G) radio access technology, fifth generation (5G) radio access technology, among other suitable radio access technologies beyond 5G (e.g., sixth generation (6G)). SUMMARY [0004] An article “a” before an element is unrestricted and understood to refer to “at least one” of those elements or “one or more” of those elements. The terms “a,” “at least one,” “one or more,” and “at least one of one or more” may be interchangeable. As used herein, including in the claims, “or” as used in a list of items (e.g., a list of items prefaced by a phrase such as “at least one of” or “one or more of” or “one or both of”) indicates an inclusive list such that, for example, a list of at least one of A, B, or C means A or B or C or AB or AC or BC or ABC (i.e., A and B and C). Also, Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 2 as used herein, the phrase “based on” shall not be construed as a reference to a closed set of conditions. For example, an example step that is described as “based on condition A” may be based on both a condition A and a condition B without departing from the scope of the present disclosure. In other words, as used herein, the phrase “based on” shall be construed in the same manner as the phrase “based at least in part on.” Further, as used herein, including in the claims, a “set” may include one or more elements. ^[0005] A first device for wireless communication is described. The first device may be configured to, capable of, or operable to perform one or more operations as described herein. For example, the first device may be configured to, capable of, or operable to select a subset of data samples from a set of data samples associated with an encoder learning model, where the encoder learning model is associated with a two-sided learning model that includes the encoder learning model and a decoder learning model, update one or more parameters corresponding to at least one normalization layer of the encoder learning model based on providing the subset of data samples as input to the encoder learning model, and transmit, to a second device, a message including an output from the encoder learning model, where the output is associated with the subset of data samples. [0006] A processor (e.g., a standalone processor chipset, or a component of a first device) for wireless communication is described. The processor may be configured to, capable of, or operable to perform one or more operations as described herein. For example, the processor may be configured to, capable of, or operable to select a subset of data samples from a set of data samples associated with an encoder learning model, where the encoder learning model is associated with a two-sided learning model that includes the encoder learning model and a decoder learning model, update one or more parameters corresponding to at least one normalization layer of the encoder learning model based on providing the subset of data samples as input to the encoder learning model, and transmit, to a second device, a message including an output from the encoder learning model, where the output is associated with the subset of data samples. [0007] A method performed or performable by a first device for wireless communication is described. The method may include selecting a subset of data samples from a set of data samples associated with an encoder learning model, where the encoder learning model is associated with a two-sided learning model that includes the encoder learning model and a decoder learning model, updating one or more parameters corresponding to at least one normalization layer of the encoder Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 3 learning model based on providing the subset of data samples as input to the encoder learning model, and transmitting, to a second device, a message including an output from the encoder learning model, where the output is associated with the subset of data samples. ^[0008] In some implementations of the first device, the processor, and the method described herein, to update the one or more parameters, the first device, the processor, and the method may further be configured to, capable of, or operable to update the one or more parameters based on a periodicity. In some implementations of the first device, the processor, and the method described herein, to update the one or more parameters, the first device, the processor, and the method may further be configured to, capable of, or operable to receive, from at least one of the second device or a third device, an additional message that indicates for the first device to update the one or more parameters. In some implementations of the first device, the processor, and the method described herein, to update the one or more parameters, the first device, the processor, and the method may further be configured to, capable of, or operable to detect a change in one or more characteristics associated with channel state information (CSI). In some implementations of the first device, the processor, and the method described herein, to update the one or more parameters, the first device, the processor, and the method may further be configured to, capable of, or operable to detect a change in one or more channel conditions associated with a channel between the first device and the second device. In some implementations of the first device, the processor, and the method described herein, to update the one or more parameters, the first device, the processor, and the method may further be configured to, capable of, or operable to determine a quality of output from the encoder learning model fails to satisfy a threshold value. [0009] In some implementations of the first device, the processor, and the method described herein, the first device, the processor, and the method may further be configured to, capable of, or operable to obtain the set of data samples based on estimating CSI. In some implementations of the first device, the processor, and the method described herein, the first device, the processor, and the method may further be configured to, capable of, or operable to receive an additional message that indicates the set of data samples. In some implementations of the first device, the processor, and the method described herein, to select the subset of data samples, the first device, the processor, and the method may further be configured to, capable of, or operable to determine the subset of data samples satisfy one or more conditions, where the one or more conditions include a distance metric Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 4 between respective data samples in the subset of data samples being greater than a threshold dissimilarity value and an entropy value of respective data samples in the subset of data samples satisfying a threshold entropy value. In some implementations of the first device, the processor, and the method described herein, the first device, the processor, and the method may further be configured to, capable of, or operable to determine one or more of the threshold dissimilarity value or the threshold entropy value. In some implementations of the first device, the processor, and the method described herein, the first device, the processor, and the method may further be configured to, capable of, or operable to receive an additional message that indicates one or more of the threshold dissimilarity value or the threshold entropy value. In some implementations of the first device, the processor, and the method described herein, to update the one or more parameters, the first device, the processor, and the method may further be configured to, capable of, or operable to propagate the subset of data samples from a first layer associated with the encoder learning model to a last layer associated with the encoder learning model, and compute one or more normalization statistics associated with the at least one normalization layer, where the one or more normalization statistics include a value of a mean parameter and a value of a variance parameter. In some implementations of the first device, the processor, and the method described herein, the one or more parameters include one or more normalization statistics associated with the at least one normalization layer, where the one or more normalization statistics include a value of a mean parameter and a value of a variance parameter. In some implementations of the first device, the processor, and the method described herein, the at least one normalization layer is a batch normalization layer. [0010] A first device for wireless communication is described. The first device may be configured to, capable of, or operable to perform one or more operations as described herein. For example, the first device may be configured to, capable of, or operable to receive, from a second device, a message including an output from an encoder learning model, where the output is associated with a subset of data samples of a set of data samples, and the encoder learning model is associated with a two-sided learning model that includes the encoder learning model and a decoder learning model, and update one or more parameters corresponding to at least one normalization layer of the decoder learning model based on providing the output as input to the decoder learning model. Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 5 [0011] A processor (e.g., a standalone processor chipset, or a component of a first device) for wireless communication is described. The processor may be configured to, capable of, or operable to perform one or more operations as described herein. For example, the processor may be configured to, capable of, or operable to receive, from a second device, a message including an output from an encoder learning model, where the output is associated with a subset of data samples of a set of data samples, and the encoder learning model is associated with a two-sided learning model that includes the encoder learning model and a decoder learning model, and update one or more parameters corresponding to at least one normalization layer of the decoder learning model based on providing the output as input to the decoder learning model. [0012] A method performed or performable by a first device for wireless communication is described. The method may include receiving, from a second device, a message including an output from an encoder learning model, where the output is associated with a subset of data samples of a set of data samples, and the encoder learning model is associated with a two-sided learning model that includes the encoder learning model and a decoder learning model, and updating one or more parameters corresponding to at least one normalization layer of the decoder learning model based on providing the output as input to the decoder learning model. ^[0013] In some implementations of the first device, the processor, and the method described herein, to update the one or more parameters, the first device, the processor, and the method may further be configured to, capable of, or operable to update the one or more parameters based on a periodicity. In some implementations of the first device, the processor, and the method described herein, to update the one or more parameters, the first device, the processor, and the method may further be configured to, capable of, or operable to receive, from at least one of the second device or a third device, an additional message that indicates for the first device to update the one or more parameters. In some implementations of the first device, the processor, and the method described herein, to update the one or more parameters, the first device, the processor, and the method may further be configured to, capable of, or operable to detect a change in one or more channel conditions associated with a channel between the first device and the second device. In some implementations of the first device, the processor, and the method described herein, to update the one or more parameters, the first device, the processor, and the method may further be configured Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 6 to, capable of, or operable to determine a quality of output from the decoder learning model fails to satisfy a threshold value. [0014] In some implementations of the first device, the processor, and the method described herein, to update the one or more parameters, the first device, the processor, and the method may further be configured to, capable of, or operable to propagate the subset of data samples from a first layer associated with the decoder learning model to a last layer associated with the decoder learning model and computes one or more normalization statistics associated with the at least one normalization layer, where the one or more normalization statistics include a value of a mean parameter and a value of a variance parameter. In some implementations of the first device, the processor, and the method described herein, the output includes a set of information samples corresponding to feature vectors associated with the encoder learning model. In some implementations of the first device, the processor, and the method described herein, the one or more parameters include one or more normalization statistics associated with the at least one normalization layer, and where the one or more normalization statistics include a value of a mean parameter and a value of a variance parameter. In some implementations of the first device, the processor, and the method described herein, the at least one normalization layer is a batch normalization layer. BRIEF DESCRIPTION OF THE DRAWINGS [0015] Figure 1 illustrates an example of a wireless communications system in accordance with aspects of the present disclosure. ^[0016] Figure 2 illustrates an example of a learning model diagram, in accordance with aspects of the present disclosure. [0017] Figure 3 illustrates an example of wireless communications system, in accordance with aspects of the present disclosure. [0018] Figure 4 illustrates an example of a signaling diagram, in accordance with aspects of the present disclosure. ^[0019] Figure 5 illustrates an example of a UE in accordance with aspects of the present disclosure. Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 7 [0020] Figure 6 illustrates an example of a processor in accordance with aspects of the present disclosure. [0021] Figure 7 illustrates an example of an NE in accordance with aspects of the present disclosure. [0022] Figure 8 illustrates a flowchart of a method performed by a UE in accordance with aspects of the present disclosure. ^[0023] Figure 9 illustrates a flowchart of a method performed by an NE in accordance with aspects of the present disclosure. DETAILED DESCRIPTION [0024] A wireless communications system may include one or more devices, such as UEs and NEs, among other devices that transmit and receive signaling. The devices can transmit and receive the signaling via one or more channels between the devices, where a channel refers to a medium through which information is transmitted between the devices. The devices can exchange information, such as CSI, related to the channels over time, frequency, and space. The devices use the information for coherent reception and/or transmission of control information and/or data between the devices. Coherent reception and/or transmission refers to maintaining phase synchronization between the transmitted and received signals, which enables more accurate signal detection and demodulation, among other advantages. A device can acquire downlink CSI using uplink reference signals (e.g., transmitted from a UE to an NE), where there is uplink-downlink channel reciprocity. Uplink-downlink channel reciprocity occurs when channel characteristics (e.g., channel gains, phase shifts, and fading conditions) observed for an uplink transmission from a UE to an NE are approximately equivalent to those observed for a downlink transmission from the NE to the UE. Additionally, or alternatively, in the absence of uplink-downlink channel reciprocity, the device can measure one or more downlink reference signals (e.g., transmitted from an NE to a UE) to obtain CSI, where a UE can transmit a message indicating the CSI to the NE. To reduce signaling overhead, the UE can compress CSI for transmission to the NE. That is, the UE can compress (e.g., minimize a complexity of and/or encode to a more compact representation) a CSI report, including CSI information provided in the CSI report. Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 8 [0025] In some variations, one or more devices (e.g., the UE and the NE) in the wireless communications systems can implement one or more learning models, which can be examples of machine learning (ML) models and/or artificial intelligence (AI) models, to perform CSI compression. For example, the devices can implement a two-sided CSI compression model that includes an encoder component that generates compressed CSI data at a UE and a decoder component that generates estimated CSI data from signaling indicating the compressed CSI data at the NE. In some examples, an NE or other device trains the learning models on a training dataset that includes CSI data samples, where the training includes obtaining one or more parameters of the learning models by inputting the training dataset (e.g., for the encoder component and the decoder component). For example, the NE obtains one or more parameters for the learning models that minimize a loss function for a training data set, where the loss function calculates an error between a predicted output of a learning model and a true output. One or more devices execute (e.g., process, implement) the learning models using the parameters. However, when executing the learning models, the devices can input data to the learning models that has different statistical characteristics than the dataset over which the learning models are trained, which leads to loss of performance (e.g., erroneous inferences and predictions) by the learning models. The loss of performance can lead to increased use of processing resources at the devices executing the model, as well as an increase in signaling overhead, due to correction of errors in output from the learning models that result from the distribution mismatch. [0026] In some cases, the NE can train multiple different learning models for respective domains over which the learning models make inferences or predictions, where a domain represents data for a scenario, a configuration, or a parameter of a network, a physical propagation medium, or a device behavior. The NE can store parameters for the different learning models for transmission to one or more other devices (e.g., UEs or other devices with reduced memory storage capabilities when compared with the NE). Additionally, or alternatively, the one or more other devices can receive and store the parameters of different learning models. The other devices can use respective learning models for the different domains. However, transmitting and/or storing parameters of learning models for different domains results in increased signaling overhead and memory storage usage, as well as inefficient use of time-frequency resources due to the learning models having a relatively large (e.g., greater than a threshold value) numerical quantity of parameters. Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 9 [0027] As described herein, to reduce loss of performance without increasing processing and signaling overhead by transmitting parameters for multiple learning models, one or more devices can generate parameters for learning models to adapt the learning models for different data distributions of input data. For example, a transmitting device, which is a device that transmits signaling, can execute an encoder component of a two-sided CSI compression model, and a receiving device, which is a device that receives the signaling, can execute a decoder component of the two-sided CSI compression model. The encoder component and the decoder component can include multiple layers, such as hidden layers (e.g., neural layers) and normalization layers. A layer of the learning model is defined by one or more parameters, which can include weights, biases, and/or other parameters. A normalization layer can include a relatively small numerical quantity of parameters (e.g., less than a threshold value) when compared with a hidden layer. A transmitting device and/or a receiving device can determine to update one or more parameters of the normalization layers of the encoder component (e.g., according to a periodicity, by detecting a change in characteristics of estimated CSI, by detecting a change in performance of the encoder component, and/or upon receiving signaling indicating for the transmitting device to update the encoder neural network, among other examples). ^[0028] In some examples, the transmitting device can update the parameters for normalization layers of the encoder component by providing a subset of data samples obtained by the transmitting device as input to the encoder component. The transmitting device can transmit an output from the encoder component to the receiving device. Additionally, or alternatively, the receiving device can use the output from the encoder component to update parameters for normalization layers of the decoder component. For example, the receiving device can provide the output from the encoder component as input to the decoder learning model to update parameters for the normalization layers. The transmitting device and/or the receiving device can execute the encoder component and the decoder component for CSI compression for remaining data samples obtained by the transmitting device, which reduce loss of performance for the encoder component and the decoder component by adapting parameters to the statistical characteristics of the subset of data samples without increasing processing and signaling overhead by transmitting parameters for multiple learning models. [0029] Reference is made herein to receiving, transmitting, or communicating data or information, such as signaling communication resources and/or communications that are transmitted Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 10 or received between devices. It is to be appreciated that other terms may be used interchangeably with communicating, such as signaling, transmitting, receiving, outputting, forwarding, retrieving, obtaining, and so forth. Similarly, other terms may be used interchangeably with transmitting (e.g., communicating, signaling, outputting, forwarding, and so forth), and other terms may be used interchangeably with receiving (e.g., communicating, retrieving, obtaining, and so forth). [0030] Aspects of the present disclosure are described in the context of a wireless communications system. [0031] Figure 1 illustrates an example of a wireless communications system 100 in accordance with aspects of the present disclosure. The wireless communications system 100 may include one or more NE 102, one or more UE 104, and a core network (CN) 106. The wireless communications system 100 may support various radio access technologies. In some implementations, the wireless communications system 100 may be a 4G network, such as an LTE network or an LTE-Advanced (LTE-A) network. In some other implementations, the wireless communications system 100 may be a NR network, such as a 5G network, a 5G-Advanced (5G-A) network, or a 5G ultrawideband (5G-UWB) network. In other implementations, the wireless communications system 100 may be a combination of a 4G network and a 5G network, or other suitable radio access technology including Institute of Electrical and Electronics Engineers (IEEE) 802.11 (Wi-Fi), IEEE 802.16 (WiMAX), IEEE 802.20. The wireless communications system 100 may support radio access technologies beyond 5G, for example, 6G. Additionally, the wireless communications system 100 may support technologies, such as time division multiple access (TDMA), frequency division multiple access (FDMA), or code division multiple access (CDMA), etc. [0032] The one or more NE 102 may be dispersed throughout a geographic region to form the wireless communications system 100. One or more of the NE 102 described herein may be or include or may be referred to as a network node, a base station, an access point (AP), a network element, a network function, a network entity, network infrastructure (or infrastructure), a radio access network (RAN), a NodeB, an eNodeB (eNB), a next-generation NodeB (gNB), or other suitable terminology. An NE 102 and a UE 104 may communicate via a communication link, which may be a wireless or wired connection. For example, an NE 102 and a UE 104 may perform wireless communication (e.g., receive signaling, transmit signaling) over a Uu interface. Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 11 [0033] An NE 102 may provide a geographic coverage area for which the NE 102 may support services for one or more UEs 104 within the geographic coverage area. For example, an NE 102 and a UE 104 may support wireless communication of signals related to services (voice, video, packet data, messaging, broadcast, etc.) according to one or multiple radio access technologies. In some implementations, an NE 102 may be moveable, for example, a satellite associated with a non-terrestrial network (NTN). In some implementations, different geographic coverage areas associated with the same or different radio access technologies may overlap, but the different geographic coverage areas may be associated with different NE 102. [0034] The one or more UEs 104 may be dispersed throughout a geographic region of the wireless communications system 100. A UE 104 may include or may be referred to as a remote unit, a mobile device, a wireless device, a remote device, a subscriber device, a transmitter device, a receiver device, or some other suitable terminology. In some implementations, the UE 104 may be referred to as a unit, a station, a terminal, or a client, among other examples. Additionally, or alternatively, the UE 104 may be referred to as an Internet-of-Things (IoT) device, an Internet-of- Everything (IoE) device, or machine-type communication (MTC) device, among other examples. [0035] A UE 104 may be able to support wireless communication directly with other UEs 104 over a communication link. For example, a UE 104 may support wireless communication directly with another UE 104 over a device-to-device (D2D) communication link. In some implementations, such as vehicle-to-vehicle (V2V) deployments, vehicle-to-everything (V2X) deployments, or cellular-V2X deployments, the communication link may be referred to as a sidelink. For example, a UE 104 may support wireless communication directly with another UE 104 over a PC5 interface. [0036] An NE 102 may support communications with the CN 106, or with another NE 102, or both. For example, an NE 102 may interface with other NE 102 or the CN 106 through one or more backhaul links (e.g., S1, N2, N6, or another network interface). In some implementations, the NE 102 may communicate with each other directly. In some other implementations, the NE 102 may communicate with each other indirectly (e.g., via the CN 106). In some implementations, one or more NE 102 may include subcomponents, such as an access network entity, which may be an example of an access node controller (ANC). An ANC may communicate with the one or more UEs 104 through one or more other access network transmission entities, which may be referred to as a radio heads, smart radio heads, or transmission-reception points (TRPs). Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 12 [0037] The CN 106 may support user authentication, access authorization, tracking, connectivity, and other access, routing, or mobility functions. The CN 106 may be an evolved packet core (EPC), or a 5G core (5GC), which may include a control plane entity that manages access and mobility (e.g., a mobility management entity (MME), an access and mobility management functions (AMF)) and a user plane entity that routes packets or interconnects to external networks (e.g., a serving gateway (S-GW), a packet data network (PDN) gateway (P-GW), or a user plane function (UPF)). In some implementations, the control plane entity may manage non-access stratum (NAS) functions, such as mobility, authentication, and bearer management (data bearers, signal bearers, etc.) for the one or more UEs 104 served by the one or more NE 102 associated with the CN 106. ^[0038] The CN 106 may communicate with a packet data network over one or more backhaul links (e.g., via an S1, N2, N6, or another network interface). The packet data network may include an application server. In some implementations, one or more UEs 104 may communicate with the application server. A UE 104 may establish a session (e.g., a protocol data unit (PDU) session, or the like) with the CN 106 via an NE 102. The CN 106 may route traffic (e.g., control information, data, and the like) between the UE 104 and the application server using the established session (e.g., the established PDU session). The PDU session may be an example of a logical connection between the UE 104 and the CN 106 (e.g., one or more network functions of the CN 106). [0039] In the wireless communications system 100, the NEs 102 and the UEs 104 may use resources of the wireless communications system 100 (e.g., time resources (e.g., symbols, slots, subframes, frames, or the like) or frequency resources (e.g., subcarriers, carriers)) to perform various operations (e.g., wireless communications). In some implementations, the NEs 102 and the UEs 104 may support different resource structures. For example, the NEs 102 and the UEs 104 may support different frame structures. In some implementations, such as in 4G, the NEs 102 and the UEs 104 may support a single frame structure. In some other implementations, such as in 5G and among other suitable radio access technologies, the NEs 102 and the UEs 104 may support various frame structures (i.e., multiple frame structures). The NEs 102 and the UEs 104 may support various frame structures based on one or more numerologies. [0040] One or more numerologies may be supported in the wireless communications system 100, and a numerology may include a subcarrier spacing and a cyclic prefix. A first Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 13 numerology (e.g., ^=0) may be associated with a first subcarrier spacing (e.g., 15 kHz) and a normal cyclic prefix. In some implementations, the first numerology (e.g., ^=0) associated with the first subcarrier spacing (e.g., 15 kHz) may utilize one slot per subframe. A second numerology (e.g., ^=1) may be associated with a second subcarrier spacing (e.g., 30 kHz) and a normal cyclic prefix. A third numerology (e.g., ^=2) may be associated with a third subcarrier spacing (e.g., 60 kHz) and a normal cyclic prefix or an extended cyclic prefix. A fourth numerology (e.g., ^=3) may be associated with a fourth subcarrier spacing (e.g., 120 kHz) and a normal cyclic prefix. A fifth numerology (e.g., ^=4) may be associated with a fifth subcarrier spacing (e.g., 240 kHz) and a normal cyclic prefix. [0041] A time interval of a resource (e.g., a communication resource) may be organized according to frames (also referred to as radio frames). Each frame may have a duration, for example, a 10 millisecond (ms) duration. In some implementations, each frame may include multiple subframes. For example, each frame may include 10 subframes, and each subframe may have a duration, for example, a 1 ms duration. In some implementations, each frame may have the same duration. In some implementations, each subframe of a frame may have the same duration. [0042] Additionally, or alternatively, a time interval of a resource (e.g., a communication resource) may be organized according to slots. For example, a subframe may include a number (e.g., quantity) of slots. The number of slots in each subframe may also depend on the one or more numerologies supported in the wireless communications system 100. For instance, the first, second, third, fourth, and fifth numerologies (i.e., ^=0, ^=1, ^=2, ^=3, ^=4) associated with respective subcarrier spacings of 15 kHz, 30 kHz, 60 kHz, 120 kHz, and 240 kHz may utilize a single slot per subframe, two slots per subframe, four slots per subframe, eight slots per subframe, and 16 slots per subframe, respectively. Each slot may include a number (e.g., quantity) of symbols (e.g., orthogonal frequency division multiplexing (OFDM) symbols). In some implementations, the number (e.g., quantity) of slots for a subframe may depend on a numerology. For a normal cyclic prefix, a slot may include 14 symbols. For an extended cyclic prefix (e.g., applicable for 60 kHz subcarrier spacing), a slot may include 12 symbols. The relationship between the number of symbols per slot, the number of slots per subframe, and the number of slots per frame for a normal cyclic prefix and an extended cyclic prefix may depend on a numerology. It should be understood that reference to a Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 14 first numerology (e.g., ^=0) associated with a first subcarrier spacing (e.g., 15 kHz) may be used interchangeably between subframes and slots. [0043] In the wireless communications system 100, an electromagnetic (EM) spectrum may be split, based on frequency or wavelength, into various classes, frequency bands, frequency channels, etc. By way of example, the wireless communications system 100 may support one or multiple operating frequency bands, such as frequency range designations FR1 (410 MHz – 7.125 GHz), FR2 (24.25 GHz – 52.6 GHz), FR3 (7.125 GHz – 24.25 GHz), FR4 (52.6 GHz – 114.25 GHz), FR4a or FR4-1 (52.6 GHz – 71 GHz), and FR5 (114.25 GHz – 300 GHz). In some implementations, the NEs 102 and the UEs 104 may perform wireless communications over one or more of the operating frequency bands. In some implementations, FR1 may be used by the NEs 102 and the UEs 104, among other equipment or devices for cellular communications traffic (e.g., control information, data). In some implementations, FR2 may be used by the NEs 102 and the UEs 104, among other equipment or devices for short-range, high data rate capabilities. [0044] FR1 may be associated with one or multiple numerologies (e.g., at least three numerologies). For example, FR1 may be associated with a first numerology (e.g., ^=0), which includes 15 kHz subcarrier spacing; a second numerology (e.g., ^=1), which includes 30 kHz subcarrier spacing; and a third numerology (e.g., ^=2), which includes 60 kHz subcarrier spacing. FR2 may be associated with one or multiple numerologies (e.g., at least 2 numerologies). For example, FR2 may be associated with a third numerology (e.g., ^=2), which includes 60 kHz subcarrier spacing; and a fourth numerology (e.g., ^=3), which includes 120 kHz subcarrier spacing. ^[0045] In some examples, the wireless communications system 100 can include one or more transmitting devices that transmit signaling, including data and/or control signaling, to one or more devices that receive the signaling (e.g., receiving devices). Example transmitting devices include, but are not limited to, NEs 102 and/or UEs 104. Additionally, or alternatively, example receiving devices include, but are not limited to, NEs 102 and/or UEs 104. The transmitting devices and the receiving devices can communicate via one or more channels (e.g., wireless channels). A channel includes a medium through which signaling propagate between the transmitting devices and the receiving devices. In some variations, the transmitting devices and/or the receiving devices can have multiple antennas. For example, a UE 104 can have ^ antennas and an NE (e.g., base station or Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 15 gNB) can have ^ antennas. A channel between the NE 102 and the UE 104 has a total of ^^ number of paths. A path defines a route or trajectory that an electromagnetic wave takes from a transmitting device to a receiving device. A signal can follow one or more paths from a transmitting device to a receiving device due to propagation, such as reflection, diffraction, and scattering. ^[0046] In downlink communication, in which an NE 102 transmits signaling to a UE 104, a _{discrete-time channel can be represented as a ^ × ^ dimensional complex-valued matrix ^, with} element ℎ_^^ of ^ denoting the complex-valued channel gain between ^^{^^} receive antenna and ^^{^^} _{transmit antenna, 1 ≤ ^ ≤ ^, 1 ≤ ^ ≤ ^. A discrete-time channel refers to a channel in which} transmitted signals and received signals are represented and processed in discrete time instants or intervals. That is, the wireless communications system 100 uses discrete-time samples of signals rather than continuous-time signals. The channel gains, or the channel matrix ^, depends on the physical propagation medium. The wireless channel is a time-varying channel due to the dynamic nature of the physical propagation medium. Further, the channel gains depend on the frequency the devices use for the signaling. For example, with a multicarrier waveform, such as OFDM, the channel matrix can have different values for different sub-carriers (e.g., frequencies) at a same instant of time. That is, the wireless channel matrix ^ is stochastic, varying across time, frequency, and spatial dimensions. ^[0047] In some examples, one or more devices in the wireless communications system 100 can adapt a transmission method for a current channel realization and/or preprocess signaling to be transmitted according to a current channel realization to increase signaling throughput over a communication link while improving reliability of the communication link. A transmitting device can use CSI to achieve an adaptive transmission or to implement preprocessing at the transmitting device. For example, a transmitting device can determine a channel matrix ^ over a frequency range of operation (e.g., at respective subcarriers for OFDM and/or multi-carrier waveforms) when characteristics of a channel change. ^[0048] A receiving device can estimate a channel using reference signals and/or pilot signals transmitted by the transmitting device. The transmitting device can obtain (e.g., acquire, determine) CSI by receiving signaling indicating the CSI from the receiving device. For example, the receiving device can measure CSI and can transmit the CSI to the transmitting device via a feedback channel. Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 16 Thus, in downlink communication (e.g., from an NE 102 to a UE 104), the UE 104 estimates downlink CSI by performing one or more measurements for pilot signals and/or reference signals transmitted from the NE 102. The CSI estimation can include a channel matrix and/or a channel covariance matrix, where elements of a channel matrix include channel gains and phase shifts. In some variations, the UE 104 can measure one or more pilot signals and/or reference signals (e.g., from the NE 102) to obtain the CSI. In some cases, CSI includes information that describes the characteristics and conditions of a communication channel between a transmitting device (e.g., the UE 104 and/or the NE 102) and a receiving device (e.g., the NE 102 and/or the UE 104) in a wireless communications system 100. For example, the CSI data and corresponding data samples can include, but are not limited to, channel gain data, phase shift data, frequency response data, delay spread data, and Doppler shift data, among other information. The UE transmits the estimated CSI to the NE 102. [0049] Additionally, or alternatively, for uplink communication from a UE 104 to an NE 102, the NE 102 estimates CSI using uplink reference signals and/or pilot signals from a UE 104. The NE 102 transmits the CSI to the UE 104. However, an NE 102 and/or UE 104 transmitting CSI to the UE 104 or the NE 102, respectively, leads to relatively high signaling overhead (e.g., greater than a threshold value), as the CSI does not include data. To reduce signaling overhead related to transmission of CSI while providing for the NE 102 to acquire CSI of sufficient quality to enable the NE 102 to adapt communications over the link, a UE 104 can compress CSI prior to transmitting the CSI to the NE 102. For example, once a receiving device estimates CSI, the receiving device can compress the estimated CSI and transmit the compressed version of CSI to a transmitting device. [0050] In some variations, the transmitting device and the receiving device can implement, or execute, a two-sided learning model for CSI compression. A two-sided learning model includes two neural networks, an encoder neural network (e.g., an encoder component) and a decoder neural network (e.g., a decoder component). The encoder component computes a low-dimensional feature vector (e.g., a low-dimensional latent representation) of the input CSI. The decoder component reconstructs the CSI using the low-dimensional feature vector. By deploying the encoder component at the receiving device and the decoder component at the transmitting device, feedback of CSI by the receiving device amounts to the feedback of the low-dimensional feature vector Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 17 computed by the encoder at the receiving device. Thus, the two-sided learning model achieves CSI compression for transmitting the CSI from a receiving device to a transmitting device, as described in further detail with respect to Figure 2. ^[0051] In some examples, one or more devices that execute the learning models, whether at one of the transmitting device or the receiving device (e.g., one-sided) or by both the transmitting device and the receiving device (e.g., two-sided), expect the learning models to perform with a same level of accuracy and/or precision for inferences and/or predictions as during a training and testing phase of the learning models. However, in a real-world wireless network, a learning model can make predictions and/or inferences based on input data having different statistical characteristics than a dataset over which the learning model is trained. That is, after the learning model is trained, tested, and deployed at a device, the learning model can output inferences and/or predictions based on input data that has one or more different statistics than the input data that is used during a training and testing phase of the learning model. Thus, the learning models can output erroneous and/or incorrect inferences and/or predictions. To reduce the errors produced by the learning model, the learning model can be trained with a relatively general set of data. However, the performance (e.g., the accuracy or precision) of the learning model may decline for one or more domains for which the learning model is implemented due to the generalized data. [0052] In some examples, a device that implements an encoder component of a two-sided learning model for CSI compression (e.g., an autoencoder or other a two-sided neural network), can provide estimated CSI as input to the encoder component. In some cases, the input is a function of the estimated CSI matrix ^, such as a covariance matrix of the channel matrix, eigen vectors of the CSI matrix, singular vectors of the CSI matrix, a precoder matrix computed from the CSI, or some other function of the estimated CSI. One or more channel characteristics can be time variant and influenced by various factors, such as a physical environment (e.g., urban, dense urban, rural), device mobility (e.g., stationary UE 104, low-speed UE 104, or high-speed UE), device location (e.g., indoor, outdoor), weather, and/or seasonal conditions (fog, rain, foliage etc.), and/or antenna array configurations used by the NE 102 and the UE 104, among other factors. One or more marginal distribution of estimated CSI and the marginal distribution of a function of the estimated CSI can change over time due to the factors that impact channel characteristics or realizations. A training dataset that a device uses to train a two-sided model for CSI compression may not include Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 18 one or more possible channel variations. Thus, a device that executes a two-sided learning model trained for CSI compression may obtain input data that has a different distribution than the distribution of the training data. ^[0053] In some examples, one or more devices can adapt the two-sided CSI compression learning model when a distribution of input data changes from a distribution of training data. Unlike a single-sided learning model, a two-sided learning model is deployed or executed at devices that are geographically separated from each other after training. For example, the encoder component is deployed at a UE 104 and the decoder component is deployed at an NE 102, and vice-versa. However, a device that is adapting a learning model to a distribution of input data at the encoder component may not have access to output of the decoder component without using additional network resources (e.g., used to send an output from the decoder component to the encoder component). [0054] According to implementations, one or more of the NEs 102 and the UEs 104 are operable to implement various aspects of the techniques described with reference to the present disclosure. For example, an NE 102 (e.g., a base station) and/or a UE 104 can execute a two-sided learning model that can be adapted to changing input data distributions without using additional network resources. Model adaptation of a learning model refers to when a device changes or adjusts one or more parameters or weights of a learning model that has been trained. Additionally, or alternatively, a process for a device to adapt a learning model from one domain to another domain is referred to as domain adaptation and/or domain generalization. A device (e.g., the NE 102 and/or the UE 104) can adapt and/or adjust one or more parameters of a learning model to improve performance (e.g., accuracy and/or precision) of the learning model for a changing distribution of input data. The device can adapt and/or adjust the parameters without accessing learning model output (e.g., from a decoder component) and does not include computing a loss function, computing gradients, and/or performing back propagation, among other computationally complex tasks. Although the learning model is described in the context of CSI compression, the devices can implement learning models that are adaptable for changing distributions of input data for other tasks in the wireless communications system 100. [0055] Figure 2 illustrates an example of a learning model diagram 200 in accordance with aspects of the present disclosure. In some examples, the learning model diagram 200 implements Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 19 aspects of the wireless communications system 100. For example, the learning model diagram 200 can implement aspects of, or can be implemented by, a UE and/or an NE, which may be examples of a UE 104 and an NE 102 as described with reference to Figure 1. The learning model diagram 200 illustrates an example of one or more learning models 202. In some cases, a learning model 202 can include a single learning model 202. In some other cases, a learning model 202 can include multiple learning models 202. ^[0056] In some examples, the learning models 202 can be examples of ML models and/or AI models. A learning model can be an algorithm that includes learnable parameters, such as a support vector machine or a decision tree. Additionally, or alternatively, the learning model can be a part of a neural network with the neuron weights as learnable parameters. For example, the one or more learning models 202 can be an example of a neural network with multiple layers. The learning model can include a two-sided deep neural network (DNN), such as an autoencoder, which is described in further detail with respect to Figure 3. A neural network is a computational model that includes multiple layers of artificial neurons, which may be referred to as nodes or units, organized into an input layer 204, one or more hidden layers 206, and an output layer 208. A DNN is a type of learning model 202 that has multiple hidden layers 206 between the input layer 204 and the output layer 208. An input layer 204 passes input data to subsequent layers and may not include learnable parameters. An output layer 208 represents a prediction generated by the neural network and may not include learnable parameters. A node in a hidden layer 206 receives an input signal, performs a mathematical operation on the input data, and produces an output signal, which is then passed on to other nodes in subsequent layers. The connections between nodes are represented by weighted edges, which determine the strength of the connection between nodes. [0057] In some examples, during a training process, one or more parameters of the hidden layers 206 are updated. Example parameters of a hidden layer 206 include, but are not limited to, weights, biases, and activation function parameters, among other parameters. Weight parameters represent a strength of connections between neurons in different layers of the learning model. Biases are additional parameters added to neurons in the learning model that provide for the learning model to capture offset or bias in input data. Activation function parameters can include slope parameters or parameters defining a shape of an activation function in a parametric activation function. Determining one or more values of model parameters for a defined use case is referred to Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 20 as training the model or learning the model. The general procedure of developing a learning model includes updating one or more parameters to minimize a cost function based on a training dataset that includes either labeled samples or unlabeled samples, resulting in supervised learning or training or unsupervised learning or training, respectively. For example, the weights are adjusted using input-output pairs from a training dataset, with the goal of minimizing a defined loss or error function. Neural networks are capable of learning complex patterns and representations from data, enabling them to perform a wide range of tasks, including classification, regression, clustering, pattern recognition, and sequence generation. Example neural networks include, but are not limited to, an autoencoder (e.g., which is described in further detail with respect to Figure 3), a convolutional neural network (CNN), a recurrent neural network (RNN), a generative adversarial network (GAN), a long short-term memory (LSTM) network or any other type of neural network. [0058] Additionally, or alternatively, a learning model 202 can include one or more normalization layers 210. Normalization or standardization refers to scaling and shifting a set of data samples to have zero mean and unit variance. Additionally, or alternatively, normalization can refer to scaling a set of data samples to maintain a value of the set of data samples within a defined range (e.g., a range of 0 to 1 or −1 to 1) without changing a mean and variance of the data samples. Standardization can refer to modifying a set of data samples to have zero mean and unit variance. However, the term normalization and standardization may be used synonymously, where normalization or standardization is defined herein as transforming a set of data samples to have zero mean and unit variance. In some cases, a device performs input data normalization or standardization independent of (e.g., outside of) the neural network. Additionally, or alternatively, normalization occurs within a neural network (e.g., internally in a deep neural network). [0059] In some cases, such as for a four-layer learning model 202 with one input layer 204, one output layer 208, and two hidden layers 206, when normalization layers 210 are introduced, there can be one normalization layer 210 between the input layer 204 and the first hidden layer 206, another normalization layer 210 between the first and the second hidden layers 206, another normalization layer 210 between the second hidden layer 206 and the output layer 208. Thus, excluding the possibility of employing a normalization layer 210 prior to the input layer 204 of the learning model 202, there can be three normalization layers 210 in a four-layer learning model 202. Thus, for a Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 21 learning model 202 having a total number of ^_^ layers, with one normalization layer 210 in between _{every pair of hidden layers 206, there will be ^ ^^^^} _{^^^^ = ^ − 1 number of normalization layers 210.} [0060] For normalization that occurs within a network (e.g., a learning model 202), a device can provide input data samples to a first layer of the neural network at different time instants, which may be referred to as an input layer 204. If the input data samples have a relatively high dynamic range, in which less than a threshold numerical quantity of input samples have values that are greater than a threshold value when compared with other data samples in a dataset, then the learned weights of the layers of the learning model 202 and the instantaneous gradient values fluctuate. Fluctuation in the instantaneous gradient values leads to a relatively long duration (e.g., greater than a threshold duration) for the neural network to converge to one or more optimal weights (e.g., weights that maximize a performance, including precision and accuracy, of the learning model). In some cases, the learning model 202 converges during training faster if the device normalizes the input weights to have zero mean and unit variance. A hidden layer 206 (e.g., the hidden layers 206 after the input layer 204) receives an input from a previous layer. Inputs for the second and greater layer of the learning model 202 (e.g., including the second layer to the output layer 208), are outputs from other layers. Thus, normalizing the output sample values of the input layer 204 and subsequent hidden layers 206 to have zero mean and unit variance can decrease a duration to convergence. For example, the inputs to respective layers (e.g., hidden layers 206 and/or an output layer 208) after the input layer 204 are normalized to zero mean and unit variance. [0061] In some variations, normalization within a learning model 202 is also referred to as feature normalization or normalization of activations within a learning model 202. The device can perform batch normalization, layer normalization, group normalization, instance normalization, and/or root mean square normalization, among others. Batch normalization reduces a training time of a learning model 202 and stabilizes model weights in a relatively short time (e.g., less than a threshold duration) by addressing an internal covariate shift. Batch normalization scales input data to have a consistent range or distribution, such that features in a portion of a batch of training samples (e.g., a subset of a batch of data samples) are scaled by the mean and variance computed within the portion of the batch of training data samples. That is, batch normalization normalizes respective features within a portion of a batch of training data samples. Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 22 [0062] Batch normalization is performed between two or more layers of a learning model 202 by inserting another layer, referred to as a batch normalization layer. A batch normalization layer receives inputs from a preceding network layer and provides (e.g., outputs) a batch normalization transformed (e.g., a normalized) version of the inputs to a succeeding network layer. The learning model 202 can perform well with mean β and variance γ^{^} (e.g., rather than with zero mean and unit variance), which the learning model 202 can learn and optimize over the course of training. For example, a batch normalization layer can introduce two parameters β and γ, which are learned by the learning model 202 during the training. In some examples, !_^ is a feature computed by layer and ^ is an index. In the case of two-dimensional inputs, such as images with non-zero width and non-zero _{height, the index ^ is four dimensional with ^ = "^#, ^$ , ^%, ^&' , where ^#, ^$ , ^% and ^& represent} batch index, channel index, height index, and width index, respectively. In the case of one- _{dimensional input (e.g., an input is an array of length (), ^ is a three-dimensional vector with ^ =} _{"^#, ^$ , ^%'. Batch normalization is defined according to Equation 1, Equation 2, and Equation 3:} _{!)^ = *+,-.} _{/ (1)} _. ₍₂₎ ₍₃₎ _{[0063] In some examples, the set 12 is defined as 12 = {^ | ^$ = ^$}, where ^$ (and ^$) denotes} _{the sub-index of ^ and ^ along the} _{-axis (e.g.,} _{axis). In some variations, |12| denotes} cardinality of a set 1₂ and ^ is a small (e.g., less than a threshold value) positive constant. For example, in the the input is a two-dimensional image, pixels having a same channel index _{are normalized together, meaning that batch normalization computes μ and σ along the "A, (, B'} _{axes. Finally, !)^ is transformed into BNE,F"!^' = γG!)^ + H^ , where γG and H^ are learnable and/or} trainable parameters, indexed by ^_$. In some variations, the multiplication of !)_^ with γ_G is element wise multiplication and the addition of β_^ is also element wise addition. Thus, γ_G and β_^ have the same length as that of !_^ . The parameter γ_G can be referred to as a learnable scale parameter and the Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 23 parameter β_^ can be referred to as a learnable shift parameter for a feature ^. The learnable scale parameter and the learnable shift parameter can be referred to as affine parameters of a normalization layer 210. In some cases, an ^^{^^} batch normalization layer has one or more learnable and/or trainable parameters, including γ_G, H_^, where a length of γ_G, and H_^ is equal to the length of the feature !_^. [0064] During the training, the features are normalized across respective subsets of the batch (e.g., mini batches), using the mean and variance computed from the corresponding subsets of the batch of training samples. During training, a device can continuously compute an expectation, mean, and/or average value of ^_^ across the subsets of the batch, denoted by IJ{^_^}^L ^_K ^M ^ N, and the expectation, mean, _{and/or average value of 5 across the subsets of the ^ LM} _{by IJ{5^ }^K^ N , for a} _{normalization layer. The device stores the values of IJ{^ }LM N and IJ{5^}LM N in the ^^^} _{^ ^K^ ^ ^K^ batch} normalization layer to use for inference and/or prediction, such as when the learning model 202 is implemented at another device and/or at the device. In some variations, O_# denotes a total numerical quantity of subsets of a batch used during the training. [0065] To reduce the computational complexity involved in computing the true values of IJ{^_^}^L ^_K ^M ^ N and IJ{5_^ ^{^}}^L ^_K ^M ^ N, a device can compute an estimate of IJ{^_^}^L ^_K ^M ^ N and IJ{5_^ ^{^}}^L ^_K ^M ^ N through computationally efficient methods. For example, the device can compute ^LM IJ{^_^}_^K^ N and IJ{5_^ ^{^}}^L ^_K ^M ^ N using an exponential moving average (EMA) method, in which a device assigns exponentially decreasing weights to past observations. An EMA of the mean denoted by ^_PLQ, and the variance R of the mini batches using a parameter τ is computed according to 4 Equation 5: ^_{PLQ = τ ^PLQ + "1 − τ'^^, ^ = 1, … , O# (4)} ₍₅₎ _{where O# is the total number of subsets of a batch (e.g., mini batches) during the training and 0 ≤} _{τ ≤ 1. Note that ^PLQ and RPLQ are an estimate of the true mean and true variance, respectively, of} the training data samples. In some examples, in addition to, or as an alternative to EMA values (e.g., ^ and R_PLQ), the device can use μ) and R), where μ) is an estimate of IJ{^_^}^L ^_K ^M ^ N (e.g., Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 24 _{through another method different from the EMA) and R) is an estimate of IJ{5^ LM} _{^ }^K^ N, computed} _{through a procedure different from the EMA. Example methods for estimating IJ{^ LM} _{^}^K^ N and} IJ{5_^ ^{^}}^L ^_K ^M ^ N include, but are not limited to, simple moving average (SMA) and average (WMA), among other methods. In some examples, SMA includes computing an average of a fixed number of recent data points, and WMA includes assigning different weights to respective data points within a window, providing for more recent observations to have a greater impact on an average. _{[0066] At the end of the training, ^^^ batch normalization layer stores μ) ^,U and R)^,U, V = 1, … , W^,} where W_^ is the number of features at the input of the ^^{^^} batch normalization layer, μ) is an estimate _{of IJ{^ }LM N and R) is an estimate of IJ{5^}LM N. In some variations, ^̂^,U = ^,U ^,U ^,U} _{^ ^K^ ^ ^K^ ^PLQ and R) = RPLQ} _{when EMA is employed for computing the estimates of IJ{^ LM ^ LM} _{^}^K^ N and IJ{5^ }^K^ N during the} _{tr Thus, when EMA is used, ^^^ batch normalization} _{, V = 1, … ,} where W_^ is the number of features at the input of the ^^{^^} batch normalization layer. In some examples, the parameters ^̂^{^,U} and R)^{^,U} can be learned without computing gradients and back propagation. In the _{case of computing ^̂^,U and R)^,U through EMA, ^^,U ^,U ^^} _{PLQ and RPLQ , V = 1, … , W^ , for ^ batch} normalization layer can be learned through simple exponential averaging, without computing gradients and back propagation. The parameters ^̂^{^,U} and R)^{^,U} are commonly known as batch normalization layer statistics of the ^^{^^} batch normalization layer. When EMA is employed during training to compute the estimates of mean and variance, ^^{^,U ^,U} P_LQ and R_PLQ are the batch normalization statistics of ^^{^^} batch normalization layer. [0067] In some examples, a learning model 202 (e.g., a DNN) can include ^_^^^^ number of _{normalization layers 210, denoted by ℓ^, … , ℓ^Z[\] . In some variations, ^Q denotes a set of the} learnable and/or trainable normalization layers 210 in the neural network (e.g., the _{learning model 202), _̂ , where ^Q is a subset of ^ (e.g., ^Q ⊂ ^), as ^ deotes the set of all learnable} _{and/or trainable parameters of the neural network _̂ . When the normalization layers 210 are batch} _{normalization layers, the ^^^ batch normalization layer is ℓ , where ^ ∈ {1, … , ^^} _{^ ^^^^^} . The ^} normalization layer will introduce W_^ number of learnable and/or trainable scale parameters _{γ^,^, … , γ^,ab and W^ number of learnable and/or trainable shift parameters H^,^, … , H^,ab . In some} Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 25 variations, W_^ is a total number of features at the ^^{^^} batch normalization layer input and respective γ_{^,^ , H^,^ , ^ = 1, … , W^ , further includes c^ number of scalar parameters (e.g., γ^,^ =} _{… } and H^,^ = {H^,^,^, H^,^,^ … , H^,^,^b } for ^ = 1, … , W^), where c^ is the length of} ℓ_^. [_{0068] At the end of the training, ^^^ batch normalization layer ℓ^ stores μ) ^,U and R)^,U , V =} _{1, … , W , where W is the number of features at the input of ℓ , μ) is LM} _{^ ^ ^ an estimate of IJ{μ^}^K^ N and R) is} _{an estimate of IJ{σ^ LM ^,U ^,U ^,U ^,U} _{^ }^K^ N. In some variations, μ) = ^PLQ and R) = RPLQ when EMA is employed} for computing the estimates of IJ{μ_^}^L ^_K ^M ^ N and IJ{σ^{^} ^}^L ^_K ^M ^ N during the training. Thus, learnable and/or t_{rainable parameters introduced by ^^^ batch normalization layer also include μ) ^,U and R)^,U , V =} _{1, … , W (e.g., which are equivalent to ^^,U and ^,U} _{^ PLQ 5PLQ , respectively in case when EMA is employed} _{to compute IJ{μ LM ^ LM} _{^}^K^ N and IJ{σ^ }^K^ N). Thus, for batch normalization layers ℓ^, … , ℓ^Z[\], the set of} learnable parameters are given by Equation 6: ^_{= {R , R … , R ,^b , H^,^,^, H^,^,^ … , H^,^,^b , ^^,U ^,U} _{Q ^,^,^ ^,^,^ ^,^ PLQ , 5PLQ }, ^ = 1, … , ^^^^^, ^ =} ₍₆₎ where W_^ is the total number of features at the input of ℓ_^ and c_^ is the length of the feature vector at the input of ℓ_^, or, equivalently, a number of neurons in ℓ_^. [0069] In some variations, a device in a wireless communications system (e.g., a UE and/or an NE) can implement the one or more learning models 202 for CSI compression, as described with reference to Figure 1, and as described in further detail with respect to Figure 3. The UE and/or the NE can train the one or more learning models 202 on a training dataset to perform the CSI compression. For example, a training dataset can include labeled and/or unlabeled data. When a dataset includes both an input data sample and a corresponding output data sample for data samples in the dataset, the dataset is referred to as a labeled dataset. When the dataset includes the input data sample without the corresponding output data sample, the dataset is referred to as an unlabeled dataset. Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 26 [0070] In some examples, a device can implement supervised learning techniques to train the learning models 202. For example, the NE 102 can train the learning models 202 to map input data to output labels based on example input-output pairs provided during training. In supervised learning, the learning model 202 computes a mapping function from input features to output labels by observing a dataset that includes labeled examples, such that the learning models 202 can generalize the mapping to make accurate predictions on new, unseen data. Although supervised learning techniques are described, the device can additionally, or alternatively, implement any other type of training techniques to train the learning models 202, including, but not limited to, unsupervised learning techniques and semi-supervised learning techniques, among others. In unsupervised learning, the learning models 202 detects patterns, structures, or relationships within training data. Unlike supervised learning, there are no explicit output labels provided during training. Semi-supervised learning leverages both labeled and unlabeled data during a training procedure. The learning models 202 use the labeled examples, while also using the structure of the unlabeled data to improve a performance of the learning models 202. [0071] In some examples, a source domain refers to a set of data samples over which a learning model 202 is trained. For example, when the learning model 202 is trained on data samples d^e = _{{"xe, yh'}^ ∼ je , de is referred to as the source domain. In some varia e} _{^ ^ ^K^ kl tions, m^ is a scalar, a one-} dimensional vector, or a multi-dimensional vector and y_^ ^h is a scalar, a one-dimensional vector, or a _{multi-dimensional vector. With me ∈ n and h} _{^ o^ ∈ p , the joint probability density function (PDF)} _je _{kl : n × p → ℝ^ denotes the source distribution. In some cases, a learning model is trained on more} than one source domains and/or over a dataset that includes data samples coming from more than one t _{source domains, which can lead to multiple source domains, de+ = {7meb eb ^ b eb} _{^ , o^ 8}^K^ ∼ jkl , where ^ is} a positive integer. A target domain refers to a set of data learning model 202 makes inferences and/or predictions. Thus, when a learning model 202 makes inferences on m^{^} ^ to _{predict a corresponding label o^ ^} _{^ , where d = {"m^ ^} _{^, o^ '}^ ^ ^} _{^K^ ∼ jkl , d is referred to as the target} _{domain and the PDF j^ : n × p → ℝ} _{In some} there can be domains for different network conditions, device parameters, channel characteristic, or any other factor that impacts the data distribution of one or more data samples. In some examples, a device can obtain input data from a target domain that is not included in a training dataset. Thus, the learning model can make inferences and/or predictions (e.g., by compressing CSI Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 27 originating from a different probability distribution than the distribution of the training dataset) over a domain that is not used during training. [0072] Conventionally, to train a learning model 202 to make inferences and/or predictions across different data distributions, a device can implement a domain adaptation technique. For example, a device can perform domain adaptation during training of a learning model 202 by using domain- invariant feature learning, data mix-up over multiple source domains during training, and/or data- augmentation techniques, among other domain adaptation techniques. In some variations, domain adaptation techniques that are implemented during training can include training a learning model 202 over multiple source domains (e.g., multiple datasets that are rich in diversity), which provides for a learning model 202 that is generalized for target domains that are not included in training datasets. However, using domain adaptation techniques that are implemented during training may not result in a generalized learning model and can use relatively large (e.g., greater than a threshold value) amounts of data from multiple source domains and can increase use of computational resources for training. ^[0073] Additionally, or alternatively, a device can perform domain adaptation during inference and/or testing of a learning model 202 by adapting and/or generalizing a learning model 202 after the learning model 202 is trained and deployed. The device can obtain labeled or unlabeled data from the target domain for the adaptation and/or generalization of the learning model 202. For example, the device can obtain data in batches, such as by obtaining multiple data samples at a same time. In some other examples, the device can obtain data in samples one at a time, or sequentially. However, conventional techniques for domain adaptation include computing a value of an end-to-end loss function and back-propagation through a network. To compute the end-to-end loss function for a two- sided learning model, a device hosting the encoder component can use the decoder component output or a device hosting the decoder component can use the encoder component input. However, the devices transmitting and/or receiving signaling that includes the decoder component output and/or the encoder component input can lead to increased signaling overhead. In some cases, domain adaptation during inference and/or testing of a learning model 202 includes back propagation, which leads to increased signaling overhead between the device hosting the encoder component and the device hosting the decoder component. During back propagation, an algorithm works backward through the learning model 202 to calculate gradients of a loss function with respect to respective weights in the learning model 202 to determine how much a weight contributes to an overall error. Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 28 [0074] In some variations, domain adaptation and/or domain generalization techniques adapt and/or adjust parameters of the learning model 202 by minimizing a loss function, where computing the loss function value depends on output of the learning model 202. For a two-sided learning model, a device provides input data samples (e.g., estimated CSI) to an encoder component and the output of the learning model 202 is produced by the decoder component. The encoder component and the decoder component can be located at two geographically separate devices. Thus, adaptation of a two- sided learning model can include signaling between the devices to exchange information. That is, the encoder component may not have access to output of a decoder component while adapting the two- sided learning model to a data distribution of input data samples at the encoder component. To make the decoder component output available at the encoder component, the device may use additional network resources to transmit decoder component output to the encoder component. [0075] In some examples, to adapt a learning model 202 (e.g., a one-sided learning model and/or a two-sided learning model) without increasing signaling overhead, and while maintaining a relatively low use (e.g., less than a threshold value) of processing, power, and memory resources, a device can develop a learning model 202 with normalization layers 210. The device can adapt parameters of the normalization layers 210 based on distributions of input data (e.g., rather than adapting all of the learnable parameters of the learning model 202, including the parameters of the hidden layers 206 and the normalization layers 210). Thus, the device can perform domain adaptation for a learning model 202, such as a learning model 202 that is implemented at devices at different geographic locations, without computing a value of an end-to-end loss function, without additional signaling overhead, and without back propagation through the learning model 202, as described in further detail with respect to Figure 3. [0076] Figure 3 illustrates an example of a wireless communications system 300 in accordance with aspects of the present disclosure. In some examples, the wireless communications system 300 implements aspects of the wireless communications system 100 and the learning model diagram 200. For example, the wireless communications system 300 includes a UE 104 and an NE 102, which may be examples of a UE 104 and an NE 102 as described with reference to Figure 1. In some examples, an NE 102 may be in wireless communications with one or more other devices in the wireless communications system 300. For example, the NE 102 may transmit and/or receive signaling to and from the UE 104. The UE 104 may transmit signaling to the NE 102 via a feedback Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 29 channel 302, which may be an example of a communication link. The signaling between the UE 104 and the NE 102 may include control signaling and/or data transmissions. For example, the signaling may include data samples, such as CSI data samples measured by the UE 104. ^[0077] In some examples, CSI data samples include information about a channel between two nodes or devices in the wireless communications system 300, such as a UE 104 and an NE 102 (e.g., a base station, gNB, or other network entity), over time, frequency, and space. The devices use the information for coherent reception and/or transmission of data between the devices by effective combining and precoding. A device receiving a data transmission can be referred to as a receiving device, while a device transmitting a data transmission can be referred to as a transmitting device. The receiving device can use combining techniques to maximize a received signal quality by combining one or more signals received from multiple antennas. A transmitting device can use precoding techniques to maximize the received signal quality at the receiver or reduce interference at other devices by assigning precoding weights according to channel gains of respective antennas and/or canceling interference in a data transmission using the CSI data. [0078] A UE 104 can acquire downlink CSI using uplink reference signals (e.g., sent from the UE 104 to the NE 102) for uplink-downlink channel reciprocity. Uplink-downlink channel reciprocity occurs when channel characteristics (e.g., channel gains, phase shifts, and fading conditions) observed for an uplink transmission from the UE 104 to the NE 102 are approximately equivalent to those observed for a downlink transmission from the NE 102 to the UE 104. In the absence of this reciprocity, the UE 104 can perform one or more measurements on downlink reference signals from the NE 102 and can transmit CSI feedback to the NE 102 that indicates the measurements, which may be referred to as CSI data. However, transmitting CSI data to the NE 102 leads to a relatively high signaling overhead on one or more uplink communication resources (e.g., time-frequency resource) allocated for feedback. [0079] To reduce the signaling overhead, the UE 104 can compress the CSI data with a quantization scheme that minimizes the feedback overhead, while maintaining a quality of the CSI data. Example quantization schemes can include, but are not limited to, a scalar quantization scheme, a vector quantization scheme, an adaptive quantization scheme, a non-uniform quantization scheme, a differential quantization scheme, and an entropy coding scheme, among others. In scalar quantization, an element of a CSI matrix (e.g., channel gains, phase shifts) is independently Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 30 quantized to a finite set of discrete levels. Vector quantization groups multiple elements of the CSI matrix into vectors and quantizes them jointly. Adaptive quantization adjusts quantization levels dynamically based on instantaneous channel conditions, which provides for adaptation to variations in the channel environment (e.g., changes in signal-to-noise ratio (SNR) or channel fading). Adaptive quantization can minimize a trade-off between quantization error and data rate. Non- uniform quantization provides for quantization levels to be spaced unevenly based on statistical properties of the CSI data, which can improve a representation accuracy of the CSI data by allocating additional quantization levels to regions of the CSI matrix with higher variability or importance. Differential quantization encodes the differences between consecutive CSI measurements rather than the absolute values, which can reduce a dynamic range of the quantized data and improve the efficiency of representation of the CSI data, such as for relatively slow variations in the channel. Entropy coding techniques, such as Huffman coding or arithmetic coding, can be applied to further compress the quantized CSI data by using statistical redundancies. Entropy coding techniques assign shorter codewords to more probable CSI values, leading to more compact representation. [0080] In some examples, one or more devices in the wireless communications system 300 (e.g., the UE 104 and the NE 102) implement an autoencoder. An autoencoder may be an example of a neural network architecture, or ML architecture, in which data is compressed from an original form to a lower-dimensional representation of the data at an encoder component 304, and the data is reconstructed to the original form at a decoder component 306. The encoder component 304 and the decoder component 306 may be components of the autoencoder and may implement respective learning models (e.g., learning models 202, as described with reference to Figure 2) to perform the compression and reconstruction of the data. In addition to, or as an alternative to signal compression, the NE 102 and the UE 104 may use the autoencoder for feature extraction and/or noise reduction. A device can train an autoencoder on a large dataset of wireless signals, and the autoencoder can learn a compact (e.g., lower-dimensional) representation of the signals to use for subsequent tasks, such as modulation classification, channel equalization, and/or interference detection. Autoencoders can additionally, or alternatively, be used for reducing or eliminating noise in a wireless signal by learning to reconstruct a signal without noise from one or more Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 31 measurements of a signal with noise, which leads to improved robustness and reliability of the wireless communications system 300 in a noisy environment. [0081] In some examples, such as for signal compression, the encoder component 304 receives input data and maps the input data to a lower-dimensional representation. The input data can include a set of data samples 308, including, but not limited to, CSI data obtained from measurements performed by the UE 104. For example, the UE 104 can measure one or more reference signals (e.g., from the NE 102) to obtain the CSI data. In some cases, CSI data includes information that describes the characteristics and conditions of a communication channel between a transmitting device (e.g., the UE 104 and/or the NE 102) and a receiving device (e.g., the NE 102 and/or the UE 104) in a wireless communications system 300. For example, the CSI data and corresponding data samples can include, but are not limited to, channel gain data, phase shift data, frequency response data, delay spread data, and Doppler shift data, among other information. Mapping the input data to the lower-dimensional representation can include transforming the input data into a compressed representation of the input data using multiple layers of a learning model. The output of the encoder component 304 is the compressed representation of the input data. [0082] The UE 104 can transmit the compressed representation of the input data to the NE 102 via the feedback channel 302. For example, the UE 104 can include the compressed representation of the input data in control signaling to the NE 102, where one or more fields in the control signaling include the compressed representation of the input data. The NE 102 receives the compressed representation of the input data and reconstructs the original input data at the decoder component 306. The decoder component 306 uses multiple layers of a neural network to map the compressed representation of the input data back to the original input data. That is, the output of the decoder component 306 is the reconstructed version of the input data. For example, the NE 102 can reconstruct CSI data to determine one or more characteristics and/or conditions of a communication channel between the UE 104 (e.g., a receiver device) and the NE 102 (e.g., a transmitter device). The NE 102 can use the CSI data for adaptive modulation and coding, beamforming, precoding, equalization, and interference cancellation, among other processes and procedures. Thus, the NE 102 can adapt one or more transmissions between the NE 102 and the UE 104 to changing channel conditions, thereby improving spectral efficiency, reliability, and overall performance. Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 32 [0083] In some variations, although the encoder component 304 is illustrated as being implemented by the UE 104 and the decoder component 306 is illustrated as being implemented by the NE 102, the UE 104 may additionally, or alternatively, implement the decoder component 306 and the NE 102 may additionally, or alternatively, implement the encoder component 304. ^[0084] In some examples, the autoencoder includes an encoder component 304, u_v, which is an example of a DNN with learnable and/or trainable parameters denoted by v. The autoencoder can include a decoder component 306, w_x, which is an example of a DNN with x as its set of trainable and/or learnable parameters. The encoder component 304 learns a representation of the input signal and/or data (e.g., encodes the input signal and/or data), such that one or more features of the input signal and/or data are captured as low-dimensional feature vectors 310. The encoder component 304 can obtain a set of data samples 308 and can output a feature vector 310 that includes characteristics, attributes, and/or properties (e.g., features) of the set of data samples 308. The UE 104 can transmit the feature vector 310 to the NE 102 via the feedback channel 302. For example, the UE 104 can transmit control signaling and/or a data transmission that includes one or more parameters with values including the feature vector 310. The decoder component 306 validates the feature vector 310 (e.g., the encoding of the set of data samples 308) and provides for the encoder component 304 to refine the encoding by regenerating the input signal and/or data from the feature vectors 310 generated by the encoder component 304. For example, the decoder component 306 can generate an estimated set of data samples 312 that is an estimate of the set of data samples 308. Thus, the encoder component 304 and the decoder component 306 are trained and developed together, such that the signal and/or data at the input to the encoder component 304 is reconstructed at the output of the decoder component 306. Thus, the two learning models 202 (e.g., the u_v and _{wx) together constitute an autoencoder, denoted as y = {uv, wx}.} In some examples, such as for downlink communication from the NE 102 to the UE 104, a device can train an autoencoder to encode and decode channel matrices. A training dataset can include a relatively large (e.g., greater than a threshold) number of channel matrices, which can be collected by a device or generated through simulations. The device trains the encoder component 304 to generate lower-dimensional latent representation of the input channel matrix, and the decoder component 306 reconstructs the channel matrix from the latent representation generated by the encoder component 304. After training, the encoder component 304 of the autoencoder, u_v, is Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 33 deployed at the UE 104 and the decoder component 306 of the autoencoder, w_x, is deployed at the NE 102. The UE 104 estimates the channel matrix using the reference and/or pilot signals received from the NE 102, encodes the channel matrix using the encoder component 304, u_v, and transmits the encoded output (e.g., feature vectors and/or latent representation of a channel matrix) computed by the encoder component 304 over the feedback channel 302 to an NE 102. The NE 102, using the decoder component 306, w_x, decodes and/or reconstructs the channel matrix from the feature vectors received from the UE 104. As the autoencoder reduces the dimensionality of data (e.g., CSI), the data is compressed prior to transmission over the feedback channel 302. The compressed CSI data at the output of the encoder component 304 includes features and/or feature vectors 310 computed by the encoder component 304. [0086] In some examples, a transmitting device can determine left-singular vectors of ^ and/or the eigenvectors of ^^∗^. In addition to, or as an alternative to, implementing the autoencoder for CSI compression, a device can use the autoencoder to transmit singular vectors or eigenvectors of a channel matrix. A device can train an autoencoder to represent and/or compress a matrix including the singular vectors and/or the eigenvectors. [0087] In some examples, a device (e.g., an NE 102 and/or a UE 104) can develop and/or train a _{two-sided DNN model or an autoencoder, denoted by y = {uv, wx}, that includes a normalization} layer (e.g., a batch normalization layer) in between pairs of hidden layers. A batch normalization layer adapts and/or changes a statistical distribution of data samples the batch normalization layer receives from a preceding layer before providing the data samples as input to a succeeding layer. In some cases, a four-layer learning model with one input layer, one output layer, and two hidden layers can include three normalization layers, as described with reference to Figure 2. Thus, in a learning model having a total number of ^_^ layers, with one normalization layer in between every _{pair of neural layers, there will be ^^^^^ = ^^^^} _{^ − 1 number of normalization layers.} I_{n some cases, a device can use an available training dataset given by d = me ^t} _{[0088] e { ^}^K^ ∼} _je _{k, or the source domain dataset available for training a two-sided leaning model, y = {uv, wx}, for} CSI compression. For example, the device can develop and/or train the two-sided learning model _{y = {uv, wx} using the training data set de, and a self-supervised learning and/or training} algorithm. The device can determine an architecture and/or structure of the learning model (a Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 34 number of layers, how the layers are connected, etc.), where the learning model can be an example of a CNN, an RNN, or a LSTM, among other types of learning models. The device can determine values of the learnable and/or trainable parameters that maximize a performance of the learning model (e.g., an accuracy and a precision, among other performance metrics) by minimizing a loss function ℒ over the training data set. [0089] In some examples, a learning model (e.g., a two-sided DNN) can include normalization layers. For example, an encoder component 304 and a decoder component 306 can include batch normalization layers, such that a batch normalization layer is between pairs of hidden layers of the encoder component 304 and the decoder component 306. Additionally, or alternatively, the encoder component 304 can include a batch normalization layer prior to the input layer to normalize input data prior to providing the input data to the encoder component 304. Additionally, or alternatively, the decoder component 306 can include a batch normalization layer between the output layer of the encoder component 304 and the input layer of the decoder component 306. After training the two- sided model, the encoder component 304 is deployed at a device (e.g., the UE 104) and the decoder component 306 is deployed at another device (e.g., the NE 102). The batch normalization layer that exists in between the output layer of the encoder component 304 and the input layer of the decoder component 306 during the training phase, can be at the end of the encoder component 304 and/or can be prior to the input layer of the decoder component 306 (e.g., when the two-sided learning model is split to deploy the encoder component 304 at one device and the decoder component 306 at another device). In some examples, the learning model can include a normalization layer at an output of the encoder component 304 and at the input of the decoder component 306. Although the learning model can be described as including normalization layers between hidden layers (e.g., and not prior to the input layer), the learning model can include any numerical quantity of normalization layers at any location within the learning model. _{[0090] In some cases, a set v = {|^, |^, … , |^}} includes the learnable parameters of the} encoder component 304, including the learnable parameters of the batch normalization layers in the encoder component 304. If a batch normalization layer is in between the encoder component 304 and the decoder component 306 and the batch normalization layer is at an output of an encoder component 304 during training of the learning model, then the parameters of the batch _{normalization layer belong to the set v. Additionally, or alternatively, the set x = {~^, ~^, … , ~^^}} Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 35 includes learnable parameters of the decoder component 306, including the learnable parameters of the batch normalization layers in the decoder component 306. If the batch normalization layer is in between the encoder component 304 and the decoder component 306 and is at an input of the decoder component 306 during training of the learning model, then the parameters of the batch _{normalization layer belong to the set x. In some examples, ^ = v ⋃ x =} _{{|^, |^, … , |^} , ~^, ~^, … , ~^^} denotes the set of the learnable and/or trainable parameters or} model. For example, the set includes the learnable and/or layers and the batch normalization layers. ^[0091] Once the device selects and/or determines a network architecture for a learning model that includes normalization layers, then the device trains the learning model using self-supervised learning by minimizing a loss function over the training data set d^e. For example, the set of _{parameters ^ = v ⋃ x = {|^, |^, … , |^} , ~^, ~^, … , ~^^} are determined by solving Equation 7} through gradient descent (SGD), adaptive moment estimation (ADAM), etc.): ^_{= ^̂} ^e ^_^ _{̂ ^} _{∈ℝ^^^ ℒ 7 , d} 8, (7) _{where ^^ = ^^ + ^^. In some} _{autoencoder, y = {u e} _{v, wx}, on more than one dataset, d b = {meb ^tb eb} _{^ }^K^ ∼ jkl , where ^ ≥ 1 is a} positive ^e values of j ^b are another. Additionally, alternatively, the device can train the two-sided learning model over a dataset that includes data samples from more than one data distribution. [0092] In some examples, a device can adapt a trained learning model (e.g., a two-sided DNN), _{y = {uv, wx} by updating normalization statistics for different normalization layers. In some cases,} one or more devices (e.g., the UE 104 and/or the NE 102) can execute, implement, or deploy the _{trained and/or developed two-sided learning model or the autoencoder, y = {uv, wx}. For example,} the encoder component is deployed at one device (e.g., UE 104) and the decoder component 306 is deployed at another device (e.g., NE 102), where both the devices are geographically separated. After deployment, while operating in a real wireless network (e.g., the wireless communications system 300) to compress CSI, the devices can provide input data to the learning model that has a Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 36 different distribution than the distribution of the training data. When the distribution of the input data changes from j_k ^e (e.g., or any of the j^eb k_l , in the case of training over multiple data distributions) to another distribution, j_k ^{^} , the device can adapt the two-sided learning model or the autoencoder, y_{= {uv, wx}, to the new input data distribution to satisfy a threshold performance for the learning} model (e.g., a threshold value for a precision and/or accuracy for the learning model). In some examples, to adapt the learning model, the devices can update (change, adjust, replace, modify, etc.) one or more parameters, including one or more weights, of the learning model. [0093] It may be difficult for a device to adapt and/or retrain a learning model using techniques that involve computing end-to-end loss or techniques that adapt the parameters of the learning model based on the final output of the learning model. For example, a final output of a two-sided learning model is the output of the decoder component 306, which is located at a device geographically separated from the device that hosts the encoder component 304. In some variations, the output of the decoder component 306 can be made available at the encoder component 304 by transmitting signaling that includes the output of the decoder component 306, which leads to inefficient use of time-frequency resources due to increased signaling overhead. In some examples, the device can adapt a learning model without computing a loss function or accessing the output of the decoder component 306. [0094] In some examples, ^_#^ and ^_#^ denote a number (e.g., numerical quantity) of batch normalization layers at an encoder component 304 and a decoder component 306 that a device has t_{rained. At the end of the training, a ^^^ batch normalization layer stores ^̂^,U and R)^,U, V = 1, … , W^,} where W_^ is a number of features at the input of the ^^{^^} batch normalization layer, ^̂ is an estimate and/or true value of IJ{^_^}^L ^_K ^M ^ N and R) is an estimate and/or true value of IJ{5_^ ^{^}}^L ^_K ^M ^ N. In some v_{ariations, ^̂^,U =} _{^,U = R^,U} _{PLQ when EMA is employed for} _estimates IJ{^_^}^L ^_K ^M ^ N and IJ{5_^ ^{^}}^L ^_K ^M ^ N The parameters ^̂^{^,U} and R)^{^,U} can be learned without and without computing gradients and back propagation. In some cases, such as for computing ^̂^{^,U} and R)^{^,U} through EMA, the device can determine (e.g., learn) ^^{^,U} P_LQ and R_^,U _{PLQ , V = 1, … , W^, for a ^^^ batch normalization layer using an exponential averaging (e.g., without} computing gradients and back propagation). The parameters ^̂^{^,U} and R)^{^,U} include batch normalization statistics of the ^^{^^} batch normalization layer. If the device compute the estimates of Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 37 mean and variance using EMA during training, then ^^{^,U ^,U} P_LQ and R_PLQ are the batch normalization statistics of ^^{^^} batch normalization layer. ^[0095] In some examples, when a distribution of input data changes to j_k ^{^} , which is different from a distribution of training data j_k ^e (e.g., or any of the j^eb k_l , in the case of training over multiple _{data distributions), then the device can adapt the two-sided learning model or autoencoder, y =} _{{uv, wx}. For example, the device can discard the batch normalization statistics (e.g., values of the} mean and variance ^̂ and R)) computed from source domain data or the training data and stored in respective batch normalization layers. The device can calculate and/or recompute the batch normalization statistics (e.g., mean and variance) for respective batch normalization layers in a _{forward pass using the data samples m^ = {m^} _{^ , m^} _{^ , … , m^} _{^ } that are distributed as per a new target} domain distribution, j_k ^{^} . In some variations, the adaptation can be performed for the encoder component 304 and the decoder component 306 separately. [0096] For the encoder component 304, the device (e.g., the UE 104) can compute new batch normalization statistics for respective batch normalization layers in the encoder component 304 _{using target domain data samples m^ = {m^ , m^ , … , m^} _{^ ^ ^ }. The device can divide the data samples into} subsets of data samples referred to as mini batches, where the number of mini batches is equal to O_# ^{^} . For the encoder component 304 the device performs a forward pass of the respective minibatches of data samples from the target domain and computes a mean ^_^ and a variance 5_^ ^{^} at _{respective batch normalization layers for different features. Then, from the values of ^^, ^ =} _{1, … , O^} _{# , and variance 5^} _{^ , ^ = 1, … , O^} _{# , the device computes new values of ^̂ and R) (e.g., using} EMA or any other method) and stores the new values in the respective batch normalization layers of the encoder component 304 as the adapted, adjusted, and/or new values of the batch normalization statistics. For a decoder component 306, the device computes new values for the batch normalization statistics for respective batch normalization layers in the decoder component 306 using the output from the encoder component 304. ^[0097] The device provides input data samples to the decoder component 306 for adaptation that include the latent and/or feature vectors computed by the encoder component 304 after adapting the batch normalization statistics of the encoder component 304. The device divides input data samples received from the encoder component 304 into subsets of data samples referred to as Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 38 minibatches, where the number of minibatches is equal to O_# ^{^}. For the decoder component 306, the device makes a forward pass of respective minibatches samples received from the encoder component 304 and computes the mean ^_^ and variance respective batch normalization layer for different features. Then, the device computes the new values of ^̂ and R) (e.g., through EMA or any other method) from the values of ^_^ and variance 5_^ ^{^} and stores the new values of ^̂ and R) in respective batch normalization layers of the decoder component 306. In some examples, the device can select a subset of data samples for the minibatches that include a diverse set of data samples by computing a distance between the data samples. For example, if the data samples or obtained sequentially, then the device can use reservoir sampling to select samples that are most likely independent and identically distributed (IID). IID data samples are from a same underlying population and the data samples are obtained independent of one another. _{[0098] The device can obtain a set of data samples m^ = {m^ , m^ , … , m^} _{^ ^ ^ } from a new input data} distribution j_k ^{^} . The device can obtain data samples that are independent and identically distributed according to j_k ^{^} to provide for a reliable estimate of the new data distribution using the computed mean and variance of the data samples. If a numerical quantity of correlated data samples in a _{dataset, m^ = {m^ , m^ , … , ^} _{^ ^ m^ }, exceeds a threshold value, then the computed mean and variance can} be biased, and the adaptation may not be effective in correcting the learning model to the new data distribution. Thus, the device can perform adaptation procedure using IID data samples. The device can obtain IID data samples by selecting uncorrelated data samples from the data set m^{^} = _{{m^ , m^ , … , m^ }. That is, the device can extract a set of samples m^ ⊆ m^, wh ^} _{^ ^ ^ ^ ere m^ includes diverse} samples or data samples that have a relatively low correlation (e.g., less than a threshold correlation, zero correlation). Additionally, or alternatively, the device can generate the data set m_^ ^{^} by generating and/or selecting a set of samples from m^{^}, such that the generated set m^{^} _^ has a relatively high entropy (e.g., greater than a threshold entropy). For example, the device can generate different subsets of the set m^{^} and can compute the entropy of respective subsets and then select a subset that has a greatest or highest entropy. [0099] In some examples, the device (e.g., the UE 104) can obtain data samples, m^{^} = _{m^ _{^ , m^} _{^ , … , m^} _{^ } for input to an encoder component 304 of a learning model, where the encoder} _{component includes ℓ^^ , … , ℓ^M^ batch normalization layers. For the values ^, where ^ ∈} Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 39 _{{^ , … , ^ }, a ^^^ batch normalizat ^,U ^,U} _{^ #^ ion layer in the encoder component 304 stores ^̂ and R) , V =} _{1, … , W , where W is ^^} _{^ ^ the number of features at the input of the ^ batch normalization layer. In some} _{a decoder component 306 includes ℓ^^ , … , ℓ^M^ batch normalization layers. For the} _{^, where ^ ∈ {^^, … , ^#^}, a ^^^ batch normalization layer in the decoder component 306} _{stores ^̂^,U and R)^,U, V = 1, … , W , where W is the number of features at th ^^} _{^ ^ e input of the ^ batch} normalization layer. The UE 104 obtains a set of samples m_^ ^{^} from the set of samples m^{^} = _{m^ _{^ , m^} _{^ , … , m^ } where m^} _{^ is a subset of m^, including data that are different,} _{distant, and/or uncorrelated from one another. The set m^ ^ ^ ^} _{^ can be equal to the set m (e.g., m^ ⊆ m ).} [0100] In some examples, a device (e.g., a UE 104) can generate the set m^{^} _^ from m^{^}. For example, the device can select a non-negative, non-zero valued distance threshold δ and select a _{distance metric and/or measure ^^^^^^^^7m^, m^8, where the distance metric and/or measure} _{^^^^^^^^7m , m^8 returns a measure of} _{between any two arbitrary data samples m^ and m^.} some examples, the distance metric and/or measure is a Euclidean distance, where _{^^^^^^^^7m^, m^8 = ^m^ − m^^^. In some other examples, the ^^^^^^^^7m^, m^8 = ^m^ − m^^^,} where ^‖m^‖ _^denotes ^^{^^} norm of m. In some other examples, the is Cosine distance between the data samples (e.g., by treating the multi-dimensional data samples m_^, _{m as row vectors or column vectors). The device ca ^ ^ ^} _{^ n initialize m^ = {m^ } by selecting m^} _{randomly from the set m^. The device can set m^ ← m^ ∖ m^ ^ ^} _{^ , where m ∖ m^ is a set that includes the} _{elements of set m^ that are not in the set m^} _{^ . The notation y ∖ ^ denotes a set A minus a set B and/or} _{a difference between the two sets, A and B, where y ∖ ^ is a set that includes elements of set y that} are not in set A. The device can select another data sample, m_^ ^{^} , randomly from the set m^{^}. The device can compute a distance between m_^ ^{^} and the samples in the set m^{^} _^ , using the selected distance _{metric and/or measure ^^^^^^^^7m^, m^8. If any of the computed distances are less than the distance} threshold δ, then the device does not include that data sample in the set m_^ ^{^} and removes that sample _{from the set m^ (e.g., set m^ ← m^ ∖ {m^} _{^ }). If the computed distances are greater than or equal to the} distance threshold δ, then the device can include that data samples in the set m_^ ^{^} (e.g., set m_^ ^{^} ← _{m^ ⋃{m^ } and m^ ← m^ ∖ m^ ). In some examples, if the set m^} _{^ ^ ^ is empty, then the device can terminate} Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 40 the procedure for generating the set m^{^} _^ from m^{^}. If the set m^{^} is not empty, then the device can return to select another data sample, m_^ ^{^} , randomly from the set m^{^}. [0101] In some other examples, a device (e.g., a UE 104) can generate one or more different subsets from the set m^{^} and compute the entropy of the subsets. The device can select a subset that results in a highest entropy as the set m_^ ^{^} . The device can divide and/or group the data samples in the ^^{m^} ^{et m^ into O^ number of minibatches, where a minibatch includes ^ ^} ^s _{^ #} ^_L} _M ^ ^{number of data samples,} and where ^|m_^ ^{^ |} denotes a number of elements (e.g., data set m_^ ^{^} and ^⌊ denotes a ^⌋ floor of the value of . In some variations, O_# ^{^} can be any non-zero value and/or any integer value _{from a unit value to the number of samples in the set m^ (e.g., 1 ≤ ^ | ^ | ^} _{^ O# ≤ m^ ). When O# = 1, a} _{data sample is a minibatch, and when O^} _{# = |m^} _{^ | there is one minibatch that includes the available} data samples for adaptation of the learning model. [0102] For the encoder component 304, the device can perform a forward pass of the respective minibatches of the data samples and can compute the mean ^_^ and variance 5_^ ^{^} at respective batch _{normalization layers for different features. Then, from the values of ^ ^} _{^, ^ = 1, … , O# , and variance} _5^ _{^ , ^ = 1, … , O^} _{# , the device can compute the new values of ^̂ and R) (e.g., using EMA or any other} method) and can store the new values of ^̂ and R) in respective batch normalization layers of the encoder component 304 as the adapted, adjusted, and/or new values of the batch normalization statistics. In some cases, for respective ^^{^^} batch normalization layers at the encoder component _{304, where ^ ∈ {^^, … , ^#^}, with the forward pass of respective minibatches of data samples, the} _{device computes a value of a mean ^ , ^ = 1, … , O^ , and a variance 5^, ^} _{^ # ^ ^ = 1, … , O# . The device} _{uses the values of ^^ and 5^} _{^ to compute ^̂^,U and R)^,U, V = 1, … , W^, where W^ is a number of features} _{at the input of the ^^^ batch normalization layer. The device} _{the values of ^̂^,U and R)^,U, V =} _{1, … , W^, at a ^^^ batch normalization layer in the encoder component 304, where ^ ∈ {^^, … , ^#^}.} ^[0103] For the decoder component 306, the device (e.g., the NE 102) can provide data samples as input to the decoder neural network for adaptation that include the latent and/or feature vectors _{computed by the encoder component 304 for respective input data samples, denoted by ¢^, … , ¢^. In} _{some examples, the outputs of an encoder component 304, ^ = |m^} _{^ |, include a latent representation} and/or vector (e.g., or a feature vector) for respective input data samples and the set m_^ ^{^} includes Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 41 data samples input to the encoder component 304 during adaptation of the two-sided learning model. The device divides and/or groups the data samples in the set m_^ ^{^} into O_# ^{^} number of _{minibatches, where respective minibatches include ^ ^} _{L^^ number} _{data samples, where ⌊ ⌋} _M denotes a floor of the value of . In some examples, O_# ^{^} can be any non-zero value, and any integer _{value from unit value to the value of ^ (e.g., 1 ≤ O^} _{# ≤ ^). When O^} _{# = 1, respective data samples} _{are considered a minibatch and when O^} _{# = ^ there is one minibatch that includes the available data} samples for adaptation of the learning model. [0104] A device can perform a forward pass of respective minibatches of data samples in a decoder component 306. The device can compute a mean ^_^ and a variance 5_^ ^{^} at respective batch _{normalization layers for different features. Then, from the values of ^ ^} _{^, ^ = 1, … , O#, and variance} _5^ _{^ , ^ = 1, … , O^} _{# , the device can compute new values of ^̂ and R) (e.g., using EMA or any other} method) and store the new values of ^̂ and R) in respective batch normalization layers of the decoder component 306 as the adapted, adjusted, and/or new values of the batch normalization statistics. With the forward pass of respective minibatches including data samples, the device computes the _{value of the mean ^^, ^ = 1, … , O^} _{#, and variance 5^} _{^ , ^ = 1, … , O^} _{# for respective batch} _{normalization layers of the ^^^ batch normalization layer at the decoder component 306, where ^ ∈} _{{^ , … , ^ }. The device computes ^̂^,U and R^,U ^} _{^ #^ ) , V = 1, … , W^ from the values of ^^ and 5^ , where} W_^ is a number of features at the input of the ^^{^^} batch normalization layer. The device stores the _{values of ^̂^,U and R)^,U, V = 1, … , W ^^} _{^, at a ^ batch normalization layer at the decoder component} _{306, where ^ ∈ {^^, … , ^#^}.} [0105] Figure 4 illustrates an example of a signaling diagram 400 in accordance with aspects of the present disclosure. In some examples, the signaling diagram 400 may implement aspects of the wireless communications system 100, the learning model diagram 200, and the wireless communications system 300. The signaling diagram 400 may illustrate an example of a device 402-a and a device 402-b adapting a two-sided learning model with normalization layer parameters. In some cases, the device 402-a may be an example of a UE 104 and the device 402-b may be an example of an NE 102, as described with reference to Figures 1 through 3. Alternative examples of the following may be implemented, where some processes are performed in a different order than Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 42 described or are not performed. In some cases, processes may include additional features not mentioned below, or further processes may be added. [0106] In some examples, at 404, the device 402-a and/or the device 402-b can transmit an indication to update learning model parameters. The learning model parameters can define layers of a learning model that is implemented at the device 402-a and/or the device 402-b, such as an autoencoder, as described with reference to Figures 2 and 3. For example, the learning model can be a two-sided learning model that includes an encoder learning model and a decoder learning model. [0107] In some cases, at 406, the device 402-b can transmit data samples to the device 402-a. For example, the device 402-b can obtain the data samples by performing one or more measurements (e.g., measurements to obtain CSI) and/or by performing a simulated channel to generate the data samples. Additionally, or alternatively, the device 402-a can obtain the data samples, such as via a simulation and/or measurements. The device 402-b and/or the device 402-a can obtain the data samples by estimating CSI. ^[0108] At 408, the device 402-a can select a subset of data samples based on a determination that one or more parameters of at least one normalization layer of a learning model are to be updated. The parameters can include one or more normalization statistics of at least one normalization layer (e.g., a batch normalization layer). The normalization statistics can include, but are not limited to, a value of a mean parameter and a value of a variance parameter. In some examples, the device 402-a can determine the subset of data samples satisfy one or more conditions. The conditions can include a distance metric between respective data samples in the subset of data samples being greater than a threshold dissimilarity value and an entropy value of respective data samples in the subset of data samples satisfying a threshold entropy value. The device 402-a can determine one or more of the threshold dissimilarity value or the threshold entropy value. For example, the device 402-a can select the threshold dissimilarity value and/or the threshold entropy value. In some other examples, the device 402-a can receive signaling (e.g., from the device 402-b or from another device) that indicates or configures the threshold dissimilarity value and/or the threshold entropy value. [0109] In some variations, the device 402-a can determine to update the parameters according to a periodicity. For example, the device 402-a can be configured with and/or determine a periodicity Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 43 for updating the parameters. The device 402-a can update the parameters periodically. In some other variations, the device 402-a can determine to update the parameters after receiving the message at 404. In yet other variations, the device 402-a can detect a change in one or more characteristics of the CSI. The device 402-a can determine to update the parameters upon detection of the change. For example, a change in characteristics of the CSI can include, but is not limited to, a change in a channel gain, a change in phase shift, a change in a delay spread, and/or a change in Doppler shift. The characteristics of the CSI can change due to fading, interference, device mobility, environmental factors, and/or antenna configuration changes. In yet other variations, the device 402-a can detect a change in one or more channel conditions for a channel between the device 402-a and the device 402-b. The device 402-a can determine to update the one or more parameters upon detecting the change. A change in channel conditions can be due to a change in fading for the channel, a change in interference for the channel, a change in device mobility of the device 402-a and/or the device 402-b, a change in environmental factors for the channel (e.g., weather conditions and obstruction of objects, among other environmental factors), and/or changes to an antenna configuration at the device 402-a and/or the device 402-b. In yet other variations, the device 402-a can determine to update the parameters if a quality of output from the encoder learning model fails to satisfy a threshold value. For example, if an accuracy and/or precision of the encoder learning model, among other quality metrics, fail to satisfy respective threshold values, then the device 402-a can determine to update the parameters. [0110] At 410, the device 402-a can generate one or more updated parameters by providing the subset of data samples as input to encoder learning model. The updated parameters replace the one or more parameters of the normalization layer of the encoder learning model. In some examples, the device 402-a can propagate the subset of data samples from a first layer of the encoder learning model to a last layer of the encoder learning model. The device 402-a can compute one or more normalization statistics of the at least one normalization layer. The updated parameters can include, but are not limited to, the normalization statistics for the at least one normalization layer. Example normalization statistics include, but are not limited to, a value of a mean parameter and a value of a variance parameter. [0111] At 412, the device 402-a can transmit an output from the encoder learning model to the device 402-b. In some examples, the output is a compressed (e.g., encoded) representation of the Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 44 subset of data samples. The device 402-b can provide the output from the encoder learning model as input to a decoder learning model. [0112] At 414, the device 402-b can generate updated parameters for a decoder learning model by providing output from the encoder learning model as input to decoder learning model. For example, the device 402-b can determine that one or more parameters for at least one normalization layer (e.g., a batch normalization layer) of the decoder learning model are to be updated. The output can include a set of information samples of feature vectors generated by the encoder learning model. The updated parameters can include, but are not limited to, normalization statistics for the at least one normalization layer. The normalization statistics can include a value of a mean parameter and a value of a variance parameter. [0113] In some variations, the device 402-b can determine to update the parameters according to a periodicity (e.g., a same periodicity as the device 402-a uses for determining whether to update parameters of the encoder learning model). For example, the device 402-b can be configured with and/or determine a periodicity for updating the parameters. The device 402-b can update the parameters periodically. In some other variations, the device 402-b can determine to update the parameters after receiving the message at 404 and/or after receiving the message at 412. In yet other variations, the device 402-b can detect a change in one or more characteristics of the CSI. The device 402-b can determine to update the parameters upon detection of the change. For example, a change in characteristics of the CSI can include, but is not limited to, a change in a channel gain, a change in phase shift, a change in a delay spread, and/or a change in Doppler shift. The characteristics of the CSI can change due to fading, interference, device mobility, environmental factors, and/or antenna configuration changes. In yet other variations, the device 402-b can detect a change in one or more channel conditions for a channel between the device 402-a and the device 402-b. The device 402-a can determine to update the one or more parameters upon detecting the change. A change in channel conditions can be due to a change in fading for the channel, a change in interference for the channel, a change in device mobility of the device 402-a and/or the device 402-b, a change in environmental factors for the channel (e.g., weather conditions and obstruction of objects, among other environmental factors), and/or changes to an antenna configuration at the device 402-a and/or the device 402-b. In yet other variations, the device 402-b can determine to update the parameters if a quality of output from the decoder learning model fails to satisfy a Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 45 threshold value. For example, if an accuracy and/or precision of the decoder learning model, among other quality metrics, fail to satisfy respective threshold values, then the device 402-b can determine to update the parameters. ^[0114] The device 402-b can propagate the subset of data samples from a first layer of the decoder learning model to a last layer of the decoder learning model. The device 402-b can compute one or more normalization statistics of the at least one normalization layer. The normalization statistics can include, but are not limited to, a value of a mean parameter and a value of a variance parameter. [0115] Figure 5 illustrates an example of a UE 500 in accordance with aspects of the present disclosure. The UE 500 may include a processor 502, a memory 504, a controller 506, and a transceiver 508. The processor 502, the memory 504, the controller 506, or the transceiver 508, or various combinations thereof or various components thereof may be examples of means for performing various aspects of the present disclosure as described herein. These components may be coupled (e.g., operatively, communicatively, functionally, electronically, electrically) via one or more interfaces. [0116] The processor 502, the memory 504, the controller 506, or the transceiver 508, or various combinations or components thereof may be implemented in hardware (e.g., circuitry). The hardware may include a processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), or other programmable logic device, or any combination thereof configured as or otherwise supporting a means for performing the functions described in the present disclosure. [0117] The processor 502 may include an intelligent hardware device (e.g., a general-purpose processor, a DSP, a CPU, an ASIC, a field-programmable gate array (FPGA), or any combination thereof). In some implementations, the processor 502 may be configured to operate the memory 504. In some other implementations, the memory 504 may be integrated into the processor 502. The processor 502 may be configured to execute computer-readable instructions stored in the memory 504 to cause the UE 500 to perform various functions of the present disclosure. [0118] The memory 504 may include volatile or non-volatile memory. The memory 504 may store computer-readable, computer-executable code including instructions when executed by the Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 46 processor 502 cause the UE 500 to perform various functions described herein. The code may be stored in a non-transitory computer-readable medium such as the memory 504 or another type of memory. Computer-readable media includes both non-transitory computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A non-transitory storage medium may be any available medium that may be accessed by a general-purpose or special-purpose computer. ^[0119] In some implementations, the processor 502 and the memory 504 coupled with the processor 502 may be configured or operable to cause the UE 500 to perform one or more of the functions described herein (e.g., executing, by the processor 502, instructions stored in the memory 504). For example, the processor 502 may support wireless communication at the UE 500 in accordance with examples as disclosed herein. The UE 500 may be configured to or operable to support a means for selecting a subset of data samples from a set of data samples associated with an encoder learning model, where the encoder learning model is associated with a two-sided learning model that includes the encoder learning model and a decoder learning model, updating one or more parameters corresponding to at least one normalization layer of the encoder learning model based on providing the subset of data samples as input to the encoder learning model, and transmitting, to a second device, a message including an output from the encoder learning model, where the output is associated with the subset of data samples. [0120] Additionally, the UE 500 may be configured to support any one or combination of to update the one or more parameters, updating the one or more parameters based on a periodicity. Additionally, or alternatively, to update the one or more parameters, the UE 500 may be configured to support receiving, from at least one of the second device or a third device, an additional message that indicates for the first device to update the one or more parameters. Additionally, or alternatively, to update the one or more parameters, the UE 500 may be configured to support detecting a change in one or more characteristics associated with CSI. Additionally, or alternatively, to update the one or more parameters, the UE 500 may be configured to support detecting a change in one or more channel conditions associated with a channel between the first device and the second device. Additionally, or alternatively, to update the one or more parameters, the UE 500 may be configured to support determining a quality of output from the encoder learning model fails to satisfy a threshold value. Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 47 [0121] Additionally, or alternatively, the UE 500 may be configured to support obtaining the set of data samples based on estimating CSI. Additionally, or alternatively, the UE 500 may be configured to support receiving an additional message that indicates the set of data samples. Additionally, or alternatively, selecting the subset of data samples includes determining the subset of data samples satisfy one or more conditions, where the one or more conditions include a distance metric between respective data samples in the subset of data samples being greater than a threshold dissimilarity value and an entropy value of respective data samples in the subset of data samples satisfying a threshold entropy value. Additionally, or alternatively, the UE 500 may be configured to support determining one or more of the threshold dissimilarity value or the threshold entropy value. Additionally, or alternatively, the UE 500 may be configured to support receiving an additional message that indicates one or more of the threshold dissimilarity value or the threshold entropy value. Additionally, or alternatively, updating the one or more parameters includes propagating the subset of data samples from a first layer associated with the encoder learning model to a last layer associated with the encoder learning model, and computing one or more normalization statistics associated with the at least one normalization layer, where the one or more normalization statistics include a value of a mean parameter and a value of a variance parameter. Additionally, or alternatively, the one or more parameters include one or more normalization statistics associated with the at least one normalization layer, and where the one or more normalization statistics include a value of a mean parameter and a value of a variance parameter. Additionally, or alternatively, the at least one normalization layer is a batch normalization layer. ^[0122] Additionally, or alternatively, the UE 500 may support at least one memory (e.g., the memory 504) and at least one processor (e.g., the processor 502) coupled with the at least one memory and configured or operable to cause the UE to select a subset of data samples from a set of data samples associated with an encoder learning model, where the encoder learning model is associated with a two-sided learning model that includes the encoder learning model and a decoder learning model, update one or more parameters corresponding to at least one normalization layer of the encoder learning model based on providing the subset of data samples as input to the encoder learning model, and transmit, to a second device, a message including an output from the encoder learning model, where the output is associated with the subset of data samples. Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 48 [0123] Additionally, the UE 500 may be configured to support any one or combination of to update the one or more parameters, the at least one processor is configured to update the one or more parameters based on a periodicity. Additionally, or alternatively, to update the one or more parameters, the at least one processor is configured to receive, from at least one of the second device or a third device, an additional message that indicates for the first device to update the one or more parameters. Additionally, or alternatively, to update the one or more parameters, the at least one processor is configured to detect a change in one or more characteristics associated with CSI. Additionally, or alternatively, to update the one or more parameters, the at least one processor is configured to detect a change in one or more channel conditions associated with a channel between the first device and the second device. Additionally, or alternatively, to update the one or more parameters, the at least one processor is configured to determine a quality of output from the encoder learning model fails to satisfy a threshold value. [0124] Additionally, or alternatively, the at least one processor is configured to obtain the set of data samples based on estimating CSI. Additionally, or alternatively, the at least one processor is configured to receive an additional message that indicates the set of data samples. Additionally, or alternatively, to select the subset of data samples, the at least one processor is configured to determine the subset of data samples satisfy one or more conditions, where the one or more conditions include a distance metric between respective data samples in the subset of data samples being greater than a threshold dissimilarity value and an entropy value of respective data samples in the subset of data samples satisfying a threshold entropy value. Additionally, or alternatively, the at least one processor is configured to determine one or more of the threshold dissimilarity value or the threshold entropy value. Additionally, or alternatively, the at least one processor is configured to receive an additional message that indicates one or more of the threshold dissimilarity value or the threshold entropy value. Additionally, or alternatively, to update the one or more parameters, the at least one processor is configured to propagate the subset of data samples from a first layer associated with the encoder learning model to a last layer associated with the encoder learning model, and compute one or more normalization statistics associated with the at least one normalization layer, where the one or more normalization statistics include a value of a mean parameter and a value of a variance parameter. Additionally, or alternatively, the one or more parameters include one or more normalization statistics associated with the at least one Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 49 normalization layer, where the one or more normalization statistics include a value of a mean parameter and a value of a variance parameter. Additionally, or alternatively, the at least one normalization layer is a batch normalization layer. ^[0125] The controller 506 may manage input and output signals for the UE 500. The controller 506 may also manage peripherals not integrated into the UE 500. In some implementations, the controller 506 may utilize an operating system such as iOS®, ANDROID®, WINDOWS®, or other operating systems. In some implementations, the controller 506 may be implemented as part of the processor 502. [0126] In some implementations, the UE 500 may include at least one transceiver 508. In some other implementations, the UE 500 may have more than one transceiver 508. The transceiver 508 may represent a wireless transceiver. The transceiver 508 may include one or more receiver chains 510, one or more transmitter chains 512, or a combination thereof. [0127] A receiver chain 510 may be configured to receive signals (e.g., control information, data, packets) over a wireless medium. For example, the receiver chain 510 may include one or more antennas to receive a signal over the air or wireless medium. The receiver chain 510 may include at least one amplifier (e.g., a low-noise amplifier (LNA)) configured to amplify the received signal. The receiver chain 510 may include at least one demodulator configured to demodulate the receive signal and obtain the transmitted data by reversing the modulation technique applied during transmission of the signal. The receiver chain 510 may include at least one decoder for decoding the demodulated signal to receive the transmitted data. ^[0128] A transmitter chain 512 may be configured to generate and transmit signals (e.g., control information, data, packets). The transmitter chain 512 may include at least one modulator for modulating data onto a carrier signal, preparing the signal for transmission over a wireless medium. The at least one modulator may be configured to support one or more techniques such as amplitude modulation (AM), frequency modulation (FM), or digital modulation schemes like phase-shift keying (PSK) or quadrature amplitude modulation (QAM). The transmitter chain 512 may also include at least one power amplifier configured to amplify the modulated signal to an appropriate power level suitable for transmission over the wireless medium. The transmitter chain 512 may also include one or more antennas for transmitting the amplified signal into the air or wireless medium. Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 50 [0129] Figure 6 illustrates an example of a processor 600 in accordance with aspects of the present disclosure. The processor 600 may be an example of a processor configured to perform various operations in accordance with examples as described herein. The processor 600 may include a controller 602 configured to perform various operations in accordance with examples as described herein. The processor 600 may optionally include at least one memory 604, which may be, for example, an L1/L2/L3 cache. Additionally, or alternatively, the processor 600 may optionally include one or more arithmetic-logic units (ALUs) 606. One or more of these components may be in electronic communication or otherwise coupled (e.g., operatively, communicatively, functionally, electronically, electrically) via one or more interfaces (e.g., buses). [0130] The processor 600 may be a processor chipset and include a protocol stack (e.g., a software stack) executed by the processor chipset to perform various operations (e.g., receiving, obtaining, retrieving, transmitting, outputting, forwarding, storing, determining, identifying, accessing, writing, reading) in accordance with examples as described herein. The processor chipset may include one or more cores, one or more caches (e.g., memory local to or included in the processor chipset (e.g., the processor 600) or other memory (e.g., random access memory (RAM), read-only memory (ROM), dynamic RAM (DRAM), synchronous dynamic RAM (SDRAM), static RAM (SRAM), ferroelectric RAM (FeRAM), magnetic RAM (MRAM), resistive RAM (RRAM), flash memory, phase change memory (PCM), and others). [0131] The controller 602 may be configured to manage and coordinate various operations (e.g., signaling, receiving, obtaining, retrieving, transmitting, outputting, forwarding, storing, determining, identifying, accessing, writing, reading) of the processor 600 to cause the processor 600 to support various operations in accordance with examples as described herein. For example, the controller 602 may operate as a control unit of the processor 600, generating control signals that manage the operation of various components of the processor 600. These control signals include enabling or disabling functional units, selecting data paths, initiating memory access, and coordinating timing of operations. ^[0132] The controller 602 may be configured to fetch (e.g., obtain, retrieve, receive) instructions from the memory 604 and determine subsequent instruction(s) to be executed to cause the processor 600 to support various operations in accordance with examples as described herein. The controller 602 may be configured to track memory addresses of instructions associated with the memory 604. Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 51 The controller 602 may be configured to decode instructions to determine the operation to be performed and the operands involved. For example, the controller 602 may be configured to interpret the instruction and determine control signals to be output to other components of the processor 600 to cause the processor 600 to support various operations in accordance with examples as described herein. Additionally, or alternatively, the controller 602 may be configured to manage flow of data within the processor 600. The controller 602 may be configured to control transfer of data between registers, ALUs 606, and other functional units of the processor 600. [0133] The memory 604 may include one or more caches (e.g., memory local to or included in the processor 600 or other memory, such as RAM, ROM, DRAM, SDRAM, SRAM, MRAM, flash memory, etc. In some implementations, the memory 604 may reside within or on a processor chipset (e.g., local to the processor 600). In some other implementations, the memory 604 may reside external to the processor chipset (e.g., remote to the processor 600). [0134] The memory 604 may store computer-readable, computer-executable code including instructions that, when executed by the processor 600, cause the processor 600 to perform various functions described herein. The code may be stored in a non-transitory computer-readable medium such as system memory or another type of memory. The controller 602 and/or the processor 600 may be configured to execute computer-readable instructions stored in the memory 604 to cause the processor 600 to perform various functions. For example, the processor 600 and/or the controller 602 may be coupled with or to the memory 604, the processor 600, and the controller 602, and may be configured to perform various functions described herein. In some examples, the processor 600 may include multiple processors and the memory 604 may include multiple memories. One or more of the multiple processors may be coupled with one or more of the multiple memories, which may, individually or collectively, be configured to perform various functions herein. ^[0135] The one or more ALUs 606 may be configured to support various operations in accordance with examples as described herein. In some implementations, the one or more ALUs 606 may reside within or on a processor chipset (e.g., the processor 600). In some other implementations, the one or more ALUs 606 may reside external to the processor chipset (e.g., the processor 600). One or more ALUs 606 may perform one or more computations such as addition, subtraction, multiplication, and division on data. For example, one or more ALUs 606 may receive input operands and an operation code, which determines an operation to be executed. One or more Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 52 ALUs 606 may be configured with a variety of logical and arithmetic circuits, including adders, subtractors, shifters, and logic gates, to process and manipulate the data according to the operation. Additionally, or alternatively, the one or more ALUs 606 may support logical operations such as AND, OR, exclusive-OR (XOR), not-OR (NOR), and not-AND (NAND), enabling the one or more ALUs 606 to handle conditional operations, comparisons, and bitwise operations. [0136] The processor 600 may support wireless communication in accordance with examples as disclosed herein. The processor 600 may be configured to or operable to support at least one controller (e.g., the controller 602) coupled with at least one memory (e.g., the memory 604) and configured or operable to cause the processor to select a subset of data samples from a set of data samples associated with an encoder learning model, where the encoder learning model is associated with a two-sided learning model that includes the encoder learning model and a decoder learning model, update one or more parameters corresponding to at least one normalization layer of the encoder learning model based on providing the subset of data samples as input to the encoder learning model, and transmit, to a device, a message including an output from the encoder learning model, where the output is associated with the subset of data samples. [0137] Additionally, the processor 600 may be configured to or operable to support any one or combination of to update the one or more parameters, the at least one controller is configured or operable to cause the processor to update the one or more parameters based on a periodicity. Additionally, or alternatively, to update the one or more parameters, the at least one controller is configured or operable to cause the processor to receive, from at least one of the device or an additional device, an additional message that indicates for the processor to update the one or more parameters. Additionally, or alternatively, to update the one or more parameters, the at least one controller is configured or operable to cause the processor to detect a change in one or more characteristics associated with CSI. Additionally, or alternatively, to update the one or more parameters, the at least one controller is configured or operable to cause the processor to detect a change in one or more channel conditions associated with a channel between the processor and the device. Additionally, or alternatively, to update the one or more parameters, the at least one controller is configured or operable to cause the processor to determine a quality of output from the encoder learning model fails to satisfy a threshold value. Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 53 [0138] Additionally, or alternatively, the at least one controller is configured or operable to cause the processor to obtain the set of data samples based on estimating CSI. Additionally, or alternatively, the at least one controller is configured or operable to cause the processor to receive an additional message that indicates the set of data samples. Additionally, or alternatively, to select the subset of data samples, the at least one controller is configured or operable to cause the processor to determine the subset of data samples satisfy one or more conditions, where the one or more conditions include a distance metric between respective data samples in the subset of data samples being greater than a threshold dissimilarity value and an entropy value of respective data samples in the subset of data samples satisfying a threshold entropy value. Additionally, or alternatively, the at least one controller is configured or operable to cause the processor to determine one or more of the threshold dissimilarity value or the threshold entropy value. Additionally, or alternatively, the at least one controller is configured or operable to cause the processor to receive an additional message that indicates one or more of the threshold dissimilarity value or the threshold entropy value. Additionally, or alternatively, to update the one or more parameters, the at least one controller is configured or operable to cause the processor to propagate the subset of data samples from a first layer associated with the encoder learning model to a last layer associated with the encoder learning model and compute one or more normalization statistics associated with the at least one normalization layer, where the one or more normalization statistics include a value of a mean parameter and a value of a variance parameter. Additionally, or alternatively, the one or more parameters include one or more normalization statistics associated with the at least one normalization layer, where the one or more normalization statistics include a value of a mean parameter and a value of a variance parameter. Additionally, or alternatively, the at least one normalization layer is a batch normalization layer. ^[0139] Figure 7 illustrates an example of an NE 700 in accordance with aspects of the present disclosure. The NE 700 may include a processor 702, a memory 704, a controller 706, and a transceiver 708. The processor 702, the memory 704, the controller 706, or the transceiver 708, or various combinations thereof or various components thereof may be examples of means for performing various aspects of the present disclosure as described herein. These components may be coupled (e.g., operatively, communicatively, functionally, electronically, electrically) via one or more interfaces. Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 54 [0140] The processor 702, the memory 704, the controller 706, or the transceiver 708, or various combinations or components thereof may be implemented in hardware (e.g., circuitry). The hardware may include a processor, a DSP, an ASIC, or other programmable logic device, or any combination thereof configured as or otherwise supporting a means for performing the functions described in the present disclosure. [0141] The processor 702 may include an intelligent hardware device (e.g., a general-purpose processor, a DSP, a CPU, an ASIC, an FPGA, or any combination thereof). In some implementations, the processor 702 may be configured to operate the memory 704. In some other implementations, the memory 704 may be integrated into the processor 702. The processor 702 may be configured to execute computer-readable instructions stored in the memory 704 to cause the NE 700 to perform various functions of the present disclosure. [0142] The memory 704 may include volatile or non-volatile memory. The memory 704 may store computer-readable, computer-executable code including instructions when executed by the processor 702 cause the NE 700 to perform various functions described herein. The code may be stored in a non-transitory computer-readable medium such as the memory 704 or another type of memory. Computer-readable media includes both non-transitory computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A non-transitory storage medium may be any available medium that may be accessed by a general-purpose or special-purpose computer. ^[0143] In some implementations, the processor 702 and the memory 704 coupled with the processor 702 may be configured or operable to cause the NE 700 to perform one or more of the functions described herein (e.g., executing, by the processor 702, instructions stored in the memory 704). For example, the processor 702 may support wireless communication at the NE 700 in accordance with examples as disclosed herein. The NE 700 may be configured to or operable to support a means for receiving, from a second device, a message including an output from an encoder learning model, where the output is associated with a subset of data samples of a set of data samples, and the encoder learning model is associated with a two-sided learning model that includes the encoder learning model and a decoder learning model, and updating one or more parameters corresponding to at least one normalization layer of the decoder learning model based on providing the output as input to the decoder learning model. Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 55 [0144] Additionally, the NE 700 may be configured to or operable to support any one or combination of to update the one or more parameters, the method further including updating the one or more parameters based on a periodicity. Additionally, or alternatively, to update the one or more parameters, the NE 700 may be configured to or operable to support receiving, from at least one of the second device or a third device, an additional message that indicates for the first device to update the one or more parameters. Additionally, or alternatively, to update the one or more parameters, the NE 700 may be configured to or operable to support detecting a change in one or more channel conditions associated with a channel between the first device and the second device. Additionally, or alternatively, to update the one or more parameters, the NE 700 may be configured to or operable to support determining a quality of output from the decoder learning model fails to satisfy a threshold value. [0145] Additionally, or alternatively, updating the one or more parameters includes propagating the subset of data samples from a first layer associated with the decoder learning model to a last layer associated with the decoder learning model, and computing one or more normalization statistics associated with the at least one normalization layer, where the one or more normalization statistics include a value of a mean parameter and a value of a variance parameter. Additionally, or alternatively, the output includes a set of information samples corresponding to feature vectors associated with the encoder learning model. Additionally, or alternatively, the one or more parameters include one or more normalization statistics associated with the at least one normalization layer, and where the one or more normalization statistics include a value of a mean parameter and a value of a variance parameter. Additionally, or alternatively, the at least one normalization layer is a batch normalization layer. [0146] Additionally, or alternatively, the NE 700 may support at least one memory (e.g., the memory 704) and at least one processor (e.g., the processor 702) coupled with the at least one memory and configured or operable to cause the NE to receive, from a second device, a message including an output from an encoder learning model, where the output is associated with a subset of data samples of a set of data samples, and the encoder learning model is associated with a two-sided learning model that includes the encoder learning model and a decoder learning model, and update one or more parameters corresponding to at least one normalization layer of the decoder learning model based on providing the output as input to the decoder learning model. Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 56 [0147] Additionally, the NE 700 may be configured to support any one or combination of to update the one or more parameters, the at least one processor is configured or operable to cause the NE to update the one or more parameters based on a periodicity. Additionally, or alternatively, to update the one or more parameters, the at least one processor is configured to receive, from at least one of the second device or a third device, an additional message that indicates for the first device to update the one or more parameters. Additionally, or alternatively, to update the one or more parameters, the at least one processor is configured to detect a change in one or more channel conditions associated with a channel between the first device and the second device. Additionally, or alternatively, to update the one or more parameters, the at least one processor is configured to determine a quality of output from the decoder learning model fails to satisfy a threshold value. ^[0148] Additionally, or alternatively, to update the one or more parameters, the at least one processor is configured to propagate the subset of data samples from a first layer associated with the decoder learning model to a last layer associated with the decoder learning model and computes one or more normalization statistics associated with the at least one normalization layer, where the one or more normalization statistics include a value of a mean parameter and a value of a variance parameter. Additionally, or alternatively, the output includes a set of information samples corresponding to feature vectors associated with the encoder learning model. Additionally, or alternatively, the one or more parameters include one or more normalization statistics associated with the at least one normalization layer, and where the one or more normalization statistics include a value of a mean parameter and a value of a variance parameter. Additionally, or alternatively, the at least one normalization layer is a batch normalization layer. [0149] The controller 706 may manage input and output signals for the NE 700. The controller 706 may also manage peripherals not integrated into the NE 700. In some implementations, the controller 706 may utilize an operating system such as iOS®, ANDROID®, WINDOWS®, or other operating systems. In some implementations, the controller 706 may be implemented as part of the processor 702. ^[0150] In some implementations, the NE 700 may include at least one transceiver 708. In some other implementations, the NE 700 may have more than one transceiver 708. The transceiver 708 may represent a wireless transceiver. The transceiver 708 may include one or more receiver chains 710, one or more transmitter chains 712, or a combination thereof. Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 57 [0151] A receiver chain 710 may be configured to receive signals (e.g., control information, data, packets) over a wireless medium. For example, the receiver chain 710 may include one or more antennas to receive a signal over the air or wireless medium. The receiver chain 710 may include at least one amplifier (e.g., a low-noise amplifier (LNA)) configured to amplify the received signal. The receiver chain 710 may include at least one demodulator configured to demodulate the receive signal and obtain the transmitted data by reversing the modulation technique applied during transmission of the signal. The receiver chain 710 may include at least one decoder for decoding the demodulated signal to receive the transmitted data. [0152] A transmitter chain 712 may be configured to generate and transmit signals (e.g., control information, data, packets). The transmitter chain 712 may include at least one modulator for modulating data onto a carrier signal, preparing the signal for transmission over a wireless medium. The at least one modulator may be configured to support one or more techniques such as amplitude modulation (AM), frequency modulation (FM), or digital modulation schemes like phase-shift keying (PSK) or quadrature amplitude modulation (QAM). The transmitter chain 712 may also include at least one power amplifier configured to amplify the modulated signal to an appropriate power level suitable for transmission over the wireless medium. The transmitter chain 712 may also include one or more antennas for transmitting the amplified signal into the air or wireless medium. [0153] Figure 8 illustrates a flowchart of a method 800 in accordance with aspects of the present disclosure. The operations of the method may be implemented by a UE as described herein. In some implementations, the UE may execute a set of instructions to control the function elements of the UE to perform the described functions. It should be noted that the method described herein describes a possible implementation, and that the operations and the steps may be rearranged or otherwise modified and that other implementations are possible. ^[0154] At 802, the method may include selecting a subset of data samples from a set of data samples associated with an encoder learning model, where the encoder learning model is associated with a two-sided learning model that includes the encoder learning model and a decoder learning model. The operations of 802 may be performed in accordance with examples as described herein. In some implementations, aspects of the operations of 802 may be performed by a UE as described with reference to Figure 5. Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 58 [0155] At 804, the method may include updating one or more parameters corresponding to at least one normalization layer of the encoder learning model based on providing the subset of data samples as input to the encoder learning model. The operations of 804 may be performed in accordance with examples as described herein. In some implementations, aspects of the operations of 804 may be performed by a UE as described with reference to Figure 5. [0156] At 806, the method may include transmitting, to a second device, a message including an output from the encoder learning model, where the output is associated with the subset of data samples. The operations of 806 may be performed in accordance with examples as described herein. In some implementations, aspects of the operations of 806 may be performed a UE as described with reference to Figure 5. [0157] Figure 9 illustrates a flowchart of a method 900 in accordance with aspects of the present disclosure. The operations of the method may be implemented by an NE as described herein. In some implementations, the NE may execute a set of instructions to control the function elements of the NE to perform the described functions. It should be noted that the method described herein describes a possible implementation, and that the operations and the steps may be rearranged or otherwise modified and that other implementations are possible. ^[0158] At 902, the method may include receiving, from a second device, a message including an output from an encoder learning model, where the output is associated with a subset of data samples of a set of data samples and the encoder learning model is associated with a two-sided learning model that includes the encoder learning model and a decoder learning model. The operations of 902 may be performed in accordance with examples as described herein. In some implementations, aspects of the operations of 902 may be performed by an NE as described with reference to Figure 7. ^[0159] At 904, the method may include updating one or more parameters corresponding to at least one normalization layer of the decoder learning model based on providing the output as input to the decoder learning model. The operations of 904 may be performed in accordance with examples as described herein. In some implementations, aspects of the operations of 904 may be performed by an NE as described with reference to Figure 7. Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 59 [0160] The description herein is provided to enable a person having ordinary skill in the art to make or use the disclosure. Various modifications to the disclosure will be apparent to a person having ordinary skill in the art, and the generic principles defined herein may be applied to other variations without departing from the scope of the disclosure. Thus, the disclosure is not limited to the examples and designs described herein but is to be accorded the broadest scope consistent with the principles and novel features disclosed herein. Firm Ref. No. SMM920240007-WO-PCT

Claims

Lenovo Ref. No. SMM92040007-WO-PCT 60 CLAIMS What is claimed is: 1. A first device, comprising: at least one memory; and at least one processor coupled with the at least one memory and operable to cause the first device to: select a subset of data samples from a plurality of data samples associated with an encoder learning model, wherein the encoder learning model is associated with a two-sided learning model that comprises the encoder learning model and a decoder learning model; update one or more parameters corresponding to at least one normalization layer of the encoder learning model based at least in part on providing the subset of data samples as input to the encoder learning model; and transmit, to a second device, a message comprising an output from the encoder learning model, wherein the output is associated with the subset of data samples. 2. The first device of claim 1, wherein to update the one or more parameters, the at least one processor is operable to cause the first device to update the one or more parameters based at least in part on a periodicity. 3. The first device of claim 1, wherein to update the one or more parameters, the at least one processor is operable to cause the first device to receive, from at least one of the second device or a third device, an additional message that indicates for the first device to update the one or more parameters. 4. The first device of claim 1, wherein to update the one or more parameters, the at least one processor is operable to cause the first device to detect a change in one or more characteristics associated with channel state information (CSI). 5. The first device of claim 1, wherein to update the one or more parameters, the at least one processor is further operable to cause the first device to detect a change in one or more channel conditions associated with a channel between the first device and the second device. Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 61 6. The first device of claim 1, wherein to update the one or more parameters, the at least one processor is operable to cause the first device to determine a quality of an output from the encoder learning model fails to satisfy a threshold value. 7. The first device of claim 1, wherein the at least one processor is further operable to cause the first device to obtain the plurality of data samples based at least in part on estimating channel state information (CSI). 8. The first device of claim 1, wherein the at least one processor is further operable to cause the first device to receive an additional message that indicates the plurality of data samples. 9. The first device of claim 1, wherein to select the subset of data samples, the at least one processor is operable to cause the first device to determine that the subset of data samples satisfies one or more conditions, and wherein the one or more conditions comprise a distance metric between respective data samples in the subset of data samples being greater than a threshold dissimilarity value and an entropy value of respective data samples in the subset of data samples satisfying a threshold entropy value. 10. The first device of claim 9, wherein the at least one processor is further operable to cause the first device to determine one or more of the threshold dissimilarity value or the threshold entropy value. 11. The first device of claim 9, wherein the at least one processor is further operable to cause the first device to receive an additional message that indicates one or more of the threshold dissimilarity value or the threshold entropy value. 12. The first device of claim 1, wherein to update the one or more parameters, the at least one processor is operable to cause the first device to: propagate the subset of data samples from a first layer associated with the encoder learning model to a last layer associated with the encoder learning model; and compute one or more normalization statistics associated with the at least one normalization layer, wherein the one or more normalization statistics comprise a value of a mean parameter and a value of a variance parameter. Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 62 13. The first device of claim 1, wherein: the one or more parameters comprise one or more normalization statistics associated with the at least one normalization layer; the one or more normalization statistics comprise a value of a mean parameter and a value of a variance parameter; and the at least one normalization layer is a batch normalization layer. 14. A method performed by a first device, the method comprising: selecting a subset of data samples from a plurality of data samples associated with an encoder learning model, wherein the encoder learning model is associated with a two-sided learning model that comprises the encoder learning model and a decoder learning model; updating one or more parameters corresponding to at least one normalization layer of the encoder learning model based at least in part on providing the subset of data samples as input to the encoder learning model; and transmitting, to a second device, a message comprising an output from the encoder learning model, wherein the output is associated with the subset of data samples. 15. A first device, comprising: at least one memory; and at least one processor coupled with the at least one memory and operable to cause the first device to: receive, from a second device, a message comprising an output from an encoder learning model, wherein: the output is associated with a subset of data samples of a plurality of data samples; and the encoder learning model is associated with a two-sided learning model that comprises the encoder learning model and a decoder learning model; and update one or more parameters corresponding to at least one normalization layer of the decoder learning model based at least in part on providing the output as input to the decoder learning model. Firm Ref. No. SMM920240007-WO-PCT Lenovo Ref. No. SMM92040007-WO-PCT 63 16. The first device of claim 15, wherein to update the one or more parameters, the at least one processor is operable to cause the first device to update the one or more parameters based at least in part on a periodicity. 17. The first device of claim 15, wherein to update the one or more parameters, the at least one processor is operable to cause the first device to receive, from at least one of the second device or a third device, an additional message that indicates for the first device to update the one or more parameters. 18. The first device of claim 15, wherein to update the one or more parameters, the at least one processor is operable to cause the first device to detect a change in one or more channel conditions associated with a channel between the first device and the second device. 19. The first device of claim 15, wherein: the one or more parameters comprise one or more normalization statistics associated with the at least one normalization layer; the one or more normalization statistics comprise a value of a mean parameter and a value of a variance parameter; and the at least one normalization layer is a batch normalization layer. 20. A method performed by a first device, the method comprising: receiving, from a second device, a message comprising an output from an encoder learning model, wherein: the output is associated with a subset of data samples of a plurality of data samples; and the encoder learning model is associated with a two-sided learning model that comprises the encoder learning model and a decoder learning model; and updating one or more parameters corresponding to at least one normalization layer of the decoder learning model based at least in part on providing the output as input to the decoder learning model. Firm Ref. No. SMM920240007-WO-PCT