WO2025212470A1 - Methods for offline policy validation in reinforcement learning - Google Patents
Methods for offline policy validation in reinforcement learningInfo
- Publication number
- WO2025212470A1 WO2025212470A1 PCT/US2025/022230 US2025022230W WO2025212470A1 WO 2025212470 A1 WO2025212470 A1 WO 2025212470A1 US 2025022230 W US2025022230 W US 2025022230W WO 2025212470 A1 WO2025212470 A1 WO 2025212470A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- configuration
- action
- wtru
- indication
- receive
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W24/00—Supervisory, monitoring or testing arrangements
- H04W24/02—Arrangements for optimising operational condition
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W24/00—Supervisory, monitoring or testing arrangements
- H04W24/10—Scheduling measurement reports ; Arrangements for measurement reports
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B7/00—Radio transmission systems, i.e. using radiation field
- H04B7/02—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
- H04B7/04—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
- H04B7/06—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station
- H04B7/0686—Hybrid systems, i.e. switching and simultaneous transmission
- H04B7/0695—Hybrid systems, i.e. switching and simultaneous transmission using beam selection
- H04B7/06952—Selecting one or more beams from a plurality of beams, e.g. beam training, management or sweeping
Definitions
- Reinforcement learning (RL) along with supervised and unsupervised learning form the three key learning paradigms in the field of artificial intelligence/machine learning (AI/ML). While the three methods seek to learn a model from data through training, they are fundamentally different in the way the learning process is carried over. For instance, supervised learning algorithms learn patterns and relationships between the input and output pairs and then the trained algorithm is used to predict outcomes based on new input data. Unsupervised learning algorithms, however, receive inputs with no specified outputs during the training process, with the aim of finding hidden patterns and relationships within the data using statistical means. Different from supervised and unsupervised learning, RL has a predetermined and well- defined end goal in the form of desired result which can be achieved through rewarding the desired behaviors.
- RL takes an exploratory approach, with a reward-and-punishment paradigm as the data is processed, wherein the explorations are continuously validated and improved to increase the probability of reaching the end goal.
- RL is expected to be one of the major components of automation of wireless networks and it has already found a wide variety of applications in the wireless domain.
- a wireless transmit/receive unit may include a processor and a memory.
- the WTRU may be configured to receive a first indication of a first configuration to be implemented for performing a first action related to communications on a network.
- the WTRU may be configured to receive, from the network, a second indication of a second configuration to be implemented for performing a second action related to communications on the network.
- the second action may be an anticipated or hypothetical action associated with a network-side (NW-side) reinforcement learning (RL) model.
- the WTRU may perform the first action related to the communications on the network using the first configuration and determine a first outcome based on a first performance metric associated with the first action.
- the WTRU may perform a validation of the NW-side RL model by being configured to determine a second outcome based on a second performance metric associated with performance of the second action without actually implementing the second configuration to perform the second action related to the communications on the network, and determine a quality of the second action compared to a quality of the first action based on the first outcome and the second outcome.
- the WTRU may report the determined quality of the second action for validation of the anticipated or hypothetical action associated with the NW-side RL model.
- the first configuration and the second configuration may each include a different downlink (DL) beam configuration, different power saving modes, different protocol parameters, and/or different timers.
- DL downlink
- the first configuration may be a first downlink (DL) beam configuration and the second configuration may be a second DL beam configuration.
- the first action may include the WTRU switching to the first DL beam to receive data transmissions.
- the second action may include the WTRU switching to the second DL beam provided by the NW-side RL model to receive the data transmissions.
- the second performance metric may include a measurement on the second DL beam.
- the first configuration may be a first downlink control information (DCI) configuration indicating a first modulation and coding scheme (MCS).
- the second configuration may be a second DCI configuration indicating a second MCS.
- the first action may include the WTRU applying the first DCI configuration to receive DL transmissions.
- the second action may include the WTRU switching to apply the second DCI configuration to receive the DL transmissions.
- the second performance metric may include a measurement on the second DCI configuration.
- the first configuration may be a first sub-band configuration.
- the second configuration may be a second sub-band configuration.
- the first action may include the WTRU applying the first sub-band configuration to receive DL transmissions.
- the second action may include the WTRU switching to apply the second sub-band configuration to receive the DL transmissions.
- the second performance metric may include a measurement on the second sub-band configuration.
- the first configuration may be a first power level configuration for a first DL transmission.
- the second configuration may be a second power level configuration for a second DL transmission.
- the first action may include the WTRU applying the first power level configuration to receive DL transmissions.
- the second action may include the WTRU switching to apply the second power level configuration to receive the DL transmissions.
- the second performance metric may include a measurement on the second power level configuration.
- the first performance metric and the second performance metric may be determined based on one or more preconfigured criteria.
- Each of the one or more preconfigured criteria may include a hybrid automatic repeat request (HARQ) acknowledgement (ACK), an average signal received power (RSRP), a signal-to- interference-plus-noise ratio (SI NR), and/or a channel quality indicator (CQI).
- HARQ hybrid automatic repeat request
- RSRP average signal received power
- SI NR signal-to- interference-plus-noise ratio
- CQI channel quality indicator
- Each of the one or more preconfigured criteria may include an average number of beam switches.
- FIG. 1 A is a system diagram illustrating an example communications system in which one or more disclosed embodiments may be implemented.
- FIG. 1 B is a system diagram illustrating an example wireless transmit/receive unit (WTRU) that may be used within the communications system illustrated in FIG. 1A according to an embodiment.
- WTRU wireless transmit/receive unit
- FIG. 1C is a system diagram illustrating an example radio access network (RAN) and an example core network (CN) that may be used within the communications system illustrated in FIG. 1 A according to an embodiment.
- RAN radio access network
- CN core network
- FIG. 1 D is a system diagram illustrating a further example RAN and a further example CN that may be used within the communications system illustrated in FIG. 1 A according to an embodiment.
- FIG. 2 illustrates an example of a reinforcement learning framework.
- FIG. 3 illustrates a network-sided (NW-sided) Reinforcement learning (RL) model used for beam selection with the aim of reducing latency and/or measurement overhead.
- NW-sided network-sided
- RL Reinforcement learning
- FIG. 4 illustrates a WTRU-sided RL model used for PMI selection with the aim of reducing the average overhead while achieving a target performance.
- FIG. 1A is a diagram illustrating an example communications system 100 in which one or more disclosed embodiments may be implemented.
- the communications system 100 may be a multiple access system that provides content, such as voice, data, video, messaging, broadcast, etc., to multiple wireless users.
- the communications system 100 may enable multiple wireless users to access such content through the sharing of system resources, including wireless bandwidth.
- the communications systems 100 may employ one or more channel access methods, such as code division multiple access (CDMA), time division multiple access (TDMA), frequency division multiple access (FDMA), orthogonal FDMA (OFDMA), single-carrier FDMA (SC-FDMA), zero-tail unique-word DFT-Spread OFDM (ZT UW DTS-s OFDM), unique word OFDM (UW-OFDM), resource block-filtered OFDM, filter bank multicarrier (FBMC), and the like.
- CDMA code division multiple access
- TDMA time division multiple access
- FDMA frequency division multiple access
- OFDMA orthogonal FDMA
- SC-FDMA single-carrier FDMA
- ZT UW DTS-s OFDM zero-tail unique-word DFT-Spread OFDM
- UW-OFDM unique word OFDM
- FBMC filter bank multicarrier
- the communications system 100 may include wireless transmit/receive units (WTRUs) 102a, 102b, 102c, 102d, a RAN 104/113, a CN 106/115, a public switched telephone network (PSTN) 108, the Internet 110, and other networks 112, though it will be appreciated that the disclosed embodiments contemplate any number of WTRUs, base stations, networks, and/or network elements.
- WTRUs 102a, 102b, 102c, 102d may be any type of device configured to operate and/or communicate in a wireless environment.
- any of the WTRUs 102a, 102b, 102c and 102d may be interchangeably referred to as a WTRU. Further, any description herein that is described with reference to a UE may be equally applicable to a WTRU (or vice versa). For example, a WTRU may be configured to perform any of the processes or procedures described herein as being performed by a UE (or wee versa).
- the communications systems 100 may also include a base station 114a and/or a base station 114b.
- Each of the base stations 114a, 114b may be any type of device configured to wirelessly interface with at least one of the WTRUs 102a, 102b, 102c, 102d to facilitate access to one or more communication networks, such as the CN 106/115, the I nternet 110, and/or the other networks 112.
- the base station 114a may be part of the RAN 104/113, which may also include other base stations and/or network elements (not shown), such as a base station controller (BSC), a radio network controller (RNC), relay nodes, etc.
- BSC base station controller
- RNC radio network controller
- the base station 114a and/or the base station 114b may be configured to transmit and/or receive wireless signals on one or more carrier frequencies, which may be referred to as a cell (not shown). These frequencies may be in licensed spectrum, unlicensed spectrum, or a combination of licensed and unlicensed spectrum.
- a cell may provide coverage for a wireless service to a specific geographical area that may be relatively fixed or that may change over time. The cell may further be divided into cell sectors.
- the cell associated with the base station 114a may be divided into three sectors.
- the base station 114a may include three transceivers, i.e., one for each sector of the cell.
- the base station 114a may employ multiple-input multiple output (MIMO) technology and may utilize multiple transceivers for each sector of the cell.
- MIMO multiple-input multiple output
- beamforming may be used to transmit and/or receive signals in desired spatial directions.
- the communications system 100 may be a multiple access system and may employ one or more channel access schemes, such as CDMA, TDMA, FDMA, OFDMA, SC-FDMA, and the like.
- the base station 114a in the RAN 104/113 and the WTRUs 102a, 102b, 102c may implement a radio technology such as Universal Mobile Telecommunications System (UMTS) Terrestrial Radio Access (UTRA), which may establish the air interface 115/116/117 using wideband CDMA (WCDMA).
- WCDMA may include communication protocols such as High-Speed Packet Access (HSPA) and/or Evolved HSPA (HSPA+).
- HSPA may include High-Speed Downlink (DL) Packet Access (HSDPA) and/or High-Speed UL Packet Access (HSUPA).
- the base station 114a and the WTRUs 102a, 102b, 102c may implement radio technologies such as IEEE 802.11 (i.e., Wireless Fidelity (WiFi), IEEE 802.16 (i.e., Worldwide Interoperability for Microwave Access (WiMAX)), CDMA2000, CDMA2000 1X, CDMA2000 EV-DO, Interim Standard 2000 (IS-2000), Interim Standard 95 (IS-95), Interim Standard 856 (IS-856), Global System for Mobile communications (GSM), Enhanced Data rates for GSM Evolution (EDGE), GSM EDGE (GERAN), and the like.
- IEEE 802.11 i.e., Wireless Fidelity (WiFi)
- IEEE 802.16 i.e., Worldwide Interoperability for Microwave Access (WiMAX)
- CDMA2000, CDMA2000 1X, CDMA2000 EV-DO Code Division Multiple Access 2000
- IS-95 Interim Standard 95
- IS-856 Interim Standard 856
- GSM Global System for
- the base station 114b in FIG. 1 A may be a wireless router, Home Node B, Home eNode B, or access point, for example, and may utilize any suitable RAT for facilitating wireless connectivity in a localized area, such as a place of business, a home, a vehicle, a campus, an industrial facility, an air corridor (e.g., for use by drones), a roadway, and the like.
- the base station 114b and the WTRUs 102c, 102d may implement a radio technology such as IEEE 802.11 to establish a wireless local area network (WLAN).
- WLAN wireless local area network
- the RAN 104/113 may be in communication with the CN 106/115, which may be any type of network configured to provide voice, data, applications, and/or voice over internet protocol (VoIP) services to one or more of the WTRUs 102a, 102b, 102c, 102d.
- the data may have varying quality of service (QoS) requirements, such as differing throughput requirements, latency requirements, error tolerance requirements, reliability requirements, data throughput requirements, mobility requirements, and the like.
- QoS quality of service
- the CN 106/115 may provide call control, billing services, mobile location-based services, pre-paid calling, Internet connectivity, video distribution, etc., and/or perform high-level security functions, such as user authentication.
- the CN 106/115 may also serve as a gateway for the WTRUs 102a, 102b, 102c, 102d to access the PSTN 108, the Internet 110, and/or the other networks 112.
- the PSTN 108 may include circuit- switched telephone networks that provide plain old telephone service (POTS).
- POTS plain old telephone service
- the Internet 110 may include a global system of interconnected computer networks and devices that use common communication protocols, such as the transmission control protocol (TCP), user datagram protocol (UDP) and/or the internet protocol (IP) in the TCP/IP internet protocol suite.
- the networks 112 may include wired and/or wireless communications networks owned and/or operated by other service providers.
- the networks 112 may include another CN connected to one or more RANs, which may employ the same RAT as the RAN 104/113 or a different RAT.
- Some or all of the WTRUs 102a, 102b, 102c, 102d in the communications system 100 may include multi-mode capabilities (e.g., the WTRUs 102a, 102b, 102c, 102d may include multiple transceivers for communicating with different wireless networks over different wireless links).
- the WTRU 102c shown in FIG. 1A may be configured to communicate with the base station 114a, which may employ a cellular-based radio technology, and with the base station 114b, which may employ an IEEE 802 radio technology.
- FIG. 1 B is a system diagram illustrating an example WTRU 102.
- the WTRU 102 may include a processor 118, a transceiver 120, a transmit/receive element 122, a speaker/microphone 124, a keypad 126, a display/touchpad 128, non-removable memory 130, removable memory 132, a power source 134, a global positioning system (GPS) chipset 136, and/or other peripherals 138, among others.
- GPS global positioning system
- the processor 118 may be a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) circuits, any other type of integrated circuit (IC), a state machine, and the like.
- the processor 118 may perform signal coding, data processing, power control, input/output processing, and/or any other functionality that enables the WTRU 102 to operate in a wireless environment.
- the processor 118 may be coupled to the transceiver 120, which may be coupled to the transmit/receive element 122. While FIG. 1B depicts the processor 118 and the transceiver 120 as separate components, it will be appreciated that the processor 118 and the transceiver 120 may be integrated together in an electronic package or chip.
- the WTRU 102 may include any number of transmit/receive elements 122. More specifically, the WTRU 102 may employ MIMO technology. Thus, in one embodiment, the WTRU 102 may include two or more transmit/receive elements 122 (e.g., multiple antennas) for transmitting and receiving wireless signals over the air interface 116.
- the WTRU 102 may include two or more transmit/receive elements 122 (e.g., multiple antennas) for transmitting and receiving wireless signals over the air interface 116.
- the transceiver 120 may be configured to modulate the signals that are to be transmitted by the transmit/receive element 122 and to demodulate the signals that are received by the transmit/receive element 122.
- the WTRU 102 may have multi-mode capabilities.
- the transceiver 120 may include multiple transceivers for enabling the WTRU 102 to communicate via multiple RATs, such as NR and IEEE 802.11 , for example.
- the non-removable memory 130 may include random-access memory (RAM), read-only memory (ROM), a hard disk, or any other type of memory storage device.
- the removable memory 132 may include a subscriber identity module (SIM) card, a memory stick, a secure digital (SD) memory card, and the like.
- SIM subscriber identity module
- SD secure digital
- the processor 118 may access information from, and store data in, memory that is not physically located on the WTRU 102, such as on a server or a home computer (not shown).
- the processor 118 may receive power from the power source 134, and may be configured to distribute and/or control the power to the other components in the WTRU 102.
- the power source 134 may be any suitable device for powering the WTRU 102.
- the power source 134 may include one or more dry cell batteries (e.g., nickel-cadmium (NiCd), nickel-zinc (NiZn), nickel metal hydride (NiMH), lithium-ion (Li-ion), etc.), solar cells, fuel cells, and the like.
- the processor 118 may further be coupled to other peripherals 138, which may include one or more software and/or hardware modules that provide additional features, functionality and/or wired or wireless connectivity.
- the peripherals 138 may include an accelerometer, an e-compass, a satellite transceiver, a digital camera (for photographs and/or video), a universal serial bus (USB) port, a vibration device, a television transceiver, a hands free headset, a Bluetooth® module, a frequency modulated (FM) radio unit, a digital music player, a media player, a video game player module, an Internet browser, a Virtual Reality and/or Augmented Reality (VR/AR) device, an activity tracker, and the like.
- FM frequency modulated
- the peripherals 138 may include one or more sensors, the sensors may be one or more of a gyroscope, an accelerometer, a hall effect sensor, a magnetometer, an orientation sensor, a proximity sensor, a temperature sensor, a time sensor; a geolocation sensor; an altimeter, a light sensor, a touch sensor, a magnetometer, a barometer, a gesture sensor, a biometric sensor, and/or a humidity sensor.
- a gyroscope an accelerometer, a hall effect sensor, a magnetometer, an orientation sensor, a proximity sensor, a temperature sensor, a time sensor; a geolocation sensor; an altimeter, a light sensor, a touch sensor, a magnetometer, a barometer, a gesture sensor, a biometric sensor, and/or a humidity sensor.
- the WTRU 102 may include a full duplex radio for which transmission and reception of some or all of the signals (e.g., associated with particular subframes for both the UL (e.g., for transmission) and downlink (e.g., for reception) may be concurrent and/or simultaneous.
- the full duplex radio may include an interference management unit 139 to reduce and or substantially eliminate self-interference via either hardware (e.g., a choke) or signal processing via a processor (e.g., a separate processor (not shown) or via processor 118).
- the WRTU 102 may include a half-duplex radio for which transmission and reception of some or all of the signals (e.g., associated with particular subframes for either the UL (e.g, for transmission) or the downlink (e.g, for reception)).
- a half-duplex radio for which transmission and reception of some or all of the signals (e.g., associated with particular subframes for either the UL (e.g, for transmission) or the downlink (e.g, for reception)).
- the first configuration may be a first sub-band configuration.
- the second configuration may be a second sub-band configuration.
- the first action may include the WTRU applying the first sub-band configuration to receive DL transmissions.
- the second action may include the WTRU switching to apply the second sub-band configuration to receive the DL transmissions.
- the second performance metric may include a measurement on the second sub-band configuration.
- the first configuration may be a first power level configuration for a first DL transmission.
- the second configuration may be a second power level configuration for a second DL transmission.
- the first action may include the WTRU applying the first power level configuration to receive DL transmissions.
- the second action may include the WTRU switching to apply the second power level configuration to receive the DL transmissions.
- the second performance metric may include a measurement on the second power level configuration
- WTRU may perform the action (e.g., a reference action) (/apply the configuration) associated with first indication.
- the WTRU may perform a validation of NW-side RL model by determining a second outcome (e.g., a hypothetical or anticipated outcome) of the second action associated with second indication without actually performing the second action.
- the validation may be performed of the NW-side RL model without applying the configuration associated with second indication.
- anticipated or hypothetical action(s) may include hypothetical application of one or more DL beam(s) indices selected by the RL model and hypothetical application of one or more sub-band indices and/or allocated power selected by the RL model.
- the WTRU may determine if an indication is a first indication, or a second indication based on explicit information or implicit information, (e.g., a separate field in DCI, MAC CE, order of indication, different resources/channels/messages etc.).
- explicit information or implicit information e.g., a separate field in DCI, MAC CE, order of indication, different resources/channels/messages etc.
- the WTRU may determine the quality of the second action (e.g., hypothetical or anticipated action), based on one or more of the following.
- the WTRU may determine the quality of hypothetical action(s) based on measuring the performance metric (e.g., function of use-case) assuming hypothetical application of the action.
- the performance metric may include an average reference signal received power (RSRP), signal-to-interference-plus-noise ratio (SINR) over a configured period, an average number of beam switches in a configured period, a number of beam failures over a configured period (with or without successful recovery), and/or an average throughput over a configured period.
- RSRP average reference signal received power
- SINR signal-to-interference-plus-noise ratio
- the WTRU may determine the quality of hypothetical action(s) based on based on a comparison of performance metric associated with hypothetical action(s) with a preconfigured performance criteria and/or condition.
- the determination of the quality of hypothetical actions may include a measured quantity and comparing it with a threshold (e.g., SINR/signal-to-noise ratio (SNR)ZRSRP on one of the sub-bands), a measured average number of beam switches and compare it with a configured metric threshold, a measured average number of beam failures and compare it with a configured metric threshold, a measured average RSRP over a period and comparing it with a threshold, and/or a measured reward over a period and comparing it with the configured expected reward.
- a threshold e.g., SINR/signal-to-noise ratio (SNR)ZRSRP on one of the sub-bands
- SNR signal-to-noise ratio
- the WTRU may determine the quality of hypothetical action(s) based on an improvement/degradation of performance metric or quality of hypothetical action(s) relative to the performance metric or quality of a reference action(s). For example, the WTRU may compare the performance metrics associated with the hypothetical action(s) and reference action(s).
- the WTRU may report the determined quality of the second action(s) (e.g. hypothetical or anticipated actions), such as for NW-sided RL model validation. For example, reporting may be triggered when preconfigured conditions are satisfied. If the determined quality of hypothetical actions(s) exceeds or is less than the preconfigured threshold or a performance metric associated with hypothetical action(s) is offset better or worse than a performance metric associated with a reference action(s) over n configured periods. For example, reporting may be transmitted periodically (e.g., every n configured period over which the performance metrics may be calculated.) For example, the report may include one or more indication formats. The one or more indication formats may include Boolean and/or multiple quantized levels for quality (e.g., represented in multiple levels depending on the difference between the measured metric and the configured metric threshold).
- the one or more indication formats may include Boolean and/or multiple quantized levels for quality (e.g., represented in multiple levels depending on the difference between the measured metric and the configured metric threshold).
- the proposed embodiment may offer a viable way to evaluate and/or validate offline-trained NW- sided RL models. This may be beneficial when there is a discrepancy between the state and/or action space used in the training environment and the state and/or action space in the real environment or some of the high-reward regions of the state space are missed from the training.
- WTRU-assisted NW-sided RL offline policy evaluation is introduced herein.
- Configuration of WTRU- based assistance of NW-sided RL model validation is introduced herein.
- the WTRU may be configured to receive a first indication and a second indication.
- the WTRU may perform a first action based on the first indication and a second action based on the second indication.
- the WTRU may perform a reference action upon receiving the first indication.
- the reference action may be associated with one or more of applying the configuration associated with first indication, performing a measurement and/or reception using a configuration at least in part determined by the first indication, triggering and/or performing a transmission using a configuration at least in part determined by the first indication, suspending ongoing and/or future transmission, entering power saving mode, setting/resetting/modifying one or more the protocol parameters, timers, counters using a configuration at least in part determined by the first indication, and the like.
- the WTRU may not act directly on the second indication but may be configured to monitor and/or measure the impacts assuming that the second action (e.g., the hypothetical action) associated with the second indication were performed.
- the second action may be associated with one or more of applying the configuration associated with second indication, performing a measurement and/or reception using a configuration at least in part determined by the second indication, triggering and/or performing a transmission using a configuration at least in part determined by the second indication, suspending ongoing and/or future transmission, entering power saving mode, setting/resetting/modifying one or more the protocol parameters, timers, counters using a configuration at least in part determined by the second indication, and the like.
- the first indication and second indication may each be associated with same logical identity.
- Configuration of performance and quality metric is discussed herein.
- the WTRU may be configured to measure the performance metric associated with hypothetical actions based on second indication. Possibly such performance metric may be determined based on one or more preconfigured criteria(s), rule(s), and/or condition(s).
- the performance metric may be modeled as a reward metric. For example, hypothetical actions that lead to better performance (e.g., higher throughput, lower latency, higher power saving, etc.) may be assigned higher rewards than the actions that lead to degraded performance (e.g., lower throughput, higher latency, high power consumption etc.).
- the WTRU may be configured to determine and/or report a performance metric within a preconfigured range. For example, the WTRU may be configured with a minimum value, a maximum value, and quantized values in between with preconfigured step size.
- the WTRU may be configured to measure the quality of hypothetical actions associated with second indication.
- the quality metric may be derived based on comparison of performance metric associated with hypothetical action against a preconfigured threshold. For example, the WTRU may be configured to monitor and/or determine if the hypothetical actions based on second indication would lead to at least one performance metric better than threshold.
- the quality metric may be derived based on comparison of performance metric of hypothetical actions against the performance metric of reference action. For example, the WTRU may be configured to monitor and/or determine if the hypothetical actions based on second indications would be better than the reference actions associated with first indication.
- the WTRU may be configured to determine and/or report a quality metric within a preconfigured range.
- the WTRU may be configured with a minimum value, a maximum value, and quantized values in between with preconfigured step size.
- the WTRU may be configured with multiple quantized levels for quality (e.g., represented in multiple levels depending on the difference between the measured metric and the configured metric threshold).
- the WTRU may be configured to report a Boolean value as quality metric.
- the Boolean value may indicate whether the hypothetical action is better or worse than the reference action. Alternately the Boolean value may indicate whether the hypothetical action is better or worse than the preconfigured threshold.
- the evaluations may include measuring a performance metric and comparing it to a threshold (e.g., SINR/SNR/RSRP on one of the sub-bands, or RSRP from another cell).
- the evaluation may include computing a performance metric and comparing it to a threshold (e.g., computing the block error rate for the current SINR and MCS).
- the evaluations may include measuring an average RSRP, SINR over a configured period and comparing it with a configured metric threshold.
- the evaluations may include measuring an average number of beam switches and comparing it with a configured metric threshold.
- the evaluations may include measuring an average number of beam failures and comparing it with a configured metric threshold.
- the evaluations may include measuring an average RSRP over a period and comparing it with a threshold.
- the certain configured threshold or criteria may include when the performance of the hypothetical action exceeds or is lower than the performance of the reference action.
- the WTRU may be configured to report the quality information based on a feedback resource selection, or feedback resource, or change thereof.
- the WTRU may be configured to report the quality information based on a performance of an associated function. For example, the WTRU may determine the rate of HARQ-ACK associated with the hypothetical actions. If the performance drops below a threshold, the WTRU may report the quality of the hypothetical action.
- the hypothetical action represents the output of the RL-model to be validated.
- the second indication may imply that the NW may provide feedback on the quality of the hypothetical action.
- the reference action may represent a first precoding matrix indicator (PMI) selection by the WTRU.
- the first PMI may be used for beamforming in the next transmission.
- the hypothetical action may represent a second PMI selection by the WTRU.
- the second selected PMI may be evaluated (e.g., only evaluated) by the NW.
- the reference action may represent a first MCS selected by the WTRU.
- the selected modulating and coding scheme (MCS) may be used in data transmission.
- the hypothetical action may represent a second PMI selected by the WTRU.
- the selected PMI may be evaluated by the NW.
- the WTRU may receive a first transmission and a second transmission.
- the first transmission may be associated with the network response of the WTRU first indication.
- the second transmission may be associated with the network response to the WTRU second indication.
- the first transmission may represent data transmission using the first selected/indicated PMI, for example, associated with the reference action.
- the second transmission may represent reference signal (RS) transmission using the second selected PMI (e.g., only RS transmission using the second selected PMI), for example, associated with the hypothetical action and/or the second transmission may represent feedback information about one or more performance aspect related to the second PMI.
- RS reference signal
- the WTRU may be configured to determine the quality of second transmission, for example, the transmission associated with the hypothetical action generated by the WTRU-sided RL model.
- the WTRU may be configured to measure a performance metric associated with the hypothetical action.
- the WTRU may measure HARQ ACKs and/or NACKs rate or block error rate (BLER) associated with the hypothetical action.
- BLER block error rate
- the WTRU may perform one or more measurements associated with the hypothetical action.
- the hypothetical action represents a PMI selected by the RL model and is applied on RS transmission by the NW, the measurements may include SINR and/or RSRP and/or CQI.
- the performance metric and/or measurements may be collected over a configured period.
- the WTRU may be configured to report the quality information based on one or more of the following triggers.
- the WTRU may be configured to report the quality information based on a time-event trigger.
- the WTRU may be configured with time instances or specific slots to determine and report the quality information of the hypothetical actions.
- the WTRU may be configured to report the quality information of the hypothetical actions every N configured period over which the performance metrics are calculated.
- the WTRU may be configured to report the quality information based on a measurement trigger.
- the WTRU may report the quality information associated with the hypothetical actions based on one or more measurements achieving certain configured threshold or criteria.
- the one or more measurements achieving certain configured threshold or criteria may include threshold or criteria when the performance of the hypothetical actions exceeds or is lower than a preconfigured threshold.
- the one or more measurements achieving certain configured threshold or criteria may include threshold or criteria when the performance of the hypothetical action exceeds or is lower than the performance of the reference action.
- the WTRU may be configured to report the quality information based on a feedback resource selection, or feedback resource, or change thereof.
- the WTRU may be configured to report the quality information based on a performance of an associated function. For example, the WTRU may determine the rate of HARQ-ACK associated with the hypothetical actions. If the performance drops below a threshold, the WTRU may report the quality of the hypothetical action.
- a WTRU may receive a configuration for validating the NW-sided RL model.
- the configuration may include a configuration of a validation mode. Actions selected/determined by RL model may not be performed but informed, which is referred to as hypothetical actions.
- the configuration may include criteria(s)/rules/conditions for RL model validation. The criteria may be associated with determined quality of actions evaluated against one or more preconfigured performance threshold(s). For example, the WTRU may determine a metric associated with hypothetical action. The WTRU may determine expected reward based on the metric and evaluate and/or report the quality based on absolute or relative threshold.
- the configuration may include a reporting configuration for RL model validation (e.g., resources for reporting, periodicity, model validation reporting format, etc.)
- the WTRU may receive a first indication and a second indication.
- the WTRU may perform the action (e.g., reference action) and/or apply the configuration associated with first indication.
- the WTRU may perform validation of RL model by determining the hypothetical outcome of the action associated with second indication without actually performing the action (e.g, hypothetical action) and/or without applying the configuration associated with second indication.
- Examples of hypothetical action(s) may include a hypothetical application of one or more DL beam(s) indices selected by the RL model, a hypothetical application of one or more sub-band indices, and/or allocated power selected by the RL model.
- the WTRU may determine if an indication is a first indication, or a second indication based on explicit information or implicit information, (e.g, a separate field in DCI, MAC CE, order of indication, different resources/channels/messages, etc.).
- explicit information or implicit information e.g, a separate field in DCI, MAC CE, order of indication, different resources/channels/messages, etc.
- the WTRU may determine the quality of hypothetical action(s), based on one or more of the following steps.
- the one or more steps may include measuring the performance metric (e.g, function of use-case) assuming hypothetical application of the action (e.g, an average RSRP, SINR over a configured period, an average number of beam switches in a configured period, a number of beam failures over a configured period with or without successful recovery), and/or an average throughput over a configured period.
- the WTRU may determine the quality of the hypothetical action(s) based on a comparison of performance metric associated with hypothetical action(s) with a preconfigured performance criteria and/or condition.
- the WTRU may determine the quality of the hypothetical actions based on improvement and/or degradation of performance metric or quality of hypothetical action(s) relative to the performance metric or quality of a reference action(s). For example, the WTRU may determine the quality of the hypothetical actions based on comparing the performance metrics associated with the hypothetical action(s) and reference action(s)).
- the WTRU may report the determined quality of hypothetical action(s) (e.g., for NW-sided RL model validation). For example, reporting may be triggered when preconfigured conditions are satisfied. Reporting may be triggered if the determined quality of hypothetical actions(s) exceeds or is less than the preconfigured threshold or a performance metric associated with hypothetical action(s) is offset better/worse than a performance metric associated with a reference action(s) over n configured periods. For example, reporting may be transmitted periodically (e.g., every n configured period over which the performance metrics are calculated). For example, the report may include one or more indication formats: Boolean, multiple quantized levels for quality (e.g., represented in multiple levels depending on the difference between the measured metric and the configured metric threshold).
- FIG. 3 illustrates a flow diagram 300 comprising an NW-sided RL model 302 used for beam selection with the aim of reducing latency and/or measurement overhead.
- the state space 304 may be defined with RS or CSI measurements while the action space 306 may include one or more selected beam indices.
- the goal of the RL model is to select the beams that maximize the average RSRP while minimizing the number of beam switches over a configured period.
- a WTRU may transmit a capability associated with an RL mode.
- the capability may include an RL model ID (e.g, implicitly indicates the configuration of state and/or action space or target policy), a support of a feature (e.g, beam management), and/or a status of RL model (e.g, validated or not, Time of last validation, Area of last validation, etc.).
- the WTRU may transmit the first and second indication.
- the WTRU may differentiate the first indication and second indication based on explicit information or implicit means (e.g., a separate field in DCI, MAC CE, order of indication, different resources/channels/messages, etc.).
- the WTRU may determine the quality of second transmission (e.g., reception associated with the RL selected PMI) based on one or more of the following steps.
- the WTRU may determine the quality of the second transmission by measuring the performance metric (e.g., function of use-case) assuming hypothetical action.
- the performance metric may include a HARQ ACK and/or a SI NR/RSRP/CQI (e.g., associated with the selected PMI by the RL model).
- the WTRU may determine the quality of the second transmission based on a comparison of performance metric associated with hypothetical action(s) with a preconfigured performance criteria/condition.
- the WTRU may determine the quality of the second transmission by comparing a measured quantity with a threshold (e.g., SINR/SNR/RSRP associated with the selected PMI by the RL model). For example, the WTRU may determine the quality of the second transmission by comparing a measured reward over a period and comparing it with the training reward. The WTRU may determine the quality of the second transmission by an improvement and/or degradation of hypothetical action(s) relative to a reference action(s) (e.g., comparing the performance metrics, for example, RSRP and/or SI NR, associated with the hypothetical action(s) and reference action(s)). The WTRU may determine the quality of the second transmission based on NW feedback associated with the hypothetical action, for example, multiuser interference metric associated with the selected PMI by the RL model.
- a threshold e.g., SINR/SNR/RSRP associated with the selected PMI by the RL model.
- Reinforcement learning (RL) along with supervised and unsupervised learning form the three key learning paradigms in the field of artificial intelligence/machine learning (AI/ML). While the three methods seek to learn a model from data through training, they are fundamentally different in the way the learning process is carried over. For instance, supervised learning algorithms learn patterns and relationships between the input and output pairs and then the trained algorithm is used to predict outcomes based on new input data. Unsupervised learning algorithms, however, receive inputs with no specified outputs during the training process, with the aim of finding hidden patterns and relationships within the data using statistical means. Different from supervised and unsupervised learning, RL has a predetermined and well- defined end goal in the form of desired result which can be achieved through rewarding the desired behaviors.
- RL takes an exploratory approach, with a reward-and-punishment paradigm as the data is processed, wherein the explorations are continuously validated and improved to increase the probability of reaching the end goal.
- RL is expected to be one of the major components of automation of wireless networks and it has already found a wide variety of applications in the wireless domain.
- the first action may be a reference action which includes one or more of applying a configuration associated with the first indication, performing a measurement or a reception using a first configuration determined by the first indication, triggering or performing a transmission using a second configuration determined by the first indication, suspending an ongoing or future transmission, entering a power saving mode, and/or setting, resetting, or modifying one or more of protocol parameters, timers, and/or counters using a third configuration determined by the first indication.
- the first configuration may be a first downlink control information (DCI) configuration indicating a first modulation and coding scheme (MCS).
- the second configuration may be a second DCI configuration indicating a second MCS.
- the first action may include the WTRU applying the first DCI configuration to receive DL transmissions.
- the second action may include the WTRU switching to apply the second DCI configuration to receive the DL transmissions.
- the second performance metric may include a measurement on the second DCI configuration.
- the second performance metric may include a measurement on the second power level configuration.
- FIG. 1 B is a system diagram illustrating an example wireless transmit/receive unit (WTRU) that may be used within the communications system illustrated in FIG. 1A according to an embodiment.
- WTRU wireless transmit/receive unit
- FIG. 1C is a system diagram illustrating an example radio access network (RAN) and an example core network (CN) that may be used within the communications system illustrated in FIG. 1 A according to an embodiment.
- RAN radio access network
- CN core network
- FIG. 1 D is a system diagram illustrating a further example RAN and a further example CN that may be used within the communications system illustrated in FIG. 1 A according to an embodiment.
- FIG. 2 illustrates an example of a reinforcement learning framework.
- FIG. 4 illustrates a WTRU-sided RL model used for PMI selection with the aim of reducing the average overhead while achieving a target performance.
- FIG. 1A is a diagram illustrating an example communications system 100 in which one or more disclosed embodiments may be implemented.
- the communications system 100 may be a multiple access system that provides content, such as voice, data, video, messaging, broadcast, etc., to multiple wireless users.
- the communications system 100 may enable multiple wireless users to access such content through the sharing of system resources, including wireless bandwidth.
- the communications systems may be a multiple access system that provides content, such as voice, data, video, messaging, broadcast, etc., to multiple wireless users.
- the communications system 100 may enable multiple wireless users to access such content through the sharing of system resources, including wireless bandwidth.
- the communications systems may be a multiple access system that provides content, such as voice, data, video, messaging, broadcast, etc.
- CDMA code division multiple access
- TDMA time division multiple access
- FDMA frequency division multiple access
- OFDMA orthogonal FDMA
- SC-FDMA single-carrier FDMA
- ZT UW DTS-s OFDM zero-tail unique-word DFT-Spread OFDM
- UW-OFDM unique word OFDM
- FBMC filter bank multicarrier
- the communications system 100 may include wireless transmit/receive units (WTRUs) 102a, 102b, 102c, 102d, a RAN 104/113, a CN 106/115, a public switched telephone network (PSTN) 108, the Internet 110, and other networks 112, though it will be appreciated that the disclosed embodiments contemplate any number of WTRUs, base stations, networks, and/or network elements.
- WTRUs 102a, 102b, 102c, 102d may be any type of device configured to operate and/or communicate in a wireless environment.
- any of the WTRUs 102a, 102b, 102c and 102d may be interchangeably referred to as a WTRU. Further, any description herein that is described with reference to a UE may be equally applicable to a WTRU (or vice versa). For example, a WTRU may be configured to perform any of the processes or procedures described herein as being performed by a UE (or wee versa).
- the communications systems 100 may also include a base station 114a and/or a base station 114b.
- Each of the base stations 114a, 114b may be any type of device configured to wirelessly interface with at least one of the WTRUs 102a, 102b, 102c, 102d to facilitate access to one or more communication networks, such as the CN 106/115, the I nternet 110, and/or the other networks 112.
- the base stations 114a, 114b may be a base transceiver station (BTS), a Node-B, an eNode B, a Home Node B, a Home eNode B, a gNB, a NR NodeB, a site controller, an access point (AP), a wireless router, and the like. While the base stations 114a, 114b are each depicted as a single element, it will be
- the base stations 114a, 114b may include any number of interconnected base stations and/or network elements.
- the base stations 114a, 114b may communicate with one or more of the WTRUs 102a, 102b, 102c, 102d over an air interface 116, which may be any suitable wireless communication link (e.g., radio frequency (RF), microwave, centimeter wave, micrometer wave, infrared (IR), ultraviolet (UV), visible light, etc.).
- the air interface 116 may be established using any suitable radio access technology (RAT).
- RAT radio access technology
- the communications system 100 may be a multiple access system and may employ one or more channel access schemes, such as CDMA, TDMA, FDMA, OFDMA, SC-FDMA, and the like.
- the base station 114a in the RAN 104/113 and the WTRUs 102a, 102b, 102c may implement a radio technology such as Universal Mobile Telecommunications System (UMTS) Terrestrial Radio Access (UTRA), which may establish the air interface 115/116/117 using wideband CDMA (WCDMA).
- WCDMA may include communication protocols such as High-Speed Packet Access (HSPA) and/or Evolved HSPA (HSPA+).
- HSPA may include High-Speed Downlink (DL) Packet Access (HSDPA) and/or High-Speed UL Packet Access (HSUPA).
- the base station 114a and the WTRUs 102a, 102b, 102c may implement a radio technology such as Evolved UMTS Terrestrial Radio Access (E-UTRA), which may establish the air interface 116 using Long Term Evolution (LTE) and/or LTE-Advanced (LTE-A) and/or LTE-Advanced Pro (LTE-A Pro).
- E-UTRA Evolved UMTS Terrestrial Radio Access
- LTE Long Term Evolution
- LTE-A LTE-Advanced
- LTE-A Pro LTE-Advanced Pro
- the base station 114a and the WTRUs 102a, 102b, 102c may implement a radio technology such as NR Radio Access , which may establish the air interface 116 using New Radio (NR).
- a radio technology such as NR Radio Access , which may establish the air interface 116 using New Radio (NR).
- the base station 114a and the WTRUs 102a, 102b, 102c may implement radio technologies such as IEEE 802.11 (i.e., Wireless Fidelity (WiFi), IEEE 802.16 (i.e., Worldwide Interoperability for Microwave Access (WiMAX)), CDMA2000, CDMA2000 1X, CDMA2000 EV-DO, Interim Standard 2000 (IS-2000), Interim Standard 95 (IS-95), Interim Standard 856 (IS-856), Global System for Mobile communications (GSM), Enhanced Data rates for GSM Evolution (EDGE), GSM EDGE (GERAN), and the like.
- IEEE 802.11 i.e., Wireless Fidelity (WiFi)
- IEEE 802.16 i.e., Worldwide Interoperability for Microwave Access (WiMAX)
- CDMA2000, CDMA2000 1X, CDMA2000 EV-DO Code Division Multiple Access 2000
- IS-95 Interim Standard 95
- IS-856 Interim Standard 856
- GSM Global System for
- the base station 114b in FIG. 1 A may be a wireless router, Home Node B, Home eNode B, or access point, for example, and may utilize any suitable RAT for facilitating wireless connectivity in a localized area, such as a place of business, a home, a vehicle, a campus, an industrial facility, an air corridor (e.g., for use by drones), a roadway, and the like.
- the base station 114b and the WTRUs 102c, 102d may implement a radio technology such as IEEE 802.11 to establish a wireless local area network (WLAN).
- WLAN wireless local area network
- the RAN 104/113 may be in communication with the CN 106/115, which may be any type of network configured to provide voice, data, applications, and/or voice over internet protocol (VoIP) services to one or more of the WTRUs 102a, 102b, 102c, 102d.
- the data may have varying quality of service (QoS) requirements, such as differing throughput requirements, latency requirements, error tolerance requirements, reliability requirements, data throughput requirements, mobility requirements, and the like.
- QoS quality of service
- the CN 106/115 may provide call control, billing services, mobile location-based services, pre-paid calling, Internet connectivity, video distribution, etc., and/or perform high-level security functions, such as user authentication. Although not shown in FIG. 1A, it will be appreciated that the RAN 104/113 and/or the CN 106/115 may be in direct or indirect communication with other RANs that employ the same RAT as the RAN 104/113 or a different RAT.
- the CN 106/115 may also serve as a gateway for the WTRUs 102a, 102b, 102c, 102d to access the PSTN 108, the Internet 110, and/or the other networks 112.
- the PSTN 108 may include circuit- switched telephone networks that provide plain old telephone service (POTS).
- POTS plain old telephone service
- the Internet 110 may include a global system of interconnected computer networks and devices that use common communication protocols, such as the transmission control protocol (TCP), user datagram protocol (UDP) and/or the internet protocol (IP) in the TCP/IP internet protocol suite.
- the networks 112 may include wired and/or wireless communications networks owned and/or operated by other service providers.
- the networks 112 may include another CN connected to one or more RANs, which may employ the same RAT as the RAN 104/113 or a different RAT.
- Some or all of the WTRUs 102a, 102b, 102c, 102d in the communications system 100 may include multi-mode capabilities (e.g., the WTRUs 102a, 102b, 102c, 102d may include multiple transceivers for communicating with different wireless networks over different wireless links).
- the WTRU 102c shown in FIG. 1A may be configured to communicate with the base station 114a, which may employ a cellular-based radio technology, and with the base station 114b, which may employ an IEEE 802 radio technology.
- the processor 118 may perform signal coding, data processing, power control, input/output processing, and/or any other functionality that enables the WTRU 102 to operate in a wireless environment.
- the processor 118 may be coupled to the transceiver 120, which may be coupled to the transmit/receive element 122. While FIG. 1B depicts the processor 118 and the transceiver 120 as separate components, it will be appreciated that the processor 118 and the transceiver 120 may be integrated together in an electronic package or chip.
- the transmit/receive element 122 may be configured to transmit signals to, or receive signals from, a base station (e.g. , the base station 114a) over the air interface 116.
- the transmit/receive element 122 may be an antenna configured to transmit and/or receive RF signals.
- the transmit/receive element 122 may be an emitter/detector configured to transmit and/or receive IR, UV, or visible light signals, for example.
- the transmit/receive element 122 may be configured to transmit and/or receive both RF and light signals. It will be appreciated that the transmit/receive element 122 may be configured to transmit and/or receive any combination of wireless signals.
- the WTRU 102 may include any number of transmit/receive elements 122. More specifically, the WTRU 102 may employ MIMO technology. Thus, in one embodiment, the WTRU 102 may include two or more transmit/receive elements 122 (e.g., multiple antennas) for transmitting and receiving wireless signals over the air interface 116.
- the WTRU 102 may include two or more transmit/receive elements 122 (e.g., multiple antennas) for transmitting and receiving wireless signals over the air interface 116.
- the transceiver 120 may be configured to modulate the signals that are to be transmitted by the transmit/receive element 122 and to demodulate the signals that are received by the transmit/receive element 122.
- the WTRU 102 may have multi-mode capabilities.
- the transceiver 120 may include multiple transceivers for enabling the WTRU 102 to communicate via multiple RATs, such as NR and IEEE 802.11 , for example.
- the processor 118 of the WTRU 102 may be coupled to, and may receive user input data from, the speaker/microphone 124, the keypad 126, and/or the display/touchpad 128 (e.g., a liquid crystal display (LCD) display unit or organic light-emitting diode (OLED) display unit).
- the processor 118 may also output user data to the speaker/microphone 124, the keypad 126, and/or the display/touchpad 128.
- the processor 118 may access information from, and store data in, any type of suitable memory, such as the non-removable memory 130 and/or the removable memory 132.
- the non-removable memory 130 may
- the processor 118 may also be coupled to the GPS chipset 136, which may be configured to provide location information (e.g., longitude and latitude) regarding the current location of the WTRU 102.
- location information e.g., longitude and latitude
- the WTRU 102 may receive location information over the air interface 116 from a base station (e.g., base stations 114a, 114b) and/or determine its location based on the timing of the signals being received from two or more nearby base stations. It will be appreciated that the WTRU 102 may acquire location information by way of any suitable locationdetermination method while remaining consistent with an embodiment.
- the processor 118 may further be coupled to other peripherals 138, which may include one or more software and/or hardware modules that provide additional features, functionality and/or wired or wireless connectivity.
- the peripherals 138 may include an accelerometer, an e-compass, a satellite transceiver, a digital camera (for photographs and/or video), a universal serial bus (USB) port, a vibration device, a television transceiver, a hands free headset, a Bluetooth® module, a frequency modulated (FM) radio unit, a digital music player, a media player, a video game player module, an Internet browser, a Virtual Reality and/or Augmented Reality (VR/AR) device, an activity tracker, and the like.
- FM frequency modulated
- the peripherals 138 may include one or more sensors, the sensors may be one or more of a gyroscope, an accelerometer, a hall effect sensor, a magnetometer, an orientation sensor, a proximity sensor, a temperature sensor, a time sensor; a geolocation sensor; an altimeter, a light sensor, a touch sensor, a magnetometer, a barometer, a gesture sensor, a biometric sensor, and/or a humidity sensor.
- a gyroscope an accelerometer, a hall effect sensor, a magnetometer, an orientation sensor, a proximity sensor, a temperature sensor, a time sensor; a geolocation sensor; an altimeter, a light sensor, a touch sensor, a magnetometer, a barometer, a gesture sensor, a biometric sensor, and/or a humidity sensor.
- the WTRU 102 may include a full duplex radio for which transmission and reception of some or all of the signals (e.g., associated with particular subframes for both the UL (e.g., for transmission) and downlink (e.g., for reception) may be concurrent and/or simultaneous.
- the full duplex radio may include an
- the WRTU 102 may include a half-duplex radio for which transmission and reception of some or all of the signals (e.g., associated with particular subframes for either the UL (e.g, for transmission) or the downlink (e.g, for reception)).
- FIG. 1C is a system diagram illustrating the RAN 104 and the CN 106 according to an embodiment.
- the RAN 104 may employ an E-UTRA radio technology to communicate with the WTRUs 102a, 102b, 102c over the air interface 116.
- the RAN 104 may also be in communication with the CN 106.
- the RAN 104 may include eNode-Bs 160a, 160b, 160c, though it will be appreciated that the RAN 104 may include any number of eNode-Bs while remaining consistent with an embodiment.
- the eNode-Bs 160a, 160b, 160c may each include one or more transceivers for communicating with the WTRUs 102a, 102b, 102c over the air interface 116.
- the eNode-Bs 160a, 160b, 160c may implement MIMO technology.
- the eNode-B 160a for example, may use multiple antennas to transmit wireless signals to, and/or receive wireless signals from, the WTRU 102a.
- the CN 106 shown in FIG. 1 C may include a mobility management entity (MME) 162, a serving gateway (SGW) 164, and a packet data network (PDN) gateway (or PGW) 166. While each of the foregoing elements are depicted as part of the CN 106, it will be appreciated that any of these elements may be owned and/or operated by an entity other than the CN operator.
- MME mobility management entity
- SGW serving gateway
- PGW packet data network gateway
- the MME 162 may be connected to each of the eNode-Bs 162a, 162b, 162c in the RAN 104 via an S1 interface and may serve as a control node.
- the MME 162 may be responsible for authenticating users of the WTRUs 102a, 102b, 102c, bearer activation/deactivation, selecting a particular serving gateway during an initial attach of the WTRUs 102a, 102b, 102c, and the like.
- the MME 162 may provide a control plane function for switching between the RAN 104 and other RANs (not shown) that employ other radio technologies, such as GSM and/or WCDMA.
- the SGW 164 may perform other functions, such as anchoring user planes during inter- eNode B handovers, triggering paging when DL data is available for the WTRUs 102a, 102b, 102c, managing and storing contexts of the WTRUs 102a, 102b, 102c, and the like.
- the CN 106 may facilitate communications with other networks.
- the CN 106 may provide the WTRUs 102a, 102b, 102c with access to circuit-switched networks, such as the PSTN 108, to facilitate communications between the WTRUs 102a, 102b, 102c and traditional land-line communications devices.
- the CN 106 may include, or may communicate with, an IP gateway (e.g., an IP multimedia subsystem (IMS) server) that serves as an interface between the CN 106 and the PSTN 108.
- IMS IP multimedia subsystem
- the CN 106 may provide the WTRUs 102a, 102b, 102c with access to the other networks 112, which may include other wired and/or wireless networks that are owned and/or operated by other service providers.
- the WTRU is described in FIGS. 1 A-1 D as a wireless terminal, it is contemplated that in certain representative embodiments that such a terminal may use (e.g., temporarily or permanently) wired communication interfaces with the communication network.
- the traffic between STAs within a BSS may be considered and/or referred to as peer-to- peer traffic.
- the peer-to-peer traffic may be sent between (e.g., directly between) the source and destination STAs with a direct link setup (DLS).
- the DLS may use an 802.11 e DLS or an 802.11 z tunneled DLS (TDLS).
- a WLAN using an Independent BSS (I BSS) mode may not have an AP, and the STAs (e.g., all of the STAs) within or using the IBSS may communicate
- the AP may transmit a beacon on a fixed channel, such as a primary channel.
- the primary channel may be a fixed width (e.g., 20 MHz wide bandwidth) or a dynamically set width via signaling.
- the primary channel may be the operating channel of the BSS and may be used by the STAs to establish a connection with the AP.
- Carrier Sense Multiple Access with Collision Avoidance (CSMA/CA) may be implemented, for example in in 802.11 systems.
- the STAs e.g., every ST A), including the AP, may sense the primary channel. If the primary channel is sensed/detected and/or determined to be busy by a particular STA, the particular STA may back off.
- One STA (e.g., only one station) may transmit at any given time in a given BSS.
- High Throughput (HT) STAs may use a 40 MHz wide channel for communication, for example, via a combination of the primary 20 MHz channel with an adjacent or nonadjacent 20 MHz channel to form a 40 MHz wide channel.
- VHT STAs may support 20MHz, 40 MHz, 80 MHz, and/or 160 MHz wide channels.
- the 40 MHz, and/or 80 MHz, channels may be formed by combining contiguous 20 MHz channels.
- a 160 MHz channel may be formed by combining 8 contiguous 20 MHz channels, or by combining two non-contiguous 80 MHz channels, which may be referred to as an 80+80 configuration.
- the data, after channel encoding may be passed through a segment parser that may divide the data into two streams.
- Inverse Fast Fourier Transform (IFFT) processing, and time domain processing may be done on each stream separately.
- IFFT Inverse Fast Fourier Transform
- the streams may be mapped on to the two 80 MHz channels, and the data may be transmitted by a transmitting STA.
- the above described operation for the 80+80 configuration may be reversed, and the combined data may be sent to the Medium Access Control (MAC).
- MAC Medium Access Control
- the MTC devices may include a battery with a battery life above a threshold (e.g., to maintain a very long battery life).
- the primary channel may be 1 MHz wide for STAs (e.g., MTC type devices) that support (e.g., only support) a 1 MHz mode, even if the AP, and other STAs in the BSS support 2 MHz, 4 MHz, 8 MHz, 16 MHz, and/or other channel bandwidth operating modes.
- Carrier sensing and/or Network Allocation Vector (NAV) settings may depend on the status of the primary channel. If the primary channel is busy, for example, due to a STA (which supports only a 1 MHz operating mode), transmitting to the AP, the entire available frequency bands may be considered busy even though a majority of the frequency bands remains idle and may be available.
- STAs e.g., MTC type devices
- NAV Network Allocation Vector
- the available frequency bands which may be used by 802.11 ah, are from 902 MHz to 928 MHz. In Korea, the available frequency bands are from 917.5 MHz to 923.5 MHz. In Japan, the available frequency bands are from 916.5 MHz to 927.5 MHz. The total bandwidth available for 802.11 ah is 6 MHz to 26 MHz depending on the country code.
- the RAN 113 may include gNBs 180a, 180b, 180c, though it will be appreciated that the RAN 113 may include any number of gNBs while remaining consistent with an embodiment.
- the gNBs 180a, 180b, 180c may each include one or more transceivers for communicating with the WTRUs 102a, 102b, 102c over the air interface 116.
- the gNBs 180a, 180b, 180c may implement MIMO technology.
- gNBs 180a, 108b may utilize beamforming to transmit signals to and/or receive signals from the gNBs 180a, 180b, 180c.
- the WTRUs 102a, 102b, 102c may communicate with gNBs 180a, 180b, 180c using transmissions associated with a scalable numerology. For example, the OFDM symbol spacing and/or OFDM subcarrier spacing may vary for different transmissions, different cells, and/or different portions of the wireless transmission spectrum.
- the WTRUs 102a, 102b, 102c may communicate with gNBs 180a, 180b, 180c using subframe or transmission time intervals (TTIs) of various or scalable lengths (e.g., containing varying number of OFDM symbols and/or lasting varying lengths of absolute time).
- TTIs subframe or transmission time intervals
- the gNBs 180a, 180b, 180c may be configured to communicate with the WTRUs 102a, 102b, 102c in a standalone configuration and/or a non-standalone configuration.
- WTRUs 102a, 102b, 102c may communicate with gNBs 180a, 180b, 180c without also accessing other RANs (e.g., such as eNode-Bs 160a, 160b, 160c).
- WTRUs 102a, 102b, 102c may utilize one or more of gNBs 180a, 180b, 180c as a mobility anchor point.
- WTRUs 102a, 102b, 102c may communicate with gNBs 180a, 180b, 180c using signals in an unlicensed band.
- WTRUs 102a, 102b, 102c may communicate with/connect to gNBs 180a, 180b, 180c while also communicating with/connecting to another RAN such as eNode-Bs 160a, 160b, 160c.
- WTRUs 102a, 102b, 102c may implement DC principles to communicate with one or more gNBs 180a, 180b, 180c and one or more eNode-Bs 160a, 160b, 160c substantially simultaneously.
- eNode-Bs 160a, 160b, 160c may serve as a mobility anchor for WTRUs 102a, 102b, 102c and gNBs 180a, 180b, 180c may provide additional coverage and/or throughput for servicing WTRUs 102a, 102b, 102c.
- Each of the gNBs 180a, 180b, 180c may be associated with a particular cell (not shown) and may be configured to handle radio resource management decisions, handover decisions, scheduling of users in the UL and/or DL, support of network slicing, dual connectivity, interworking between NR and E-UTRA, routing of user plane data towards User Plane Function (UPF) 184a, 184b, routing of control plane information towards Access and Mobility Management Function (AMF) 182a, 182b and the like. As shown in FIG. 1D, the gNBs 180a, 180b, 180c may communicate with one another over an Xn interface.
- UPF User Plane Function
- AMF Access and Mobility Management Function
- the CN 115 shown in FIG. 1 D may include at least one AMF 182a, 182b, at least one UPF 184a, 184b, at least one Session Management Function (SMF) 183a, 183b, and possibly a Data Network
- AMF Session Management Function
- the AMF 182a, 182b may be connected to one or more of the gNBs 180a, 180b, 180c in the RAN 113 via an N2 interface and may serve as a control node.
- the AMF 182a, 182b may be responsible for authenticating users of the WTRUs 102a, 102b, 102c, support for network slicing (e.g., handling of different PDU sessions with different requirements), selecting a particular SMF 183a, 183b, management of the registration area, termination of NAS signaling, mobility management, and the like.
- Network slicing may be used by the AMF 182a, 182b in order to customize CN support for WTRUs 102a, 102b, 102c based on the types of services being utilized WTRUs 102a, 102b, 102c.
- different network slices may be established for different use cases such as services relying on ultra-reliable low latency (URLLC) access, services relying on enhanced massive mobile broadband (eMBB) access, services for machine type communication (MTC) access, and/or the like.
- URLLC ultra-reliable low latency
- eMBB enhanced massive mobile broadband
- MTC machine type communication
- the AMF 162 may provide a control plane function for switching between the RAN 113 and other RANs (not shown) that employ other radio technologies, such as LTE, LTE-A, LTE-A Pro, and/or non-3GPP access technologies such as WiFi.
- the SMF 183a, 183b may be connected to an AMF 182a, 182b in the CN 115 via an N11 interface.
- the SMF 183a, 183b may also be connected to a UPF 184a, 184b in the CN 115 via an N4 interface.
- the SMF 183a, 183b may select and control the UPF 184a, 184b and configure the routing of traffic through the UPF 184a, 184b.
- the UPF 184a, 184b may be connected to one or more of the gNBs 180a, 180b, 180c in the RAN 113 via an N3 interface, which may provide the WTRUs 102a, 102b, 102c with access to packet-switched networks, such as the Internet 110, to facilitate communications between the WTRUs 102a, 102b, 102c and IP-enabled devices.
- the UPF 184, 184b may perform other functions, such as routing and forwarding packets, enforcing user plane policies, supporting multi-homed PDU sessions, handling user plane QoS, buffering downlink packets, providing mobility anchoring, and the like.
- one or more, or all, of the functions described herein with regard to one or more of: WTRU 102a-d, Base Station 114a-b, eNode-B 160a-c, MME 162, SGW 164, PGW 166, gNB 180a-c, AMF 182a-ab, UPF 184a-b, SMF 183a-b, DN 185a-b, and/or any other device(s) described herein, may be performed by one or more emulation devices (not shown).
- the emulation devices may be one or more devices configured to emulate one or more, or all, of the functions described herein.
- the emulation devices may be used to test other devices and/or to simulate network and/or WTRU functions.
- the emulation devices may be designed to implement one or more tests of other devices in a lab environment and/or in an operator network environment.
- the one or more emulation devices may perform the one or more, or all, functions while being fully or partially implemented and/or deployed as part of a wired and/or wireless communication network in order to test other devices within the communication network.
- the one or more emulation devices may perform the one or more, or all, functions while being temporarily implemented/deployed as part of a wired and/or wireless communication network.
- the emulation device may be directly coupled to another device for purposes of testing and/or may performing testing using over-the-air wireless communications.
- FIG. 2 depicts an example reinforcement learning (RL) framework 200.
- the RL framework may include some key elements.
- the key elements may include agent which is the ML algorithm; the environment which is the adaptive problem space with attributes such as boundary values, rules, valid actions, etc.; the actions which is the step that the RL agent takes to navigate the environment; the state
- the agent 202 may determine an action A t based on the current state S t using its policy. Then, an environment 204 may execute the action and return a reward R t+ 1 and next state. Using the tuple (state, action, reward, next state) collected over past steps, the agent 202 may update its policy to maximize the cumulative reward.
- RL models may be trained in different ways. For example, RL models may be trained online wherein the agent interacts with the real/field environment during training the RL model. For another example, RL models may be trained offline wherein the RL agent cannot interact with the real/field environment during training the RL model.
- Offline RL may provide a more efficient and flexible training framework as data collection is decoupled from policy training and may use less memory and computational resources relative to online RL training.
- Variable ai may denote the action taken by the agent at state s t l .
- Variable r the resulting reward may be associated with the action a t l .
- Variable s ⁇ + 1 may be the state at time t+1 .
- offline RL may require the learning algorithm to find a sufficient understanding of the system dynamics from a fixed dataset, and then find a policy 7r(a
- the embodiments described herein address how the WTRU can validate its off- policy RL model before using it, for example, for a WTRU-sided off-policy RL model. The described herein further address how the WTRU can determine and report the quality of the generated RL model actions from the offline-trained model.
- the WTRU may receive a first indication of a first action to be implemented for performing a first action related to communications on a network.
- the first action may be a reference action which includes one or more of applying a configuration associated with the first indication, performing a measurement or a reception using a first configuration determined by the first indication, triggering and/or performing a transmission using a second configuration determined by the first indication, suspending an ongoing or future transmission, entering a power saving mode, setting, resetting, and/or modifying one or more of protocol parameters, timers, or counters using a third configuration determined by the first indication.
- the WTRU may receive, from the network, a second indication of a second configuration to be implemented for performing a second action related to communications on the network.
- the second action for example, may be an anticipated or hypothetical action associated with a NW-side RL model received.
- anticipated or hypothetical action(s) may include hypothetical application of one or more DL beam(s) indices selected by the RL model and/or hypothetical application of one or more sub-band indices and/or allocated power selected by the RL model.
- the first configuration and the second configuration may each include a different downlink (DL) beam configuration, different power saving modes, different protocol
- the WTRU may perform a validation of the NW-side RL model by at least determining a second outcome based on a second performance metric associated with performance of the second action without actually implementing the second configuration to perform the second action related to the communications on the network, and/or determining a quality of the second action compared to a quality of the first action based on the first outcome and the second outcome.
- the WTRU may report the determined quality of the second action for validation of the anticipated or hypothetical action associated with the NW-side RL model.
- the first configuration may be a first downlink (DL) beam configuration and the second configuration may be a second DL beam configuration.
- the first action may include the WTRU switching to the first DL beam to receive data transmissions.
- the second action may include the WTRU switching to the second DL beam provided by the NW-side RL model to receive the data transmissions.
- the second performance metric may include a measurement on the second DL beam.
- the first configuration may be a first downlink control information (DCI) configuration indicating a first modulation and coding scheme (MCS).
- the second configuration may be a second DCI configuration indicating a second MCS.
- the first action may include the WTRU applying the first DCI configuration to receive DL transmissions.
- the second action may include the WTRU switching to apply the second DCI configuration to receive the DL transmissions.
- the second performance metric may include a measurement on the second DCI configuration.
- the first configuration may be a first sub-band configuration.
- the second configuration may be a second sub-band configuration.
- the first action may include the WTRU applying the first sub-band configuration to receive DL transmissions.
- the second action may include the WTRU switching to apply the second sub-band configuration to receive the DL transmissions.
- the second performance metric may include a measurement on the second sub-band configuration.
- the first configuration may be a first power level configuration for a first DL transmission.
- the second configuration may be a second power level configuration for a second DL transmission.
- the first action may include the WTRU applying the first power level configuration to receive DL transmissions.
- the second action may include the WTRU switching to apply the second power level configuration to receive the DL transmissions.
- the second performance metric may include a measurement on the second power level configuration
- HARQ HARQ acknowledgement
- RSRP average signal received power
- SINR signal-to- interference-plus-noise ratio
- CQI channel quality indicator
- WTRU may perform the action (e.g., a reference action) (/apply the configuration) associated with first indication.
- the WTRU may perform a validation of NW-side RL model by determining a second outcome (e.g., a hypothetical or anticipated outcome) of the second action associated with second indication without actually performing the second action.
- the validation may be performed of the NW-side RL model without applying the configuration associated with second indication.
- anticipated or hypothetical action(s) may include hypothetical application of one or more DL beam(s) indices selected by the RL model and hypothetical application of one or more sub-band indices and/or allocated power selected by the RL model.
- the WTRU may determine if an indication is a first indication, or a second indication based on explicit information or implicit information, (e.g., a separate field in DCI, MAC CE, order of indication, different resources/channels/messages etc.).
- explicit information or implicit information e.g., a separate field in DCI, MAC CE, order of indication, different resources/channels/messages etc.
- the WTRU may determine the quality of the second action (e.g., hypothetical or anticipated action), based on one or more of the following.
- the WTRU may determine the quality of hypothetical action(s) based on measuring the performance metric (e.g., function of use-case) assuming hypothetical application of the action.
- the performance metric may include an average reference signal received power (RSRP), signal-to-interference-plus-noise ratio (SINR) over a configured period, an average number of beam switches in a configured period, a number of beam failures over a configured period (with or without successful recovery), and/or an average throughput over a configured period.
- RSRP average reference signal received power
- SINR signal-to-interference-plus-noise ratio
- the WTRU may determine the quality of hypothetical action(s) based on based on a comparison of performance metric associated with hypothetical action(s) with a preconfigured performance criteria and/or condition.
- the determination of the quality of hypothetical actions may include a measured quantity and comparing it with a threshold (e.g., SINR/signal-to-noise ratio (SNR)ZRSRP on one of the sub-bands), a measured average number of beam switches and compare it with a configured metric threshold, a measured average number of beam failures and compare it with a configured metric threshold, a measured average RSRP over a period and comparing it with a threshold, and/or a measured reward over a period and comparing it with the configured expected reward.
- the WTRU may determine the quality of hypothetical action(s) based on an improvement/degradation of performance metric or quality of hypothetical action(s) relative to the
- the WTRU may compare the performance metrics associated with the hypothetical action(s) and reference action(s).
- the WTRU may report the determined quality of the second action(s) (e.g. hypothetical or anticipated actions), such as for NW-sided RL model validation. For example, reporting may be triggered when preconfigured conditions are satisfied. If the determined quality of hypothetical actions(s) exceeds or is less than the preconfigured threshold or a performance metric associated with hypothetical action(s) is offset better or worse than a performance metric associated with a reference action(s) over n configured periods. For example, reporting may be transmitted periodically (e.g., every n configured period over which the performance metrics may be calculated.) For example, the report may include one or more indication formats. The one or more indication formats may include Boolean and/or multiple quantized levels for quality (e.g., represented in multiple levels depending on the difference between the measured metric and the configured metric threshold).
- the one or more indication formats may include Boolean and/or multiple quantized levels for quality (e.g., represented in multiple levels depending on the difference between the measured metric and the configured metric threshold).
- the proposed embodiment may offer a viable way to evaluate and/or validate offline-trained NW- sided RL models. This may be beneficial when there is a discrepancy between the state and/or action space used in the training environment and the state and/or action space in the real environment or some of the high-reward regions of the state space are missed from the training.
- WTRU-assisted NW-sided RL offline policy evaluation is introduced herein.
- Configuration of WTRU- based assistance of NW-sided RL model validation is introduced herein.
- the WTRU may be configured to receive a first indication and a second indication.
- the WTRU may perform a first action based on the first indication and a second action based on the second indication.
- the WTRU may perform a reference action upon receiving the first indication.
- the reference action may be associated with one or more of applying the configuration associated with first indication, performing a measurement and/or reception using a configuration at least in part determined by the first indication, triggering and/or performing a transmission using a configuration at least in part determined by the first indication, suspending ongoing and/or future transmission, entering power saving mode, setting/resetting/modifying one or more the protocol parameters, timers, counters using a configuration at least in part determined by the first indication, and the like.
- the first indication may be a first downlink control information (DCI) indicating a first modulation and coding scheme (MCS), a first sub-band or a first power level for a first DL transmission, which is possibly determined based on legacy algorithm and/or a supervised learning algorithm and the like.
- the second indication may be a second DCI indicating a second MCS, a second sub-band and/or a second power level for a second transmission, possibly determined based on the RL model to be evaluated.
- the reference action may be the WTRU applying the first DCI to receive a DL data transmission.
- the hypothetical action may be that the WTRU measuring one or more performance metric assuming that the WTRU receives a DL transmission based on the second DL DCI.
- the WTRU may be configured to determine the linkage between first indication and a corresponding second indication.
- the WTRU may be configured to determine the reference action and the corresponding hypothetical action which is to be evaluated against the reference action.
- the linkage may be implicitly configured.
- the linkage may be implicitly configured based on the relationship between resources, channels, and/or messages carrying the first indication and second indication.
- the first and second indication may be inferred by order of inclusion in a message.
- the linkage may be explicitly configured. For example, a separate field in DCI, media access control control element (MAC CE), radio resource control (RRC) message, etc.
- MAC CE media access control control element
- RRC radio resource control
- a specific IE may be configured for second indication or hypothetical action.
- the first indication and second indication may each be associated with same logical identity.
- the WTRU may be configured to measure the performance metric associated with hypothetical actions based on second indication. Possibly such performance metric may be determined based on one or more preconfigured criteria(s), rule(s), and/or condition(s).
- the performance metric may be modeled as a reward metric. For example, hypothetical actions that lead to better performance (e.g., higher throughput, lower latency, higher power saving, etc.) may be assigned higher rewards than the actions that lead to degraded performance (e.g., lower throughput, higher latency, high power consumption etc.).
- the WTRU may be configured to determine and/or report a performance metric within a preconfigured range. For example, the WTRU may be configured with a minimum value, a maximum value, and quantized values in between with preconfigured step size.
- the WTRU may be configured with resources to report one or more of the following associated with RL model validation, which may include the performance metric associated with a hypothetical action, quality of a hypothetical action, and the like.
- the WTRU may be configured with resources to report one or more of the following associated with RL model validation, which may include the performance metric associated with a set of hypothetical action(s)
- the WTRU may be configured with resources for reporting RL model validation report periodically. Possibly the periodicity of reporting may be an integer multiple of the time window over which the performance and/or quality metric is calculated.
- the WTRU may be configured with physical uplink control channel (PUCCH) resources to transmit RL model validation report.
- the WTRU may be configured with semi-persistent physical uplink shared channel (PUSCH) resources to transmit RL model validation report.
- the WTRU may be configured to transmit such reports in one or more of Uplink Control Information (UCI), MAC Control Element (CE), etc.
- UCI Uplink Control Information
- CE MAC Control Element
- the RL model validation report may be multiplexed with CSI reporting.
- the feedback based on reference action may be transmitted in the first part of CSI report and the feedback based on hypothetical action may be transmitted in the second part of CSI report.
- the RL model validation report may be transmitted in PUSCH resources.
- the RL model validation report may be transmitted in a higher layer message (e.g., in an RRC message).
- the WTRU may be configured to transmit the validation report in a RRC message carrying beam level measurement report.
- the WTRU may be configured to transmit the validation report in a uplink (UL) information transfer message.
- the WTRU may be configured to transmit the validation report in a WTRU assistance information message.
- the WTRU may be configured with resources for reporting RL model validation report when one or more of on preconfigured conditions are satisfied. For example, the WTRU may transmit a report when the performance metric and/or the quality of hypothetical actions(s) exceeds and/or becomes lower than the preconfigured threshold. For example, the WTRU may transmit a report when the performance metric and/or the quality of hypothetical actions(s) exceeds and/or becomes lower than reference action(s) over n configured periods.
- a WTRU configured with model validation for NW-sided RL model may receive a first and second indication from the NW where the WTRU may determine whether the indication is first or second based on one or more of the following, which includes dynamically by receiving an indication in the DCI field of a physical downlink control channel (PDCCH) resource, dynamically or semi persistently by MAC CE, based on a configuration that indicates the order or pattern of first indication and second indication, and/or based on the resources or channels or messages that the indication is received through.
- PDCCH physical downlink control channel
- the WTRU receiving a first indication may perform an action and/or apply a configuration (e.g. , reference action) associated with the first indication.
- a WTRU receiving a second indication may perform validation of the action and or configuration (e.g., hypothetical action) associated with the second indication without actually performing the action or applying the configuration.
- actions associated with the first or second indication may include one or more DL beam indices selected by the RL model, one or more sub-band indices and/or allocated power selected by the RL model, an MCS selected by the RL model, and/or a handover request determined by the RL model.
- the WTRU may measure a performance metric for the hypothetical action associated with the second indication without performing the action, and possibly likewise for the reference action.
- the performance metrics may be determined and reported with every reception of the second indication.
- the performance metrics may be calculated and recorded with every reception of the second indication but reported over a configured period. This may enable the WTRU to determine the performance metric upon collecting measurements over a sequence of hypothetical actions.
- the measured performance metric may be determined based on the hypothetical action.
- the hypothetical action may represent a DL beam index selected by the RL model.
- the WTRU may be configured to determine and/or measure one or more performance metrics associated with the selected beams. For example, the WTRU may estimate the beamforming gain of the DL beam index.
- the performance metric may be the average RSRP and/or SINR associated with the selected DL beam indices measured over a configured period.
- the performance metric may be the average number of beam switches measured over a configured period.
- the performance metric may be the average number of beam failures measured over a configured period.
- the hypothetical action may represent sub-band indices and/or allocated power selected by the RL model.
- the WTRU may be configured to measure the SINR and/or SNR for the sub-band indices over a configured period.
- the hypothetical action may represent the MCS selected by the RL model, and the WTRU may compute the block error rate for the
- the hypothetical action may represent a handover request determined by the RL model and the WTRU may measure the RSRP from the neighboring cell.
- the WTRU may determine the quality of a hypothetical action, for example depending on the measured performance metric.
- the WTRU may determine the quality based on one of more of the following.
- the WTRU may determine the quality based on a comparison of performance metric associated with hypothetical actions with a preconfigured performance criterion.
- the WTRU may determine the quality by performing one of the following evaluations.
- the evaluations may include estimating a performance metric and comparing it to a threshold (e.g., estimated beamforming gain with another DL beam, or the number of beam switches over a configured period, or the average RSRP/SINR/SNR over a configured period).
- the evaluations may include measuring a performance metric and comparing it to a threshold (e.g., SINR/SNR/RSRP on one of the sub-bands, or RSRP from another cell).
- the evaluation may include computing a performance metric and comparing it to a threshold (e.g., computing the block error rate for the current SINR and MCS).
- the evaluations may include measuring an average RSRP, SINR over a configured period and comparing it with a configured metric threshold.
- the evaluations may include measuring an average number of beam switches and comparing it with a configured metric threshold.
- the evaluations may include measuring an average number of beam failures and comparing it with a configured metric threshold.
- the evaluations may include measuring an average RSRP over a period and comparing it with a threshold.
- the evaluations may include measuring a reward over a period and comparing it with the configured expected reward (e.g., average throughput, block error rate, etc.).
- the WTRU may determine the quality based on a comparison of hypothetical action(s) relative to reference action(s).
- the reference action may include comparing the performance metrics associated with the hypothetical action(s) and reference action(s).
- the WTRU may measure the performance of the hypothetical action and reference action over a configured period. For example, the WTRU may measure the number of the hypothetical DL beam switches over a configured period and compare it against the number of the reference DL beam switches over the same period.
- the hypothetical action, MCS selected by the RL model the WTRU may re-encode the decoded bits for the previous reference action using the new MCS, decode under new SINR model and compare the block error rate.
- the WTRU may be configured to report the quality information associated with the hypothetical actions for WTRU-sided RL-model validation.
- the quality information may be indicated in multiple formats.
- the formats may be configured, for example, in a DCI field or MAC CE.
- a first format may be
- a second format may be multiple quantized quality levels, indicated in two or more bits.
- each quality level may represent a specific range of performance difference between the hypothetical action performance and the reference action performance.
- each quality level may represent a range of difference between the hypothetical action performance and the configured threshold.
- a third format may indicate the direct performance of the hypothetical action.
- the performance may be a quantized performance metric, for example, quantized RSRP/SINR/SNR, configured by the NW.
- the WTRU may be preconfigured with resources to indicate the validation report of the WTRU- sided RL model.
- the validation report may include the quality information of the hypothetical action.
- the WTRU may be configured with reserved symbols on PUSCH for transmission of validation report. For example, the reserved symbols may be configured on preconfigured time-frequency locations.
- the WTRU may be configured to report the quality information associated with the hypothetical actions in a Uplink Control Information (UCI).
- UCI Uplink Control Information
- the resources for the RL model validation report transmission may be preconfigured for a WTRU.
- the resources may be defined as PUSCH format or UCI format.
- the resources may be configured via RRC configuration. These resources may be activated and/or deactivated by MAC CE.
- the certain configured threshold or criteria may include when the performance of the hypothetical action exceeds or is lower than the performance of the reference action.
- the WTRU may be configured to report the quality information based on a feedback resource selection, or feedback resource, or change thereof.
- the WTRU may be configured to report the quality
- the WTRU may determine the rate of HARQ-ACK associated with the hypothetical actions. If the performance drops below a threshold, the WTRU may report the quality of the hypothetical action.
- WTRU procedures for RL offline policy validation are discussed herein.
- Configuration of WTRU- sided RL model evaluation is discussed herein.
- a WTRU may be equipped with an RL model, capable of performing actions based on its WTRU-sided RL model reports its capabilities specific to the supported RL model.
- the report may include the RL model identification (ID) and its associated information and configuration, for example, applicable conditions such as environment setup and constraints associated to the pre-trained RL model, configuration of state and/or action space, target policy, dedicated and/or supported features (e.g., beam management), and/or status of the model (e.g., validated, not validated, conditions under which the model was validated, restricted conditions for the model validation, etc.).
- Reporting omission configuration may include, for example, the WTRU omits reporting if the difference between the current report and a previous one for the same hypothetical action(s) is below a configured threshold.
- the WTRU may be configured to report the quality information based on a quality information configuration based on one or more preconfigured conditions or a quality information reporting and/or indication format.
- the one or more preconfigured conditions may include one or more thresholds/offsets for comparing reference and hypothetical action outcomes or a number of configured periods for validating the pre-configured conditions.
- the quality information reporting and/or indication format may include Boolean, which indicates whether the conditions associated with validation are satisfied or Quantized (e.g., multiple levels/scores of the quality of hypothetical action(s)).
- the WTRU may be configured to report the quality information based on one or more performance metric(s) for evaluating hypothetical actions. For example, the one or more performance metric(s) for
- evaluating hypothetical actions may be based on one or more of quantity-based (e.g, SINR/SNR/RSRP associated with the selected PMI by the RL model), reward-based (e.g., measured reward over a period and comparing it with the training reward), horizon (e.g., time-period for reward measurement, or number of repeated actions and repetition rate(s) or expected long-term or short-term reward limit), processing-time based (e.g., required time for evaluating hypothetical actions with respect to reference actions), complexitybased (e.g., required number of FLOPs), or performance criteria and/or conditions for comparison with performance metric(s) of hypothetical actions (e.g., such as hybrid automatic repeat request (HARQ) acknowledgement (ACK) or SI NR/RSRP/CQI (e.g., associated with the selected PMI by the RL model)).
- quantity-based e.g, SINR/SNR/RSRP associated with the selected PMI by the RL model
- reward-based e.g.,
- WTRU procedure for validating its RL model is discussed herein.
- WTRU determination of first indication and second indication for action generation is discussed herein.
- the WTRU may be configured to determine one or more parameters associated with a WTRU-sided RL model validation.
- the WTRU may determine a first indication and a second indication.
- the first indication may be based on non-AIML/legacy or a different RL model while the second indication may be based on the WTRU-sided RL model to be validated.
- the first indication may be associated with a reference action.
- the reference action may be applied by the WTRU for example, to perform a transmission using a configuration associated with the first indication.
- the second indication may be associated with a hypothetical action.
- the hypothetical action represents the output of the RL-model to be validated.
- the second indication may imply that the NW may provide feedback on the quality of the hypothetical action.
- the reference action may represent a first precoding matrix indicator (PMI) selection by the WTRU.
- the first PMI may be used for beamforming in the next transmission.
- the hypothetical action may represent a second PMI selection by the WTRU.
- the second selected PMI may be evaluated (e.g., only evaluated) by the NW.
- the reference action may represent a first MCS selected by the WTRU.
- the selected modulating and coding scheme (MCS) may be used in data transmission.
- the hypothetical action may represent a second PMI selected by the WTRU.
- the selected PMI may be evaluated by the NW.
- the WTRU may differentiate and/or determine the first indication and second indication from the DCI scheduling the transmission.
- the first indication and second indication may be determined from the contents of the DCI (e.g., explicit indication).
- the DCI may have specific field that indicates whether the WTRU needs to determine a first indication or second indication or both.
- the first indication and second indication may be determined from a parameter in the DCI (e.g., implicit indication).
- the parameters of DCI may include one or more of radio network temporary identifier (RNTI) used to decode the DCI, CORESET or search-space of the PDCCH transmission including the DCI, aggregation level of the PDCCH/PSCCH
- RNTI radio network temporary identifier
- the WTRU may receive a first transmission and a second transmission.
- the first transmission may be associated with the network response of the WTRU first indication.
- the second transmission may be associated with the network response to the WTRU second indication.
- the first transmission may represent data transmission using the first selected/indicated PMI, for example, associated with the reference action.
- the second transmission may represent reference signal (RS) transmission using the second selected PMI (e.g., only RS transmission using the second selected PMI), for example, associated with the hypothetical action and/or the second transmission may represent feedback information about one or more performance aspect related to the second PMI.
- RS reference signal
- the second transmission may represent interference metric associated with the second PMI, e.g., resulting interference on other WTRU served by the NW.
- the two transmissions may happen over the same channel (e.g., PDSCH/PDCCH) or over two different channels (e.g., first transmission occur over PDSCH while second transmission occur over PDCCH).
- the WTRU may be configured to determine the quality of second transmission, for example, the transmission associated with the hypothetical action generated by the WTRU-sided RL model.
- the WTRU may be configured to measure a performance metric associated with the hypothetical action.
- the WTRU may measure HARQ ACKs and/or NACKs rate or block error rate (BLER) associated with the hypothetical action.
- BLER block error rate
- the WTRU may perform one or more measurements associated with the hypothetical action.
- the hypothetical action represents a PMI selected by the RL model and is applied on RS transmission by the NW, the measurements may include SINR and/or RSRP and/or CQI.
- the performance metric and/or measurements may be collected over a configured period.
- the WTRU may determine the quality of actions by performing a comparison between the measured and/or determined performance metric associated with the hypothetical action and a preconfigured performance threshold/criteria/condition. For example, the WTRU may determine the quality of the selected PMI by comparing the measured SINR/SNR/RSRP with a configured SINR/SNR/RSRP threshold. The WTRU may determine the quality of hypothetical actions by comparing the hypothetical reward with a configured measured reward by the NW. The hypothetical reward may be the expected
- the WTRU may be configured with a reward associated with the RL- based selected PMI.
- the reward may be function of interference and expected reception quality associated with the selected PMI.
- the WTRU may determine the quality by comparing the configured reward with the hypothetical reward.
- the WTRU may be configured to report the quality information associated with the hypothetical actions for WTRU-sided RL- model validation.
- the quality information may be indicated in multiple formats.
- the formats may be configured, e.g., in a DCI field or MAC CE.
- a first format may be Boolean, for example, a binary bit indicating if the hypothetical action performance exceeds the reference action performance, or a binary bit indicating if the hypothetical action performance exceeds a configured performance threshold.
- a second format may be multiple quantized quality levels, indicated in two or more bits. Each quality level may represent a specific range of performance difference between the hypothetical action performance and the
- each quality level may represent a range of difference between the hypothetical action performance and the configured threshold.
- a third format may indicate the direct performance of the hypothetical action.
- the performance may be a quantized performance metric, e.g. quantized RSRP/SINR/SNR, configured by the NW.
- the WTRU may be preconfigured with resources to indicate the validation report of the WTRU-sided RL model.
- the validation report may include the quality information of the hypothetical action.
- the WTRU may be configured with reserved symbols on PUSCH for transmission of validation report. For example, the reserved symbols may be configured on preconfigured time-frequency locations.
- the WTRU may be configured to report the quality information associated with the hypothetical actions in a Uplink Control Information (UCI).
- UCI Uplink Control Information
- the resources for the RL model validation report transmission may be preconfigured for a WTRU.
- the resources may be defined as PUSCH format or UCI format.
- the resources may be configured via RRC configuration. These resources may be activated and/ordeactivated by MAC CE.
- the WTRU may be configured to report the quality information based on one or more of the following triggers.
- the WTRU may be configured to report the quality information based on a time-event trigger.
- the WTRU may be configured with time instances or specific slots to determine and report the quality information of the hypothetical actions.
- the WTRU may be configured to report the quality information of the hypothetical actions every N configured period over which the performance metrics are calculated.
- the WTRU may be configured to report the quality information based on a measurement trigger.
- the WTRU may report the quality information associated with the hypothetical actions based on one or more measurements achieving certain configured threshold or criteria.
- the one or more measurements achieving certain configured threshold or criteria may include threshold or criteria when the performance of the hypothetical actions exceeds or is lower than a preconfigured threshold.
- the one or more measurements achieving certain configured threshold or criteria may include threshold or criteria when the performance of the hypothetical action exceeds or is lower than the performance of the reference action.
- the WTRU may be configured to report the quality information based on a feedback resource selection, or feedback resource, or change thereof.
- the WTRU may be configured to report the quality information based on a performance of an associated function. For example, the WTRU may determine the rate of HARQ-ACK associated with the hypothetical actions. If the performance drops below a threshold, the WTRU may report the quality of the hypothetical action.
- a WTRU may receive a configuration for validating the NW-sided RL model.
- the configuration may include a configuration of a
- the configuration may include criteria(s)/rules/conditions for RL model validation.
- the criteria may be associated with determined quality of actions evaluated against one or more preconfigured performance threshold(s).
- the WTRU may determine a metric associated with hypothetical action.
- the WTRU may determine expected reward based on the metric and evaluate and/or report the quality based on absolute or relative threshold.
- the configuration may include a reporting configuration for RL model validation (e.g., resources for reporting, periodicity, model validation reporting format, etc.)
- the WTRU may determine the quality of hypothetical action(s), based on one or more of the following steps.
- the one or more steps may include measuring the performance metric (e.g, function of use-case) assuming hypothetical application of the action (e.g, an average RSRP, SINR over a configured period, an average number of beam switches in a configured period, a number of beam failures over a configured period with or without successful recovery), and/or an average throughput over a configured period.
- the WTRU may determine the quality of the hypothetical action(s) based on a comparison of performance metric associated with hypothetical action(s) with a preconfigured performance criteria and/or condition.
- the preconfigured performance criteria and/or condition may include measured quantity and comparing it with a threshold (e.g, SINR/SNR/RSRP on one of the sub-bands).
- the preconfigured performance criteria and/or condition may include measured average number of beam switches and comparing it with a configured metric threshold.
- the preconfigured performance criteria and/or condition may include measured average number of beam failures and comparing it with a configured metric threshold.
- the preconfigured performance criteria and/or condition may include
- the WTRU may report the determined quality of hypothetical action(s) (e.g., for NW-sided RL model validation). For example, reporting may be triggered when preconfigured conditions are satisfied. Reporting may be triggered if the determined quality of hypothetical actions(s) exceeds or is less than the preconfigured threshold or a performance metric associated with hypothetical action(s) is offset better/worse than a performance metric associated with a reference action(s) over n configured periods. For example, reporting may be transmitted periodically (e.g., every n configured period over which the performance metrics are calculated). For example, the report may include one or more indication formats: Boolean, multiple quantized levels for quality (e.g., represented in multiple levels depending on the difference between the measured metric and the configured metric threshold).
- FIG. 3 illustrates a flow diagram 300 comprising an NW-sided RL model 302 used for beam selection with the aim of reducing latency and/or measurement overhead.
- the state space 304 may be defined with RS or CSI measurements while the action space 306 may include one or more selected beam indices.
- the goal of the RL model is to select the beams that maximize the average RSRP while minimizing the number of beam switches over a configured period.
- the WTRU may receive a configuration for RL model validation.
- the WTRU may receive a resource configuration for reporting RL model validation (e.g, resources for reporting, periodicity, model validation reporting format, etc.).
- the WTRU may determine a first indication and a second indication.
- the WTRU may determine a first indication based on non-AIML/legacy methods or a different RL model. Possibly the first indication may be associated with a reference action.
- the WTRU may
- the second indication may be associated with a hypothetical action, for example, indicating that the network provides feedback on the quality of hypothetical action (e.g., selected PMI by the RL model to satisfy an average overhead requirement while achieving a target performance)
- the WTRU may transmit the first and second indication.
- the WTRU may differentiate the first indication and second indication based on explicit information or implicit means (e.g., a separate field in DCI, MAC CE, order of indication, different resources/channels/messages, etc.).
- the WTRU may receive a first transmission based on the first indication and a second transmission based on the second indication.
- the WTRU may determine the quality of the second transmission by comparing a measured quantity with a threshold (e.g., SINR/SNR/RSRP associated with the selected PMI by the RL model). For example, the WTRU may determine the quality of the second transmission by comparing a measured reward over a period and comparing it with the training reward. The WTRU may determine the quality of the second transmission by an improvement and/or degradation of hypothetical action(s) relative to a reference action(s) (e.g., comparing the performance metrics, for example, RSRP and/or SI NR, associated with the hypothetical action(s) and reference action(s)). The WTRU may determine the quality of the second transmission based on NW feedback associated with the hypothetical action, for example, multiuser interference metric associated with the selected PMI by the RL model.
- a threshold e.g., SINR/SNR/RSRP associated with the selected PMI by the RL model.
- the WTRU may transmit the quality information associated with hypothetical action(s) for WTRU- sided RL model validation, for example, when preconfigured conditions are satisfied. For example, the WTRU may transmit the quality information if the quality of hypothetical actions(s) exceeds/lower than the preconfigured threshold or offset better/worse than reference action(s) over n configured periods. The WTRU may transmit the quality information periodically (e.g., every n configured period over which the performance metrics are calculated).
- the format of the quality information indication may be Boolean,
- the format of the quality information indication may include multiple quantized levels for quality.
- the multiple quantized levels for quality may be represented in multiple levels depending on the difference between the measured metric and the configured metric threshold.
- the WTRU may receive the indication from gNB on RL model validation, for example, for MU transmissions.
- FIG. 4 illustrates a flow diagram 400 comprising WTRU-sided RL model 402 used for PMI selection with the aim of reducing the average overhead while achieving a target performance.
- the state space 404 may be defined with power, interference, and CSI measurements while the action space 406 may include one or more selected PMI.
- the goal of the RL model may be to select the PMI that maximizes the RSRP under average overhead constraints.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
A wireless transmit/receive unit (WTRU) may be configured to receive a first indication of a first configuration to be implemented for performing a first action and receive a second indication of a second configuration to be implemented for performing a second action, perform the first action using the first configuration and determine a first outcome based on a first performance metric associated with the first action. The WTRU may perform a validation of the NW-side RL model by determining a second outcome based on a second performance metric associated with performance of the second action without actually implementing the second configuration to perform the second action, and determining a quality of the second action compared to a quality of the first action based on the first outcome and the second outcome. The WTRU may report the determined quality of the second action for validation of the anticipated action.
Description
METHODS FOR OFFLINE POLICY VALIDATION IN REINFORCEMENT LEARNING
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of United States Provisional Application No. 63/572,521 filed on April 1 , 2024, the entire contents of which is incorporated herein by reference.
BACKGROUND
[0002] Reinforcement learning (RL) along with supervised and unsupervised learning form the three key learning paradigms in the field of artificial intelligence/machine learning (AI/ML). While the three methods seek to learn a model from data through training, they are fundamentally different in the way the learning process is carried over. For instance, supervised learning algorithms learn patterns and relationships between the input and output pairs and then the trained algorithm is used to predict outcomes based on new input data. Unsupervised learning algorithms, however, receive inputs with no specified outputs during the training process, with the aim of finding hidden patterns and relationships within the data using statistical means. Different from supervised and unsupervised learning, RL has a predetermined and well- defined end goal in the form of desired result which can be achieved through rewarding the desired behaviors. RL takes an exploratory approach, with a reward-and-punishment paradigm as the data is processed, wherein the explorations are continuously validated and improved to increase the probability of reaching the end goal. RL is expected to be one of the major components of automation of wireless networks and it has already found a wide variety of applications in the wireless domain.
SUMMARY
[0003] A wireless transmit/receive unit (WTRU) may include a processor and a memory. The WTRU may be configured to receive a first indication of a first configuration to be implemented for performing a first action related to communications on a network. The WTRU may be configured to receive, from the network, a second indication of a second configuration to be implemented for performing a second action related to communications on the network. The second action may be an anticipated or hypothetical action associated with a network-side (NW-side) reinforcement learning (RL) model. The WTRU may perform the first action related to the communications on the network using the first configuration and determine a first outcome based on a first performance metric associated with the first action. The WTRU may perform a validation of the NW-side RL model by being configured to determine a second outcome based on a second performance metric associated with performance of the second action without actually
implementing the second configuration to perform the second action related to the communications on the network, and determine a quality of the second action compared to a quality of the first action based on the first outcome and the second outcome. The WTRU may report the determined quality of the second action for validation of the anticipated or hypothetical action associated with the NW-side RL model.
[0004] The first action may be a reference action which includes one or more of applying a configuration associated with the first indication, performing a measurement or a reception using a first configuration determined by the first indication, triggering or performing a transmission using a second configuration determined by the first indication, suspending an ongoing or future transmission, entering a power saving mode, and/or setting, resetting, or modifying one or more of protocol parameters, timers, and/or counters using a third configuration determined by the first indication.
[0005] The first configuration and the second configuration may each include a different downlink (DL) beam configuration, different power saving modes, different protocol parameters, and/or different timers.
[0006] The first configuration may be a first downlink (DL) beam configuration and the second configuration may be a second DL beam configuration. The first action may include the WTRU switching to the first DL beam to receive data transmissions. The second action may include the WTRU switching to the second DL beam provided by the NW-side RL model to receive the data transmissions. The second performance metric may include a measurement on the second DL beam.
[0007] The first configuration may be a first downlink control information (DCI) configuration indicating a first modulation and coding scheme (MCS). The second configuration may be a second DCI configuration indicating a second MCS. The first action may include the WTRU applying the first DCI configuration to receive DL transmissions. The second action may include the WTRU switching to apply the second DCI configuration to receive the DL transmissions. The second performance metric may include a measurement on the second DCI configuration.
[0008] The first configuration may be a first sub-band configuration. The second configuration may be a second sub-band configuration. The first action may include the WTRU applying the first sub-band configuration to receive DL transmissions. The second action may include the WTRU switching to apply the second sub-band configuration to receive the DL transmissions. The second performance metric may include a measurement on the second sub-band configuration.
[0009] The first configuration may be a first power level configuration for a first DL transmission. The second configuration may be a second power level configuration for a second DL transmission. The first action may include the WTRU applying the first power level configuration to receive DL transmissions. The second action
may include the WTRU switching to apply the second power level configuration to receive the DL transmissions. The second performance metric may include a measurement on the second power level configuration.
[0010] The first performance metric and the second performance metric may be determined based on one or more preconfigured criteria. Each of the one or more preconfigured criteria may include a hybrid automatic repeat request (HARQ) acknowledgement (ACK), an average signal received power (RSRP), a signal-to- interference-plus-noise ratio (SI NR), and/or a channel quality indicator (CQI). Each of the one or more preconfigured criteria may include an average number of beam switches.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] FIG. 1 A is a system diagram illustrating an example communications system in which one or more disclosed embodiments may be implemented.
[0012] FIG. 1 B is a system diagram illustrating an example wireless transmit/receive unit (WTRU) that may be used within the communications system illustrated in FIG. 1A according to an embodiment.
[0013] FIG. 1C is a system diagram illustrating an example radio access network (RAN) and an example core network (CN) that may be used within the communications system illustrated in FIG. 1 A according to an embodiment.
[0014] FIG. 1 D is a system diagram illustrating a further example RAN and a further example CN that may be used within the communications system illustrated in FIG. 1 A according to an embodiment.
[0015] FIG. 2 illustrates an example of a reinforcement learning framework.
[0016] FIG. 3 illustrates a network-sided (NW-sided) Reinforcement learning (RL) model used for beam selection with the aim of reducing latency and/or measurement overhead.
[0017] FIG. 4 illustrates a WTRU-sided RL model used for PMI selection with the aim of reducing the average overhead while achieving a target performance.
DETAILED DESCRIPTION
[0018] FIG. 1A is a diagram illustrating an example communications system 100 in which one or more disclosed embodiments may be implemented. The communications system 100 may be a multiple access system that provides content, such as voice, data, video, messaging, broadcast, etc., to multiple wireless users. The communications system 100 may enable multiple wireless users to access such content through the sharing of system resources, including wireless bandwidth. For example, the communications systems
100 may employ one or more channel access methods, such as code division multiple access (CDMA), time division multiple access (TDMA), frequency division multiple access (FDMA), orthogonal FDMA (OFDMA), single-carrier FDMA (SC-FDMA), zero-tail unique-word DFT-Spread OFDM (ZT UW DTS-s OFDM), unique word OFDM (UW-OFDM), resource block-filtered OFDM, filter bank multicarrier (FBMC), and the like.
[0019] As shown in FIG. 1A, the communications system 100 may include wireless transmit/receive units (WTRUs) 102a, 102b, 102c, 102d, a RAN 104/113, a CN 106/115, a public switched telephone network (PSTN) 108, the Internet 110, and other networks 112, though it will be appreciated that the disclosed embodiments contemplate any number of WTRUs, base stations, networks, and/or network elements. Each of the WTRUs 102a, 102b, 102c, 102d may be any type of device configured to operate and/or communicate in a wireless environment. By way of example, the WTRUs 102a, 102b, 102c, 102d, any of which may be referred to as a “station” and/or a “ST A”, may be configured to transmit and/or receive wireless signals and may include a user equipment (UE), a mobile station, a fixed or mobile subscriber unit, a subscription-based unit, a pager, a cellular telephone, a personal digital assistant (PDA), a smartphone, a laptop, a netbook, a personal computer, a wireless sensor, a hotspot or Mi-Fi device, an Internet of Things (loT) device, a watch or other wearable, a head-mounted display (HMD), a vehicle, a drone, a medical device and applications (e.g., remote surgery), an industrial device and applications (e.g., a robot and/or other wireless devices operating in an industrial and/or an automated processing chain contexts), a consumer electronics device, a device operating on commercial and/or industrial wireless networks, and the like. Any of the WTRUs 102a, 102b, 102c and 102d may be interchangeably referred to as a WTRU. Further, any description herein that is described with reference to a UE may be equally applicable to a WTRU (or vice versa). For example, a WTRU may be configured to perform any of the processes or procedures described herein as being performed by a UE (or wee versa).
[0020] The communications systems 100 may also include a base station 114a and/or a base station 114b. Each of the base stations 114a, 114b may be any type of device configured to wirelessly interface with at least one of the WTRUs 102a, 102b, 102c, 102d to facilitate access to one or more communication networks, such as the CN 106/115, the I nternet 110, and/or the other networks 112. By way of example, the base stations 114a, 114b may be a base transceiver station (BTS), a Node-B, an eNode B, a Home Node B, a Home eNode B, a gNB, a NR NodeB, a site controller, an access point (AP), a wireless router, and the like. While the base stations 114a, 114b are each depicted as a single element, it will be
appreciated that the base stations 114a, 114b may include any number of interconnected base stations and/or network elements.
[0021] The base station 114a may be part of the RAN 104/113, which may also include other base stations and/or network elements (not shown), such as a base station controller (BSC), a radio network controller (RNC), relay nodes, etc. The base station 114a and/or the base station 114b may be configured to transmit and/or receive wireless signals on one or more carrier frequencies, which may be referred to as a cell (not shown). These frequencies may be in licensed spectrum, unlicensed spectrum, or a combination of licensed and unlicensed spectrum. A cell may provide coverage for a wireless service to a specific geographical area that may be relatively fixed or that may change over time. The cell may further be divided into cell sectors. For example, the cell associated with the base station 114a may be divided into three sectors. Thus, in one embodiment, the base station 114a may include three transceivers, i.e., one for each sector of the cell. In an embodiment, the base station 114a may employ multiple-input multiple output (MIMO) technology and may utilize multiple transceivers for each sector of the cell. For example, beamforming may be used to transmit and/or receive signals in desired spatial directions.
[0022] The base stations 114a, 114b may communicate with one or more of the WTRUs 102a, 102b, 102c, 102d over an air interface 116, which may be any suitable wireless communication link (e.g., radio frequency (RF), microwave, centimeter wave, micrometer wave, infrared (IR), ultraviolet (UV), visible light, etc.). The air interface 116 may be established using any suitable radio access technology (RAT).
[0023] More specifically, as noted above, the communications system 100 may be a multiple access system and may employ one or more channel access schemes, such as CDMA, TDMA, FDMA, OFDMA, SC-FDMA, and the like. For example, the base station 114a in the RAN 104/113 and the WTRUs 102a, 102b, 102c may implement a radio technology such as Universal Mobile Telecommunications System (UMTS) Terrestrial Radio Access (UTRA), which may establish the air interface 115/116/117 using wideband CDMA (WCDMA). WCDMA may include communication protocols such as High-Speed Packet Access (HSPA) and/or Evolved HSPA (HSPA+). HSPA may include High-Speed Downlink (DL) Packet Access (HSDPA) and/or High-Speed UL Packet Access (HSUPA).
[0024] I n an embodiment, the base station 114a and the WTRUs 102a, 102b, 102c may implement a radio technology such as Evolved UMTS Terrestrial Radio Access (E-UTRA), which may establish the air interface 116 using Long Term Evolution (LTE) and/or LTE-Advanced (LTE-A) and/or LTE-Advanced Pro (LTE-A Pro).
[0025] I n an embodiment, the base station 114a and the WTRUs 102a, 102b, 102c may implement a radio technology such as NR Radio Access , which may establish the air interface 116 using New Radio (NR).
[0026] In an embodiment, the base station 114a and the WTRUs 102a, 102b, 102c may implement multiple radio access technologies. For example, the base station 114a and the WTRUs 102a, 102b, 102c may implement LTE radio access and NR radio access together, for instance using dual connectivity (DC) principles. Thus, the air interface utilized by WTRUs 102a, 102b, 102c may be characterized by multiple types of radio access technologies and/or transmissions sent to/from multiple types of base stations (e.g., a eNB and a gNB).
[0027] In other embodiments, the base station 114a and the WTRUs 102a, 102b, 102c may implement radio technologies such as IEEE 802.11 (i.e., Wireless Fidelity (WiFi), IEEE 802.16 (i.e., Worldwide Interoperability for Microwave Access (WiMAX)), CDMA2000, CDMA2000 1X, CDMA2000 EV-DO, Interim Standard 2000 (IS-2000), Interim Standard 95 (IS-95), Interim Standard 856 (IS-856), Global System for Mobile communications (GSM), Enhanced Data rates for GSM Evolution (EDGE), GSM EDGE (GERAN), and the like.
[0028] The base station 114b in FIG. 1 A may be a wireless router, Home Node B, Home eNode B, or access point, for example, and may utilize any suitable RAT for facilitating wireless connectivity in a localized area, such as a place of business, a home, a vehicle, a campus, an industrial facility, an air corridor (e.g., for use by drones), a roadway, and the like. In one embodiment, the base station 114b and the WTRUs 102c, 102d may implement a radio technology such as IEEE 802.11 to establish a wireless local area network (WLAN). In an embodiment, the base station 114b and the WTRUs 102c, 102d may implement a radio technology such as IEEE 802.15 to establish a wireless personal area network (WPAN). In yet another embodiment, the base station 114b and the WTRUs 102c, 102d may utilize a cellular-based RAT (e.g., WCDMA, CDMA2000, GSM, LTE, LTE-A, LTE-A Pro, NR etc.) to establish a picocell or femtocell. As shown in FIG. 1A, the base station 114b may have a direct connection to the Internet 110. Thus, the base station 114b may not be required to access the Internet 110 via the CN 106/115.
[0029] The RAN 104/113 may be in communication with the CN 106/115, which may be any type of network configured to provide voice, data, applications, and/or voice over internet protocol (VoIP) services to one or more of the WTRUs 102a, 102b, 102c, 102d. The data may have varying quality of service (QoS) requirements, such as differing throughput requirements, latency requirements, error tolerance requirements, reliability requirements, data throughput requirements, mobility requirements, and the like.
The CN 106/115 may provide call control, billing services, mobile location-based services, pre-paid calling, Internet connectivity, video distribution, etc., and/or perform high-level security functions, such as user authentication. Although not shown in FIG. 1A, it will be appreciated that the RAN 104/113 and/or the CN 106/115 may be in direct or indirect communication with other RANs that employ the same RAT as the RAN 104/113 or a different RAT. For example, in addition to being connected to the RAN 104/113, which may be utilizing a NR radio technology, the CN 106/115 may also be in communication with another RAN (not shown) employing a GSM, UMTS, CDMA 2000, WiMAX, E-UTRA, or WiFi radio technology.
[0030] The CN 106/115 may also serve as a gateway for the WTRUs 102a, 102b, 102c, 102d to access the PSTN 108, the Internet 110, and/or the other networks 112. The PSTN 108 may include circuit- switched telephone networks that provide plain old telephone service (POTS). The Internet 110 may include a global system of interconnected computer networks and devices that use common communication protocols, such as the transmission control protocol (TCP), user datagram protocol (UDP) and/or the internet protocol (IP) in the TCP/IP internet protocol suite. The networks 112 may include wired and/or wireless communications networks owned and/or operated by other service providers. For example, the networks 112 may include another CN connected to one or more RANs, which may employ the same RAT as the RAN 104/113 or a different RAT.
[0031] Some or all of the WTRUs 102a, 102b, 102c, 102d in the communications system 100 may include multi-mode capabilities (e.g., the WTRUs 102a, 102b, 102c, 102d may include multiple transceivers for communicating with different wireless networks over different wireless links). For example, the WTRU 102c shown in FIG. 1A may be configured to communicate with the base station 114a, which may employ a cellular-based radio technology, and with the base station 114b, which may employ an IEEE 802 radio technology.
[0032] FIG. 1 B is a system diagram illustrating an example WTRU 102. As shown in FIG. 1 B, the WTRU 102 may include a processor 118, a transceiver 120, a transmit/receive element 122, a speaker/microphone 124, a keypad 126, a display/touchpad 128, non-removable memory 130, removable memory 132, a power source 134, a global positioning system (GPS) chipset 136, and/or other peripherals 138, among others. It will be appreciated that the WTRU 102 may include any sub-combination of the foregoing elements while remaining consistent with an embodiment.
[0033] The processor 118 may be a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific
Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) circuits, any other type of integrated circuit (IC), a state machine, and the like. The processor 118 may perform signal coding, data processing, power control, input/output processing, and/or any other functionality that enables the WTRU 102 to operate in a wireless environment. The processor 118 may be coupled to the transceiver 120, which may be coupled to the transmit/receive element 122. While FIG. 1B depicts the processor 118 and the transceiver 120 as separate components, it will be appreciated that the processor 118 and the transceiver 120 may be integrated together in an electronic package or chip.
[0034] The transmit/receive element 122 may be configured to transmit signals to, or receive signals from, a base station (e.g. , the base station 114a) over the air interface 116. For example, in one embodiment, the transmit/receive element 122 may be an antenna configured to transmit and/or receive RF signals. In an embodiment, the transmit/receive element 122 may be an emitter/detector configured to transmit and/or receive IR, UV, or visible light signals, for example. In yet another embodiment, the transmit/receive element 122 may be configured to transmit and/or receive both RF and light signals. It will be appreciated that the transmit/receive element 122 may be configured to transmit and/or receive any combination of wireless signals.
[0035] Although the transmit/receive element 122 is depicted in FIG. 1 B as a single element, the WTRU 102 may include any number of transmit/receive elements 122. More specifically, the WTRU 102 may employ MIMO technology. Thus, in one embodiment, the WTRU 102 may include two or more transmit/receive elements 122 (e.g., multiple antennas) for transmitting and receiving wireless signals over the air interface 116.
[0036] The transceiver 120 may be configured to modulate the signals that are to be transmitted by the transmit/receive element 122 and to demodulate the signals that are received by the transmit/receive element 122. As noted above, the WTRU 102 may have multi-mode capabilities. Thus, the transceiver 120 may include multiple transceivers for enabling the WTRU 102 to communicate via multiple RATs, such as NR and IEEE 802.11 , for example.
[0037] The processor 118 of the WTRU 102 may be coupled to, and may receive user input data from, the speaker/microphone 124, the keypad 126, and/or the display/touchpad 128 (e.g., a liquid crystal display (LCD) display unit or organic light-emitting diode (OLED) display unit). The processor 118 may also output user data to the speaker/microphone 124, the keypad 126, and/or the display/touchpad 128. In addition, the processor 118 may access information from, and store data in, any type of suitable memory, such as the non-removable memory 130 and/or the removable memory 132. The non-removable memory 130 may
include random-access memory (RAM), read-only memory (ROM), a hard disk, or any other type of memory storage device. The removable memory 132 may include a subscriber identity module (SIM) card, a memory stick, a secure digital (SD) memory card, and the like. In other embodiments, the processor 118 may access information from, and store data in, memory that is not physically located on the WTRU 102, such as on a server or a home computer (not shown).
[0038] The processor 118 may receive power from the power source 134, and may be configured to distribute and/or control the power to the other components in the WTRU 102. The power source 134 may be any suitable device for powering the WTRU 102. For example, the power source 134 may include one or more dry cell batteries (e.g., nickel-cadmium (NiCd), nickel-zinc (NiZn), nickel metal hydride (NiMH), lithium-ion (Li-ion), etc.), solar cells, fuel cells, and the like.
[0039] The processor 118 may also be coupled to the GPS chipset 136, which may be configured to provide location information (e.g., longitude and latitude) regarding the current location of the WTRU 102. In addition to, or in lieu of, the information from the GPS chipset 136, the WTRU 102 may receive location information over the air interface 116 from a base station (e.g., base stations 114a, 114b) and/or determine its location based on the timing of the signals being received from two or more nearby base stations. It will be appreciated that the WTRU 102 may acquire location information by way of any suitable locationdetermination method while remaining consistent with an embodiment.
[0040] The processor 118 may further be coupled to other peripherals 138, which may include one or more software and/or hardware modules that provide additional features, functionality and/or wired or wireless connectivity. For example, the peripherals 138 may include an accelerometer, an e-compass, a satellite transceiver, a digital camera (for photographs and/or video), a universal serial bus (USB) port, a vibration device, a television transceiver, a hands free headset, a Bluetooth® module, a frequency modulated (FM) radio unit, a digital music player, a media player, a video game player module, an Internet browser, a Virtual Reality and/or Augmented Reality (VR/AR) device, an activity tracker, and the like. The peripherals 138 may include one or more sensors, the sensors may be one or more of a gyroscope, an accelerometer, a hall effect sensor, a magnetometer, an orientation sensor, a proximity sensor, a temperature sensor, a time sensor; a geolocation sensor; an altimeter, a light sensor, a touch sensor, a magnetometer, a barometer, a gesture sensor, a biometric sensor, and/or a humidity sensor.
[0041] The WTRU 102 may include a full duplex radio for which transmission and reception of some or all of the signals (e.g., associated with particular subframes for both the UL (e.g., for transmission) and downlink (e.g., for reception) may be concurrent and/or simultaneous. The full duplex radio may include an
interference management unit 139 to reduce and or substantially eliminate self-interference via either hardware (e.g., a choke) or signal processing via a processor (e.g., a separate processor (not shown) or via processor 118). In an embodiment, the WRTU 102 may include a half-duplex radio for which transmission and reception of some or all of the signals (e.g., associated with particular subframes for either the UL (e.g, for transmission) or the downlink (e.g, for reception)).
[0042] FIG. 1C is a system diagram illustrating the RAN 104 and the CN 106 according to an embodiment. As noted above, the RAN 104 may employ an E-UTRA radio technology to communicate with the WTRUs 102a, 102b, 102c over the air interface 116. The RAN 104 may also be in communication with the CN 106.
[0043] The RAN 104 may include eNode-Bs 160a, 160b, 160c, though it will be appreciated that the RAN 104 may include any number of eNode-Bs while remaining consistent with an embodiment. The eNode-Bs 160a, 160b, 160c may each include one or more transceivers for communicating with the WTRUs 102a, 102b, 102c over the air interface 116. In one embodiment, the eNode-Bs 160a, 160b, 160c may implement MIMO technology. Thus, the eNode-B 160a, for example, may use multiple antennas to transmit wireless signals to, and/or receive wireless signals from, the WTRU 102a.
[0044] Each of the eNode-Bs 160a, 160b, 160c may be associated with a particular cell (not shown) and may be configured to handle radio resource management decisions, handover decisions, scheduling of users in the UL and/or DL, and the like. As shown in FIG. 1C, the eNode-Bs 160a, 160b, 160c may communicate with one another over an X2 interface.
[0045] The CN 106 shown in FIG. 1 C may include a mobility management entity (MME) 162, a serving gateway (SGW) 164, and a packet data network (PDN) gateway (or PGW) 166. While each of the foregoing elements are depicted as part of the CN 106, it will be appreciated that any of these elements may be owned and/or operated by an entity other than the CN operator.
[0046] The MME 162 may be connected to each of the eNode-Bs 162a, 162b, 162c in the RAN 104 via an S1 interface and may serve as a control node. For example, the MME 162 may be responsible for authenticating users of the WTRUs 102a, 102b, 102c, bearer activation/deactivation, selecting a particular serving gateway during an initial attach of the WTRUs 102a, 102b, 102c, and the like. The MME 162 may provide a control plane function for switching between the RAN 104 and other RANs (not shown) that employ other radio technologies, such as GSM and/or WCDMA.
[0047] The SGW 164 may be connected to each of the eNode Bs 160a, 160b, 160c in the RAN 104 via the S1 interface. The SGW 164 may generally route and forward user data packets to/from the WTRUs
102a, 102b, 102c. The SGW 164 may perform other functions, such as anchoring user planes during inter- eNode B handovers, triggering paging when DL data is available for the WTRUs 102a, 102b, 102c, managing and storing contexts of the WTRUs 102a, 102b, 102c, and the like.
[0048] The SGW 164 may be connected to the PGW 166, which may provide the WTRUs 102a, 102b, 102c with access to packet-switched networks, such as the Internet 110, to facilitate communications between the WTRUs 102a, 102b, 102c and IP-enabled devices.
[0049] The CN 106 may facilitate communications with other networks. For example, the CN 106 may provide the WTRUs 102a, 102b, 102c with access to circuit-switched networks, such as the PSTN 108, to facilitate communications between the WTRUs 102a, 102b, 102c and traditional land-line communications devices. For example, the CN 106 may include, or may communicate with, an IP gateway (e.g., an IP multimedia subsystem (IMS) server) that serves as an interface between the CN 106 and the PSTN 108. In addition, the CN 106 may provide the WTRUs 102a, 102b, 102c with access to the other networks 112, which may include other wired and/or wireless networks that are owned and/or operated by other service providers.
[0050] Although the WTRU is described in FIGS. 1 A-1 D as a wireless terminal, it is contemplated that in certain representative embodiments that such a terminal may use (e.g., temporarily or permanently) wired communication interfaces with the communication network.
[0051] In representative embodiments, the other network 112 may be a WLAN.
[0052] A WLAN in Infrastructure Basic Service Set (BSS) mode may have an Access Point (AP) for the BSS and one or more stations (STAs) associated with the AP. The AP may have an access or an interface to a Distribution System (DS) or another type of wired/wireless network that carries traffic in to and/or out of the BSS. Traffic to STAs that originates from outside the BSS may arrive through the AP and may be delivered to the STAs. Traffic originating from STAs to destinations outside the BSS may be sent to the AP to be delivered to respective destinations. Traffic between STAs within the BSS may be sent through the AP, for example, where the source STA may send traffic to the AP and the AP may deliver the traffic to the destination STA. The traffic between STAs within a BSS may be considered and/or referred to as peer-to- peer traffic. The peer-to-peer traffic may be sent between (e.g., directly between) the source and destination STAs with a direct link setup (DLS). In certain representative embodiments, the DLS may use an 802.11 e DLS or an 802.11 z tunneled DLS (TDLS). A WLAN using an Independent BSS (I BSS) mode may not have an AP, and the STAs (e.g., all of the STAs) within or using the IBSS may communicate
directly with each other. The IBSS mode of communication may sometimes be referred to herein as an “ad- hoc” mode of communication.
[0053] When using the 802.11 ac infrastructure mode of operation or a similar mode of operations, the AP may transmit a beacon on a fixed channel, such as a primary channel. The primary channel may be a fixed width (e.g., 20 MHz wide bandwidth) or a dynamically set width via signaling. The primary channel may be the operating channel of the BSS and may be used by the STAs to establish a connection with the AP. In certain representative embodiments, Carrier Sense Multiple Access with Collision Avoidance (CSMA/CA) may be implemented, for example in in 802.11 systems. For CSMA/CA, the STAs (e.g., every ST A), including the AP, may sense the primary channel. If the primary channel is sensed/detected and/or determined to be busy by a particular STA, the particular STA may back off. One STA (e.g., only one station) may transmit at any given time in a given BSS.
[0054] High Throughput (HT) STAs may use a 40 MHz wide channel for communication, for example, via a combination of the primary 20 MHz channel with an adjacent or nonadjacent 20 MHz channel to form a 40 MHz wide channel.
[0055] Very High Throughput (VHT) STAs may support 20MHz, 40 MHz, 80 MHz, and/or 160 MHz wide channels. The 40 MHz, and/or 80 MHz, channels may be formed by combining contiguous 20 MHz channels. A 160 MHz channel may be formed by combining 8 contiguous 20 MHz channels, or by combining two non-contiguous 80 MHz channels, which may be referred to as an 80+80 configuration. For the 80+80 configuration, the data, after channel encoding, may be passed through a segment parser that may divide the data into two streams. Inverse Fast Fourier Transform (IFFT) processing, and time domain processing, may be done on each stream separately. The streams may be mapped on to the two 80 MHz channels, and the data may be transmitted by a transmitting STA. At the receiver of the receiving STA, the above described operation for the 80+80 configuration may be reversed, and the combined data may be sent to the Medium Access Control (MAC).
[0056] Sub 1 GHz modes of operation are supported by 802.11 af and 802.11 ah. The channel operating bandwidths, and carriers, are reduced in 802.11 af and 802.11 ah relative to those used in 802.11 n, and 802.11ac. 802.11 af supports 5 MHz, 10 MHz and 20 MHz bandwidths in the TV White Space (TVWS) spectrum, and 802.11 ah supports 1 MHz, 2 MHz, 4 MHz, 8 MHz, and 16 MHz bandwidths using non-TVWS spectrum. According to a representative embodiment, 802.11 ah may support Meter Type Control/Machine- Type Communications, such as MTC devices in a macro coverage area. MTC devices may have certain capabilities, for example, limited capabilities including support for (e.g., only support for) certain and/or
limited bandwidths. The MTC devices may include a battery with a battery life above a threshold (e.g., to maintain a very long battery life).
[0057] WLAN systems, which may support multiple channels, and channel bandwidths, such as 802.11 n, 802.11 ac, 802.11 af, and 802.11 ah, include a channel which may be designated as the primary channel. The primary channel may have a bandwidth equal to the largest common operating bandwidth supported by all STAs in the BSS. The bandwidth of the primary channel may be set and/or limited by a STA, from among all STAs in operating in a BSS, which supports the smallest bandwidth operating mode. In the example of 802.11 ah, the primary channel may be 1 MHz wide for STAs (e.g., MTC type devices) that support (e.g., only support) a 1 MHz mode, even if the AP, and other STAs in the BSS support 2 MHz, 4 MHz, 8 MHz, 16 MHz, and/or other channel bandwidth operating modes. Carrier sensing and/or Network Allocation Vector (NAV) settings may depend on the status of the primary channel. If the primary channel is busy, for example, due to a STA (which supports only a 1 MHz operating mode), transmitting to the AP, the entire available frequency bands may be considered busy even though a majority of the frequency bands remains idle and may be available.
[0058] In the United States, the available frequency bands, which may be used by 802.11 ah, are from 902 MHz to 928 MHz. In Korea, the available frequency bands are from 917.5 MHz to 923.5 MHz. In Japan, the available frequency bands are from 916.5 MHz to 927.5 MHz. The total bandwidth available for 802.11 ah is 6 MHz to 26 MHz depending on the country code.
[0059] FIG. 1 D is a system diagram illustrating the RAN 113 and the CN 115 according to an embodiment. As noted above, the RAN 113 may employ an NR radio technology to communicate with the WTRUs 102a, 102b, 102c over the air interface 116. The RAN 113 may also be in communication with the CN 115.
[0060] The RAN 113 may include gNBs 180a, 180b, 180c, though it will be appreciated that the RAN 113 may include any number of gNBs while remaining consistent with an embodiment. The gNBs 180a, 180b, 180c may each include one or more transceivers for communicating with the WTRUs 102a, 102b, 102c over the air interface 116. In one embodiment, the gNBs 180a, 180b, 180c may implement MIMO technology. For example, gNBs 180a, 108b may utilize beamforming to transmit signals to and/or receive signals from the gNBs 180a, 180b, 180c. Thus, the gNB 180a, for example, may use multiple antennas to transmit wireless signals to, and/or receive wireless signals from, the WTRU 102a. In an embodiment, the gNBs 180a, 180b, 180c may implement carrier aggregation technology. For example, the gNB 180a may transmit multiple component carriers to the WTRU 102a (not shown). A subset of these component carriers
may be on unlicensed spectrum while the remaining component carriers may be on licensed spectrum. In an embodiment, the gNBs 180a, 180b, 180c may implement Coordinated Multi-Point (CoMP) technology. For example, WTRU 102a may receive coordinated transmissions from gNB 180a and gNB 180b (and/or gNB 180c).
[0061] The WTRUs 102a, 102b, 102c may communicate with gNBs 180a, 180b, 180c using transmissions associated with a scalable numerology. For example, the OFDM symbol spacing and/or OFDM subcarrier spacing may vary for different transmissions, different cells, and/or different portions of the wireless transmission spectrum. The WTRUs 102a, 102b, 102c may communicate with gNBs 180a, 180b, 180c using subframe or transmission time intervals (TTIs) of various or scalable lengths (e.g., containing varying number of OFDM symbols and/or lasting varying lengths of absolute time).
[0062] The gNBs 180a, 180b, 180c may be configured to communicate with the WTRUs 102a, 102b, 102c in a standalone configuration and/or a non-standalone configuration. In the standalone configuration, WTRUs 102a, 102b, 102c may communicate with gNBs 180a, 180b, 180c without also accessing other RANs (e.g., such as eNode-Bs 160a, 160b, 160c). In the standalone configuration, WTRUs 102a, 102b, 102c may utilize one or more of gNBs 180a, 180b, 180c as a mobility anchor point. In the standalone configuration, WTRUs 102a, 102b, 102c may communicate with gNBs 180a, 180b, 180c using signals in an unlicensed band. In a non-standalone configuration WTRUs 102a, 102b, 102c may communicate with/connect to gNBs 180a, 180b, 180c while also communicating with/connecting to another RAN such as eNode-Bs 160a, 160b, 160c. For example, WTRUs 102a, 102b, 102c may implement DC principles to communicate with one or more gNBs 180a, 180b, 180c and one or more eNode-Bs 160a, 160b, 160c substantially simultaneously. In the non-standalone configuration, eNode-Bs 160a, 160b, 160c may serve as a mobility anchor for WTRUs 102a, 102b, 102c and gNBs 180a, 180b, 180c may provide additional coverage and/or throughput for servicing WTRUs 102a, 102b, 102c.
[0063] Each of the gNBs 180a, 180b, 180c may be associated with a particular cell (not shown) and may be configured to handle radio resource management decisions, handover decisions, scheduling of users in the UL and/or DL, support of network slicing, dual connectivity, interworking between NR and E-UTRA, routing of user plane data towards User Plane Function (UPF) 184a, 184b, routing of control plane information towards Access and Mobility Management Function (AMF) 182a, 182b and the like. As shown in FIG. 1D, the gNBs 180a, 180b, 180c may communicate with one another over an Xn interface.
[0064] The CN 115 shown in FIG. 1 D may include at least one AMF 182a, 182b, at least one UPF 184a, 184b, at least one Session Management Function (SMF) 183a, 183b, and possibly a Data Network
(DN) 185a, 185b. While each of the foregoing elements are depicted as part of the CN 115, it will be appreciated that any of these elements may be owned and/or operated by an entity other than the CN operator.
[0065] The AMF 182a, 182b may be connected to one or more of the gNBs 180a, 180b, 180c in the RAN 113 via an N2 interface and may serve as a control node. For example, the AMF 182a, 182b may be responsible for authenticating users of the WTRUs 102a, 102b, 102c, support for network slicing (e.g., handling of different PDU sessions with different requirements), selecting a particular SMF 183a, 183b, management of the registration area, termination of NAS signaling, mobility management, and the like. Network slicing may be used by the AMF 182a, 182b in order to customize CN support for WTRUs 102a, 102b, 102c based on the types of services being utilized WTRUs 102a, 102b, 102c. For example, different network slices may be established for different use cases such as services relying on ultra-reliable low latency (URLLC) access, services relying on enhanced massive mobile broadband (eMBB) access, services for machine type communication (MTC) access, and/or the like. The AMF 162 may provide a control plane function for switching between the RAN 113 and other RANs (not shown) that employ other radio technologies, such as LTE, LTE-A, LTE-A Pro, and/or non-3GPP access technologies such as WiFi. [0066] The SMF 183a, 183b may be connected to an AMF 182a, 182b in the CN 115 via an N11 interface. The SMF 183a, 183b may also be connected to a UPF 184a, 184b in the CN 115 via an N4 interface. The SMF 183a, 183b may select and control the UPF 184a, 184b and configure the routing of traffic through the UPF 184a, 184b. The SMF 183a, 183b may perform other functions, such as managing and allocating WTRU IP address, managing PDU sessions, controlling policy enforcement and QoS, providing downlink data notifications, and the like. A PDU session type may be IP-based, non-IP based, Ethernet-based, and the like.
[0067] The UPF 184a, 184b may be connected to one or more of the gNBs 180a, 180b, 180c in the RAN 113 via an N3 interface, which may provide the WTRUs 102a, 102b, 102c with access to packet-switched networks, such as the Internet 110, to facilitate communications between the WTRUs 102a, 102b, 102c and IP-enabled devices. The UPF 184, 184b may perform other functions, such as routing and forwarding packets, enforcing user plane policies, supporting multi-homed PDU sessions, handling user plane QoS, buffering downlink packets, providing mobility anchoring, and the like.
[0068] The CN 115 may facilitate communications with other networks. For example, the CN 115 may include, or may communicate with, an IP gateway (e.g., an IP multimedia subsystem (IMS) server) that serves as an interface between the CN 115 and the PSTN 108. In addition, the CN 115 may provide the
WTRUs 102a, 102b, 102c with access to the other networks 112, which may include other wired and/or wireless networks that are owned and/or operated by other service providers. In one embodiment, the WTRUs 102a, 102b, 102c may be connected to a local Data Network (DN) 185a, 185b through the UPF 184a, 184b via the N3 interface to the UPF 184a, 184b and an N6 interface between the UPF 184a, 184b and the DN 185a, 185b.
[0069] In view of Figures 1A-1 D, and the corresponding description of Figures 1A-1 D, one or more, or all, of the functions described herein with regard to one or more of: WTRU 102a-d, Base Station 114a-b, eNode-B 160a-c, MME 162, SGW 164, PGW 166, gNB 180a-c, AMF 182a-ab, UPF 184a-b, SMF 183a-b, DN 185a-b, and/or any other device(s) described herein, may be performed by one or more emulation devices (not shown). The emulation devices may be one or more devices configured to emulate one or more, or all, of the functions described herein. For example, the emulation devices may be used to test other devices and/or to simulate network and/or WTRU functions.
[0070] The emulation devices may be designed to implement one or more tests of other devices in a lab environment and/or in an operator network environment. For example, the one or more emulation devices may perform the one or more, or all, functions while being fully or partially implemented and/or deployed as part of a wired and/or wireless communication network in order to test other devices within the communication network. The one or more emulation devices may perform the one or more, or all, functions while being temporarily implemented/deployed as part of a wired and/or wireless communication network. The emulation device may be directly coupled to another device for purposes of testing and/or may performing testing using over-the-air wireless communications.
[0071] The one or more emulation devices may perform the one or more, including all, functions while not being implemented/deployed as part of a wired and/or wireless communication network. For example, the emulation devices may be utilized in a testing scenario in a testing laboratory and/or a non-deployed (e.g, testing) wired and/or wireless communication network in order to implement testing of one or more components. The one or more emulation devices may be test equipment. Direct RF coupling and/or wireless communications via RF circuitry (e.g., which may include one or more antennas) may be used by the emulation devices to transmit and/or receive data.
[0072] FIG. 2 depicts an example reinforcement learning (RL) framework 200. The RL framework may include some key elements. The key elements may include agent which is the ML algorithm; the environment which is the adaptive problem space with attributes such as boundary values, rules, valid actions, etc.; the actions which is the step that the RL agent takes to navigate the environment; the state
which is the environment at a given point in time; the reward which is the response of the environment based on the taken actions (e.g., the response may be positive, negative or zero value); and/or the cumulative reward or value function which specifies what is good on the long run. On top of those elements may be the RL policy which defines the RL agent’s way of behaving at a given time. The RL policy may represent the mapping that determines the appropriate actions to take based on the observed states. At each time step t, an agent 202 may determine an action At based on the current state St using its policy. Then, an environment 204 may execute the action and return a reward Rt+ 1 and next state. Using the tuple (state, action, reward, next state) collected over past steps, the agent 202 may update its policy to maximize the cumulative reward.
[0073] RL models may be trained in different ways. For example, RL models may be trained online wherein the agent interacts with the real/field environment during training the RL model. For another example, RL models may be trained offline wherein the RL agent cannot interact with the real/field environment during training the RL model. Offline RL may provide a more efficient and flexible training framework as data collection is decoupled from policy training and may use less memory and computational resources relative to online RL training. In offline RL, the learning algorithm may be provided with a static dataset of transitions, D = {st l, a ,
and may learn the best policy it can using this dataset. Variable st l may represent the state of the environment at time t. Variable ai may denote the action taken by the agent at state st l. Variable r the resulting reward may be associated with the action at l. Variable s^+ 1 may be the state at time t+1 . In other words, offline RL may require the learning algorithm to find a sufficient understanding of the system dynamics from a fixed dataset, and then find a policy 7r(a|s) that achieves the largest possible cumulative reward.
[0074] While offline RL seems to be more efficient from a practical standpoint as it provides more flexible and less complex training framework relative to online RL, the performance of offline RL may be impacted when the RL model is deployed to the field. For instance, an offline-trained RL model may result in a well- trained policy but not good enough when the RL model is brought to the real environment, especially if the training dataset does not cover the entirety of the state and/or action space. Embodiments described herein discuss how to validate and/or evaluate an offline RL policy prior field deployment. In other words, embodiments described herein discuss how to ensure that the taken RL actions from an offline-trained RL model results in an acceptable performance, when this model is used in the real/field environment. The embodiments described herein address how the WTRU assists the network (NW) to validate and/or evaluate its RL model before running the model in the real system, for example, for a network-sided (NW-
sided) off-policy RL model. The embodiments described herein address how the WTRU can validate its off- policy RL model before using it, for example, for a WTRU-sided off-policy RL model. The described herein further address how the WTRU can determine and report the quality of the generated RL model actions from the offline-trained model.
[0075] WTRU-assisted NW-sided offline RL policy validation is provided herein. Methods for evaluating/validating offline-trained NW-sided RL models using WTRU assistance through determining the quality of the hypothetical actions as a function of one or more measurements satisfying configured criteria for performance requirements.
[0076] A WTRU may receive configuration for validating the NW-sided RL model. The configuration may include configuration of a validation mode. Actions selected and/or determined by the RL model may not be performed but informed; referred to as hypothetical actions. The configuration may include criteria(s), rules, and/or conditions for RL model validation. The criteria may be associated with determined quality of actions evaluated against one or more preconfigured performance threshold (s)) . For example, the criteria may be associated with determining a metric associated with hypothetical action, determining expected reward based on the metric, and/or evaluating and/or reporting the quality based on absolute or relative threshold. The configuration may include reporting configuration for RL model validation (e.g., resources for reporting, periodicity, model validation reporting format, etc.).
[0077] The WTRU may receive a first indication of a first action to be implemented for performing a first action related to communications on a network. The first action may be a reference action which includes one or more of applying a configuration associated with the first indication, performing a measurement or a reception using a first configuration determined by the first indication, triggering and/or performing a transmission using a second configuration determined by the first indication, suspending an ongoing or future transmission, entering a power saving mode, setting, resetting, and/or modifying one or more of protocol parameters, timers, or counters using a third configuration determined by the first indication.
[0078] The WTRU may receive, from the network, a second indication of a second configuration to be implemented for performing a second action related to communications on the network. The second action, for example, may be an anticipated or hypothetical action associated with a NW-side RL model received. Examples of anticipated or hypothetical action(s) may include hypothetical application of one or more DL beam(s) indices selected by the RL model and/or hypothetical application of one or more sub-band indices and/or allocated power selected by the RL model. The first configuration and the second configuration may each include a different downlink (DL) beam configuration, different power saving modes, different protocol
parameters, and/or different timers. The WTRU may perform a validation of the NW-side RL model by at least determining a second outcome based on a second performance metric associated with performance of the second action without actually implementing the second configuration to perform the second action related to the communications on the network, and/or determining a quality of the second action compared to a quality of the first action based on the first outcome and the second outcome. The WTRU may report the determined quality of the second action for validation of the anticipated or hypothetical action associated with the NW-side RL model.
[0079] In various embodiments described herein, the first configuration may be a first downlink (DL) beam configuration and the second configuration may be a second DL beam configuration. The first action may include the WTRU switching to the first DL beam to receive data transmissions. The second action may include the WTRU switching to the second DL beam provided by the NW-side RL model to receive the data transmissions. The second performance metric may include a measurement on the second DL beam.
[0080] In various embodiments described herein, the first configuration may be a first downlink control information (DCI) configuration indicating a first modulation and coding scheme (MCS). The second configuration may be a second DCI configuration indicating a second MCS. The first action may include the WTRU applying the first DCI configuration to receive DL transmissions. The second action may include the WTRU switching to apply the second DCI configuration to receive the DL transmissions. The second performance metric may include a measurement on the second DCI configuration.
[0081] In various embodiments described herein, the first configuration may be a first sub-band configuration. The second configuration may be a second sub-band configuration. The first action may include the WTRU applying the first sub-band configuration to receive DL transmissions. The second action may include the WTRU switching to apply the second sub-band configuration to receive the DL transmissions. The second performance metric may include a measurement on the second sub-band configuration.
[0082] In various embodiments described herein, the first configuration may be a first power level configuration for a first DL transmission. The second configuration may be a second power level configuration for a second DL transmission. The first action may include the WTRU applying the first power level configuration to receive DL transmissions. The second action may include the WTRU switching to apply the second power level configuration to receive the DL transmissions. The second performance metric may include a measurement on the second power level configuration
[0083] The one or more performance metric in each example may be determined based on one or more preconfigured criteria. Each of the one or more preconfigured criteria may include a hybrid automatic repeat
request (HARQ) acknowledgement (ACK), an average signal received power (RSRP), a signal-to- interference-plus-noise ratio (SINR), and/or a channel quality indicator (CQI). Each of the one or more preconfigured criteria may include an average number of beam switches. Though reference may be made to a hypothetical or anticipated action or outcome, embodiments described herein may implement either similarly throughout.
[0084] WTRU may perform the action (e.g., a reference action) (/apply the configuration) associated with first indication. The WTRU may perform a validation of NW-side RL model by determining a second outcome (e.g., a hypothetical or anticipated outcome) of the second action associated with second indication without actually performing the second action. For example, the validation may be performed of the NW-side RL model without applying the configuration associated with second indication. Examples of anticipated or hypothetical action(s) may include hypothetical application of one or more DL beam(s) indices selected by the RL model and hypothetical application of one or more sub-band indices and/or allocated power selected by the RL model. For example, the WTRU may determine if an indication is a first indication, or a second indication based on explicit information or implicit information, (e.g., a separate field in DCI, MAC CE, order of indication, different resources/channels/messages etc.).
[0085] The WTRU may determine the quality of the second action (e.g., hypothetical or anticipated action), based on one or more of the following. The WTRU may determine the quality of hypothetical action(s) based on measuring the performance metric (e.g., function of use-case) assuming hypothetical application of the action. The performance metric may include an average reference signal received power (RSRP), signal-to-interference-plus-noise ratio (SINR) over a configured period, an average number of beam switches in a configured period, a number of beam failures over a configured period (with or without successful recovery), and/or an average throughput over a configured period. The WTRU may determine the quality of hypothetical action(s) based on based on a comparison of performance metric associated with hypothetical action(s) with a preconfigured performance criteria and/or condition. For example, the determination of the quality of hypothetical actions may include a measured quantity and comparing it with a threshold (e.g., SINR/signal-to-noise ratio (SNR)ZRSRP on one of the sub-bands), a measured average number of beam switches and compare it with a configured metric threshold, a measured average number of beam failures and compare it with a configured metric threshold, a measured average RSRP over a period and comparing it with a threshold, and/or a measured reward over a period and comparing it with the configured expected reward. The WTRU may determine the quality of hypothetical action(s) based on an improvement/degradation of performance metric or quality of hypothetical action(s) relative to the
performance metric or quality of a reference action(s). For example, the WTRU may compare the performance metrics associated with the hypothetical action(s) and reference action(s).
[0086] The WTRU may report the determined quality of the second action(s) (e.g. hypothetical or anticipated actions), such as for NW-sided RL model validation. For example, reporting may be triggered when preconfigured conditions are satisfied. If the determined quality of hypothetical actions(s) exceeds or is less than the preconfigured threshold or a performance metric associated with hypothetical action(s) is offset better or worse than a performance metric associated with a reference action(s) over n configured periods. For example, reporting may be transmitted periodically (e.g., every n configured period over which the performance metrics may be calculated.) For example, the report may include one or more indication formats. The one or more indication formats may include Boolean and/or multiple quantized levels for quality (e.g., represented in multiple levels depending on the difference between the measured metric and the configured metric threshold).
[0087] The proposed embodiment may offer a viable way to evaluate and/or validate offline-trained NW- sided RL models. This may be beneficial when there is a discrepancy between the state and/or action space used in the training environment and the state and/or action space in the real environment or some of the high-reward regions of the state space are missed from the training.
[0088] WTRU-assisted NW-sided RL offline policy evaluation is introduced herein. Configuration of WTRU- based assistance of NW-sided RL model validation is introduced herein. The WTRU may be configured to receive a first indication and a second indication. The WTRU may perform a first action based on the first indication and a second action based on the second indication.
[0089] For example, the WTRU may perform a reference action upon receiving the first indication. For example, the reference action may be associated with one or more of applying the configuration associated with first indication, performing a measurement and/or reception using a configuration at least in part determined by the first indication, triggering and/or performing a transmission using a configuration at least in part determined by the first indication, suspending ongoing and/or future transmission, entering power saving mode, setting/resetting/modifying one or more the protocol parameters, timers, counters using a configuration at least in part determined by the first indication, and the like.
[0090] For example, the WTRU may not act directly on the second indication but may be configured to monitor and/or measure the impacts assuming that the second action (e.g., the hypothetical action) associated with the second indication were performed. For example, the second action may be associated with one or more of applying the configuration associated with second indication, performing a measurement
and/or reception using a configuration at least in part determined by the second indication, triggering and/or performing a transmission using a configuration at least in part determined by the second indication, suspending ongoing and/or future transmission, entering power saving mode, setting/resetting/modifying one or more the protocol parameters, timers, counters using a configuration at least in part determined by the second indication, and the like.
[0091] Example realization of reference action and hypothetical action(s) is introduced herein. In a first example realization, the first indication may be an indication associated with a first DL beam, which is possibly determined based on legacy algorithm or a supervised learning algorithm and the like. A second indication may be associated with a second DL beam, which is possibly determined based on the RL model to be evaluated. The reference action may be the WTRU switching to the first DL beam to receive a data transmission. The hypothetical action may be that the WTRU measuring one or more performance metric assuming that the WTRU switches to the second DL beam.
[0092] In a second example realization, the first indication may be a first downlink control information (DCI) indicating a first modulation and coding scheme (MCS), a first sub-band or a first power level for a first DL transmission, which is possibly determined based on legacy algorithm and/or a supervised learning algorithm and the like. The second indication may be a second DCI indicating a second MCS, a second sub-band and/or a second power level for a second transmission, possibly determined based on the RL model to be evaluated. The reference action may be the WTRU applying the first DCI to receive a DL data transmission. The hypothetical action may be that the WTRU measuring one or more performance metric assuming that the WTRU receives a DL transmission based on the second DL DCI.
[0093] The WTRU may be configured to determine the linkage between first indication and a corresponding second indication. In other words, the WTRU may be configured to determine the reference action and the corresponding hypothetical action which is to be evaluated against the reference action. The linkage may be implicitly configured. For example, the linkage may be implicitly configured based on the relationship between resources, channels, and/or messages carrying the first indication and second indication. For example, the first and second indication may be inferred by order of inclusion in a message. The linkage may be explicitly configured. For example, a separate field in DCI, media access control control element (MAC CE), radio resource control (RRC) message, etc. For example, a specific IE may be configured for second indication or hypothetical action. In one example, the first indication and second indication may each be associated with same logical identity.
[0094] Configuration of performance and quality metric is discussed herein. The WTRU may be configured to measure the performance metric associated with hypothetical actions based on second indication. Possibly such performance metric may be determined based on one or more preconfigured criteria(s), rule(s), and/or condition(s). The performance metric may be modeled as a reward metric. For example, hypothetical actions that lead to better performance (e.g., higher throughput, lower latency, higher power saving, etc.) may be assigned higher rewards than the actions that lead to degraded performance (e.g., lower throughput, higher latency, high power consumption etc.). The WTRU may be configured to determine and/or report a performance metric within a preconfigured range. For example, the WTRU may be configured with a minimum value, a maximum value, and quantized values in between with preconfigured step size.
[0095] The WTRU may be configured to measure the quality of hypothetical actions associated with second indication. The quality metric may be derived based on comparison of performance metric associated with hypothetical action against a preconfigured threshold. For example, the WTRU may be configured to monitor and/or determine if the hypothetical actions based on second indication would lead to at least one performance metric better than threshold. The quality metric may be derived based on comparison of performance metric of hypothetical actions against the performance metric of reference action. For example, the WTRU may be configured to monitor and/or determine if the hypothetical actions based on second indications would be better than the reference actions associated with first indication. The WTRU may be configured to determine and/or report a quality metric within a preconfigured range. For example, the WTRU may be configured with a minimum value, a maximum value, and quantized values in between with preconfigured step size. For example, the WTRU may be configured with multiple quantized levels for quality (e.g., represented in multiple levels depending on the difference between the measured metric and the configured metric threshold). The WTRU may be configured to report a Boolean value as quality metric. The Boolean value may indicate whether the hypothetical action is better or worse than the reference action. Alternately the Boolean value may indicate whether the hypothetical action is better or worse than the preconfigured threshold.
[0096] Resources for RL model validation report are discussed herein. The WTRU may be configured with resources to report one or more of the following associated with RL model validation, which may include the performance metric associated with a hypothetical action, quality of a hypothetical action, and the like. The WTRU may be configured with resources to report one or more of the following associated with RL model validation, which may include the performance metric associated with a set of hypothetical action(s)
in a preconfigured time window, quality of set of hypothetical action(s) in a preconfigured time window, and the like. The WTRU may be configured to report average performance metric and/or quality metric within the preconfigured window. The WTRU may be configured to report pairs of hypothetical action and the associated performance and/or quality metric for more than one (e.g, all or a subset) hypothetical actions within the time window. For example, the subset of hypothetical actions may be selected based on preconfigured criteria or by explicitly requested by the network.
[0097] The WTRU may be configured with resources for reporting RL model validation report periodically. Possibly the periodicity of reporting may be an integer multiple of the time window over which the performance and/or quality metric is calculated. For example, the WTRU may be configured with physical uplink control channel (PUCCH) resources to transmit RL model validation report. For example, the WTRU may be configured with semi-persistent physical uplink shared channel (PUSCH) resources to transmit RL model validation report. The WTRU may be configured to transmit such reports in one or more of Uplink Control Information (UCI), MAC Control Element (CE), etc. For example, the RL model validation report may be multiplexed with CSI reporting. For example, the feedback based on reference action may be transmitted in the first part of CSI report and the feedback based on hypothetical action may be transmitted in the second part of CSI report. For example, the RL model validation report may be transmitted in PUSCH resources. The RL model validation report may be transmitted in a higher layer message (e.g., in an RRC message). For example, the WTRU may be configured to transmit the validation report in a RRC message carrying beam level measurement report. For example, the WTRU may be configured to transmit the validation report in a uplink (UL) information transfer message. For example, the WTRU may be configured to transmit the validation report in a WTRU assistance information message.
[0098] The WTRU may be configured to transmit the RL model validation report based on request from the network. In one example, the WTRU may receive the request for transmission of RL model validation report in an aperiodic channel state information (CSI) request. In another example, the WTRU may receive the request for transmission of RL model validation report in a higher layer (e.g., RRC) request.
[0099] The WTRU may be configured with resources for reporting RL model validation report when one or more of on preconfigured conditions are satisfied. For example, the WTRU may transmit a report when the performance metric and/or the quality of hypothetical actions(s) exceeds and/or becomes lower than the preconfigured threshold. For example, the WTRU may transmit a report when the performance metric and/or the quality of hypothetical actions(s) exceeds and/or becomes lower than reference action(s) over n configured periods.
[0100] Determination of quality of actions associated with model validation is discussed herein. A WTRU configured with model validation for NW-sided RL model may receive a first and second indication from the NW where the WTRU may determine whether the indication is first or second based on one or more of the following, which includes dynamically by receiving an indication in the DCI field of a physical downlink control channel (PDCCH) resource, dynamically or semi persistently by MAC CE, based on a configuration that indicates the order or pattern of first indication and second indication, and/or based on the resources or channels or messages that the indication is received through.
[0101] The WTRU receiving a first indication may perform an action and/or apply a configuration (e.g. , reference action) associated with the first indication. A WTRU receiving a second indication may perform validation of the action and or configuration (e.g., hypothetical action) associated with the second indication without actually performing the action or applying the configuration. For example, actions associated with the first or second indication may include one or more DL beam indices selected by the RL model, one or more sub-band indices and/or allocated power selected by the RL model, an MCS selected by the RL model, and/or a handover request determined by the RL model.
[0102] The WTRU may measure a performance metric for the hypothetical action associated with the second indication without performing the action, and possibly likewise for the reference action. The performance metrics may be determined and reported with every reception of the second indication. The performance metrics may be calculated and recorded with every reception of the second indication but reported over a configured period. This may enable the WTRU to determine the performance metric upon collecting measurements over a sequence of hypothetical actions. The measured performance metric may be determined based on the hypothetical action. For example, the hypothetical action may represent a DL beam index selected by the RL model. The WTRU may be configured to determine and/or measure one or more performance metrics associated with the selected beams. For example, the WTRU may estimate the beamforming gain of the DL beam index. For example, the performance metric may be the average RSRP and/or SINR associated with the selected DL beam indices measured over a configured period. For example, the performance metric may be the average number of beam switches measured over a configured period. For example, the performance metric may be the average number of beam failures measured over a configured period. For example, the hypothetical action may represent sub-band indices and/or allocated power selected by the RL model. The WTRU may be configured to measure the SINR and/or SNR for the sub-band indices over a configured period. For example, the hypothetical action may represent the MCS selected by the RL model, and the WTRU may compute the block error rate for the
current SI NR and MCS. For example, the hypothetical action may represent a handover request determined by the RL model and the WTRU may measure the RSRP from the neighboring cell.
[0103] The WTRU may determine the quality of a hypothetical action, for example depending on the measured performance metric. The WTRU may determine the quality based on one of more of the following. The WTRU may determine the quality based on a comparison of performance metric associated with hypothetical actions with a preconfigured performance criterion. For example, the WTRU may determine the quality by performing one of the following evaluations. The evaluations may include estimating a performance metric and comparing it to a threshold (e.g., estimated beamforming gain with another DL beam, or the number of beam switches over a configured period, or the average RSRP/SINR/SNR over a configured period). The evaluations may include measuring a performance metric and comparing it to a threshold (e.g., SINR/SNR/RSRP on one of the sub-bands, or RSRP from another cell). The evaluation may include computing a performance metric and comparing it to a threshold (e.g., computing the block error rate for the current SINR and MCS). The evaluations may include measuring an average RSRP, SINR over a configured period and comparing it with a configured metric threshold. The evaluations may include measuring an average number of beam switches and comparing it with a configured metric threshold. The evaluations may include measuring an average number of beam failures and comparing it with a configured metric threshold. The evaluations may include measuring an average RSRP over a period and comparing it with a threshold. The evaluations may include measuring a reward over a period and comparing it with the configured expected reward (e.g., average throughput, block error rate, etc.). The WTRU may determine the quality based on a comparison of hypothetical action(s) relative to reference action(s). The reference action may include comparing the performance metrics associated with the hypothetical action(s) and reference action(s). The WTRU may measure the performance of the hypothetical action and reference action over a configured period. For example, the WTRU may measure the number of the hypothetical DL beam switches over a configured period and compare it against the number of the reference DL beam switches over the same period. For the hypothetical action, MCS selected by the RL model, the WTRU may re-encode the decoded bits for the previous reference action using the new MCS, decode under new SINR model and compare the block error rate.
[0104] Reporting of WTRU assistance information of NW-sided RL model validation is discussed herein. The WTRU may be configured to report the quality information associated with the hypothetical actions for WTRU-sided RL-model validation. For example, the quality information may be indicated in multiple formats. The formats may be configured, for example, in a DCI field or MAC CE. A first format may be
Boolean, for example, a binary bit indicating if the hypothetical action performance exceeds the reference action performance, or a binary bit indicating if the hypothetical action performance exceeds a configured performance threshold. A second format may be multiple quantized quality levels, indicated in two or more bits. In one example, each quality level may represent a specific range of performance difference between the hypothetical action performance and the reference action performance. In another example, each quality level may represent a range of difference between the hypothetical action performance and the configured threshold. A third format may indicate the direct performance of the hypothetical action. The performance may be a quantized performance metric, for example, quantized RSRP/SINR/SNR, configured by the NW.
[0105] The WTRU may be preconfigured with resources to indicate the validation report of the WTRU- sided RL model. The validation report may include the quality information of the hypothetical action. The WTRU may be configured with reserved symbols on PUSCH for transmission of validation report. For example, the reserved symbols may be configured on preconfigured time-frequency locations. The WTRU may be configured to report the quality information associated with the hypothetical actions in a Uplink Control Information (UCI). The resources for the RL model validation report transmission may be preconfigured for a WTRU. The resources may be defined as PUSCH format or UCI format. The resources may be configured via RRC configuration. These resources may be activated and/or deactivated by MAC CE.
[0106] The WTRU may be configured to report the quality information based on one or more of the following triggers. The WTRU may be configured to report the quality information based on a time-event. The WTRU may be configured with time instances or specific slots to determine and report the quality information of the hypothetical actions. In another option, the WTRU may be configured to report the quality information of the hypothetical actions every N configured period over which the performance metrics are calculated. The WTRU may be configured to report the quality information based on a measurement. The WTRU may report the quality information associated with the hypothetical actions based on one or more measurements achieving certain configured threshold or criteria. For example, the certain configured threshold or criteria may include when the performance of the hypothetical actions exceeds or is lower than a preconfigured threshold. In another example, the certain configured threshold or criteria may include when the performance of the hypothetical action exceeds or is lower than the performance of the reference action. The WTRU may be configured to report the quality information based on a feedback resource selection, or feedback resource, or change thereof. The WTRU may be configured to report the quality
information based on a performance of an associated function. For example, the WTRU may determine the rate of HARQ-ACK associated with the hypothetical actions. If the performance drops below a threshold, the WTRU may report the quality of the hypothetical action.
[0107] WTRU procedures for RL offline policy validation are discussed herein. Configuration of WTRU- sided RL model evaluation is discussed herein. A WTRU may be equipped with an RL model, capable of performing actions based on its WTRU-sided RL model reports its capabilities specific to the supported RL model. The report may include the RL model identification (ID) and its associated information and configuration, for example, applicable conditions such as environment setup and constraints associated to the pre-trained RL model, configuration of state and/or action space, target policy, dedicated and/or supported features (e.g., beam management), and/or status of the model (e.g., validated, not validated, conditions under which the model was validated, restricted conditions for the model validation, etc.). [0108] The WTRU may receive the configuration for the evaluation and validation of a supported RL model. The configuration may include at least one or more of reporting resource configuration, reporting periodicity, quality information configuration, and/or performance metrics for evaluating hypothetical actions. The configuration may include a reporting resource configuration. The reporting resource configuration may include resources for reporting, a maximum reporting overhead, and/or a reporting granularity. The reporting granularity may include number of separate transmissions required for the report (e.g., separate transmissions for each action or each subset of hypothetical actions). The WTRU may be configured to report the quality information based on a reporting periodicity. For example, the reporting periodicity may be periodic, semi-persistent, aperiodic, or triggered. Reporting omission configuration may include, for example, the WTRU omits reporting if the difference between the current report and a previous one for the same hypothetical action(s) is below a configured threshold. The WTRU may be configured to report the quality information based on a quality information configuration based on one or more preconfigured conditions or a quality information reporting and/or indication format. The one or more preconfigured conditions may include one or more thresholds/offsets for comparing reference and hypothetical action outcomes or a number of configured periods for validating the pre-configured conditions. The quality information reporting and/or indication format may include Boolean, which indicates whether the conditions associated with validation are satisfied or Quantized (e.g., multiple levels/scores of the quality of hypothetical action(s)).
[0109] The WTRU may be configured to report the quality information based on one or more performance metric(s) for evaluating hypothetical actions. For example, the one or more performance metric(s) for
evaluating hypothetical actions may be based on one or more of quantity-based (e.g, SINR/SNR/RSRP associated with the selected PMI by the RL model), reward-based (e.g., measured reward over a period and comparing it with the training reward), horizon (e.g., time-period for reward measurement, or number of repeated actions and repetition rate(s) or expected long-term or short-term reward limit), processing-time based (e.g., required time for evaluating hypothetical actions with respect to reference actions), complexitybased (e.g., required number of FLOPs), or performance criteria and/or conditions for comparison with performance metric(s) of hypothetical actions (e.g., such as hybrid automatic repeat request (HARQ) acknowledgement (ACK) or SI NR/RSRP/CQI (e.g., associated with the selected PMI by the RL model)). [0110] WTRU procedure for validating its RL model is discussed herein. WTRU determination of first indication and second indication for action generation is discussed herein. The WTRU may be configured to determine one or more parameters associated with a WTRU-sided RL model validation. The WTRU may determine a first indication and a second indication. The first indication may be based on non-AIML/legacy or a different RL model while the second indication may be based on the WTRU-sided RL model to be validated. The first indication may be associated with a reference action. The reference action may be applied by the WTRU for example, to perform a transmission using a configuration associated with the first indication. The second indication may be associated with a hypothetical action. The hypothetical action represents the output of the RL-model to be validated. The second indication may imply that the NW may provide feedback on the quality of the hypothetical action. For example, the reference action may represent a first precoding matrix indicator (PMI) selection by the WTRU. The first PMI may be used for beamforming in the next transmission. The hypothetical action may represent a second PMI selection by the WTRU. The second selected PMI may be evaluated (e.g., only evaluated) by the NW. In another example, the reference action may represent a first MCS selected by the WTRU. The selected modulating and coding scheme (MCS) may be used in data transmission. The hypothetical action may represent a second PMI selected by the WTRU. The selected PMI may be evaluated by the NW.
[0111] The WTRU may differentiate and/or determine the first indication and second indication from the DCI scheduling the transmission. The first indication and second indication may be determined from the contents of the DCI (e.g., explicit indication). The DCI may have specific field that indicates whether the WTRU needs to determine a first indication or second indication or both. The first indication and second indication may be determined from a parameter in the DCI (e.g., implicit indication). The parameters of DCI may include one or more of radio network temporary identifier (RNTI) used to decode the DCI, CORESET or search-space of the PDCCH transmission including the DCI, aggregation level of the PDCCH/PSCCH
transmission of the DCI, timing of the DCI reception, and/or beam or transmission configuration indication (TCI) state used for the PDCCH/PSCCH transmission of the DCI.
[0112] Reception of the of the first transmission and second transmission based on the first indication and second indication is discussed herein. The WTRU may receive a first transmission and a second transmission. The first transmission may be associated with the network response of the WTRU first indication. The second transmission may be associated with the network response to the WTRU second indication. For example, the first transmission may represent data transmission using the first selected/indicated PMI, for example, associated with the reference action. The second transmission may represent reference signal (RS) transmission using the second selected PMI (e.g., only RS transmission using the second selected PMI), for example, associated with the hypothetical action and/or the second transmission may represent feedback information about one or more performance aspect related to the second PMI. For example, the second transmission may represent interference metric associated with the second PMI, e.g., resulting interference on other WTRU served by the NW. The two transmissions may happen over the same channel (e.g., PDSCH/PDCCH) or over two different channels (e.g., first transmission occur over PDSCH while second transmission occur over PDCCH).
[0113] WTRU determination of the second transmission quality based on configured performance threshold is discussed herein. The WTRU may be configured to determine the quality of second transmission, for example, the transmission associated with the hypothetical action generated by the WTRU-sided RL model. The WTRU may be configured to measure a performance metric associated with the hypothetical action. For example, the WTRU may measure HARQ ACKs and/or NACKs rate or block error rate (BLER) associated with the hypothetical action. For example, the WTRU may perform one or more measurements associated with the hypothetical action. As an example, if the hypothetical action represents a PMI selected by the RL model and is applied on RS transmission by the NW, the measurements may include SINR and/or RSRP and/or CQI. The performance metric and/or measurements may be collected over a configured period.
[0114] The WTRU may determine the quality of actions by performing a comparison between the measured and/or determined performance metric associated with the hypothetical action and a preconfigured performance threshold/criteria/condition. For example, the WTRU may determine the quality of the selected PMI by comparing the measured SINR/SNR/RSRP with a configured SINR/SNR/RSRP threshold. The WTRU may determine the quality of hypothetical actions by comparing the hypothetical reward with a configured measured reward by the NW. The hypothetical reward may be the expected
training reward that the WTRU would have received if the hypothetical action would have been taken in the training environment. For example, the WTRU may be configured with a reward associated with the RL- based selected PMI. The reward may be function of interference and expected reception quality associated with the selected PMI. The WTRU may determine the quality by comparing the configured reward with the hypothetical reward.
[0115] WTRU determination of the quality of the second transmission (e.g., associated with hypothetical action), based on the performance of the first transmission (e.g., associated with the reference action), is discussed herein. The WTRU may determine the quality of the hypothetical action by comparing it against the reference action. For example, the WTRU may determine the quality of the hypothetical PMI, (e.g., RL- based selected PMI), by comparing the measured SINR/SNR/RSRP against the measured SINR/SNR/RSRP associated with the reference PMI, (e.g., the PMI used for data beamforming). The WTRU may determine and indicate the performance degradation or improvement of the hypothetical action relative to the reference action. For example, the WTRU may determine the RSRP/SNR/SINR difference between the RL-based selected PMI and the reference PMI.
[0116] WTRU determination of the quality of the second transmission (e.g., associated with the hypothetical action), based on the NW feedback (e.g., performance metric) associated with the hypothetical action, is discussed herein. The WTRU may determine the quality of hypothetical actions based on the NW feedback. For example, the NW may provide feedback (e.g., performance metric) associated with the WTRU indicated hypothetical action. For example, the WTRU may report the RL-selected PMI and the NW may respond by interference metric feedback that reflects the amount of interference caused by the WTRU RL-based selected PMI. The WTRU may determine and report the quality of the RL-based selected PMI based on the received interference metric, or a combination of the received interference metric and the measured RSRP associated with the RL-based selected PMI.
[0117] Indication of the WTRU RL validation report may be discussed herein. The WTRU may be configured to report the quality information associated with the hypothetical actions for WTRU-sided RL- model validation. For example, the quality information may be indicated in multiple formats. The formats may be configured, e.g., in a DCI field or MAC CE. A first format may be Boolean, for example, a binary bit indicating if the hypothetical action performance exceeds the reference action performance, or a binary bit indicating if the hypothetical action performance exceeds a configured performance threshold. A second format may be multiple quantized quality levels, indicated in two or more bits. Each quality level may represent a specific range of performance difference between the hypothetical action performance and the
reference action performance, or each quality level may represent a range of difference between the hypothetical action performance and the configured threshold. A third format may indicate the direct performance of the hypothetical action. The performance may be a quantized performance metric, e.g. quantized RSRP/SINR/SNR, configured by the NW.
[0118] The WTRU may be preconfigured with resources to indicate the validation report of the WTRU-sided RL model. The validation report may include the quality information of the hypothetical action. The WTRU may be configured with reserved symbols on PUSCH for transmission of validation report. For example, the reserved symbols may be configured on preconfigured time-frequency locations. The WTRU may be configured to report the quality information associated with the hypothetical actions in a Uplink Control Information (UCI). The resources for the RL model validation report transmission may be preconfigured for a WTRU. The resources may be defined as PUSCH format or UCI format. The resources may be configured via RRC configuration. These resources may be activated and/ordeactivated by MAC CE.
[0119] The WTRU may be configured to report the quality information based on one or more of the following triggers. The WTRU may be configured to report the quality information based on a time-event trigger. The WTRU may be configured with time instances or specific slots to determine and report the quality information of the hypothetical actions. In another option, the WTRU may be configured to report the quality information of the hypothetical actions every N configured period over which the performance metrics are calculated. The WTRU may be configured to report the quality information based on a measurement trigger. The WTRU may report the quality information associated with the hypothetical actions based on one or more measurements achieving certain configured threshold or criteria. For example, the one or more measurements achieving certain configured threshold or criteria may include threshold or criteria when the performance of the hypothetical actions exceeds or is lower than a preconfigured threshold. In another example, the one or more measurements achieving certain configured threshold or criteria may include threshold or criteria when the performance of the hypothetical action exceeds or is lower than the performance of the reference action. The WTRU may be configured to report the quality information based on a feedback resource selection, or feedback resource, or change thereof. The WTRU may be configured to report the quality information based on a performance of an associated function. For example, the WTRU may determine the rate of HARQ-ACK associated with the hypothetical actions. If the performance drops below a threshold, the WTRU may report the quality of the hypothetical action.
[0120] WTRU-assisted NW-sided RL offline policy evaluation is discussed herein. A WTRU may receive a configuration for validating the NW-sided RL model. The configuration may include a configuration of a
validation mode. Actions selected/determined by RL model may not be performed but informed, which is referred to as hypothetical actions. The configuration may include criteria(s)/rules/conditions for RL model validation. The criteria may be associated with determined quality of actions evaluated against one or more preconfigured performance threshold(s). For example, the WTRU may determine a metric associated with hypothetical action. The WTRU may determine expected reward based on the metric and evaluate and/or report the quality based on absolute or relative threshold. The configuration may include a reporting configuration for RL model validation (e.g., resources for reporting, periodicity, model validation reporting format, etc.)
[0121] The WTRU may receive a first indication and a second indication. The WTRU may perform the action (e.g., reference action) and/or apply the configuration associated with first indication. The WTRU may perform validation of RL model by determining the hypothetical outcome of the action associated with second indication without actually performing the action (e.g, hypothetical action) and/or without applying the configuration associated with second indication. Examples of hypothetical action(s) may include a hypothetical application of one or more DL beam(s) indices selected by the RL model, a hypothetical application of one or more sub-band indices, and/or allocated power selected by the RL model. For example, the WTRU may determine if an indication is a first indication, or a second indication based on explicit information or implicit information, (e.g, a separate field in DCI, MAC CE, order of indication, different resources/channels/messages, etc.).
[0122] The WTRU may determine the quality of hypothetical action(s), based on one or more of the following steps. The one or more steps may include measuring the performance metric (e.g, function of use-case) assuming hypothetical application of the action (e.g, an average RSRP, SINR over a configured period, an average number of beam switches in a configured period, a number of beam failures over a configured period with or without successful recovery), and/or an average throughput over a configured period. The WTRU may determine the quality of the hypothetical action(s) based on a comparison of performance metric associated with hypothetical action(s) with a preconfigured performance criteria and/or condition. For example: The preconfigured performance criteria and/or condition may include measured quantity and comparing it with a threshold (e.g, SINR/SNR/RSRP on one of the sub-bands). The preconfigured performance criteria and/or condition may include measured average number of beam switches and comparing it with a configured metric threshold. The preconfigured performance criteria and/or condition may include measured average number of beam failures and comparing it with a configured metric threshold. The preconfigured performance criteria and/or condition may include
measured average RSRP over a period and comparing it with a threshold. The preconfigured performance criteria and/or condition may include measured reward over a period and comparing it with the configured expected reward. The WTRU may determine the quality of the hypothetical actions based on improvement and/or degradation of performance metric or quality of hypothetical action(s) relative to the performance metric or quality of a reference action(s). For example, the WTRU may determine the quality of the hypothetical actions based on comparing the performance metrics associated with the hypothetical action(s) and reference action(s)).
[0123] The WTRU may report the determined quality of hypothetical action(s) (e.g., for NW-sided RL model validation). For example, reporting may be triggered when preconfigured conditions are satisfied. Reporting may be triggered if the determined quality of hypothetical actions(s) exceeds or is less than the preconfigured threshold or a performance metric associated with hypothetical action(s) is offset better/worse than a performance metric associated with a reference action(s) over n configured periods. For example, reporting may be transmitted periodically (e.g., every n configured period over which the performance metrics are calculated). For example, the report may include one or more indication formats: Boolean, multiple quantized levels for quality (e.g., represented in multiple levels depending on the difference between the measured metric and the configured metric threshold).
[0124] FIG. 3 illustrates a flow diagram 300 comprising an NW-sided RL model 302 used for beam selection with the aim of reducing latency and/or measurement overhead. The state space 304 may be defined with RS or CSI measurements while the action space 306 may include one or more selected beam indices. The goal of the RL model is to select the beams that maximize the average RSRP while minimizing the number of beam switches over a configured period.
[0125] WTRU procedures for off-policy RL model evaluation/validation are discussed herein. A WTRU may transmit a capability associated with an RL mode. The capability may include an RL model ID (e.g, implicitly indicates the configuration of state and/or action space or target policy), a support of a feature (e.g, beam management), and/or a status of RL model (e.g, validated or not, Time of last validation, Area of last validation, etc.).
[0126] The WTRU may receive a configuration for RL model validation. For example, the WTRU may receive a resource configuration for reporting RL model validation (e.g, resources for reporting, periodicity, model validation reporting format, etc.). The WTRU may determine a first indication and a second indication. The WTRU may determine a first indication based on non-AIML/legacy methods or a different RL model. Possibly the first indication may be associated with a reference action. The WTRU may
determine a second indication based on an output of RL model being validated. The second indication may be associated with a hypothetical action, for example, indicating that the network provides feedback on the quality of hypothetical action (e.g., selected PMI by the RL model to satisfy an average overhead requirement while achieving a target performance)
[0127] The WTRU may transmit the first and second indication. The WTRU may differentiate the first indication and second indication based on explicit information or implicit means (e.g., a separate field in DCI, MAC CE, order of indication, different resources/channels/messages, etc.).
[0128] The WTRU may receive a first transmission based on the first indication and a second transmission based on the second indication.
[0129] The WTRU may determine the quality of second transmission (e.g., reception associated with the RL selected PMI) based on one or more of the following steps. The WTRU may determine the quality of the second transmission by measuring the performance metric (e.g., function of use-case) assuming hypothetical action. The performance metric may include a HARQ ACK and/or a SI NR/RSRP/CQI (e.g., associated with the selected PMI by the RL model). The WTRU may determine the quality of the second transmission based on a comparison of performance metric associated with hypothetical action(s) with a preconfigured performance criteria/condition. For example, the WTRU may determine the quality of the second transmission by comparing a measured quantity with a threshold (e.g., SINR/SNR/RSRP associated with the selected PMI by the RL model). For example, the WTRU may determine the quality of the second transmission by comparing a measured reward over a period and comparing it with the training reward. The WTRU may determine the quality of the second transmission by an improvement and/or degradation of hypothetical action(s) relative to a reference action(s) (e.g., comparing the performance metrics, for example, RSRP and/or SI NR, associated with the hypothetical action(s) and reference action(s)). The WTRU may determine the quality of the second transmission based on NW feedback associated with the hypothetical action, for example, multiuser interference metric associated with the selected PMI by the RL model.
[0130] The WTRU may transmit the quality information associated with hypothetical action(s) for WTRU- sided RL model validation, for example, when preconfigured conditions are satisfied. For example, the WTRU may transmit the quality information if the quality of hypothetical actions(s) exceeds/lower than the preconfigured threshold or offset better/worse than reference action(s) over n configured periods. The WTRU may transmit the quality information periodically (e.g., every n configured period over which the performance metrics are calculated). The format of the quality information indication may be Boolean,
indicating whether the conditions associated with validation are satisfied. The format of the quality information indication may include multiple quantized levels for quality. The multiple quantized levels for quality may be represented in multiple levels depending on the difference between the measured metric and the configured metric threshold.
[0131] The WTRU may receive the indication from gNB on RL model validation, for example, for MU transmissions.
[0132] FIG. 4 illustrates a flow diagram 400 comprising WTRU-sided RL model 402 used for PMI selection with the aim of reducing the average overhead while achieving a target performance. The state space 404 may be defined with power, interference, and CSI measurements while the action space 406 may include one or more selected PMI. The goal of the RL model may be to select the PMI that maximizes the RSRP under average overhead constraints.
METHODS FOR OFFLINE POLICY VALIDATION IN REINFORCEMENT LEARNING
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of United States Provisional Application No. 63/572,521 filed on April 1 , 2024, the entire contents of which is incorporated herein by reference.
BACKGROUND
[0002] Reinforcement learning (RL) along with supervised and unsupervised learning form the three key learning paradigms in the field of artificial intelligence/machine learning (AI/ML). While the three methods seek to learn a model from data through training, they are fundamentally different in the way the learning process is carried over. For instance, supervised learning algorithms learn patterns and relationships between the input and output pairs and then the trained algorithm is used to predict outcomes based on new input data. Unsupervised learning algorithms, however, receive inputs with no specified outputs during the training process, with the aim of finding hidden patterns and relationships within the data using statistical means. Different from supervised and unsupervised learning, RL has a predetermined and well- defined end goal in the form of desired result which can be achieved through rewarding the desired behaviors. RL takes an exploratory approach, with a reward-and-punishment paradigm as the data is processed, wherein the explorations are continuously validated and improved to increase the probability of reaching the end goal. RL is expected to be one of the major components of automation of wireless networks and it has already found a wide variety of applications in the wireless domain.
SUMMARY
[0003] A wireless transmit/receive unit (WTRU) may include a processor and a memory. The WTRU may be configured to receive a first indication of a first configuration to be implemented for performing a first action related to communications on a network. The WTRU may be configured to receive, from the network, a second indication of a second configuration to be implemented for performing a second action related to communications on the network. The second action may be an anticipated or hypothetical action associated with a network-side (NW-side) reinforcement learning (RL) model. The WTRU may perform the first action related to the communications on the network using the first configuration and determine a first outcome based on a first performance metric associated with the first action. The WTRU may perform a validation of the NW-side RL model by being configured to determine a second outcome based on a second performance metric associated with performance of the second action without actually
implementing the second configuration to perform the second action related to the communications on the network, and determine a quality of the second action compared to a quality of the first action based on the first outcome and the second outcome. The WTRU may report the determined quality of the second action for validation of the anticipated or hypothetical action associated with the NW-side RL model.
[0004] The first action may be a reference action which includes one or more of applying a configuration associated with the first indication, performing a measurement or a reception using a first configuration determined by the first indication, triggering or performing a transmission using a second configuration determined by the first indication, suspending an ongoing or future transmission, entering a power saving mode, and/or setting, resetting, or modifying one or more of protocol parameters, timers, and/or counters using a third configuration determined by the first indication.
[0005] The first configuration and the second configuration may each include a different downlink (DL) beam configuration, different power saving modes, different protocol parameters, and/or different timers.
[0006] The first configuration may be a first downlink (DL) beam configuration and the second configuration may be a second DL beam configuration. The first action may include the WTRU switching to the first DL beam to receive data transmissions. The second action may include the WTRU switching to the second DL beam provided by the NW-side RL model to receive the data transmissions. The second performance metric may include a measurement on the second DL beam.
[0007] The first configuration may be a first downlink control information (DCI) configuration indicating a first modulation and coding scheme (MCS). The second configuration may be a second DCI configuration indicating a second MCS. The first action may include the WTRU applying the first DCI configuration to receive DL transmissions. The second action may include the WTRU switching to apply the second DCI configuration to receive the DL transmissions. The second performance metric may include a measurement on the second DCI configuration.
[0008] The first configuration may be a first sub-band configuration. The second configuration may be a second sub-band configuration. The first action may include the WTRU applying the first sub-band configuration to receive DL transmissions. The second action may include the WTRU switching to apply the second sub-band configuration to receive the DL transmissions. The second performance metric may include a measurement on the second sub-band configuration.
[0009] The first configuration may be a first power level configuration for a first DL transmission. The second configuration may be a second power level configuration for a second DL transmission. The first action may include the WTRU applying the first power level configuration to receive DL transmissions. The second action
2
may include the WTRU switching to apply the second power level configuration to receive the DL transmissions. The second performance metric may include a measurement on the second power level configuration.
[0010] The first performance metric and the second performance metric may be determined based on one or more preconfigured criteria. Each of the one or more preconfigured criteria may include a hybrid automatic repeat request (HARQ) acknowledgement (ACK), an average signal received power (RSRP), a signal-to- interference-plus-noise ratio (SI NR), and/or a channel quality indicator (CQI). Each of the one or more preconfigured criteria may include an average number of beam switches.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] FIG. 1 A is a system diagram illustrating an example communications system in which one or more disclosed embodiments may be implemented.
[0012] FIG. 1 B is a system diagram illustrating an example wireless transmit/receive unit (WTRU) that may be used within the communications system illustrated in FIG. 1A according to an embodiment.
[0013] FIG. 1C is a system diagram illustrating an example radio access network (RAN) and an example core network (CN) that may be used within the communications system illustrated in FIG. 1 A according to an embodiment.
[0014] FIG. 1 D is a system diagram illustrating a further example RAN and a further example CN that may be used within the communications system illustrated in FIG. 1 A according to an embodiment.
[0015] FIG. 2 illustrates an example of a reinforcement learning framework.
[0016] FIG. 3 illustrates a network-sided (NW-sided) Reinforcement learning (RL) model used for beam selection with the aim of reducing latency and/or measurement overhead.
[0017] FIG. 4 illustrates a WTRU-sided RL model used for PMI selection with the aim of reducing the average overhead while achieving a target performance.
DETAILED DESCRIPTION
[0018] FIG. 1A is a diagram illustrating an example communications system 100 in which one or more disclosed embodiments may be implemented. The communications system 100 may be a multiple access system that provides content, such as voice, data, video, messaging, broadcast, etc., to multiple wireless users. The communications system 100 may enable multiple wireless users to access such content through the sharing of system resources, including wireless bandwidth. For example, the communications systems
3
100 may employ one or more channel access methods, such as code division multiple access (CDMA), time division multiple access (TDMA), frequency division multiple access (FDMA), orthogonal FDMA (OFDMA), single-carrier FDMA (SC-FDMA), zero-tail unique-word DFT-Spread OFDM (ZT UW DTS-s OFDM), unique word OFDM (UW-OFDM), resource block-filtered OFDM, filter bank multicarrier (FBMC), and the like.
[0019] As shown in FIG. 1A, the communications system 100 may include wireless transmit/receive units (WTRUs) 102a, 102b, 102c, 102d, a RAN 104/113, a CN 106/115, a public switched telephone network (PSTN) 108, the Internet 110, and other networks 112, though it will be appreciated that the disclosed embodiments contemplate any number of WTRUs, base stations, networks, and/or network elements. Each of the WTRUs 102a, 102b, 102c, 102d may be any type of device configured to operate and/or communicate in a wireless environment. By way of example, the WTRUs 102a, 102b, 102c, 102d, any of which may be referred to as a “station” and/or a “ST A”, may be configured to transmit and/or receive wireless signals and may include a user equipment (UE), a mobile station, a fixed or mobile subscriber unit, a subscription-based unit, a pager, a cellular telephone, a personal digital assistant (PDA), a smartphone, a laptop, a netbook, a personal computer, a wireless sensor, a hotspot or Mi-Fi device, an Internet of Things (loT) device, a watch or other wearable, a head-mounted display (HMD), a vehicle, a drone, a medical device and applications (e.g., remote surgery), an industrial device and applications (e.g., a robot and/or other wireless devices operating in an industrial and/or an automated processing chain contexts), a consumer electronics device, a device operating on commercial and/or industrial wireless networks, and the like. Any of the WTRUs 102a, 102b, 102c and 102d may be interchangeably referred to as a WTRU. Further, any description herein that is described with reference to a UE may be equally applicable to a WTRU (or vice versa). For example, a WTRU may be configured to perform any of the processes or procedures described herein as being performed by a UE (or wee versa).
[0020] The communications systems 100 may also include a base station 114a and/or a base station 114b. Each of the base stations 114a, 114b may be any type of device configured to wirelessly interface with at least one of the WTRUs 102a, 102b, 102c, 102d to facilitate access to one or more communication networks, such as the CN 106/115, the I nternet 110, and/or the other networks 112. By way of example, the base stations 114a, 114b may be a base transceiver station (BTS), a Node-B, an eNode B, a Home Node B, a Home eNode B, a gNB, a NR NodeB, a site controller, an access point (AP), a wireless router, and the like. While the base stations 114a, 114b are each depicted as a single element, it will be
4
appreciated that the base stations 114a, 114b may include any number of interconnected base stations and/or network elements.
[0021] The base station 114a may be part of the RAN 104/113, which may also include other base stations and/or network elements (not shown), such as a base station controller (BSC), a radio network controller (RNC), relay nodes, etc. The base station 114a and/or the base station 114b may be configured to transmit and/or receive wireless signals on one or more carrier frequencies, which may be referred to as a cell (not shown). These frequencies may be in licensed spectrum, unlicensed spectrum, or a combination of licensed and unlicensed spectrum. A cell may provide coverage for a wireless service to a specific geographical area that may be relatively fixed or that may change over time. The cell may further be divided into cell sectors. For example, the cell associated with the base station 114a may be divided into three sectors. Thus, in one embodiment, the base station 114a may include three transceivers, i.e., one for each sector of the cell. In an embodiment, the base station 114a may employ multiple-input multiple output (MIMO) technology and may utilize multiple transceivers for each sector of the cell. For example, beamforming may be used to transmit and/or receive signals in desired spatial directions.
[0022] The base stations 114a, 114b may communicate with one or more of the WTRUs 102a, 102b, 102c, 102d over an air interface 116, which may be any suitable wireless communication link (e.g., radio frequency (RF), microwave, centimeter wave, micrometer wave, infrared (IR), ultraviolet (UV), visible light, etc.). The air interface 116 may be established using any suitable radio access technology (RAT).
[0023] More specifically, as noted above, the communications system 100 may be a multiple access system and may employ one or more channel access schemes, such as CDMA, TDMA, FDMA, OFDMA, SC-FDMA, and the like. For example, the base station 114a in the RAN 104/113 and the WTRUs 102a, 102b, 102c may implement a radio technology such as Universal Mobile Telecommunications System (UMTS) Terrestrial Radio Access (UTRA), which may establish the air interface 115/116/117 using wideband CDMA (WCDMA). WCDMA may include communication protocols such as High-Speed Packet Access (HSPA) and/or Evolved HSPA (HSPA+). HSPA may include High-Speed Downlink (DL) Packet Access (HSDPA) and/or High-Speed UL Packet Access (HSUPA).
[0024] I n an embodiment, the base station 114a and the WTRUs 102a, 102b, 102c may implement a radio technology such as Evolved UMTS Terrestrial Radio Access (E-UTRA), which may establish the air interface 116 using Long Term Evolution (LTE) and/or LTE-Advanced (LTE-A) and/or LTE-Advanced Pro (LTE-A Pro).
5
[0025] I n an embodiment, the base station 114a and the WTRUs 102a, 102b, 102c may implement a radio technology such as NR Radio Access , which may establish the air interface 116 using New Radio (NR).
[0026] In an embodiment, the base station 114a and the WTRUs 102a, 102b, 102c may implement multiple radio access technologies. For example, the base station 114a and the WTRUs 102a, 102b, 102c may implement LTE radio access and NR radio access together, for instance using dual connectivity (DC) principles. Thus, the air interface utilized by WTRUs 102a, 102b, 102c may be characterized by multiple types of radio access technologies and/or transmissions sent to/from multiple types of base stations (e.g., a eNB and a gNB).
[0027] In other embodiments, the base station 114a and the WTRUs 102a, 102b, 102c may implement radio technologies such as IEEE 802.11 (i.e., Wireless Fidelity (WiFi), IEEE 802.16 (i.e., Worldwide Interoperability for Microwave Access (WiMAX)), CDMA2000, CDMA2000 1X, CDMA2000 EV-DO, Interim Standard 2000 (IS-2000), Interim Standard 95 (IS-95), Interim Standard 856 (IS-856), Global System for Mobile communications (GSM), Enhanced Data rates for GSM Evolution (EDGE), GSM EDGE (GERAN), and the like.
[0028] The base station 114b in FIG. 1 A may be a wireless router, Home Node B, Home eNode B, or access point, for example, and may utilize any suitable RAT for facilitating wireless connectivity in a localized area, such as a place of business, a home, a vehicle, a campus, an industrial facility, an air corridor (e.g., for use by drones), a roadway, and the like. In one embodiment, the base station 114b and the WTRUs 102c, 102d may implement a radio technology such as IEEE 802.11 to establish a wireless local area network (WLAN). In an embodiment, the base station 114b and the WTRUs 102c, 102d may implement a radio technology such as IEEE 802.15 to establish a wireless personal area network (WPAN). In yet another embodiment, the base station 114b and the WTRUs 102c, 102d may utilize a cellular-based RAT (e.g., WCDMA, CDMA2000, GSM, LTE, LTE-A, LTE-A Pro, NR etc.) to establish a picocell or femtocell. As shown in FIG. 1A, the base station 114b may have a direct connection to the Internet 110. Thus, the base station 114b may not be required to access the Internet 110 via the CN 106/115.
[0029] The RAN 104/113 may be in communication with the CN 106/115, which may be any type of network configured to provide voice, data, applications, and/or voice over internet protocol (VoIP) services to one or more of the WTRUs 102a, 102b, 102c, 102d. The data may have varying quality of service (QoS) requirements, such as differing throughput requirements, latency requirements, error tolerance requirements, reliability requirements, data throughput requirements, mobility requirements, and the like.
6
The CN 106/115 may provide call control, billing services, mobile location-based services, pre-paid calling, Internet connectivity, video distribution, etc., and/or perform high-level security functions, such as user authentication. Although not shown in FIG. 1A, it will be appreciated that the RAN 104/113 and/or the CN 106/115 may be in direct or indirect communication with other RANs that employ the same RAT as the RAN 104/113 or a different RAT. For example, in addition to being connected to the RAN 104/113, which may be utilizing a NR radio technology, the CN 106/115 may also be in communication with another RAN (not shown) employing a GSM, UMTS, CDMA 2000, WiMAX, E-UTRA, or WiFi radio technology.
[0030] The CN 106/115 may also serve as a gateway for the WTRUs 102a, 102b, 102c, 102d to access the PSTN 108, the Internet 110, and/or the other networks 112. The PSTN 108 may include circuit- switched telephone networks that provide plain old telephone service (POTS). The Internet 110 may include a global system of interconnected computer networks and devices that use common communication protocols, such as the transmission control protocol (TCP), user datagram protocol (UDP) and/or the internet protocol (IP) in the TCP/IP internet protocol suite. The networks 112 may include wired and/or wireless communications networks owned and/or operated by other service providers. For example, the networks 112 may include another CN connected to one or more RANs, which may employ the same RAT as the RAN 104/113 or a different RAT.
[0031] Some or all of the WTRUs 102a, 102b, 102c, 102d in the communications system 100 may include multi-mode capabilities (e.g., the WTRUs 102a, 102b, 102c, 102d may include multiple transceivers for communicating with different wireless networks over different wireless links). For example, the WTRU 102c shown in FIG. 1A may be configured to communicate with the base station 114a, which may employ a cellular-based radio technology, and with the base station 114b, which may employ an IEEE 802 radio technology.
[0032] FIG. 1 B is a system diagram illustrating an example WTRU 102. As shown in FIG. 1 B, the WTRU 102 may include a processor 118, a transceiver 120, a transmit/receive element 122, a speaker/microphone 124, a keypad 126, a display/touchpad 128, non-removable memory 130, removable memory 132, a power source 134, a global positioning system (GPS) chipset 136, and/or other peripherals 138, among others. It will be appreciated that the WTRU 102 may include any sub-combination of the foregoing elements while remaining consistent with an embodiment.
[0033] The processor 118 may be a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific
7
Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) circuits, any other type of integrated circuit (IC), a state machine, and the like. The processor 118 may perform signal coding, data processing, power control, input/output processing, and/or any other functionality that enables the WTRU 102 to operate in a wireless environment. The processor 118 may be coupled to the transceiver 120, which may be coupled to the transmit/receive element 122. While FIG. 1B depicts the processor 118 and the transceiver 120 as separate components, it will be appreciated that the processor 118 and the transceiver 120 may be integrated together in an electronic package or chip.
[0034] The transmit/receive element 122 may be configured to transmit signals to, or receive signals from, a base station (e.g. , the base station 114a) over the air interface 116. For example, in one embodiment, the transmit/receive element 122 may be an antenna configured to transmit and/or receive RF signals. In an embodiment, the transmit/receive element 122 may be an emitter/detector configured to transmit and/or receive IR, UV, or visible light signals, for example. In yet another embodiment, the transmit/receive element 122 may be configured to transmit and/or receive both RF and light signals. It will be appreciated that the transmit/receive element 122 may be configured to transmit and/or receive any combination of wireless signals.
[0035] Although the transmit/receive element 122 is depicted in FIG. 1 B as a single element, the WTRU 102 may include any number of transmit/receive elements 122. More specifically, the WTRU 102 may employ MIMO technology. Thus, in one embodiment, the WTRU 102 may include two or more transmit/receive elements 122 (e.g., multiple antennas) for transmitting and receiving wireless signals over the air interface 116.
[0036] The transceiver 120 may be configured to modulate the signals that are to be transmitted by the transmit/receive element 122 and to demodulate the signals that are received by the transmit/receive element 122. As noted above, the WTRU 102 may have multi-mode capabilities. Thus, the transceiver 120 may include multiple transceivers for enabling the WTRU 102 to communicate via multiple RATs, such as NR and IEEE 802.11 , for example.
[0037] The processor 118 of the WTRU 102 may be coupled to, and may receive user input data from, the speaker/microphone 124, the keypad 126, and/or the display/touchpad 128 (e.g., a liquid crystal display (LCD) display unit or organic light-emitting diode (OLED) display unit). The processor 118 may also output user data to the speaker/microphone 124, the keypad 126, and/or the display/touchpad 128. In addition, the processor 118 may access information from, and store data in, any type of suitable memory, such as the non-removable memory 130 and/or the removable memory 132. The non-removable memory 130 may
8
include random-access memory (RAM), read-only memory (ROM), a hard disk, or any other type of memory storage device. The removable memory 132 may include a subscriber identity module (SIM) card, a memory stick, a secure digital (SD) memory card, and the like. In other embodiments, the processor 118 may access information from, and store data in, memory that is not physically located on the WTRU 102, such as on a server or a home computer (not shown).
[0038] The processor 118 may receive power from the power source 134, and may be configured to distribute and/or control the power to the other components in the WTRU 102. The power source 134 may be any suitable device for powering the WTRU 102. For example, the power source 134 may include one or more dry cell batteries (e.g., nickel-cadmium (NiCd), nickel-zinc (NiZn), nickel metal hydride (NiMH), lithium-ion (Li-ion), etc.), solar cells, fuel cells, and the like.
[0039] The processor 118 may also be coupled to the GPS chipset 136, which may be configured to provide location information (e.g., longitude and latitude) regarding the current location of the WTRU 102. In addition to, or in lieu of, the information from the GPS chipset 136, the WTRU 102 may receive location information over the air interface 116 from a base station (e.g., base stations 114a, 114b) and/or determine its location based on the timing of the signals being received from two or more nearby base stations. It will be appreciated that the WTRU 102 may acquire location information by way of any suitable locationdetermination method while remaining consistent with an embodiment.
[0040] The processor 118 may further be coupled to other peripherals 138, which may include one or more software and/or hardware modules that provide additional features, functionality and/or wired or wireless connectivity. For example, the peripherals 138 may include an accelerometer, an e-compass, a satellite transceiver, a digital camera (for photographs and/or video), a universal serial bus (USB) port, a vibration device, a television transceiver, a hands free headset, a Bluetooth® module, a frequency modulated (FM) radio unit, a digital music player, a media player, a video game player module, an Internet browser, a Virtual Reality and/or Augmented Reality (VR/AR) device, an activity tracker, and the like. The peripherals 138 may include one or more sensors, the sensors may be one or more of a gyroscope, an accelerometer, a hall effect sensor, a magnetometer, an orientation sensor, a proximity sensor, a temperature sensor, a time sensor; a geolocation sensor; an altimeter, a light sensor, a touch sensor, a magnetometer, a barometer, a gesture sensor, a biometric sensor, and/or a humidity sensor.
[0041] The WTRU 102 may include a full duplex radio for which transmission and reception of some or all of the signals (e.g., associated with particular subframes for both the UL (e.g., for transmission) and downlink (e.g., for reception) may be concurrent and/or simultaneous. The full duplex radio may include an
9
interference management unit 139 to reduce and or substantially eliminate self-interference via either hardware (e.g., a choke) or signal processing via a processor (e.g., a separate processor (not shown) or via processor 118). In an embodiment, the WRTU 102 may include a half-duplex radio for which transmission and reception of some or all of the signals (e.g., associated with particular subframes for either the UL (e.g, for transmission) or the downlink (e.g, for reception)).
[0042] FIG. 1C is a system diagram illustrating the RAN 104 and the CN 106 according to an embodiment. As noted above, the RAN 104 may employ an E-UTRA radio technology to communicate with the WTRUs 102a, 102b, 102c over the air interface 116. The RAN 104 may also be in communication with the CN 106.
[0043] The RAN 104 may include eNode-Bs 160a, 160b, 160c, though it will be appreciated that the RAN 104 may include any number of eNode-Bs while remaining consistent with an embodiment. The eNode-Bs 160a, 160b, 160c may each include one or more transceivers for communicating with the WTRUs 102a, 102b, 102c over the air interface 116. In one embodiment, the eNode-Bs 160a, 160b, 160c may implement MIMO technology. Thus, the eNode-B 160a, for example, may use multiple antennas to transmit wireless signals to, and/or receive wireless signals from, the WTRU 102a.
[0044] Each of the eNode-Bs 160a, 160b, 160c may be associated with a particular cell (not shown) and may be configured to handle radio resource management decisions, handover decisions, scheduling of users in the UL and/or DL, and the like. As shown in FIG. 1C, the eNode-Bs 160a, 160b, 160c may communicate with one another over an X2 interface.
[0045] The CN 106 shown in FIG. 1 C may include a mobility management entity (MME) 162, a serving gateway (SGW) 164, and a packet data network (PDN) gateway (or PGW) 166. While each of the foregoing elements are depicted as part of the CN 106, it will be appreciated that any of these elements may be owned and/or operated by an entity other than the CN operator.
[0046] The MME 162 may be connected to each of the eNode-Bs 162a, 162b, 162c in the RAN 104 via an S1 interface and may serve as a control node. For example, the MME 162 may be responsible for authenticating users of the WTRUs 102a, 102b, 102c, bearer activation/deactivation, selecting a particular serving gateway during an initial attach of the WTRUs 102a, 102b, 102c, and the like. The MME 162 may provide a control plane function for switching between the RAN 104 and other RANs (not shown) that employ other radio technologies, such as GSM and/or WCDMA.
[0047] The SGW 164 may be connected to each of the eNode Bs 160a, 160b, 160c in the RAN 104 via the S1 interface. The SGW 164 may generally route and forward user data packets to/from the WTRUs
10
102a, 102b, 102c. The SGW 164 may perform other functions, such as anchoring user planes during inter- eNode B handovers, triggering paging when DL data is available for the WTRUs 102a, 102b, 102c, managing and storing contexts of the WTRUs 102a, 102b, 102c, and the like.
[0048] The SGW 164 may be connected to the PGW 166, which may provide the WTRUs 102a, 102b, 102c with access to packet-switched networks, such as the Internet 110, to facilitate communications between the WTRUs 102a, 102b, 102c and IP-enabled devices.
[0049] The CN 106 may facilitate communications with other networks. For example, the CN 106 may provide the WTRUs 102a, 102b, 102c with access to circuit-switched networks, such as the PSTN 108, to facilitate communications between the WTRUs 102a, 102b, 102c and traditional land-line communications devices. For example, the CN 106 may include, or may communicate with, an IP gateway (e.g., an IP multimedia subsystem (IMS) server) that serves as an interface between the CN 106 and the PSTN 108. In addition, the CN 106 may provide the WTRUs 102a, 102b, 102c with access to the other networks 112, which may include other wired and/or wireless networks that are owned and/or operated by other service providers.
[0050] Although the WTRU is described in FIGS. 1 A-1 D as a wireless terminal, it is contemplated that in certain representative embodiments that such a terminal may use (e.g., temporarily or permanently) wired communication interfaces with the communication network.
[0051] In representative embodiments, the other network 112 may be a WLAN.
[0052] A WLAN in Infrastructure Basic Service Set (BSS) mode may have an Access Point (AP) for the BSS and one or more stations (STAs) associated with the AP. The AP may have an access or an interface to a Distribution System (DS) or another type of wired/wireless network that carries traffic in to and/or out of the BSS. Traffic to STAs that originates from outside the BSS may arrive through the AP and may be delivered to the STAs. Traffic originating from STAs to destinations outside the BSS may be sent to the AP to be delivered to respective destinations. Traffic between STAs within the BSS may be sent through the AP, for example, where the source STA may send traffic to the AP and the AP may deliver the traffic to the destination STA. The traffic between STAs within a BSS may be considered and/or referred to as peer-to- peer traffic. The peer-to-peer traffic may be sent between (e.g., directly between) the source and destination STAs with a direct link setup (DLS). In certain representative embodiments, the DLS may use an 802.11 e DLS or an 802.11 z tunneled DLS (TDLS). A WLAN using an Independent BSS (I BSS) mode may not have an AP, and the STAs (e.g., all of the STAs) within or using the IBSS may communicate
1 1
directly with each other. The IBSS mode of communication may sometimes be referred to herein as an “ad- hoc” mode of communication.
[0053] When using the 802.11 ac infrastructure mode of operation or a similar mode of operations, the AP may transmit a beacon on a fixed channel, such as a primary channel. The primary channel may be a fixed width (e.g., 20 MHz wide bandwidth) or a dynamically set width via signaling. The primary channel may be the operating channel of the BSS and may be used by the STAs to establish a connection with the AP. In certain representative embodiments, Carrier Sense Multiple Access with Collision Avoidance (CSMA/CA) may be implemented, for example in in 802.11 systems. For CSMA/CA, the STAs (e.g., every ST A), including the AP, may sense the primary channel. If the primary channel is sensed/detected and/or determined to be busy by a particular STA, the particular STA may back off. One STA (e.g., only one station) may transmit at any given time in a given BSS.
[0054] High Throughput (HT) STAs may use a 40 MHz wide channel for communication, for example, via a combination of the primary 20 MHz channel with an adjacent or nonadjacent 20 MHz channel to form a 40 MHz wide channel.
[0055] Very High Throughput (VHT) STAs may support 20MHz, 40 MHz, 80 MHz, and/or 160 MHz wide channels. The 40 MHz, and/or 80 MHz, channels may be formed by combining contiguous 20 MHz channels. A 160 MHz channel may be formed by combining 8 contiguous 20 MHz channels, or by combining two non-contiguous 80 MHz channels, which may be referred to as an 80+80 configuration. For the 80+80 configuration, the data, after channel encoding, may be passed through a segment parser that may divide the data into two streams. Inverse Fast Fourier Transform (IFFT) processing, and time domain processing, may be done on each stream separately. The streams may be mapped on to the two 80 MHz channels, and the data may be transmitted by a transmitting STA. At the receiver of the receiving STA, the above described operation for the 80+80 configuration may be reversed, and the combined data may be sent to the Medium Access Control (MAC).
[0056] Sub 1 GHz modes of operation are supported by 802.11 af and 802.11 ah. The channel operating bandwidths, and carriers, are reduced in 802.11 af and 802.11 ah relative to those used in 802.11 n, and 802.11ac. 802.11 af supports 5 MHz, 10 MHz and 20 MHz bandwidths in the TV White Space (TVWS) spectrum, and 802.11 ah supports 1 MHz, 2 MHz, 4 MHz, 8 MHz, and 16 MHz bandwidths using non-TVWS spectrum. According to a representative embodiment, 802.11 ah may support Meter Type Control/Machine- Type Communications, such as MTC devices in a macro coverage area. MTC devices may have certain capabilities, for example, limited capabilities including support for (e.g., only support for) certain and/or
12
limited bandwidths. The MTC devices may include a battery with a battery life above a threshold (e.g., to maintain a very long battery life).
[0057] WLAN systems, which may support multiple channels, and channel bandwidths, such as 802.11 n, 802.11 ac, 802.11 af, and 802.11 ah, include a channel which may be designated as the primary channel. The primary channel may have a bandwidth equal to the largest common operating bandwidth supported by all STAs in the BSS. The bandwidth of the primary channel may be set and/or limited by a STA, from among all STAs in operating in a BSS, which supports the smallest bandwidth operating mode. In the example of 802.11 ah, the primary channel may be 1 MHz wide for STAs (e.g., MTC type devices) that support (e.g., only support) a 1 MHz mode, even if the AP, and other STAs in the BSS support 2 MHz, 4 MHz, 8 MHz, 16 MHz, and/or other channel bandwidth operating modes. Carrier sensing and/or Network Allocation Vector (NAV) settings may depend on the status of the primary channel. If the primary channel is busy, for example, due to a STA (which supports only a 1 MHz operating mode), transmitting to the AP, the entire available frequency bands may be considered busy even though a majority of the frequency bands remains idle and may be available.
[0058] In the United States, the available frequency bands, which may be used by 802.11 ah, are from 902 MHz to 928 MHz. In Korea, the available frequency bands are from 917.5 MHz to 923.5 MHz. In Japan, the available frequency bands are from 916.5 MHz to 927.5 MHz. The total bandwidth available for 802.11 ah is 6 MHz to 26 MHz depending on the country code.
[0059] FIG. 1 D is a system diagram illustrating the RAN 113 and the CN 115 according to an embodiment. As noted above, the RAN 113 may employ an NR radio technology to communicate with the WTRUs 102a, 102b, 102c over the air interface 116. The RAN 113 may also be in communication with the CN 115.
[0060] The RAN 113 may include gNBs 180a, 180b, 180c, though it will be appreciated that the RAN 113 may include any number of gNBs while remaining consistent with an embodiment. The gNBs 180a, 180b, 180c may each include one or more transceivers for communicating with the WTRUs 102a, 102b, 102c over the air interface 116. In one embodiment, the gNBs 180a, 180b, 180c may implement MIMO technology. For example, gNBs 180a, 108b may utilize beamforming to transmit signals to and/or receive signals from the gNBs 180a, 180b, 180c. Thus, the gNB 180a, for example, may use multiple antennas to transmit wireless signals to, and/or receive wireless signals from, the WTRU 102a. In an embodiment, the gNBs 180a, 180b, 180c may implement carrier aggregation technology. For example, the gNB 180a may transmit multiple component carriers to the WTRU 102a (not shown). A subset of these component carriers
13
may be on unlicensed spectrum while the remaining component carriers may be on licensed spectrum. In an embodiment, the gNBs 180a, 180b, 180c may implement Coordinated Multi-Point (CoMP) technology. For example, WTRU 102a may receive coordinated transmissions from gNB 180a and gNB 180b (and/or gNB 180c).
[0061] The WTRUs 102a, 102b, 102c may communicate with gNBs 180a, 180b, 180c using transmissions associated with a scalable numerology. For example, the OFDM symbol spacing and/or OFDM subcarrier spacing may vary for different transmissions, different cells, and/or different portions of the wireless transmission spectrum. The WTRUs 102a, 102b, 102c may communicate with gNBs 180a, 180b, 180c using subframe or transmission time intervals (TTIs) of various or scalable lengths (e.g., containing varying number of OFDM symbols and/or lasting varying lengths of absolute time).
[0062] The gNBs 180a, 180b, 180c may be configured to communicate with the WTRUs 102a, 102b, 102c in a standalone configuration and/or a non-standalone configuration. In the standalone configuration, WTRUs 102a, 102b, 102c may communicate with gNBs 180a, 180b, 180c without also accessing other RANs (e.g., such as eNode-Bs 160a, 160b, 160c). In the standalone configuration, WTRUs 102a, 102b, 102c may utilize one or more of gNBs 180a, 180b, 180c as a mobility anchor point. In the standalone configuration, WTRUs 102a, 102b, 102c may communicate with gNBs 180a, 180b, 180c using signals in an unlicensed band. In a non-standalone configuration WTRUs 102a, 102b, 102c may communicate with/connect to gNBs 180a, 180b, 180c while also communicating with/connecting to another RAN such as eNode-Bs 160a, 160b, 160c. For example, WTRUs 102a, 102b, 102c may implement DC principles to communicate with one or more gNBs 180a, 180b, 180c and one or more eNode-Bs 160a, 160b, 160c substantially simultaneously. In the non-standalone configuration, eNode-Bs 160a, 160b, 160c may serve as a mobility anchor for WTRUs 102a, 102b, 102c and gNBs 180a, 180b, 180c may provide additional coverage and/or throughput for servicing WTRUs 102a, 102b, 102c.
[0063] Each of the gNBs 180a, 180b, 180c may be associated with a particular cell (not shown) and may be configured to handle radio resource management decisions, handover decisions, scheduling of users in the UL and/or DL, support of network slicing, dual connectivity, interworking between NR and E-UTRA, routing of user plane data towards User Plane Function (UPF) 184a, 184b, routing of control plane information towards Access and Mobility Management Function (AMF) 182a, 182b and the like. As shown in FIG. 1D, the gNBs 180a, 180b, 180c may communicate with one another over an Xn interface.
[0064] The CN 115 shown in FIG. 1 D may include at least one AMF 182a, 182b, at least one UPF 184a, 184b, at least one Session Management Function (SMF) 183a, 183b, and possibly a Data Network
14
(DN) 185a, 185b. While each of the foregoing elements are depicted as part of the CN 115, it will be appreciated that any of these elements may be owned and/or operated by an entity other than the CN operator.
[0065] The AMF 182a, 182b may be connected to one or more of the gNBs 180a, 180b, 180c in the RAN 113 via an N2 interface and may serve as a control node. For example, the AMF 182a, 182b may be responsible for authenticating users of the WTRUs 102a, 102b, 102c, support for network slicing (e.g., handling of different PDU sessions with different requirements), selecting a particular SMF 183a, 183b, management of the registration area, termination of NAS signaling, mobility management, and the like. Network slicing may be used by the AMF 182a, 182b in order to customize CN support for WTRUs 102a, 102b, 102c based on the types of services being utilized WTRUs 102a, 102b, 102c. For example, different network slices may be established for different use cases such as services relying on ultra-reliable low latency (URLLC) access, services relying on enhanced massive mobile broadband (eMBB) access, services for machine type communication (MTC) access, and/or the like. The AMF 162 may provide a control plane function for switching between the RAN 113 and other RANs (not shown) that employ other radio technologies, such as LTE, LTE-A, LTE-A Pro, and/or non-3GPP access technologies such as WiFi. [0066] The SMF 183a, 183b may be connected to an AMF 182a, 182b in the CN 115 via an N11 interface. The SMF 183a, 183b may also be connected to a UPF 184a, 184b in the CN 115 via an N4 interface. The SMF 183a, 183b may select and control the UPF 184a, 184b and configure the routing of traffic through the UPF 184a, 184b. The SMF 183a, 183b may perform other functions, such as managing and allocating WTRU IP address, managing PDU sessions, controlling policy enforcement and QoS, providing downlink data notifications, and the like. A PDU session type may be IP-based, non-IP based, Ethernet-based, and the like.
[0067] The UPF 184a, 184b may be connected to one or more of the gNBs 180a, 180b, 180c in the RAN 113 via an N3 interface, which may provide the WTRUs 102a, 102b, 102c with access to packet-switched networks, such as the Internet 110, to facilitate communications between the WTRUs 102a, 102b, 102c and IP-enabled devices. The UPF 184, 184b may perform other functions, such as routing and forwarding packets, enforcing user plane policies, supporting multi-homed PDU sessions, handling user plane QoS, buffering downlink packets, providing mobility anchoring, and the like.
[0068] The CN 115 may facilitate communications with other networks. For example, the CN 115 may include, or may communicate with, an IP gateway (e.g., an IP multimedia subsystem (IMS) server) that serves as an interface between the CN 115 and the PSTN 108. In addition, the CN 115 may provide the
15
WTRUs 102a, 102b, 102c with access to the other networks 112, which may include other wired and/or wireless networks that are owned and/or operated by other service providers. In one embodiment, the WTRUs 102a, 102b, 102c may be connected to a local Data Network (DN) 185a, 185b through the UPF 184a, 184b via the N3 interface to the UPF 184a, 184b and an N6 interface between the UPF 184a, 184b and the DN 185a, 185b.
[0069] In view of Figures 1A-1 D, and the corresponding description of Figures 1A-1 D, one or more, or all, of the functions described herein with regard to one or more of: WTRU 102a-d, Base Station 114a-b, eNode-B 160a-c, MME 162, SGW 164, PGW 166, gNB 180a-c, AMF 182a-ab, UPF 184a-b, SMF 183a-b, DN 185a-b, and/or any other device(s) described herein, may be performed by one or more emulation devices (not shown). The emulation devices may be one or more devices configured to emulate one or more, or all, of the functions described herein. For example, the emulation devices may be used to test other devices and/or to simulate network and/or WTRU functions.
[0070] The emulation devices may be designed to implement one or more tests of other devices in a lab environment and/or in an operator network environment. For example, the one or more emulation devices may perform the one or more, or all, functions while being fully or partially implemented and/or deployed as part of a wired and/or wireless communication network in order to test other devices within the communication network. The one or more emulation devices may perform the one or more, or all, functions while being temporarily implemented/deployed as part of a wired and/or wireless communication network. The emulation device may be directly coupled to another device for purposes of testing and/or may performing testing using over-the-air wireless communications.
[0071] The one or more emulation devices may perform the one or more, including all, functions while not being implemented/deployed as part of a wired and/or wireless communication network. For example, the emulation devices may be utilized in a testing scenario in a testing laboratory and/or a non-deployed (e.g, testing) wired and/or wireless communication network in order to implement testing of one or more components. The one or more emulation devices may be test equipment. Direct RF coupling and/or wireless communications via RF circuitry (e.g., which may include one or more antennas) may be used by the emulation devices to transmit and/or receive data.
[0072] FIG. 2 depicts an example reinforcement learning (RL) framework 200. The RL framework may include some key elements. The key elements may include agent which is the ML algorithm; the environment which is the adaptive problem space with attributes such as boundary values, rules, valid actions, etc.; the actions which is the step that the RL agent takes to navigate the environment; the state
16
which is the environment at a given point in time; the reward which is the response of the environment based on the taken actions (e.g., the response may be positive, negative or zero value); and/or the cumulative reward or value function which specifies what is good on the long run. On top of those elements may be the RL policy which defines the RL agent’s way of behaving at a given time. The RL policy may represent the mapping that determines the appropriate actions to take based on the observed states. At each time step t, an agent 202 may determine an action At based on the current state St using its policy. Then, an environment 204 may execute the action and return a reward Rt+ 1 and next state. Using the tuple (state, action, reward, next state) collected over past steps, the agent 202 may update its policy to maximize the cumulative reward.
[0073] RL models may be trained in different ways. For example, RL models may be trained online wherein the agent interacts with the real/field environment during training the RL model. For another example, RL models may be trained offline wherein the RL agent cannot interact with the real/field environment during training the RL model. Offline RL may provide a more efficient and flexible training framework as data collection is decoupled from policy training and may use less memory and computational resources relative to online RL training. In offline RL, the learning algorithm may be provided with a static dataset of transitions, D = {st l, a ,
and may learn the best policy it can using this dataset. Variable st l may represent the state of the environment at time t. Variable ai may denote the action taken by the agent at state st l. Variable r the resulting reward may be associated with the action at l. Variable s^+ 1 may be the state at time t+1 . In other words, offline RL may require the learning algorithm to find a sufficient understanding of the system dynamics from a fixed dataset, and then find a policy 7r(a|s) that achieves the largest possible cumulative reward.
[0074] While offline RL seems to be more efficient from a practical standpoint as it provides more flexible and less complex training framework relative to online RL, the performance of offline RL may be impacted when the RL model is deployed to the field. For instance, an offline-trained RL model may result in a well- trained policy but not good enough when the RL model is brought to the real environment, especially if the training dataset does not cover the entirety of the state and/or action space. Embodiments described herein discuss how to validate and/or evaluate an offline RL policy prior field deployment. In other words, embodiments described herein discuss how to ensure that the taken RL actions from an offline-trained RL model results in an acceptable performance, when this model is used in the real/field environment. The embodiments described herein address how the WTRU assists the network (NW) to validate and/or evaluate its RL model before running the model in the real system, for example, for a network-sided (NW-
17
sided) off-policy RL model. The embodiments described herein address how the WTRU can validate its off- policy RL model before using it, for example, for a WTRU-sided off-policy RL model. The described herein further address how the WTRU can determine and report the quality of the generated RL model actions from the offline-trained model.
[0075] WTRU-assisted NW-sided offline RL policy validation is provided herein. Methods for evaluating/validating offline-trained NW-sided RL models using WTRU assistance through determining the quality of the hypothetical actions as a function of one or more measurements satisfying configured criteria for performance requirements.
[0076] A WTRU may receive configuration for validating the NW-sided RL model. The configuration may include configuration of a validation mode. Actions selected and/or determined by the RL model may not be performed but informed; referred to as hypothetical actions. The configuration may include criteria(s), rules, and/or conditions for RL model validation. The criteria may be associated with determined quality of actions evaluated against one or more preconfigured performance threshold (s)) . For example, the criteria may be associated with determining a metric associated with hypothetical action, determining expected reward based on the metric, and/or evaluating and/or reporting the quality based on absolute or relative threshold. The configuration may include reporting configuration for RL model validation (e.g., resources for reporting, periodicity, model validation reporting format, etc.).
[0077] The WTRU may receive a first indication of a first action to be implemented for performing a first action related to communications on a network. The first action may be a reference action which includes one or more of applying a configuration associated with the first indication, performing a measurement or a reception using a first configuration determined by the first indication, triggering and/or performing a transmission using a second configuration determined by the first indication, suspending an ongoing or future transmission, entering a power saving mode, setting, resetting, and/or modifying one or more of protocol parameters, timers, or counters using a third configuration determined by the first indication.
[0078] The WTRU may receive, from the network, a second indication of a second configuration to be implemented for performing a second action related to communications on the network. The second action, for example, may be an anticipated or hypothetical action associated with a NW-side RL model received. Examples of anticipated or hypothetical action(s) may include hypothetical application of one or more DL beam(s) indices selected by the RL model and/or hypothetical application of one or more sub-band indices and/or allocated power selected by the RL model. The first configuration and the second configuration may each include a different downlink (DL) beam configuration, different power saving modes, different protocol
18
parameters, and/or different timers. The WTRU may perform a validation of the NW-side RL model by at least determining a second outcome based on a second performance metric associated with performance of the second action without actually implementing the second configuration to perform the second action related to the communications on the network, and/or determining a quality of the second action compared to a quality of the first action based on the first outcome and the second outcome. The WTRU may report the determined quality of the second action for validation of the anticipated or hypothetical action associated with the NW-side RL model.
[0079] In various embodiments described herein, the first configuration may be a first downlink (DL) beam configuration and the second configuration may be a second DL beam configuration. The first action may include the WTRU switching to the first DL beam to receive data transmissions. The second action may include the WTRU switching to the second DL beam provided by the NW-side RL model to receive the data transmissions. The second performance metric may include a measurement on the second DL beam.
[0080] In various embodiments described herein, the first configuration may be a first downlink control information (DCI) configuration indicating a first modulation and coding scheme (MCS). The second configuration may be a second DCI configuration indicating a second MCS. The first action may include the WTRU applying the first DCI configuration to receive DL transmissions. The second action may include the WTRU switching to apply the second DCI configuration to receive the DL transmissions. The second performance metric may include a measurement on the second DCI configuration.
[0081] In various embodiments described herein, the first configuration may be a first sub-band configuration. The second configuration may be a second sub-band configuration. The first action may include the WTRU applying the first sub-band configuration to receive DL transmissions. The second action may include the WTRU switching to apply the second sub-band configuration to receive the DL transmissions. The second performance metric may include a measurement on the second sub-band configuration.
[0082] In various embodiments described herein, the first configuration may be a first power level configuration for a first DL transmission. The second configuration may be a second power level configuration for a second DL transmission. The first action may include the WTRU applying the first power level configuration to receive DL transmissions. The second action may include the WTRU switching to apply the second power level configuration to receive the DL transmissions. The second performance metric may include a measurement on the second power level configuration
[0083] The one or more performance metric in each example may be determined based on one or more preconfigured criteria. Each of the one or more preconfigured criteria may include a hybrid automatic repeat
19
request (HARQ) acknowledgement (ACK), an average signal received power (RSRP), a signal-to- interference-plus-noise ratio (SINR), and/or a channel quality indicator (CQI). Each of the one or more preconfigured criteria may include an average number of beam switches. Though reference may be made to a hypothetical or anticipated action or outcome, embodiments described herein may implement either similarly throughout.
[0084] WTRU may perform the action (e.g., a reference action) (/apply the configuration) associated with first indication. The WTRU may perform a validation of NW-side RL model by determining a second outcome (e.g., a hypothetical or anticipated outcome) of the second action associated with second indication without actually performing the second action. For example, the validation may be performed of the NW-side RL model without applying the configuration associated with second indication. Examples of anticipated or hypothetical action(s) may include hypothetical application of one or more DL beam(s) indices selected by the RL model and hypothetical application of one or more sub-band indices and/or allocated power selected by the RL model. For example, the WTRU may determine if an indication is a first indication, or a second indication based on explicit information or implicit information, (e.g., a separate field in DCI, MAC CE, order of indication, different resources/channels/messages etc.).
[0085] The WTRU may determine the quality of the second action (e.g., hypothetical or anticipated action), based on one or more of the following. The WTRU may determine the quality of hypothetical action(s) based on measuring the performance metric (e.g., function of use-case) assuming hypothetical application of the action. The performance metric may include an average reference signal received power (RSRP), signal-to-interference-plus-noise ratio (SINR) over a configured period, an average number of beam switches in a configured period, a number of beam failures over a configured period (with or without successful recovery), and/or an average throughput over a configured period. The WTRU may determine the quality of hypothetical action(s) based on based on a comparison of performance metric associated with hypothetical action(s) with a preconfigured performance criteria and/or condition. For example, the determination of the quality of hypothetical actions may include a measured quantity and comparing it with a threshold (e.g., SINR/signal-to-noise ratio (SNR)ZRSRP on one of the sub-bands), a measured average number of beam switches and compare it with a configured metric threshold, a measured average number of beam failures and compare it with a configured metric threshold, a measured average RSRP over a period and comparing it with a threshold, and/or a measured reward over a period and comparing it with the configured expected reward. The WTRU may determine the quality of hypothetical action(s) based on an improvement/degradation of performance metric or quality of hypothetical action(s) relative to the
20
performance metric or quality of a reference action(s). For example, the WTRU may compare the performance metrics associated with the hypothetical action(s) and reference action(s).
[0086] The WTRU may report the determined quality of the second action(s) (e.g. hypothetical or anticipated actions), such as for NW-sided RL model validation. For example, reporting may be triggered when preconfigured conditions are satisfied. If the determined quality of hypothetical actions(s) exceeds or is less than the preconfigured threshold or a performance metric associated with hypothetical action(s) is offset better or worse than a performance metric associated with a reference action(s) over n configured periods. For example, reporting may be transmitted periodically (e.g., every n configured period over which the performance metrics may be calculated.) For example, the report may include one or more indication formats. The one or more indication formats may include Boolean and/or multiple quantized levels for quality (e.g., represented in multiple levels depending on the difference between the measured metric and the configured metric threshold).
[0087] The proposed embodiment may offer a viable way to evaluate and/or validate offline-trained NW- sided RL models. This may be beneficial when there is a discrepancy between the state and/or action space used in the training environment and the state and/or action space in the real environment or some of the high-reward regions of the state space are missed from the training.
[0088] WTRU-assisted NW-sided RL offline policy evaluation is introduced herein. Configuration of WTRU- based assistance of NW-sided RL model validation is introduced herein. The WTRU may be configured to receive a first indication and a second indication. The WTRU may perform a first action based on the first indication and a second action based on the second indication.
[0089] For example, the WTRU may perform a reference action upon receiving the first indication. For example, the reference action may be associated with one or more of applying the configuration associated with first indication, performing a measurement and/or reception using a configuration at least in part determined by the first indication, triggering and/or performing a transmission using a configuration at least in part determined by the first indication, suspending ongoing and/or future transmission, entering power saving mode, setting/resetting/modifying one or more the protocol parameters, timers, counters using a configuration at least in part determined by the first indication, and the like.
[0090] For example, the WTRU may not act directly on the second indication but may be configured to monitor and/or measure the impacts assuming that the second action (e.g., the hypothetical action) associated with the second indication were performed. For example, the second action may be associated with one or more of applying the configuration associated with second indication, performing a measurement
21
and/or reception using a configuration at least in part determined by the second indication, triggering and/or performing a transmission using a configuration at least in part determined by the second indication, suspending ongoing and/or future transmission, entering power saving mode, setting/resetting/modifying one or more the protocol parameters, timers, counters using a configuration at least in part determined by the second indication, and the like.
[0091] Example realization of reference action and hypothetical action(s) is introduced herein. In a first example realization, the first indication may be an indication associated with a first DL beam, which is possibly determined based on legacy algorithm or a supervised learning algorithm and the like. A second indication may be associated with a second DL beam, which is possibly determined based on the RL model to be evaluated. The reference action may be the WTRU switching to the first DL beam to receive a data transmission. The hypothetical action may be that the WTRU measuring one or more performance metric assuming that the WTRU switches to the second DL beam.
[0092] In a second example realization, the first indication may be a first downlink control information (DCI) indicating a first modulation and coding scheme (MCS), a first sub-band or a first power level for a first DL transmission, which is possibly determined based on legacy algorithm and/or a supervised learning algorithm and the like. The second indication may be a second DCI indicating a second MCS, a second sub-band and/or a second power level for a second transmission, possibly determined based on the RL model to be evaluated. The reference action may be the WTRU applying the first DCI to receive a DL data transmission. The hypothetical action may be that the WTRU measuring one or more performance metric assuming that the WTRU receives a DL transmission based on the second DL DCI.
[0093] The WTRU may be configured to determine the linkage between first indication and a corresponding second indication. In other words, the WTRU may be configured to determine the reference action and the corresponding hypothetical action which is to be evaluated against the reference action. The linkage may be implicitly configured. For example, the linkage may be implicitly configured based on the relationship between resources, channels, and/or messages carrying the first indication and second indication. For example, the first and second indication may be inferred by order of inclusion in a message. The linkage may be explicitly configured. For example, a separate field in DCI, media access control control element (MAC CE), radio resource control (RRC) message, etc. For example, a specific IE may be configured for second indication or hypothetical action. In one example, the first indication and second indication may each be associated with same logical identity.
22
[0094] Configuration of performance and quality metric is discussed herein. The WTRU may be configured to measure the performance metric associated with hypothetical actions based on second indication. Possibly such performance metric may be determined based on one or more preconfigured criteria(s), rule(s), and/or condition(s). The performance metric may be modeled as a reward metric. For example, hypothetical actions that lead to better performance (e.g., higher throughput, lower latency, higher power saving, etc.) may be assigned higher rewards than the actions that lead to degraded performance (e.g., lower throughput, higher latency, high power consumption etc.). The WTRU may be configured to determine and/or report a performance metric within a preconfigured range. For example, the WTRU may be configured with a minimum value, a maximum value, and quantized values in between with preconfigured step size.
[0095] The WTRU may be configured to measure the quality of hypothetical actions associated with second indication. The quality metric may be derived based on comparison of performance metric associated with hypothetical action against a preconfigured threshold. For example, the WTRU may be configured to monitor and/or determine if the hypothetical actions based on second indication would lead to at least one performance metric better than threshold. The quality metric may be derived based on comparison of performance metric of hypothetical actions against the performance metric of reference action. For example, the WTRU may be configured to monitor and/or determine if the hypothetical actions based on second indications would be better than the reference actions associated with first indication. The WTRU may be configured to determine and/or report a quality metric within a preconfigured range. For example, the WTRU may be configured with a minimum value, a maximum value, and quantized values in between with preconfigured step size. For example, the WTRU may be configured with multiple quantized levels for quality (e.g., represented in multiple levels depending on the difference between the measured metric and the configured metric threshold). The WTRU may be configured to report a Boolean value as quality metric. The Boolean value may indicate whether the hypothetical action is better or worse than the reference action. Alternately the Boolean value may indicate whether the hypothetical action is better or worse than the preconfigured threshold.
[0096] Resources for RL model validation report are discussed herein. The WTRU may be configured with resources to report one or more of the following associated with RL model validation, which may include the performance metric associated with a hypothetical action, quality of a hypothetical action, and the like. The WTRU may be configured with resources to report one or more of the following associated with RL model validation, which may include the performance metric associated with a set of hypothetical action(s)
23
in a preconfigured time window, quality of set of hypothetical action(s) in a preconfigured time window, and the like. The WTRU may be configured to report average performance metric and/or quality metric within the preconfigured window. The WTRU may be configured to report pairs of hypothetical action and the associated performance and/or quality metric for more than one (e.g, all or a subset) hypothetical actions within the time window. For example, the subset of hypothetical actions may be selected based on preconfigured criteria or by explicitly requested by the network.
[0097] The WTRU may be configured with resources for reporting RL model validation report periodically. Possibly the periodicity of reporting may be an integer multiple of the time window over which the performance and/or quality metric is calculated. For example, the WTRU may be configured with physical uplink control channel (PUCCH) resources to transmit RL model validation report. For example, the WTRU may be configured with semi-persistent physical uplink shared channel (PUSCH) resources to transmit RL model validation report. The WTRU may be configured to transmit such reports in one or more of Uplink Control Information (UCI), MAC Control Element (CE), etc. For example, the RL model validation report may be multiplexed with CSI reporting. For example, the feedback based on reference action may be transmitted in the first part of CSI report and the feedback based on hypothetical action may be transmitted in the second part of CSI report. For example, the RL model validation report may be transmitted in PUSCH resources. The RL model validation report may be transmitted in a higher layer message (e.g., in an RRC message). For example, the WTRU may be configured to transmit the validation report in a RRC message carrying beam level measurement report. For example, the WTRU may be configured to transmit the validation report in a uplink (UL) information transfer message. For example, the WTRU may be configured to transmit the validation report in a WTRU assistance information message.
[0098] The WTRU may be configured to transmit the RL model validation report based on request from the network. In one example, the WTRU may receive the request for transmission of RL model validation report in an aperiodic channel state information (CSI) request. In another example, the WTRU may receive the request for transmission of RL model validation report in a higher layer (e.g., RRC) request.
[0099] The WTRU may be configured with resources for reporting RL model validation report when one or more of on preconfigured conditions are satisfied. For example, the WTRU may transmit a report when the performance metric and/or the quality of hypothetical actions(s) exceeds and/or becomes lower than the preconfigured threshold. For example, the WTRU may transmit a report when the performance metric and/or the quality of hypothetical actions(s) exceeds and/or becomes lower than reference action(s) over n configured periods.
24
[0100] Determination of quality of actions associated with model validation is discussed herein. A WTRU configured with model validation for NW-sided RL model may receive a first and second indication from the NW where the WTRU may determine whether the indication is first or second based on one or more of the following, which includes dynamically by receiving an indication in the DCI field of a physical downlink control channel (PDCCH) resource, dynamically or semi persistently by MAC CE, based on a configuration that indicates the order or pattern of first indication and second indication, and/or based on the resources or channels or messages that the indication is received through.
[0101] The WTRU receiving a first indication may perform an action and/or apply a configuration (e.g. , reference action) associated with the first indication. A WTRU receiving a second indication may perform validation of the action and or configuration (e.g., hypothetical action) associated with the second indication without actually performing the action or applying the configuration. For example, actions associated with the first or second indication may include one or more DL beam indices selected by the RL model, one or more sub-band indices and/or allocated power selected by the RL model, an MCS selected by the RL model, and/or a handover request determined by the RL model.
[0102] The WTRU may measure a performance metric for the hypothetical action associated with the second indication without performing the action, and possibly likewise for the reference action. The performance metrics may be determined and reported with every reception of the second indication. The performance metrics may be calculated and recorded with every reception of the second indication but reported over a configured period. This may enable the WTRU to determine the performance metric upon collecting measurements over a sequence of hypothetical actions. The measured performance metric may be determined based on the hypothetical action. For example, the hypothetical action may represent a DL beam index selected by the RL model. The WTRU may be configured to determine and/or measure one or more performance metrics associated with the selected beams. For example, the WTRU may estimate the beamforming gain of the DL beam index. For example, the performance metric may be the average RSRP and/or SINR associated with the selected DL beam indices measured over a configured period. For example, the performance metric may be the average number of beam switches measured over a configured period. For example, the performance metric may be the average number of beam failures measured over a configured period. For example, the hypothetical action may represent sub-band indices and/or allocated power selected by the RL model. The WTRU may be configured to measure the SINR and/or SNR for the sub-band indices over a configured period. For example, the hypothetical action may represent the MCS selected by the RL model, and the WTRU may compute the block error rate for the
25
current SI NR and MCS. For example, the hypothetical action may represent a handover request determined by the RL model and the WTRU may measure the RSRP from the neighboring cell.
[0103] The WTRU may determine the quality of a hypothetical action, for example depending on the measured performance metric. The WTRU may determine the quality based on one of more of the following. The WTRU may determine the quality based on a comparison of performance metric associated with hypothetical actions with a preconfigured performance criterion. For example, the WTRU may determine the quality by performing one of the following evaluations. The evaluations may include estimating a performance metric and comparing it to a threshold (e.g., estimated beamforming gain with another DL beam, or the number of beam switches over a configured period, or the average RSRP/SINR/SNR over a configured period). The evaluations may include measuring a performance metric and comparing it to a threshold (e.g., SINR/SNR/RSRP on one of the sub-bands, or RSRP from another cell). The evaluation may include computing a performance metric and comparing it to a threshold (e.g., computing the block error rate for the current SINR and MCS). The evaluations may include measuring an average RSRP, SINR over a configured period and comparing it with a configured metric threshold. The evaluations may include measuring an average number of beam switches and comparing it with a configured metric threshold. The evaluations may include measuring an average number of beam failures and comparing it with a configured metric threshold. The evaluations may include measuring an average RSRP over a period and comparing it with a threshold. The evaluations may include measuring a reward over a period and comparing it with the configured expected reward (e.g., average throughput, block error rate, etc.). The WTRU may determine the quality based on a comparison of hypothetical action(s) relative to reference action(s). The reference action may include comparing the performance metrics associated with the hypothetical action(s) and reference action(s). The WTRU may measure the performance of the hypothetical action and reference action over a configured period. For example, the WTRU may measure the number of the hypothetical DL beam switches over a configured period and compare it against the number of the reference DL beam switches over the same period. For the hypothetical action, MCS selected by the RL model, the WTRU may re-encode the decoded bits for the previous reference action using the new MCS, decode under new SINR model and compare the block error rate.
[0104] Reporting of WTRU assistance information of NW-sided RL model validation is discussed herein. The WTRU may be configured to report the quality information associated with the hypothetical actions for WTRU-sided RL-model validation. For example, the quality information may be indicated in multiple formats. The formats may be configured, for example, in a DCI field or MAC CE. A first format may be
26
Boolean, for example, a binary bit indicating if the hypothetical action performance exceeds the reference action performance, or a binary bit indicating if the hypothetical action performance exceeds a configured performance threshold. A second format may be multiple quantized quality levels, indicated in two or more bits. In one example, each quality level may represent a specific range of performance difference between the hypothetical action performance and the reference action performance. In another example, each quality level may represent a range of difference between the hypothetical action performance and the configured threshold. A third format may indicate the direct performance of the hypothetical action. The performance may be a quantized performance metric, for example, quantized RSRP/SINR/SNR, configured by the NW.
[0105] The WTRU may be preconfigured with resources to indicate the validation report of the WTRU- sided RL model. The validation report may include the quality information of the hypothetical action. The WTRU may be configured with reserved symbols on PUSCH for transmission of validation report. For example, the reserved symbols may be configured on preconfigured time-frequency locations. The WTRU may be configured to report the quality information associated with the hypothetical actions in a Uplink Control Information (UCI). The resources for the RL model validation report transmission may be preconfigured for a WTRU. The resources may be defined as PUSCH format or UCI format. The resources may be configured via RRC configuration. These resources may be activated and/or deactivated by MAC CE.
[0106] The WTRU may be configured to report the quality information based on one or more of the following triggers. The WTRU may be configured to report the quality information based on a time-event. The WTRU may be configured with time instances or specific slots to determine and report the quality information of the hypothetical actions. In another option, the WTRU may be configured to report the quality information of the hypothetical actions every N configured period over which the performance metrics are calculated. The WTRU may be configured to report the quality information based on a measurement. The WTRU may report the quality information associated with the hypothetical actions based on one or more measurements achieving certain configured threshold or criteria. For example, the certain configured threshold or criteria may include when the performance of the hypothetical actions exceeds or is lower than a preconfigured threshold. In another example, the certain configured threshold or criteria may include when the performance of the hypothetical action exceeds or is lower than the performance of the reference action. The WTRU may be configured to report the quality information based on a feedback resource selection, or feedback resource, or change thereof. The WTRU may be configured to report the quality
27
information based on a performance of an associated function. For example, the WTRU may determine the rate of HARQ-ACK associated with the hypothetical actions. If the performance drops below a threshold, the WTRU may report the quality of the hypothetical action.
[0107] WTRU procedures for RL offline policy validation are discussed herein. Configuration of WTRU- sided RL model evaluation is discussed herein. A WTRU may be equipped with an RL model, capable of performing actions based on its WTRU-sided RL model reports its capabilities specific to the supported RL model. The report may include the RL model identification (ID) and its associated information and configuration, for example, applicable conditions such as environment setup and constraints associated to the pre-trained RL model, configuration of state and/or action space, target policy, dedicated and/or supported features (e.g., beam management), and/or status of the model (e.g., validated, not validated, conditions under which the model was validated, restricted conditions for the model validation, etc.). [0108] The WTRU may receive the configuration for the evaluation and validation of a supported RL model. The configuration may include at least one or more of reporting resource configuration, reporting periodicity, quality information configuration, and/or performance metrics for evaluating hypothetical actions. The configuration may include a reporting resource configuration. The reporting resource configuration may include resources for reporting, a maximum reporting overhead, and/or a reporting granularity. The reporting granularity may include number of separate transmissions required for the report (e.g., separate transmissions for each action or each subset of hypothetical actions). The WTRU may be configured to report the quality information based on a reporting periodicity. For example, the reporting periodicity may be periodic, semi-persistent, aperiodic, or triggered. Reporting omission configuration may include, for example, the WTRU omits reporting if the difference between the current report and a previous one for the same hypothetical action(s) is below a configured threshold. The WTRU may be configured to report the quality information based on a quality information configuration based on one or more preconfigured conditions or a quality information reporting and/or indication format. The one or more preconfigured conditions may include one or more thresholds/offsets for comparing reference and hypothetical action outcomes or a number of configured periods for validating the pre-configured conditions. The quality information reporting and/or indication format may include Boolean, which indicates whether the conditions associated with validation are satisfied or Quantized (e.g., multiple levels/scores of the quality of hypothetical action(s)).
[0109] The WTRU may be configured to report the quality information based on one or more performance metric(s) for evaluating hypothetical actions. For example, the one or more performance metric(s) for
28
evaluating hypothetical actions may be based on one or more of quantity-based (e.g, SINR/SNR/RSRP associated with the selected PMI by the RL model), reward-based (e.g., measured reward over a period and comparing it with the training reward), horizon (e.g., time-period for reward measurement, or number of repeated actions and repetition rate(s) or expected long-term or short-term reward limit), processing-time based (e.g., required time for evaluating hypothetical actions with respect to reference actions), complexitybased (e.g., required number of FLOPs), or performance criteria and/or conditions for comparison with performance metric(s) of hypothetical actions (e.g., such as hybrid automatic repeat request (HARQ) acknowledgement (ACK) or SI NR/RSRP/CQI (e.g., associated with the selected PMI by the RL model)). [0110] WTRU procedure for validating its RL model is discussed herein. WTRU determination of first indication and second indication for action generation is discussed herein. The WTRU may be configured to determine one or more parameters associated with a WTRU-sided RL model validation. The WTRU may determine a first indication and a second indication. The first indication may be based on non-AIML/legacy or a different RL model while the second indication may be based on the WTRU-sided RL model to be validated. The first indication may be associated with a reference action. The reference action may be applied by the WTRU for example, to perform a transmission using a configuration associated with the first indication. The second indication may be associated with a hypothetical action. The hypothetical action represents the output of the RL-model to be validated. The second indication may imply that the NW may provide feedback on the quality of the hypothetical action. For example, the reference action may represent a first precoding matrix indicator (PMI) selection by the WTRU. The first PMI may be used for beamforming in the next transmission. The hypothetical action may represent a second PMI selection by the WTRU. The second selected PMI may be evaluated (e.g., only evaluated) by the NW. In another example, the reference action may represent a first MCS selected by the WTRU. The selected modulating and coding scheme (MCS) may be used in data transmission. The hypothetical action may represent a second PMI selected by the WTRU. The selected PMI may be evaluated by the NW.
[0111] The WTRU may differentiate and/or determine the first indication and second indication from the DCI scheduling the transmission. The first indication and second indication may be determined from the contents of the DCI (e.g., explicit indication). The DCI may have specific field that indicates whether the WTRU needs to determine a first indication or second indication or both. The first indication and second indication may be determined from a parameter in the DCI (e.g., implicit indication). The parameters of DCI may include one or more of radio network temporary identifier (RNTI) used to decode the DCI, CORESET or search-space of the PDCCH transmission including the DCI, aggregation level of the PDCCH/PSCCH
29
transmission of the DCI, timing of the DCI reception, and/or beam or transmission configuration indication (TCI) state used for the PDCCH/PSCCH transmission of the DCI.
[0112] Reception of the of the first transmission and second transmission based on the first indication and second indication is discussed herein. The WTRU may receive a first transmission and a second transmission. The first transmission may be associated with the network response of the WTRU first indication. The second transmission may be associated with the network response to the WTRU second indication. For example, the first transmission may represent data transmission using the first selected/indicated PMI, for example, associated with the reference action. The second transmission may represent reference signal (RS) transmission using the second selected PMI (e.g., only RS transmission using the second selected PMI), for example, associated with the hypothetical action and/or the second transmission may represent feedback information about one or more performance aspect related to the second PMI. For example, the second transmission may represent interference metric associated with the second PMI, e.g., resulting interference on other WTRU served by the NW. The two transmissions may happen over the same channel (e.g., PDSCH/PDCCH) or over two different channels (e.g., first transmission occur over PDSCH while second transmission occur over PDCCH).
[0113] WTRU determination of the second transmission quality based on configured performance threshold is discussed herein. The WTRU may be configured to determine the quality of second transmission, for example, the transmission associated with the hypothetical action generated by the WTRU-sided RL model. The WTRU may be configured to measure a performance metric associated with the hypothetical action. For example, the WTRU may measure HARQ ACKs and/or NACKs rate or block error rate (BLER) associated with the hypothetical action. For example, the WTRU may perform one or more measurements associated with the hypothetical action. As an example, if the hypothetical action represents a PMI selected by the RL model and is applied on RS transmission by the NW, the measurements may include SINR and/or RSRP and/or CQI. The performance metric and/or measurements may be collected over a configured period.
[0114] The WTRU may determine the quality of actions by performing a comparison between the measured and/or determined performance metric associated with the hypothetical action and a preconfigured performance threshold/criteria/condition. For example, the WTRU may determine the quality of the selected PMI by comparing the measured SINR/SNR/RSRP with a configured SINR/SNR/RSRP threshold. The WTRU may determine the quality of hypothetical actions by comparing the hypothetical reward with a configured measured reward by the NW. The hypothetical reward may be the expected
30
training reward that the WTRU would have received if the hypothetical action would have been taken in the training environment. For example, the WTRU may be configured with a reward associated with the RL- based selected PMI. The reward may be function of interference and expected reception quality associated with the selected PMI. The WTRU may determine the quality by comparing the configured reward with the hypothetical reward.
[0115] WTRU determination of the quality of the second transmission (e.g., associated with hypothetical action), based on the performance of the first transmission (e.g., associated with the reference action), is discussed herein. The WTRU may determine the quality of the hypothetical action by comparing it against the reference action. For example, the WTRU may determine the quality of the hypothetical PMI, (e.g., RL- based selected PMI), by comparing the measured SINR/SNR/RSRP against the measured SINR/SNR/RSRP associated with the reference PMI, (e.g., the PMI used for data beamforming). The WTRU may determine and indicate the performance degradation or improvement of the hypothetical action relative to the reference action. For example, the WTRU may determine the RSRP/SNR/SINR difference between the RL-based selected PMI and the reference PMI.
[0116] WTRU determination of the quality of the second transmission (e.g., associated with the hypothetical action), based on the NW feedback (e.g., performance metric) associated with the hypothetical action, is discussed herein. The WTRU may determine the quality of hypothetical actions based on the NW feedback. For example, the NW may provide feedback (e.g., performance metric) associated with the WTRU indicated hypothetical action. For example, the WTRU may report the RL-selected PMI and the NW may respond by interference metric feedback that reflects the amount of interference caused by the WTRU RL-based selected PMI. The WTRU may determine and report the quality of the RL-based selected PMI based on the received interference metric, or a combination of the received interference metric and the measured RSRP associated with the RL-based selected PMI.
[0117] Indication of the WTRU RL validation report may be discussed herein. The WTRU may be configured to report the quality information associated with the hypothetical actions for WTRU-sided RL- model validation. For example, the quality information may be indicated in multiple formats. The formats may be configured, e.g., in a DCI field or MAC CE. A first format may be Boolean, for example, a binary bit indicating if the hypothetical action performance exceeds the reference action performance, or a binary bit indicating if the hypothetical action performance exceeds a configured performance threshold. A second format may be multiple quantized quality levels, indicated in two or more bits. Each quality level may represent a specific range of performance difference between the hypothetical action performance and the
31
reference action performance, or each quality level may represent a range of difference between the hypothetical action performance and the configured threshold. A third format may indicate the direct performance of the hypothetical action. The performance may be a quantized performance metric, e.g. quantized RSRP/SINR/SNR, configured by the NW.
[0118] The WTRU may be preconfigured with resources to indicate the validation report of the WTRU-sided RL model. The validation report may include the quality information of the hypothetical action. The WTRU may be configured with reserved symbols on PUSCH for transmission of validation report. For example, the reserved symbols may be configured on preconfigured time-frequency locations. The WTRU may be configured to report the quality information associated with the hypothetical actions in a Uplink Control Information (UCI). The resources for the RL model validation report transmission may be preconfigured for a WTRU. The resources may be defined as PUSCH format or UCI format. The resources may be configured via RRC configuration. These resources may be activated and/ordeactivated by MAC CE.
[0119] The WTRU may be configured to report the quality information based on one or more of the following triggers. The WTRU may be configured to report the quality information based on a time-event trigger. The WTRU may be configured with time instances or specific slots to determine and report the quality information of the hypothetical actions. In another option, the WTRU may be configured to report the quality information of the hypothetical actions every N configured period over which the performance metrics are calculated. The WTRU may be configured to report the quality information based on a measurement trigger. The WTRU may report the quality information associated with the hypothetical actions based on one or more measurements achieving certain configured threshold or criteria. For example, the one or more measurements achieving certain configured threshold or criteria may include threshold or criteria when the performance of the hypothetical actions exceeds or is lower than a preconfigured threshold. In another example, the one or more measurements achieving certain configured threshold or criteria may include threshold or criteria when the performance of the hypothetical action exceeds or is lower than the performance of the reference action. The WTRU may be configured to report the quality information based on a feedback resource selection, or feedback resource, or change thereof. The WTRU may be configured to report the quality information based on a performance of an associated function. For example, the WTRU may determine the rate of HARQ-ACK associated with the hypothetical actions. If the performance drops below a threshold, the WTRU may report the quality of the hypothetical action.
[0120] WTRU-assisted NW-sided RL offline policy evaluation is discussed herein. A WTRU may receive a configuration for validating the NW-sided RL model. The configuration may include a configuration of a
32
validation mode. Actions selected/determined by RL model may not be performed but informed, which is referred to as hypothetical actions. The configuration may include criteria(s)/rules/conditions for RL model validation. The criteria may be associated with determined quality of actions evaluated against one or more preconfigured performance threshold(s). For example, the WTRU may determine a metric associated with hypothetical action. The WTRU may determine expected reward based on the metric and evaluate and/or report the quality based on absolute or relative threshold. The configuration may include a reporting configuration for RL model validation (e.g., resources for reporting, periodicity, model validation reporting format, etc.)
[0121] The WTRU may receive a first indication and a second indication. The WTRU may perform the action (e.g., reference action) and/or apply the configuration associated with first indication. The WTRU may perform validation of RL model by determining the hypothetical outcome of the action associated with second indication without actually performing the action (e.g, hypothetical action) and/or without applying the configuration associated with second indication. Examples of hypothetical action(s) may include a hypothetical application of one or more DL beam(s) indices selected by the RL model, a hypothetical application of one or more sub-band indices, and/or allocated power selected by the RL model. For example, the WTRU may determine if an indication is a first indication, or a second indication based on explicit information or implicit information, (e.g, a separate field in DCI, MAC CE, order of indication, different resources/channels/messages, etc.).
[0122] The WTRU may determine the quality of hypothetical action(s), based on one or more of the following steps. The one or more steps may include measuring the performance metric (e.g, function of use-case) assuming hypothetical application of the action (e.g, an average RSRP, SINR over a configured period, an average number of beam switches in a configured period, a number of beam failures over a configured period with or without successful recovery), and/or an average throughput over a configured period. The WTRU may determine the quality of the hypothetical action(s) based on a comparison of performance metric associated with hypothetical action(s) with a preconfigured performance criteria and/or condition. For example: The preconfigured performance criteria and/or condition may include measured quantity and comparing it with a threshold (e.g, SINR/SNR/RSRP on one of the sub-bands). The preconfigured performance criteria and/or condition may include measured average number of beam switches and comparing it with a configured metric threshold. The preconfigured performance criteria and/or condition may include measured average number of beam failures and comparing it with a configured metric threshold. The preconfigured performance criteria and/or condition may include
33
measured average RSRP over a period and comparing it with a threshold. The preconfigured performance criteria and/or condition may include measured reward over a period and comparing it with the configured expected reward. The WTRU may determine the quality of the hypothetical actions based on improvement and/or degradation of performance metric or quality of hypothetical action(s) relative to the performance metric or quality of a reference action(s). For example, the WTRU may determine the quality of the hypothetical actions based on comparing the performance metrics associated with the hypothetical action(s) and reference action(s)).
[0123] The WTRU may report the determined quality of hypothetical action(s) (e.g., for NW-sided RL model validation). For example, reporting may be triggered when preconfigured conditions are satisfied. Reporting may be triggered if the determined quality of hypothetical actions(s) exceeds or is less than the preconfigured threshold or a performance metric associated with hypothetical action(s) is offset better/worse than a performance metric associated with a reference action(s) over n configured periods. For example, reporting may be transmitted periodically (e.g., every n configured period over which the performance metrics are calculated). For example, the report may include one or more indication formats: Boolean, multiple quantized levels for quality (e.g., represented in multiple levels depending on the difference between the measured metric and the configured metric threshold).
[0124] FIG. 3 illustrates a flow diagram 300 comprising an NW-sided RL model 302 used for beam selection with the aim of reducing latency and/or measurement overhead. The state space 304 may be defined with RS or CSI measurements while the action space 306 may include one or more selected beam indices. The goal of the RL model is to select the beams that maximize the average RSRP while minimizing the number of beam switches over a configured period.
[0125] WTRU procedures for off-policy RL model evaluation/validation are discussed herein. A WTRU may transmit a capability associated with an RL mode. The capability may include an RL model ID (e.g, implicitly indicates the configuration of state and/or action space or target policy), a support of a feature (e.g, beam management), and/or a status of RL model (e.g, validated or not, Time of last validation, Area of last validation, etc.).
[0126] The WTRU may receive a configuration for RL model validation. For example, the WTRU may receive a resource configuration for reporting RL model validation (e.g, resources for reporting, periodicity, model validation reporting format, etc.). The WTRU may determine a first indication and a second indication. The WTRU may determine a first indication based on non-AIML/legacy methods or a different RL model. Possibly the first indication may be associated with a reference action. The WTRU may
34
determine a second indication based on an output of RL model being validated. The second indication may be associated with a hypothetical action, for example, indicating that the network provides feedback on the quality of hypothetical action (e.g., selected PMI by the RL model to satisfy an average overhead requirement while achieving a target performance)
[0127] The WTRU may transmit the first and second indication. The WTRU may differentiate the first indication and second indication based on explicit information or implicit means (e.g., a separate field in DCI, MAC CE, order of indication, different resources/channels/messages, etc.).
[0128] The WTRU may receive a first transmission based on the first indication and a second transmission based on the second indication.
[0129] The WTRU may determine the quality of second transmission (e.g., reception associated with the RL selected PMI) based on one or more of the following steps. The WTRU may determine the quality of the second transmission by measuring the performance metric (e.g., function of use-case) assuming hypothetical action. The performance metric may include a HARQ ACK and/or a SI NR/RSRP/CQI (e.g., associated with the selected PMI by the RL model). The WTRU may determine the quality of the second transmission based on a comparison of performance metric associated with hypothetical action(s) with a preconfigured performance criteria/condition. For example, the WTRU may determine the quality of the second transmission by comparing a measured quantity with a threshold (e.g., SINR/SNR/RSRP associated with the selected PMI by the RL model). For example, the WTRU may determine the quality of the second transmission by comparing a measured reward over a period and comparing it with the training reward. The WTRU may determine the quality of the second transmission by an improvement and/or degradation of hypothetical action(s) relative to a reference action(s) (e.g., comparing the performance metrics, for example, RSRP and/or SI NR, associated with the hypothetical action(s) and reference action(s)). The WTRU may determine the quality of the second transmission based on NW feedback associated with the hypothetical action, for example, multiuser interference metric associated with the selected PMI by the RL model.
[0130] The WTRU may transmit the quality information associated with hypothetical action(s) for WTRU- sided RL model validation, for example, when preconfigured conditions are satisfied. For example, the WTRU may transmit the quality information if the quality of hypothetical actions(s) exceeds/lower than the preconfigured threshold or offset better/worse than reference action(s) over n configured periods. The WTRU may transmit the quality information periodically (e.g., every n configured period over which the performance metrics are calculated). The format of the quality information indication may be Boolean,
35
indicating whether the conditions associated with validation are satisfied. The format of the quality information indication may include multiple quantized levels for quality. The multiple quantized levels for quality may be represented in multiple levels depending on the difference between the measured metric and the configured metric threshold.
[0131] The WTRU may receive the indication from gNB on RL model validation, for example, for MU transmissions.
[0132] FIG. 4 illustrates a flow diagram 400 comprising WTRU-sided RL model 402 used for PMI selection with the aim of reducing the average overhead while achieving a target performance. The state space 404 may be defined with power, interference, and CSI measurements while the action space 406 may include one or more selected PMI. The goal of the RL model may be to select the PMI that maximizes the RSRP under average overhead constraints.
36
Claims
1 . A wireless transmit/receive unit (WTRU) comprising a processor and a memory, wherein the processor and memory are configured to: receive a first indication of a first configuration to be implemented for performing a first action related to communications on a network; receive, from the network, a second indication of a second configuration to be implemented for performing a second action related to communications on the network, wherein the second action is an anticipated action associated with a network-side (NW-side) reinforcement learning (RL) model; perform the first action related to the communications on the network using the first configuration and determine a first outcome based on a first performance metric associated with the first action; perform a validation of the NW-side RL model by being configured to: determine a second outcome based on a second performance metric associated with performance of the second action without actually implementing the second configuration to perform the second action related to the communications on the network; and determine a quality of the second action compared to a quality of the first action based on the first outcome and the second outcome; and report the determined quality of the second action for validation of the anticipated action associated with the NW-side RL model.
2. The WTRU of claim 1 , wherein the first action is a reference action comprising one or more of: applying a configuration associated with the first indication; performing a measurement or a reception using a first configuration determined by the first indication; triggering or performing a transmission using a second configuration determined by the first indication; suspending an ongoing or future transmission; entering a power saving mode; setting, resetting, or modifying one or more of protocol parameters, timers, or counters using a third configuration determined by the first indication.
3. The WTRU of claim 1 , wherein the first configuration and the second configuration each comprises a different downlink (DL) beam configuration, different power saving modes, different protocol parameters, or different timers.
4. The WTRU of claim 1 , wherein the first configuration is a first downlink (DL) beam configuration and the second configuration is a second DL beam configuration, wherein the first action comprises the WTRU switching to the first DL beam to receive data transmissions, wherein the second action comprises the WTRU switching to the second DL beam provided by the NW-side RL model to receive the data transmissions, and wherein the second performance metric comprises a measurement on the second DL beam.
5. The WTRU of claim 1 , wherein the first configuration is a first downlink control information (DCI) configuration indicating a first modulation and coding scheme (MCS), wherein the second configuration is a second DCI configuration indicating a second MCS, wherein the first action comprises the WTRU applying the first DCI configuration to receive DL transmissions, and wherein the second action comprises the WTRU switching to apply the second DCI configuration to receive the DL transmissions, and wherein the second performance metric comprises a measurement on the second DCI configuration.
6. The WTRU of claim 1 , wherein the first configuration is a first sub-band configuration, wherein the second configuration is a second sub-band configuration, wherein the first action comprises the WTRU applying the first sub-band configuration to receive DL transmissions, and wherein the second action comprises the WTRU switching to apply the second sub-band configuration to receive the DL transmissions, and wherein the second performance metric comprises a measurement on the second subband configuration.
7. The WTRU of claim 1 , wherein the first configuration is a first power level configuration for a first DL transmission, wherein the second configuration is a second power level configuration for a second DL transmission, wherein the first action comprises the WTRU applying the first power level configuration to receive DL transmissions, and wherein the second action comprises the WTRU switching to apply the second power level configuration to receive the DL transmissions, and wherein the second performance metric comprises a measurement on the second power level configuration.
8. The WTRU of claim 1 , wherein the first performance metric and the second performance metric are determined based on one or more preconfigured criteria.
38
9. The WTRU of claim 8, wherein each of the one or more preconfigured criteria comprises a hybrid automatic repeat request (HARQ) acknowledgement (ACK), an average signal received power (RSRP), a signal-to-i nterference-plus-noise ratio (SINR), or a channel quality indicator (CQI).
10. The WTRU of claim 8, wherein each of the one or more preconfigured criteria comprises an average number of beam switches.
11. A method comprising: receiving a first indication of a first configuration to be implemented for performing a first action related to communications on a network; receiving, from the network, a second indication of a second configuration to be implemented for performing a second action related to communications on the network, wherein the second action is an anticipated action associated with a network-side (NW-side) reinforcement learning (RL) model; performing the first action related to the communications on the network using the first configuration and determining a first outcome based on a first performance metric associated with the first action; performing a validation of the NW-side RL model by being configured to: determine a second outcome based on a second performance metric associated with performance of the second action without actually implementing the second configuration to perform the second action related to the communications on the network; and determine a quality of the second action compared to a quality of the first action based on the first outcome and the second outcome; and reporting the determined quality of the second action for validation of the anticipated action associated with the NW-side RL model.
12. The method of claim 1 , wherein the first action is a reference action comprising one or more of: applying a configuration associated with the first indication; performing a measurement or a reception using a first configuration determined by the first indication; triggering or performing a transmission using a second configuration determined by the first indication; suspending an ongoing or future transmission;
entering a power saving mode; setting, resetting, or modifying one or more of protocol parameters, timers, or counters using a third configuration determined by the first indication.
13. The method of claim 11 , wherein the first configuration and the second configuration each comprises a different downlink (DL) beam configuration, different power saving modes, different protocol parameters, or different timers.
14. The method of claim 11 , wherein the first configuration is a first downlink (DL) beam configuration and the second configuration is a second DL beam configuration, wherein the first action comprises the WTRU switching to the first DL beam to receive data transmissions, wherein the second action comprises the WTRU switching to the second DL beam provided by the NW-side RL model to receive the data transmissions, and wherein the second performance metric comprises a measurement on the second DL beam.
15. The method of claim 11 , wherein the first configuration is a first downlink control information (DCI) configuration indicating a first modulation and coding scheme (MCS), wherein the second configuration is a second DCI configuration indicating a second MCS, wherein the first action comprises the WTRU applying the first DCI configuration to receive DL transmissions, and wherein the second action comprises the WTRU switching to apply the second DCI configuration to receive the DL transmissions, and wherein the second performance metric comprises a measurement on the second DCI configuration.
16. The method of claim 11 , wherein the first configuration is a first sub-band configuration, wherein the second configuration is a second sub-band configuration, wherein the first action comprises the WTRU applying the first sub-band configuration to receive DL transmissions, and wherein the second action comprises the WTRU switching to apply the second sub-band configuration to receive the DL transmissions, and wherein the second performance metric comprises a measurement on the second subband configuration.
17. The method of claim 11 , wherein the first configuration is a first power level configuration for a first DL transmission, wherein the second configuration is a second power level configuration for a second DL transmission, wherein the first action comprises the WTRU applying the first power level configuration to
receive DL transmissions, and wherein the second action comprises the WTRU switching to apply the second power level configuration to receive the DL transmissions, and wherein the second performance metric comprises a measurement on the second power level configuration.
18. The method of claim 11 , wherein the first performance metric and the second performance metric are determined based on one or more preconfigured criteria.
19. The method of claim 18, wherein each of the one or more preconfigured criteria comprises a hybrid automatic repeat request (HARQ) acknowledgement (ACK), an average signal received power (RSRP), a signal-to-i nterference-plus-noise ratio (SINR), or a channel quality indicator (CQI).
20. The method of claim 18, wherein each of the one or more preconfigured criteria comprises an average number of beam switches.
CLAIMS:
1 . A wireless transmit/receive unit (WTRU) comprising a processor and a memory, wherein the processor and memory are configured to: receive a first indication of a first configuration to be implemented for performing a first action related to communications on a network; receive, from the network, a second indication of a second configuration to be implemented for performing a second action related to communications on the network, wherein the second action is an anticipated action associated with a network-side (NW-side) reinforcement learning (RL) model; perform the first action related to the communications on the network using the first configuration and determine a first outcome based on a first performance metric associated with the first action; perform a validation of the NW-side RL model by being configured to: determine a second outcome based on a second performance metric associated with performance of the second action without actually implementing the second configuration to perform the second action related to the communications on the network; and determine a quality of the second action compared to a quality of the first action based on the first outcome and the second outcome; and report the determined quality of the second action for validation of the anticipated action associated with the NW-side RL model.
2. The WTRU of claim 1 , wherein the first action is a reference action comprising one or more of: applying a configuration associated with the first indication; performing a measurement or a reception using a first configuration determined by the first indication; triggering or performing a transmission using a second configuration determined by the first indication; suspending an ongoing or future transmission; entering a power saving mode; setting, resetting, or modifying one or more of protocol parameters, timers, or counters using a third configuration determined by the first indication.
3. The WTRU of claim 1 , wherein the first configuration and the second configuration each comprises a different downlink (DL) beam configuration, different power saving modes, different protocol parameters, or different timers.
4. The WTRU of claim 1 , wherein the first configuration is a first downlink (DL) beam configuration and the second configuration is a second DL beam configuration, wherein the first action comprises the WTRU switching to the first DL beam to receive data transmissions, wherein the second action comprises the WTRU switching to the second DL beam provided by the NW-side RL model to receive the data transmissions, and wherein the second performance metric comprises a measurement on the second DL beam.
5. The WTRU of claim 1 , wherein the first configuration is a first downlink control information (DCI) configuration indicating a first modulation and coding scheme (MCS), wherein the second configuration is a second DCI configuration indicating a second MCS, wherein the first action comprises the WTRU applying the first DCI configuration to receive DL transmissions, and wherein the second action comprises the WTRU switching to apply the second DCI configuration to receive the DL transmissions, and wherein the second performance metric comprises a measurement on the second DCI configuration.
6. The WTRU of claim 1 , wherein the first configuration is a first sub-band configuration, wherein the second configuration is a second sub-band configuration, wherein the first action comprises the WTRU applying the first sub-band configuration to receive DL transmissions, and wherein the second action comprises the WTRU switching to apply the second sub-band configuration to receive the DL transmissions, and wherein the second performance metric comprises a measurement on the second subband configuration.
7. The WTRU of claim 1 , wherein the first configuration is a first power level configuration for a first DL transmission, wherein the second configuration is a second power level configuration for a second DL transmission, wherein the first action comprises the WTRU applying the first power level configuration to receive DL transmissions, and wherein the second action comprises the WTRU switching to apply the second power level configuration to receive the DL transmissions, and wherein the second performance metric comprises a measurement on the second power level configuration.
8. The WTRU of claim 1 , wherein the first performance metric and the second performance metric are determined based on one or more preconfigured criteria.
38
9. The WTRU of claim 8, wherein each of the one or more preconfigured criteria comprises a hybrid automatic repeat request (HARQ) acknowledgement (ACK), an average signal received power (RSRP), a signal-to-i nterference-plus-noise ratio (SINR), or a channel quality indicator (CQI).
10. The WTRU of claim 8, wherein each of the one or more preconfigured criteria comprises an average number of beam switches.
11. A method comprising: receiving a first indication of a first configuration to be implemented for performing a first action related to communications on a network; receiving, from the network, a second indication of a second configuration to be implemented for performing a second action related to communications on the network, wherein the second action is an anticipated action associated with a network-side (NW-side) reinforcement learning (RL) model; performing the first action related to the communications on the network using the first configuration and determining a first outcome based on a first performance metric associated with the first action; performing a validation of the NW-side RL model by being configured to: determine a second outcome based on a second performance metric associated with performance of the second action without actually implementing the second configuration to perform the second action related to the communications on the network; and determine a quality of the second action compared to a quality of the first action based on the first outcome and the second outcome; and reporting the determined quality of the second action for validation of the anticipated action associated with the NW-side RL model.
12. The method of claim 1 , wherein the first action is a reference action comprising one or more of: applying a configuration associated with the first indication; performing a measurement or a reception using a first configuration determined by the first indication; triggering or performing a transmission using a second configuration determined by the first indication; suspending an ongoing or future transmission;
39
entering a power saving mode; setting, resetting, or modifying one or more of protocol parameters, timers, or counters using a third configuration determined by the first indication.
13. The method of claim 11 , wherein the first configuration and the second configuration each comprises a different downlink (DL) beam configuration, different power saving modes, different protocol parameters, or different timers.
14. The method of claim 11 , wherein the first configuration is a first downlink (DL) beam configuration and the second configuration is a second DL beam configuration, wherein the first action comprises the WTRU switching to the first DL beam to receive data transmissions, wherein the second action comprises the WTRU switching to the second DL beam provided by the NW-side RL model to receive the data transmissions, and wherein the second performance metric comprises a measurement on the second DL beam.
15. The method of claim 11 , wherein the first configuration is a first downlink control information (DCI) configuration indicating a first modulation and coding scheme (MCS), wherein the second configuration is a second DCI configuration indicating a second MCS, wherein the first action comprises the WTRU applying the first DCI configuration to receive DL transmissions, and wherein the second action comprises the WTRU switching to apply the second DCI configuration to receive the DL transmissions, and wherein the second performance metric comprises a measurement on the second DCI configuration.
16. The method of claim 11 , wherein the first configuration is a first sub-band configuration, wherein the second configuration is a second sub-band configuration, wherein the first action comprises the WTRU applying the first sub-band configuration to receive DL transmissions, and wherein the second action comprises the WTRU switching to apply the second sub-band configuration to receive the DL transmissions, and wherein the second performance metric comprises a measurement on the second subband configuration.
17. The method of claim 11 , wherein the first configuration is a first power level configuration for a first DL transmission, wherein the second configuration is a second power level configuration for a second DL transmission, wherein the first action comprises the WTRU applying the first power level configuration to
40
receive DL transmissions, and wherein the second action comprises the WTRU switching to apply the second power level configuration to receive the DL transmissions, and wherein the second performance metric comprises a measurement on the second power level configuration.
18. The method of claim 11 , wherein the first performance metric and the second performance metric are determined based on one or more preconfigured criteria.
19. The method of claim 18, wherein each of the one or more preconfigured criteria comprises a hybrid automatic repeat request (HARQ) acknowledgement (ACK), an average signal received power (RSRP), a signal-to-i nterference-plus-noise ratio (SINR), or a channel quality indicator (CQI).
20. The method of claim 18, wherein each of the one or more preconfigured criteria comprises an average number of beam switches.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202463572521P | 2024-04-01 | 2024-04-01 | |
| US63/572,521 | 2024-04-01 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025212470A1 true WO2025212470A1 (en) | 2025-10-09 |
Family
ID=95446499
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2025/022230 Pending WO2025212470A1 (en) | 2024-04-01 | 2025-03-31 | Methods for offline policy validation in reinforcement learning |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025212470A1 (en) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2023146749A1 (en) * | 2022-01-26 | 2023-08-03 | Qualcomm Incorporated | Machine learning model validation with verification data |
| WO2024030410A1 (en) * | 2022-08-01 | 2024-02-08 | Interdigital Patent Holdings, Inc. | Methods for online training for devices performing ai/ml based csi feedback |
| WO2024030604A1 (en) * | 2022-08-05 | 2024-02-08 | Interdigital Patent Holdings, Inc. | Validation of artificial intelligence (ai)/machine learning (ml) in beam management and hierarchical beam prediction |
-
2025
- 2025-03-31 WO PCT/US2025/022230 patent/WO2025212470A1/en active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2023146749A1 (en) * | 2022-01-26 | 2023-08-03 | Qualcomm Incorporated | Machine learning model validation with verification data |
| WO2024030410A1 (en) * | 2022-08-01 | 2024-02-08 | Interdigital Patent Holdings, Inc. | Methods for online training for devices performing ai/ml based csi feedback |
| WO2024030604A1 (en) * | 2022-08-05 | 2024-02-08 | Interdigital Patent Holdings, Inc. | Validation of artificial intelligence (ai)/machine learning (ml) in beam management and hierarchical beam prediction |
Non-Patent Citations (1)
| Title |
|---|
| "3rd Generation Partnership Project; Technical Specification Group Radio Access Network; Study on Artificial Intelligence (AI)/Machine Learning (ML) for NR air interface (Release 18)", no. V18.0.0, 16 January 2024 (2024-01-16), pages 1 - 187, XP052576826, Retrieved from the Internet <URL:https://ftp.3gpp.org/Specs/archive/38_series/38.843/38843-i00.zip 38843-i00.docx> [retrieved on 20240116] * |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20250016593A1 (en) | Methods and apparatuses for multi-resolution csi feedback for wireless systems | |
| WO2024030604A1 (en) | Validation of artificial intelligence (ai)/machine learning (ml) in beam management and hierarchical beam prediction | |
| US20240275641A1 (en) | Methods, architectures, apparatuses and systems directed to adaptive reference signal configuration | |
| US20250357982A1 (en) | Methods, apparatus, and systems for hierarchical beam prediction based on association of beam resources | |
| WO2024025731A1 (en) | Methods for hierarchical beam prediction based on multiple cri | |
| WO2024173223A1 (en) | Methods on supporting dynamic model selection for wireless communication | |
| WO2023212272A1 (en) | Methods on beam prediction for wireless communication | |
| WO2024163944A1 (en) | Procedures for dynamic reporting of specific predicted csi components | |
| EP4533680A1 (en) | Multiple codeword simultaneous multi-panel transmission | |
| WO2025212470A1 (en) | Methods for offline policy validation in reinforcement learning | |
| WO2025235315A1 (en) | Generation of channel quality indicator values with a reinforcement learning model at a wireless transmit/receive unit | |
| WO2024211561A1 (en) | Methods on selecting, determining and indicating beam measurement set based on kpis for aiml systems | |
| WO2025145156A1 (en) | Methods, architectures, apparatuses and systems for monitoring wireless transmit/receive unit (wtru) selected inactive measurement beam resource sets | |
| WO2025019312A1 (en) | Dynamic mode switching between beam indication and beam pair indication based on ue request and gnb confirmation or used search space for pdcch reception | |
| WO2025128278A1 (en) | Methods and apparatuses for switching location of artificial intelligence (ai)/machine learning (ml) operations | |
| WO2025019310A1 (en) | Dynamic mode switching between beam indication and beam pair indication and ul tci activation based on explicit indication from gnb | |
| WO2025111448A1 (en) | Wtru assisted proactive phy frame reconfiguration associated with low latency applications | |
| WO2025019323A1 (en) | Dynamic determination of beam failure recovery mode | |
| WO2024211555A1 (en) | Methods for artificial intelligence (ai) / machine learning (ml) model switching | |
| WO2024173451A1 (en) | Harq codebook enhancements | |
| EP4612804A1 (en) | Methods and systems for adaptive csi quantization | |
| WO2025034419A1 (en) | Methods and apparatuses for selecting, determining, and supporting auxiliary beams based on synchronization signal/physical broadcast channel blocks | |
| WO2025096298A1 (en) | Methods and apparatuses for channel state information compression using a latent representation with multiple parts associated to different time intervals | |
| WO2025019317A1 (en) | Dynamic determination of beam reporting parameters based on implicit or explicit determination | |
| WO2025090503A1 (en) | Methods for multi-trp linear coded csi compression |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 25719586 Country of ref document: EP Kind code of ref document: A1 |