WO2025172598A1

WO2025172598A1 - First node, second node, communications system and methods performed thereby

Info

Publication number: WO2025172598A1
Application number: PCT/EP2025/054150
Authority: WO
Inventors: Miguel Angel MUÑOZ DE LA TORRE ALONSO; Antonio INIESTA GONZALEZ; Danesh DAROUI; Ulf Mattsson; Magnus HALLENSTÅL
Original assignee: Telefonaktiebolaget LM Ericsson AB
Current assignee: Telefonaktiebolaget LM Ericsson AB
Priority date: 2024-02-16
Filing date: 2025-02-17
Publication date: 2025-08-21
Anticipated expiration: 2026-08-16

Abstract

A computer-implemented method, performed by a first node, for handling a Quality of Experience, QoE, aimed to be achieved, the first node operating in a communications system, the method comprising determining, using a reinforcement learning procedure of machine learning, one or more actions to be applied based on the QoE aimed to be achieved, for the at least one of the one or more services and the one or more slices, wherein the determining is based on information received from a second node operating in the communications system, and initiating application of the determined one or more actions. In some embodiments the one or more actions are to be applied on at least one of: one or more services, one or more slices, at least a first subset of one or more devices operating in the communications system, the one or more actions to be applied comprise updating one or more policies, the one or more actions comprise using one or more Quality of Service, QoS, parameters, and/or the at least of the one or more services and the one or more slices are for data communication to be provided to the first subset, or at least a second subset, of the one or more devices.

Description

FIRST NODE, SECOND NODE, COMMUNICATIONS SYSTEM AND METHODS

PERFORMED THEREBY

TECHNICAL FIELD

The present disclosure relates generally to a first node and methods performed thereby. The present disclosure also relates generally to a second node, and methods performed thereby. The present disclosure also relates generally to a communications system, and methods performed thereby.

BACKGROUND

Computer systems in a communications network or communications system may comprise one or more nodes. A node may comprise a processing circuitry which, together with computer program code may perform different functions and actions, a memory, a receiving port, and a sending port. A node may be, for example, a server. Nodes may perform their functions entirely on the cloud.

The communications system may cover a geographical area which may be divided into cell areas, each cell area being served by a type of node, a network node in the Radio Access Network (RAN), radio network node or Transmission Point (TP), for example, an access node such as a Base Station (BS), e.g., a Radio Base Station (RBS), which sometimes may be referred to as e.g., gNB, evolved Node B (“eNB”), “eNodeB”, “NodeB”, “B node”, or Base Transceiver Station (BTS), depending on the technology and terminology used. The base stations may be of different classes such as e.g., Wide Area Base Stations, Medium Range Base Stations, Local Area Base Stations, and Home Base Stations, based on transmission power and thereby also cell size. A cell may be understood to be the geographical area where radio coverage may be provided by the base station at a base station site. One base station, situated on the base station site, may serve one or several cells. Further, each base station may support one or several communication technologies. The telecommunications network may also comprise network nodes which may serve receiving nodes, such as user equipments, with serving beams.

The standardization organization Third Generation Partnership Project (3GPP) is currently in the process of specifying a New Radio Interface called Next Generation Radio or New Radio (NR) or 5G-Universal Terrestrial Radio Access (UTRA), as well as a Fifth Generation (5G) Packet Core Network, which may be referred to as 5G Core Network (5GC), abbreviated as 5GC.

Figure 1 is a schematic diagram depicting a particular example of a 5G reference architecture of a policy and charging control framework, as defined by 3GPP, which may be used as a reference for the present disclosure. An Application Function (AF) 1 may provide a service in the communications system and may interact with the 3GPP Core Network through a Network Exposure Function (NEF) 2. The AF 1 may allow external parties to use the Exposure Application Programming Interfaces (APIs) offered by the network operator. In case the AF 1 is trusted, e.g., internal to the network operator, the AF 1 may interact with the 3GPP Core Network directly, with no NEF 2 involved. The NEF 2 may support different functionality. Specifically, the NEF 2 may support different Exposure APIs, such as, for example, a NEF Application Programming Interface (API) for Packet Flow Description (PFD) Management. The NEF 2 may therefore be understood in such examples to comprise a PFD Function (PFDF). Management of Packet Flow Descriptions (PFDs) may be understood to refer to a capability to create, update or remove PFDs in the NEF 2 (PFDF), and the distribution from the NEF 2 (PFDF) to a Session Management Function (SMF) 3 and finally to a User Plane function (UPF) 4. This feature may be used when the UPF 4 may be configured to detect a particular application provided by an Application Service Provider (ASP). The SMF 3 may support different functionalities, e.g., the SMF 3 may receive Policy and Charging Control (PCC) rules from the Policy Control Function (PCF) 5 and may configure the UPF 4 accordingly. The UPF 4 may support handling of user plane traffic, including packet inspection, packet routing and forwarding, traffic usage reporting, and Quality of Service (QoS) handling, e.g., for user plane, e.g., UL/DL rate enforcement. The User Plane may receive PFDs from the AF 1 through the NEF 2 and SMF 3. The SMF 3 may receive the PFD from the NEF 2, and convert it to applications and filters in Packet Detection Rules (PDRs) to be sent to the UPF 4. A Unified Data Repository (UDR) 6 may store data, grouped into distinct collections of subscription-related information: subscription data, policy data, structured data for exposure, and application data. Particularly relevant for this disclosure, stored data may comprise subscription policy data to be used by the PCF 5. The PCF 5 may support a unified policy framework to govern the network behavior. Specifically, the PCF 5 may provide PCC rules to a Policy and Charging Enforcement Function (PCEF), that is, the SMF 3 and/or the UPF 4 that may enforce policy and charging decisions according to provisioned PCC rules. A Charging Function (CHF) 7 may support charging related functionality, specifically online and offline charging. The PCF 5 may provide policy rules to a User Equipment (UE) through an Access and Mobility Function (AMF) 8. The AMF 8 may manage access of the UE. For example, when the UE may be connected through different access networks, and mobility aspects of the UE. A Network Data Analytics Function (NWDAF) 9 may be understood to represent an operator managed network analytics logical function. The NWDAF 9 may be part of the 5GC architecture and may use the mechanisms and interfaces specified for 5GC and Operations, Administration and Maintenance (OAM). The NWDAF may interact with different entities for different purposes: a) data collection based on event subscription, provided by AMF 8, SMF 3, PCF 5, Unified Data Management Function (UDM), AF 1 , directly or via NEF 2, and OAM, b) retrieval of information from data repositories, e.g., UDR 6 via UDM for subscriber-related information, c) retrieval of information about network Functions (NFs), e.g., Network Repository Function (NRF) for NF-related information, and Network Slice Selection Function (NSSF) for slice-related information, and d) on demand provision of analytics to consumers. Each of the UDR 6, the NEF 2, the NWDAF 9, the AF 1 , the PCF 5, the CHF 7, the AMF 8, the SMF 3 and the UPF 4 may have an interface through which they may be accessed, which as depicted in the Figure, may be, respectively: Nudr 10, Nnef 11 , Nnwdaf 12, Naf 13, Npcf 14, Nchf 15, Namf 16, Nsmf 17 and N4 18.

Machine Learning

Machine learning (ML) may be understood as the study of computer algorithms that may improve automatically through experience. It is seen as a part of Al. ML algorithms may build a model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to do so. ML algorithms may be used in a wide variety of applications, such as email filtering and computer vision, where it may be difficult or unfeasible to develop conventional algorithms to perform the needed tasks.

There may be basically three types of ML Algorithms: Supervised Learning, Unsupervised Learning, and Reinforcement Learning (RL).

Supervised Learning algorithms may comprise a target/outcome variable, or dependent variable, which may have to be predicted from a given set of predictors, that is, independent variables. Using this set of variables, a function may be generated that may map inputs to desired outputs. The training process may continue until the model may achieve a desired level of accuracy on the training data. Once an ML model may have been trained, an inference process may begin, whereby new data may be run through the ML model to calculate an output. Examples of Supervised Learning may be Regression, Decision Tree, Random Forest, KNN, Logistic Regression etc.

In Unsupervised Learning algorithms, there may be no target or outcome variable to predict/estimate. It may be used for clustering a population into different groups, which may be widely used for segmenting customers in different groups for specific intervention. Examples of Unsupervised Learning may be K-means, mean-shift clustering, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Expectation-Maximization (EM) Clustering using Gaussian Mixture Models (GMM), Agglomerative Hierarchical Clustering, etc....

Cluster analysis or clustering may be understood as an ML technique which may comprise grouping a set of objects in such a way that objects in the same group, which may be called a cluster, may be understood to be more similar, in some sense, to each other than to those in other groups, that is, other clusters. It may be understood as a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including pattern recognition, image analysis, information retrieval, bioinformatics, data compression, computer graphics and ML.

Reinforcement learning (RL) may be understood to be a type of ML where an agent may learn to make decisions by taking actions in an environment to achieve some goal. The agent may receive feedback in the form of rewards, which it may use to learn the best strategy, or policy, to accumulate the most reward over time.

Reinforcement learning may involve the following. An agent may be understood to be a learner or decision maker that may interact with the environment. The environment may be understood to refer to the world that the agent may interact with and learn from. A state may be understood to refer to a representation of the current situation that the agent may be in. It may be understood to be the context within which the agent may make decisions. An action may be understood to refer to a choice made by the agent that may affect the state. A reward: may be understood to refer to feedback from the environment in response to the actions by the agent. It may be a scalar signal that may indicate how well the agent is doing at a given moment. A Policy may be understood to be a strategy used by the agent, which may map states to actions. The policy may be deterministic, that is, always the same action for a given state, or stochastic, that is, probabilistic actions for a given state. A value function may be understood to refer to a function that may estimate how good it may be for the agent to be in a given state, or how good it may be to perform a certain action in a given state. The "goodness" may be typically measured as the expected return, e.g., the cumulative reward, that may be achieved. A Q-function, an Action-Value Function, may be understood to refer to a function that may estimate the value of taking a certain action in a given state, and then following the current policy thereafter. A model of the environment may be understood to refer to the fact that some RL approaches may involve learning a model that may predict the next state and the reward for the current state and action. This may be understood to allow for planning and reasoning about the future without needing to actually take the action.

RL may involve making decisions in regard to exploration vs. exploitation. In reinforcement learning, the agent may need to balance exploration, that is, trying new things to discover better rewards, with exploitation, that is, using known information to maximize rewards. This may be understood to be a trade-off in RL.

In the reinforcement learning problem, the state may change every time the agent may apply a new action. The problem may be represented in the following way: The agent may receive the state of the environment at a certain time (s). Then the agent may the selects on action (a) and apply it in the environment. When this action is applied, the environment may provide a reward (r) and change to a new state (s’), the reward and state may be provided finally by an interpreter. In reinforcement learning, the term "interpreter" may be used to describe a component of a reinforcement learning system that may interpret the state of the environment and the actions of an agent. It may be understood to be the part of the system that may bridge the agent with the environment it may be trying to learn from.

This cyclic procedure may be understood to bring a sequence of states, actions and rewards: s1 , a1 ,r1 ;...;sT,aT,rT. The agent may use different learning algorithms to learn the most appropriate action to take on every different state of the NW, e.g., policy-learning based, such as actor-critic approaches, or value-based learning, such as deep-q networks.

3GPP Rel19 Artificial Intelligence (AI)ZML (AIML) Ph2

The 3GPP Rel19 study on Core Network Enhanced Support for AI/ML includes Key Issue #3 to study NWDAF-assisted policy control and Quality of Service (QoS) enhancement. The NWDAF may be understood to be able to gather quite a lot of data from 5GC NFs, AF and QAM and thus may further assist the PCF in making PCC decisions. The PCF may be understood to traditionally determine QoS parameters based on its own data and knowledge, as well optional statistics and predictions collected from the NWDAF.

This Key issue may be understood to aim to study whether and what may need to be additionally supported in order to enhance 5GC NF operations related to policy control and QoS with the assistance of the NWDAF.

In this key issue, the following aspects may be understood to be studied in relation to the identification of use cases where policy control and QoS may be further enhanced with assistance from NWDAF: whether and how to introduce new 5GC functionality e.g., of the NWDAF and/or PCF to enhance the policy control and QoS, considering policies of the operator, b) whether and what additional input information may be needed by the NWDAF for providing an assistance to policy control and QoS, and how to gather it, c) whether and what output information, on top of that already provided, the NWDAF may provide to assist with policy control and QoS enhancements, and d) whether and how to evaluate the quality of the enhanced NWDAF assistance to policy control and QoS.

The study may focus primarily on existing enforcement mechanisms when available and identify new ones only when no existing ones may be used.

A problematic aspect of existing solutions is that they do not provide an optimal mechanism for obtaining the most appropriate values of Quality of Service (QoS) parameters to achieve a desired Quality of Experience (QoE) across varying network conditions. In current implementations, the determination of QoS parameters is often based on static rules or limited data, which may not adequately adapt to the different states of a dynamic network, thereby resulting in suboptimal service quality.

A further problematic aspect of existing solutions is that they lack the integration of advanced machine learning techniques — specifically reinforcement learning (RL) — into the policy control framework. Traditional approaches rely on predetermined policies and basic statistical inputs, without leveraging RL’s capability to dynamically adjust actions based on state and reward feedback, which could otherwise lead to more effective optimization of QoS settings in real time.

A further problematic aspect of existing solutions is the uncertainty regarding the additional input and output information required by the Network Data Analytics Function (NWDAF) to effectively assist in enhanced policy control and QoS management. There is a challenge in identifying what new data sources are necessary and in defining the appropriate interfaces for delivering supplementary information, which limits the ability to fully exploit NWDAF’s potential for improving network performance.

A further problematic aspect of existing solutions is that the roles within the network functions are not sufficiently flexible to optimally perform RL-based policy control. For example, while the Policy Control Function (PCF) traditionally might be expected to act as the RL agent, there is a recognized need to allow other network functions — such as Operations, Administration, and Maintenance (OAM) — to perform this role. This rigidity in role assignment can hinder the dynamic configuration of QoS parameters, particularly in complex network environments with rapidly changing conditions.

SUMMARY

The invention is set out in the appended set of claims.

A first aspect of the invention relates to a computer-implemented method, performed by a first node (111), for handling a Quality of Experience, QoE, aimed to be achieved, the first node (111) operating in a communications system (100), the method comprising: determining (405), using a reinforcement learning procedure of machine learning, one or more actions to be applied based on the QoE aimed to be achieved, for the at least one of the one or more services and the one or more slices, wherein the determining (405) is based on information received from a second node (112) operating in the communications system (100), and initiating (406) application of the determined one or more actions. In some embodiments, at least one of: the one or more actions are to be applied on at least one of: one or more services, one or more slices, at least a first subset of one or more devices (140) operating in the communications system (100), the one or more actions to be applied comprise updating one or more policies, the one or more actions comprise using one or more Quality of Service, QoS, parameters, the at least of the one or more services and the one or more slices are for data communication to be provided to the first subset, or at least a second subset, of the one or more devices (140). In some embodiments, the information comprises at least one of: an initial state of an environment of the communications system (100), an initial reward corresponding to the initial state of the environment, the state of the environment after an earlier action triggered by the first node (111) on the environment, and the reward for the earlier action triggered by the first node (111) in the environment, based on the state of the environment after the action. In some embodiments the method further comprises at least one of: obtaining (401) a first indication of the QoE aimed to be achieved, and wherein the determining (405) is based on the obtained first indication, sending (402), before the determining (405) of the one or more actions, a second indication to the second node (112), the second indication indicating a subscription to, or a query for, a service provided by the second node (112), the service being to provide at least one of the state and the reward, obtaining (403), before the determining (405) of the one or more actions and responsive to the sent second indication, a third indication from the second node (112), the third indication indicating an identifier of the reinforcement learning procedure, and obtaining (404) one or more fourth indications from the second node (112), the one or more fourth indications indicating the at least one of the state and the reward, and wherein the determining (405) is based on the obtained one or more fourth indications. In some embodiments, for every respective iteration subsequent to an initial iteration the method further comprises: sending (407) a fifth indication to the second node (112) after performance of the determined one or more actions, the fifth indication requesting the respective one or more fourth indications for the respective iteration. In some embodiments, the initiating (406) application of the one or more actions comprises triggering performance of the determined one or more actions, and wherein the first node (111) iterates the obtaining (404) of the one or more fourth indications, the determining (405) of the one or more actions, the initiating (406) application of the one or more actions and the sending (407) of the fifth indication until an obtained reward exceeds a threshold for a number of iterations. In some embodiments, the second indication indicates at least one of: a respective definition of the at least one of the states and the reward, the first subset, or at least a second subset, of the one or more devices (140) the at least one of the state and the reward applies to. In some embodiments, the communications system (100) is a Fifth Generation, 5G, network and at least one of: the first node (111) is a network function, the second node (112) is a Network Data Analytics Function, NWDAF, and the network function is one of a Policy Control Function, PCF, a Session Management Function, SMF, and an Operation, Administration and Maintenance, OAM, node.

A second aspect of the invention relates to a computer-implemented method, performed by a second node (112), for handling a Quality of Experience, QoE, aimed to be achieved, the second node (112) operating in a communications system (100), the method comprising: receiving (501) a second indication from a first node (111) operating in the communications system (100), the second indication indicating a subscription to, or a query for, a service provided by the second node (112), the service being to provide information to be used as input in a reinforcement learning procedure of machine learning to be performed by the first node (111) based on the QoE aimed to be achieved, determining (504) the information responsive to the received second indication, and sending (505) the determined information to the first node (111). In some embodiments, the reinforcement learning procedure is to determine one or more actions to be applied on the at least one of one or more services and the one or more slices, based on the QoE aimed to be achieved, and wherein at least one of: the QoE aimed to be achieved is for the at least one of one or more services, one or more slices, and at least a first subset of one or more devices (140) operating in the communications system (100), the one or more actions to be applied comprise updating one or more policies, the one or more actions comprise using one or more Quality of Service, QoS, parameters, and the at least of the one or more services and the one or more slices are for data communication to be provided to the first subset, or at least a second subset, of the one or more devices (140). In some embodiments, the information comprises at least one of: an initial state of an environment of the communications system (100), an initial reward corresponding to the initial state of the environment, the state of the environment after an earlier action triggered by the first node (111) on the environment, and the reward for the earlier action triggered by the first node (111) in the environment, based on the state of the environment after the action. In some embodiments, the information sent comprises one or more fourth indications, the one or more fourth indications indicating the at least one of the state and the reward. In some embodiments the method further comprises at least one of: sending (502) a third indication to the first node (111), the third indication indicating an identifier of the reinforcement learning procedure, and collecting (503) data from the environment responsive to the received second indication, and wherein the determining (504) of the information is based on the collected data. In some embodiments, for every respective iteration subsequent to an initial iteration, the method further comprises: receiving (506) a fifth indication from the first node (111) after performance of the one or more actions determined by the first node (111) based on the sent information, the fifth indication requesting the respective one or more fourth indications for the respective iteration. In some embodiments, the second node (112) iterates the collecting (503) of the data from the environment, the determining (504) of the information, the sending (505) of the information, and the receiving (506) of the fifth indication until an obtained reward exceeds a threshold for a number of iterations. In some embodiments, the second indication indicates at least one of: a respective definition of the at least one of the state and the reward, the first subset, or at least a second subset, of the one or more devices (140) the at least one of the state and the reward applies to. In some embodiments, the communications system (100) is a Fifth Generation, 5G, network and at least one of: the first node (111) is a network function, the second node (112) is a Network Data Analytics Function, NWDAF, and the network function is one of a Policy Control Function, PCF, a Session Management Function, SMF, and an Operation, Administration and Maintenance, OAM, node.

Certain embodiments disclosed herein may provide one or more of the following technical advantage(s), which may be summarized as follows.

Embodiments in this disclosure may be understood to allow the network operator to enhance 5GC NF operations related to policy control and QoS with the assistance of the NWDAF.

Embodiments in this disclosure may be understood to enable the PCF to select optimally (fast) the QoS parameters for a service to fulfill a defined target QoE. BRIEF DESCRIPTION OF THE DRAWINGS

Examples of embodiments herein are described in more detail with reference to the accompanying drawings, according to the following description.

Figure 1 is a schematic diagram illustrating an example of a 5G Network Architecture, according to existing methods.

Figure 2 is a schematic diagram illustrating RL, according to existing methods.

Figure 3 is a schematic diagram illustrating a non-limiting example of a communications system, according to embodiments herein.

Figure 4 is a flowchart depicting embodiments of a first method in a first node, according to embodiments herein.

Figure 5 is a flowchart depicting embodiments of a first method in a second node, according to embodiments herein.

Figure 6 is a schematic diagram illustrating aspects of a first method performed by the first node and the second node, according to embodiments herein.

Figure 7 is a signalling diagram illustrating a non-limiting example of signalling between nodes in a communications system, according to a first method of embodiments herein.

Figure 8 is a schematic block diagram illustrating two non-limiting examples, a) and b), of a first node, according to embodiments herein.

Figure 9 is a schematic block diagram illustrating two non-limiting examples, a) and b), of a second node, according to embodiments herein.

Figure 10 is a flowchart depicting embodiments of a second method in a first node, according to embodiments herein.

Figure 11 is a flowchart depicting embodiments of a second method in a second node, according to embodiments herein.

Figure 12 is a signalling diagram illustrating a non-limiting example of signalling between nodes in a communications system, according to a second method of embodiments herein.

Figure 13 is an example virtualization environment.

DETAILED DESCRIPTION

The embodiments will now be described more fully hereinafter with reference to the accompanying drawings, in which examples are shown. In this section, embodiments herein are illustrated by exemplary embodiments. It should be noted that these embodiments are not mutually exclusive. Components from one embodiment or example may be tacitly assumed to be present in another embodiment or example and it will be obvious to a person skilled in the art how those components may be used in the other exemplary embodiments. All possible combinations are not described to simplify the description.

Figure 3 depicts four non-limiting examples, in panels “a”, “b”, “c” and “d” respectively, of a communications system 100, in which embodiments herein may be implemented. In some example implementations, such as that depicted in the non-limiting examples of Figure 3 a) and 3 c), the communications system 100 may be a computer network. In other example implementations, such as that depicted in the non-limiting examples of Figure 3 b) and 3 d), the communications system 100 may be implemented in a telecommunications system, sometimes also referred to as a telecommunications network, cellular radio system, cellular network, or wireless communications system. In some examples, the telecommunications system may comprise network nodes which may serve receiving nodes, such as wireless devices. The communications system 100 may for example be a network such as a 5G system, or a newer system supporting similar functionality. The telecommunications system may additionally support other technologies such as, for example, Long-Term Evolution (LTE), e.g., LTE Frequency Division Duplex (FDD), LTE Time Division Duplex (TDD), LTE HalfDuplex Frequency Division Duplex (HD-FDD), or LTE operating in an unlicensed band. The telecommunications system may also support yet other technologies, such as Wideband Code Division Multiple Access (WCDMA), Universal Mobile Telecommunications System Terrestrial Radio Access (UTRA) TDD, Global System for Mobile communications (GSM) network, GSM/Enhanced Data Rate for GSM Evolution (EDGE) Radio Access Network (GERAN) network, Ultra-Mobile Broadband (UMB), EDGE, any combination of Radio Access Technologies (RATs) such as e.g. Multi-Standard Radio (MSR) base stations, multi-RAT base stations etc., any 3rd Generation Partnership Project (3GPP) cellular network, Wireless Local Area Network/s (WLAN) or WiFi network/s, Worldwide Interoperability for Microwave Access (WiMax), IEEE 802.15.4-based low-power short-range networks such as IPv6 over Low-Power Wireless Personal Area Networks (6LowPAN), Zigbee, Z-Wave, Bluetooth Low Energy (BLE), or any cellular network or system. The telecommunications system may for example support a Low Power Wide Area Network (LPWAN). LPWAN technologies may comprise Long Range physical layer protocol (LoRa), Haystack, SigFox, LTE-M, and Narrow-Band loT (NB-loT).

The communications system 100 comprises nodes, whereof a first node 111 and a second node 112 are depicted in Figure 3. The communications system 100 may comprise additional nodes. In particular examples, such as those depicted in Figure 3 c) and 3 d), the communications system 100 may comprise one or more third nodes 113. The one or more third nodes 113 may comprise any one or more of one or more first third nodes 114, one or more second third nodes 115, one or more third third nodes 116 and one or more fourth third nodes 117, e.g., a fourth third node 117. The communications system 100 may comprise additional nodes.

Any of the first node 111 , the second node 112 and the one or more third nodes 113 may be understood, respectively, as a first computer system or server, a second computer system or server and one or more third computer systems or servers. Any of the first node 111 , the second node 112 and the one or more third nodes 113 may be implemented as a standalone server in e.g., a host computer in the cloud 120, as depicted in the non-limiting examples of Figure 3b), for the first node 111 and the second node 112, and of Figure 3d), for the first node 111 , the second node 112 and the one or more third nodes 113. In other examples, any of the first node 111 , the second node 112 and the one or more third nodes 113 may be a distributed node or distributed server, such as a virtual node in the cloud 120, and may perform some of its respective functions locally, e.g., by a client manager, and some of its functions in the cloud 120, by e.g., a server manager. In other examples, any of the first node 111 , the second node 112 and the one or more third nodes 113 may perform its functions entirely on the cloud 120, or partially, in collaboration or collocated with a radio network node. Yet in other examples, any of the first node 111 , the second node 112 and the one or more third nodes 113 may also be implemented as processing resources in a server farm.

Yet in other examples, any of the first node 111 , the second node 112 and the one or more third nodes 113 may also be implemented as virtual network functions, e.g., according to a Network Functions Virtualization (NFV) Architecture.

Any of the first node 111 , the second node 112 and the one or more third nodes 113 may be under the ownership or control of a service provider or may be operated by the service provider, or on behalf of the service provider.

The first node 111 may be understood as a node having a capability to consume a service offered by the second node 112. In a particular non-limiting example, wherein the communications system 100 may be a 5G network, the first node 111 may, in such examples be a NF. In particular examples, the network function may be one of: a PCF, an SMF, and an OAM node. In further particular examples, the first node 111 may be a PCF.

In particular examples of embodiments herein, the second node 112 be an operator managed network analytics logical function. That is, as a node that may have a capability to handle data collection and analysis from different sources in the communications system 100. The second node 112 may interact with different entities for different purposes, such as to data collection provided by, e.g., Access and Mobility Function (AMF), Session Management Function (SMF), Policy Control Function (PCF), Unified Data Management Function (UDM), Application Function (AF), based on event subscription, directly or via a Network Exposure Function (NEF), and Operations And Management (OAM), retrieval of information from data repositories, e.g., Unified Data Repository (UDR) via UDM for subscriber-related information, retrieval of information about NFs, e.g., NRF for NF-related information, and Network Slice Selection Function (NSSF) for slice-related information, on demand provision of analytics to consumers, and storage in an Analytics Data Repository Function, e.g., Analytics Data Repository Function (ADRF). As depicted in Figure 3, a non-limiting example of the second node 112, wherein the communications system 100 may be a 5G network, may be an NWDAF.

Any of the first node and the second node 112 may have a capability to perform machine-implemented learning procedures, which may be also referred to as “machine learning” (ML). In some examples, the first node 111 may have the capability to perform machine-implemented learning procedures, e.g., RL. In other examples, the second node 112 may have the capability to perform machine-implemented learning procedures. Yet in other examples, both of the first node 111 and the second node 112 may have the capability to perform machine-implemented learning procedures.

The one or more third nodes 113 may be understood to be nodes in the communications system 100 the second node 112, and/or the first node 111 may collect data from.

In some examples, the one or more third third nodes 116, in some particular examples, may comprise at least one of an AF, an AMF, an SMF, a UPF, and an OAM. The one or more third nodes 114, 115, 116 comprise a first NF.

The one or more first third nodes 114, in some particular examples, may comprise an SMF. The one or more second third nodes 115, in some particular examples, may comprise at least one of a UPF, a second NF other than the UPF, and a radio network node, such as the radio network node 130 described below. The one or more fourth third nodes 117 may, in some examples, comprise an OAM.

The communications system 100 may in some examples, comprise one or more radio network nodes, such as radio network node 130, depicted in Figure 3 b) and Figure 3 d). The radio network node 130 may be, e.g., comprised in a Radio Access Network of the telecommunications system. That is, the radio network node 130 may be a transmission point such as a radio base station, for example a gNB, an eNB, or any other network node with similar features capable of serving a wireless device, such as a user equipment or a machine type communication device, in the communications system 100. In typical examples, the radio network node 130 may be a base station, such as a gNB or an eNB. In other examples, the radio network node 130 may be a distributed node, such as a virtual node in the cloud 120, and may perform its functions entirely on the cloud 120, or partially, in collaboration with a radio network node.

The telecommunications system may cover a geographical area, which in some embodiments may be divided into cell areas, wherein each cell area may be served by a radio network node 130, although, one radio network node 130 may serve one or several cells. In the examples of Figure 3, the cells are not depicted to simplify the figure. The radio network node 130 may be of different classes, such as, e.g., macro eNodeB, home eNodeB or pico base station, based on transmission power and thereby also cell size. In some examples, the radio network node 130 may serve receiving nodes with serving beams. The radio network node 130 may be directly connected to one or more core networks.

Any of the first node 111 , the second node 112, the one or more third nodes 113 and the radio network node 130, and/or any of the other nodes comprised in the communications system 100 may support one or several communication technologies, and its name may depend on the technology and terminology used.

One or more devices 140 may be comprised in the telecommunication network, which are depicted with a single device 140 in the non-limiting examples of Figure 3 b) and Figure 3 d). The one or more devices 140 comprised in the communications system 100 may be a wireless communication device such as a 5G UE, or a UE, which may also be known as e.g., mobile terminal, wireless terminal and/or mobile station, a Customer Premises Equipment (CPE) a mobile telephone, cellular telephone, or laptop with wireless capability, just to mention some further examples. The one or more devices 140 comprised in the telecommunications system may be, for example, portable, pocket-storable, hand-held, computer-comprised, or a vehicle-mounted mobile device, enabled to communicate voice and/or data, via the RAN, with another entity, such as a server, a laptop, a Personal Digital Assistant (PDA), or a tablet, Machine-to-Machine (M2M) device, device equipped with a wireless interface, such as a printer or a file storage device, modem, sensor, loT device, or any other radio network unit capable of communicating over a radio link in a communications system. The one or more devices 140 comprised in the telecommunications system may be enabled to communicate wirelessly in the telecommunications system. The communication may be performed e.g., via a RAN, and possibly the one or more core networks, which may be comprised within the telecommunications system.

It may be understood that the telecommunications network may comprise additional radio network nodes 130 and/or additional devices 140 than those depicted in Figure 3.

The first node 111 may be configured to communicate within the communications system 100 with the second node 112 over a first link 151 , e.g., a radio link, or a wired link. The first node 111 may be configured to communicate within the communications system 100 with the radio network node 130 over a second link 152, e.g., a radio link, or a wired link. The second node 112 may be configured to communicate within the communications system 100 with the radio network node 130 over a third link 153, e.g., a radio link, or a wired link. The radio network node 130 may be configured to communicate within the communications system 100 with the one or more device 140s over a respective fourth link 154, e.g., a radio link. The second node 112 may be configured to communicate within the communications system 100 with the one or more third nodes 113 over a respective fifth link 155, e.g., a radio link, or a wired link. The one or more third nodes 113 may be configured to communicate within the communications system 100 with the radio network node 130 over a respective sixth link 156, e.g., a radio link.

Any of the first link 151 , the second link 152, the third link 153, the fourth link 154, the respective fifth link 155, and the respective sixth link 156 may be a direct link or may be comprised of a plurality of individual links, wherein it may go via one or more computer systems or one or more core networks in the communications system 100, which are not depicted in Figure 3, or it may go via an optional intermediate network. The intermediate network may be one of, or a combination of more than one of, a public, private or hosted network; the intermediate network, if any, may be a backbone network or the Internet; in particular, the intermediate network may comprise two or more sub-networks, which is not shown in Figure 3.

In general, the usage of “first”, “second”, “third”, “fourth”, “fifth”, and/or “sixth” herein may be understood to be an arbitrary way to denote different elements or entities, and may be understood to not confer a cumulative or chronological character to the nouns they modify.

Some of the embodiments contemplated herein will now be described more fully with reference to the accompanying drawings. Other embodiments, however, are contained within the scope of the subject matter disclosed herein, the disclosed subject matter should not be construed as limited to only the embodiments set forth herein; rather, these embodiments are provided by way of example to convey the scope of the subject matter to those skilled in the art.

Some embodiments herein may relate to a first method performed by the first node 111. Some embodiments herein may relate to a first method performed by the second node 112.

Embodiments of a first computer-implemented method, performed by the first node 111 , will now be described with reference to the flowchart depicted in Figure 4. The first method may be understood to be for handling a Quality of Experience (QoE) aimed to be achieved. The first node 111 may operate in the communications system 100.

In some embodiments, the communications system 100 may be a 5G network.

In some embodiments, the first node 111 may be a network function.

In some embodiments, the network function may be one of a PCF, an SMF, and an QAM node.

Several embodiments are comprised herein. In some embodiments, all the actions may be performed. In some embodiments, one or more of the actions may be performed. It should be noted that the examples herein are not mutually exclusive. One or more embodiments may be combined, where applicable. All possible combinations are not described to simplify the description. Components from one embodiment may be tacitly assumed to be present in another embodiment and it will be obvious to a person skilled in the art how those components may be used in the other exemplary embodiments. A non-limiting example of the first method performed by the first node 111 is depicted in Figure 4. In a particular non-limiting example of the first method, Action 405 and Action 407 may be performed.

In Figure 4, optional actions are represented with dashed lines.

Action 401

In this Action 401 , the first node 111 may obtain a first indication. The first indication may be of the QoE aimed to be achieved.

The QoE aimed to be achieved may be for at least one of: i. one or more services, and ii. one or more slices.

The at least of the one or more services and the one or more slices may be for data communication to be provided to a first subset of the one or more devices 140, or at least a second subset of the one or more devices 140.

Action 402

In this Action 402, the first node 111 may send a second indication.

The first node 111 may send the second indication to the second node 112.

In some embodiments, the second node 112 may be am NWDAF.

The second indication indicate a subscription to, or a query for, a service provided by the second node 112. The service may be to provide at least one of a state and a reward.

The state may be of an environment of the communications system 100. The state may be of the environment of the communications system 100 after an action, e.g., an earlier action, triggered by the first node 111 on the environment.

The reward may be for the action, e.g., for the earlier action triggered by the first node 111 in the environment. The reward may be based on the state of the environment after the action.

The state and the reward may be to be used by the first node 111 in a reinforcement learning procedure. The reinforcement learning procedure may be of machine learning. The reinforcement learning procedure of machine learning may be to be performed by the first node 111 , e.g., based on information received from the second node 112.

In a first iteration of the reinforcement learning procedure of machine learning, the state may be an initial state of the environment of the communications system 100. Initial may be understood to mean prior to the first node 111 taking any action as part of the reinforcement learning procedure of machine learning.

In the first iteration of the reinforcement learning procedure of machine learning, the reward may be an initial reward corresponding to the initial state of the environment.

In some embodiments, the second indication may indicate at least one of: a. a respective definition of the at least one of the state and the reward, and b. the first subset, or at least the second subset, of the one or more devices 140 the at least one of the state and the reward may apply to.

Action 403

In this Action 403, the first node 111 may obtain a third indication. The first node 111 may obtain the third indication from the second node 112. The third indication may be a response to the sent second indication. That is, the obtaining of the third indication may be responsive to the sent second indication.

The third indication may indicate an identifier of the reinforcement learning procedure.

The reinforcement learning procedure may be of machine learning. The reinforcement learning procedure of machine learning may be to be performed by the first node 111 , e.g., based on information received from the second node 112.

Action 404

In some embodiments, in this Action 404, the first node 111 may obtain information.

The obtaining in this Action 404 of the information may be from the second node 112.

The information may comprise one or fourth indications. In some particular embodiments, in this Action 404, the first node 111 may obtain one or more fourth indications.

The obtaining in this Action 404 of the one or more fourth indications may be from the second node 112.

The one or more fourth indications may indicate the at least one of the state and the reward.

As stated earlier, in the first iteration of the reinforcement learning procedure of machine learning, the state may be an initial state of the environment of the communications system 100 and the reward may be the initial reward corresponding to the initial state of the environment.

Action 405

In this Action 405, the first node 111 may determine, using the reinforcement learning procedure of machine learning, one or more actions. The determined one or more actions may be to be applied based on the QoE aimed to be achieved.

The QoE aimed to be achieved may be for the at least one of the one or more services and the one or more slices.

The determining in this Action 405 may be based on information received from the second node 112 operating in the communications system 100, e.g., the information obtained in Action 404.

In some embodiments, the determining in this Action 405 may be based on the obtained one or more fourth indications. The information received form the second node 112 may comprise the one or more fourth indications.

In some embodiments, the information may comprise at least one of: i.the initial state of an environment of the communications system 100, ii.the initial reward corresponding to the initial state of the environment, iii.the state of the environment after an earlier action triggered by the first node 111 on the environment, and iv.the reward for the earlier action triggered by the first node 111 in the environment, based on the state of the environment after the action.

In some embodiments, at least one of the following may apply: a. the one or more actions may be to be applied on at least one of: i. the one or more services, ii. the one or more slices,

Hi. the at least the first subset of the one or more devices 140 operating in the communications system 100, and iv. at least a second subset of the one or more devices 140, b. the one or more actions to be applied may comprise updating one or more policies, c. the one or more actions may comprise using one or more Quality of Service, (QoS) parameters, and d. the at least of the one or more services and the one or more slices may be for data communication to be provided to the first subset, or at least the second subset, of the one or more devices 140.

In some examples, the one or more actions may comprise configuring a QoS enforcement action space based on the QoE aimed to be achieved.

In some embodiments, the determining in this Action 405 may be based on the obtained first indication.

In some embodiments, the sending in Action 402 may be performed before the determining in this Action 405 of the one or more actions.

In some embodiments, the obtaining in Action 403 may be performed before the determining in this Action 405 of the one or more actions.

In some embodiments, the obtaining in Action 403 may be performed before the determining in this Action 405 of the one or more actions and responsive to the sent second indication.

Action 406

In this Action 406, the first node 111 may initiate application of the determined one or more actions.

Initiating application may be understood as triggering, enabling, facilitating, the application by another node, or starting the application itself.

In some embodiments, the initiating, in this Action 406, application of the one or more actions may comprise triggering performance of the determined one or more actions, e.g., on the at least of: the one or more services, the one or more slices, the one or more devices 140, the first subset of the one or more devices 140, and at least a second subset of the one or more devices 140.

Action 407

In this Action 407, the first node 111 send a fifth indication.

The first node 111 may send the fifth indication to the second node 112.

The sending in this Action 407 may be after performance of the determined one or more actions.

The fifth indication may request the respective one or more fourth indications for the respective iteration.

This sending in this Action 407 may be performed for every respective iteration subsequent to the initial iteration.

In some embodiments, the initiating, in Action 406, application of the one or more actions may comprise triggering performance of the determined one or more actions, and the first node 111 may iterate the obtaining of Action 404 of the one or more fourth indications, the determining of Action 405 of the one or more actions, the initiating application of Action 406 of the one or more actions and the sending of this Action 407 of the fifth indication until an obtained reward may exceed a threshold for a number of iterations. That is, until the first node 111 may have learned to determine the best one or more actions consistently over the number of iterations.

Embodiments of a first computer-implemented method performed by the second node 112, will now be described with reference to the flowchart depicted in Figure 5. The method may be understood to be for handling the QoE aimed to be achieved. The second node 112 may operate in the communications system 100.

The first method may comprise the following actions. Several embodiments are comprised herein. In some embodiments, the first method may comprise all the actions. In other embodiments, the first method may comprise one or more actions. One or more embodiments may be combined, where applicable. All possible combinations are not described to simplify the description. It should be noted that the examples herein are not mutually exclusive. Components from one example may be tacitly assumed to be present in another example and it will be obvious to a person skilled in the art how those components may be used in the other examples. In Figure 5, optional actions are depicted with dashed lines. In particular embodiments, Action 501 , 504 and 505 may be performed. In other particular embodiments, Action 505 may be performed. The detailed description of some of the following corresponds to the same references provided above, in relation to the actions described for the first node 111 for the first method and will thus not be repeated here to simplify the description. For example, in some examples, the communications system 100 may be a 5G network. In some embodiments, the first node 111 may be a network function. In some embodiments, the network function may be one of a PCF, an SMF, and an OAM node. In some embodiments, the second node 112 may be an NWDAF.

Action 501

In this Action 501 , the second node 112 may receive the second indication.

The receiving in this Action 501 may be from the first node 111 operating in the communications system 100.

The second indication may indicate the subscription to, or the query for, the service provided by the second node 112.

The service may be to provide information. The information may be to be used as input in the reinforcement learning procedure of machine learning to be performed by the first node 111 based on the QoE aimed to be achieved.

In some embodiments, the reinforcement learning procedure may be to determine the one or more actions to be applied on the at least one of one or more services and the one or more slices, based on the QoE aimed to be achieved. In some of such embodiments, at least one of the following may apply: a. the QoE aimed to be achieved may be for the at least one of: the one or more services, the one or more slices, at least the first subset of one or more devices 140 operating in the communications system 100, b. the one or more actions to be applied may comprise updating the one or more policies, c. the one or more actions may comprise using the one or more QoS parameters, and d. the at least of the one or more services and the one or more slices may be for data communication to be provided to the first subset, or at least the second subset, of the one or more devices 140.

The second indication may indicate at least one of: a. the respective definition of the at least one of the state and the reward, b. the first subset, or at least the second subset, of the one or more devices 140 the at least one of the state and the reward may apply to.

Action 502 In this Action 502, the second node 112 may send the third indication. The sending in this Action 502 may be to the first node 111.

The third indication may indicate the identifier of the reinforcement learning procedure.

Action 503

In this Action 503, the second node 112 may collect data from the environment responsive to the received second indication.

Action 504

In this Action 504, the second node 112 may determine the information. The determining in this Action 504 may be responsive to the received second indication.

The determining 504 of the information may be based on the collected data in Action 503.

In some embodiments, the information may comprise the one or more fourth indications. The information may comprise at least one of: i. the initial state of the environment of the communications system 100, ii. the initial reward corresponding to the initial state of the environment, Hi. the state of the environment after the earlier action triggered by the first node 111 on the environment, and iv. the reward for the earlier action triggered by the first node 111 in the environment, based on the state of the environment after the action.

Action 505

In this Action 505, the second node 112 may send the determined information. The second node 112 may send the determined information to the first node 111.

In some embodiments, the information sent may comprise the one or more fourth indications.

Action 506

In some embodiments, for every respective iteration subsequent to an initial iteration, the first method may further comprise this Action 506.

In this Action 506, the second node 112 may receive the fifth indication. The receiving in this Action 506 may be from the first node 111.

The receiving in this Action 506 may be after performance of the one or more actions determined by the first node 111 based on the sent information. The fifth indication may request the respective one or more fourth indications for the respective iteration.

In some embodiments, the second node 112 may iterate the collecting of the data from the environment of Action 503, the determining of the information of Action 504, the sending of the information of Action505, and the receiving of the fifth indication of this Action 506 until the obtained reward may exceed the threshold for the number of iterations.

Some embodiments herein will now be further described with some non-limiting examples, which may be combined with the embodiments just described.

In the following description, any reference to a/the PCF, simply a/the “PCF”, or a/the “Agent (PCF)” or simply a/the “Agent” may be understood to equally refer the first node 111 ; any reference to a/the NWDAF, and/or a/the “Interpreter (NWDAF)” or simply “Interpreter”, may be understood to equally refer to the second node 112; any reference to a/the “network” or simply a/the “NW may be understood to equally refer to the communications system 100; any reference to a/the “UE” and/or “UEs” may be understood to equally refer to any of the one or more devices 140, or a subset, as indicated; any reference to a/the “target UEs” may be understood to equally refer to at least the second subset of the one or more devices 140, the first subset, or both, or in some examples, to the one or more devices 140.

Figure 6 is a schematic diagram illustrating the architecture of embodiments herein and its relationship with Reinforcement Learning at a high level.

According to the embodiments of the first method, the RL technique may be used to learn the actions the Agent (PCF) may take to reach a target (wanted) QoE for a service, or group of services. The same solution may be used for QoE of a NW slice. The target QoE may be provided to the PCF via configuration based e.g., on SLA.

The PCF as Agent may take different actions to maximize the QoE, e.g., to change the QoS parameters, e.g., 5QI, GBR, MBR, .. applied to a service. The QoS Enforcement action space may be the set of different QoS Enforcement actions that may be chosen, e.g., the so- called action space in Reinforcement Learning. Specifically, it may be composed of a set of QoS parameters, each including max, min and step values, e.g., for throttling as a QoS parameter: throttling max set to 1 Mbps, throttling min set to 64 kbps and step set to 64 kbps.

The NWDAF as Interpreter may need to provide the reward and state upon an action may be applied by the PCF. In this context, state may be understood to refer to the NW state and reward may be understood to refer to a measure in the increment/decrement of the QoE: - state, e.g., NW state: the NWDAF may be able today to collect big amounts of data from the 5GC NFs and OAM that may be used to provide a measure of the state of the NW. As an example, the combination of some existing analytics may be also used for such purpose e.g., slice load, NW performance, OSE, User Data Congestion, etc.

- reward: The NWDAF may calculate the reward of the action applied by the PCF by e.g., comparing the value of OSE analytic before and after the action, note the actions may not be taking an immediate effect in the value of OSE analytic, e.g., reward (t)=OSE(t) - 0SE(t-1).

The PCF may need to implement an algorithm for learning with the assistance of the NWDAF to provide the state and reward upon every new action may be applied in every iteration of the cycle.

Exploration vs. Exploitation: In reinforcement learning, the agent may need to balance exploration, that is, trying new things to discover better rewards, with exploitation, that is, using known information to maximize rewards. This may be understood to be a trade-off in RL.

Note: On exploration and exploitation phases of the RL agent, the exploration phase may take place in a controlled environment, e.g., a laboratory, and not in a production environment. Another possibility may be to use existing production data to pre-train the RL agent. The agent may be deployed in the production environment already trained to avoid an initial extensive exploration phase in production, which may not be desirable.

Embodiments herein of the first method may be define new service(s) in NWDAF to provide the state and reward to the PCF. The services and methods may be understood to be described generically to allow the application of this mechanism also to other cases where RL may be needed, just by adding new definition of states and rewards. For that purpose, the NF consumer, e.g., PCF for this NWDAF-assisted policy control and QoS enhancement, may need to identify which state and reward, e.g., from a list of predefined ones, may be used for the RL procedure.

In addition, in order to apply the RL process, e.g., the adjustment of QoS to reach a wanted QoE, just to a set a of UEs, the NWDAF new services may allow to set the targetUEs for the computation of the reward.

Figure 7 is a sequence diagram showing a non-limiting example of the proposed mechanism for NWDAF-assisted policy control according to the first method described herein. Steps are detailed below:

1 . PCF may start the process by invoking the new Nnwdaf_RLInterpeter service providing the definition of state(NW state) and reward, e.g., based on QoE for an appld, and target UEs. 2. NWDAF may answer providing the identifier of the RL process. This identifier may be used to correlate all the iterations in the cycle.

3. If not available, the NWDAF may start data collection in order to calculate the state, e.g., the state may be based on a combination of some existing analytics as slice load, NW performance, OSE, User Data Congestion, and reward, based on OSE analytic.

4. NWDAF may run analytics based on collected data and derive the state and reward.

5. NWDAF may notify the PCF by invoking Nnwdaf_RLInterpreter_NotifyState providing the current state of the NW and the initial value of the reward. Since there is no previous value of OSE, the NWDAF may provide in this first iteration the initial value of OSE.

6. PCF may configure the QoS enforcement action space based on the target QoE, may select one action and start applying such action to the target UEs, e.g., by updating the PCC rules.

7. The NW may enforce the QoS action.

8. The PCF may ask NWDAF to get a new value of reward and state.

9. The may NWDAF start collecting data to get the new value of OSE and NW state. Note the NWDAF may need to collect data from the NW after the QoS action has been applied by the NW. This may be understood to be to compute the new value of OSE that may take into account such actions.

10. NWDAF may run analytics based on collected data and may derive the state and reward.

11 . NWDAF may trigger Nnwdaf_RLInterpreter_NotifyState to provide new state and reward to the PCF.

12. PCF may trigger RL agent learning process based on the state and reward received. RL agent may learn the effect of the past QoS enforcement action decisions for a given NW state. The learning phase may be understood to basically mean that the RL agent may learn how to map the states to the actions in an optimal way, usually by trying to maximize the reward of the actions. The RL agent may decide the QoS enforcement actions based on the target QoE, the set of possible actions, e.g., the QoS enforcement action space, whether it may be on exploration or exploitation mode, the learnt information, etc.

The PCF may select one action and start applying such action to the target UEs, e.g., by updating the PCC rules.

13. The NW may enforce the QoS action.

The steps 8-13 may be repeated sequentially.

Finally, although not shown in the sequence diagram of Figure 7 above, in an alternative embodiment, QAM may use the proposed NWDAF, e.g., RLInterpreter, service. In this case, the OAM action may be to configure the, e.g., optimized, values of the QoS parameters in PCF, for PCF to apply them.

Embodiments herein may have a Technical Specification Impact to e.g., 3GPP TS 23.288. This may refer to:

• a new NWDAF service for NWDAF to act as RL Interpreter, to provide state and reward to assist in RL procedures.

• NWDAF support to the calculation of state=NW state e.g., based on a combination of existing analytics as slice load, NW performance, OSE, User Data Congestion.

• NWDAF support to the calculation of reward based on comparing the value of OSE analytic between different iterations of the cycle, e.g. Reward(t)=OSE(t) - 0SE(t-1), where the data collected for every OSE(t) may need to be within the timeslot [t,t- 1 ]

Embodiments herein may have a Technical Specification Impact to e.g., 3GPP TS 23.501 . This may refer to PCF, SMF or OAM acting as RL Agent, may implement an algorithm for learning with the assistance of the NWDAF to provide the state and reward upon every new action may be applied in every iteration of the RL cycle.

The embodiments described above in relation to Figures 4-7 may be understood to correspond to a first group of embodiment corresponding to the first method performed by the first node 111 and/or the second node 112.

As a summarized overview of the foregoing, embodiments herein in a first group of embodiments may be understood to provide a mechanism which may allow the network operator to enhance 5GC NF operations related to policy control and QoS with the assistance of the NWDAF, based on PCF or OAM acting as RL Agent and NWDAF acting as Interpreter.

Embodiments herein in the first group of embodiments may be understood to allow the network operator to enhance 5GC NF operations related to policy control and QoS with the assistance of the NWDAF.

PCF may select optimally (fast) the QoS parameters for a service to fulfill a defined target QoE.

Figure 8 depicts an example of the arrangement that the first node 111 may comprise to perform embodiments herein. The first node 111 may be understood to be for handling the QoE aimed to be achieved. The first node 111 may be configured to operate in the communications system 100. Several embodiments are comprised herein. It should be noted that the examples herein are not mutually exclusive. One or more embodiments may be combined, where applicable. All possible combinations are not described to simplify the description. Components from one embodiment may be tacitly assumed to be present in another embodiment and it will be obvious to a person skilled in the art how those components may be used in the other exemplary embodiments. The detailed description of some of the following corresponds to the same references provided above, in relation to the actions described for the first node 111 and will thus not be repeated here. For example, in some examples, the communications system 100 may be configured to be a 5G network. In some embodiments, the first node 111 may be configured to be a network function. In some embodiments, the network function may be configured to be one of a PCF, an SMF, and an CAM node. In some embodiments, the second node 112 may be configured to be an NWDAF.

The embodiments herein in the first node 111 may be implemented through one or more processors, such as a processing circuitry 801 in the first node 111 depicted in Figure 8, together with computer program code for performing the functions and actions of the embodiments herein. A processor, as used herein, may be understood to be a hardware component. The program code mentioned above may also be provided as a computer program product, for instance in the form of a data carrier carrying computer program code for performing the embodiments herein when being loaded into the first node 111. One such carrier may be in the form of a CD ROM disc. It is however feasible with other data carriers such as a memory stick. The computer program code may furthermore be provided as pure program code on a server and downloaded to the first node 111.

The first node 111 may further comprise a memory 802 comprising one or more memory units. The memory 802 is arranged to be used to store obtained information, store data, configurations, schedulings, and applications etc. to perform the methods herein when being executed in the first node 111.

In some embodiments, the first node 111 may receive information from, e.g., the second node 112, the third node 113, any of the one or more third nodes 113, such as the one or more first third nodes 114, one or more second third nodes 115, one or more third third nodes 116 and one or more fourth third nodes 117, e.g., a fourth third node 117, the radio network node 140, the one or more devices 130, such as the first subset and the second subset, another node or user equipment, and/or another structure in the communications system 100, through a receiving port 803. In some embodiments, the receiving port 803 may be, for example, connected to one or more antennas in first node 111. In other embodiments, the first node 111 may receive information from another structure in the communications system 100 through the receiving port 803. Since the receiving port 803 may be in communication with the processing circuitry 801 , the receiving port 803 may then send the received information to the processing circuitry 801 . The receiving port 803 may also be configured to receive other information.

The processing circuitry 801 in the first node 111 may be further configured to transmit or send information to e.g., the second node 112, the third node 113, any of the one or more third nodes 113, such as the one or more first third nodes 114, one or more second third nodes 115, one or more third third nodes 116 and one or more fourth third nodes 117, e.g., a fourth third node 117, the radio network node 140, the one or more devices 130, such as the first subset and the second subset, another node or user equipment, and/or another structure in the communications system 100, through a sending port 804, which may be in communication with the processing circuitry 801 , and the memory 802.

Those skilled in the art will also appreciate that the units comprised within the first node 111 described above as being configured to perform different actions, may refer to a combination of analog and digital circuits, and/or one or more processors configured with software and/or firmware, e.g., stored in memory, that, when executed by the one or more processors such as the processing circuitry 801 , perform as described herein. One or more of these processors, as well as the other digital hardware, may be included in a single Application-Specific Integrated Circuit (ASIC), or several processors and various digital hardware may be distributed among several separate components, whether individually packaged or assembled into a System-on-a-Chip (SoC).

Thus, the methods according to the embodiments described herein for the first node 111 may be respectively implemented by means of a computer program 805 product, comprising instructions, i.e., software code portions, which, when executed on at least one processing circuitry 801 , cause the at least one processing circuitry 801 to carry out the actions described herein, as performed by the first node 111. The computer program 805 product may be stored on a computer-readable storage medium 806. The computer-readable storage medium 806, having stored thereon the computer program 805, may comprise instructions which, when executed on at least one processing circuitry 801 , cause the at least one processing circuitry 801 to carry out the actions described herein, as performed by the first node 111. In some embodiments, the computer-readable storage medium 806 may be a non-transitory computer- readable storage medium, such as a CD ROM disc, or a memory stick. In other embodiments, the computer program 805 product may be stored on a carrier containing the computer program 805 just described, wherein the carrier is one of an electronic signal, optical signal, radio signal, or the computer-readable storage medium 806, as described above.

The first node 111 may comprise a communication interface configured to facilitate, or an interface unit to facilitate, communications between the first node 111 and other nodes or devices, e.g., the second node 112, the third node 113, any of the one or more third nodes 113, such as the one or more first third nodes 114, one or more second third nodes 115, one or more third third nodes 116 and one or more fourth third nodes 117, e.g., a fourth third node 117, the radio network node 140, the one or more devices 130, such as the first subset and the second subset, another node or user equipment, and/or another structure in the communications system 100. The interface may, for example, include a transceiver configured to transmit and receive radio signals over an air interface in accordance with a suitable standard.

In other embodiments, the first node 111 may comprise a radio circuitry 807, which may comprise e.g., the receiving port 803 and the sending port 804.

The radio circuitry 807 may be configured to set up and maintain at least a wireless connection with the second node 112, the third node 113, any of the one or more third nodes 113, such as the one or more first third nodes 114, one or more second third nodes 115, one or more third third nodes 116 and one or more fourth third nodes 117, e.g., a fourth third node 117, the radio network node 140, the one or more devices 130, such as the first subset and the second subset, another node or user equipment, and/or another structure in the communications system 100. Circuitry may be understood herein as a hardware component.

Hence, embodiments herein also relate to the first node 111 operative to operate in the communications system 100. The first node 111 may comprise the processing circuitry 801 and the memory 802, said memory 802 containing instructions executable by said processing circuitry 801 , whereby the first node 111 is further operative to perform the actions described herein in relation to the first node 111.

Figure 9 depicts an example of the arrangement that the second node 112 may comprise to perform the embodiments herein. The second node 112 may be understood to be for handling the QoE aimed to be achieved. The second node 112 may be configured to operate in the communications system 100.

Several embodiments are comprised herein. It should be noted that the examples herein are not mutually exclusive. One or more embodiments may be combined, where applicable. All possible combinations are not described to simplify the description. Components from one embodiment may be tacitly assumed to be present in another embodiment and it will be obvious to a person skilled in the art how those components may be used in the other exemplary embodiments. The detailed description of some of the following corresponds to the same references provided above, in relation to the actions described for the second node 112 and will thus not be repeated here. For example, in some examples, the communications system 100 may be configured to be a 5G network. In some embodiments, the first node 111 may be configured to be a network function. In some embodiments, the network function may be configured to be one of a PCF, an SMF, and an CAM node. In some embodiments, the second node 112 may be configured to be an NWDAF. The embodiments herein in the second node 112 may be implemented through one or more processors, such as a processing circuitry 901 in the second node 112 depicted in Figure 9, together with computer program code for performing the functions and actions of the embodiments herein. A processor, as used herein, may be understood to be a hardware component. The program code mentioned above may also be provided as a computer program product, for instance in the form of a data carrier carrying computer program code for performing the embodiments herein when being loaded into the second node 112. One such carrier may be in the form of a CD ROM disc. It is however feasible with other data carriers such as a memory stick. The computer program code may furthermore be provided as pure program code on a server and downloaded to the second node 112.

The second node 112 may further comprise a memory 902 comprising one or more memory units. The memory 902 is arranged to be used to store obtained information, store data, configurations, schedulings, and applications etc. to perform the methods herein when being executed in the second node 112.

In some embodiments, the second node 112 may receive information from, e.g., the first node 111 , the third node 113, any of the one or more third nodes 113, such as the one or more first third nodes 114, one or more second third nodes 115, one or more third third nodes 116 and one or more fourth third nodes 117, e.g., a fourth third node 117, the radio network node 140, the one or more devices 130, such as the first subset and the second subset, another node or user equipment, and/or another structure in the communications system 100, through a receiving port 903. In some embodiments, the receiving port 903 may be, for example, connected to one or more antennas in second node 112. In other embodiments, the second node 112 may receive information from another structure in the communications system 100 through the receiving port 903. Since the receiving port 903 may be in communication with the processing circuitry 901 , the receiving port 903 may then send the received information to the processing circuitry 901 . The receiving port 903 may also be configured to receive other information.

The processing circuitry 901 in the second node 112 may be further configured to transmit or send information to e.g., the first node 111 , the third node 113, any of the one or more third nodes 113, such as the one or more first third nodes 114, one or more second third nodes 115, one or more third third nodes 116 and one or more fourth third nodes 117, e.g., a fourth third node 117, the radio network node 140, the one or more devices 130, such as the first subset and the second subset, another node or user equipment, and/or another structure in the communications system 100, through a sending port 904, which may be in communication with the processing circuitry 901 , and the memory 902.

Those skilled in the art will also appreciate that the units comprised within the second node 112 described above as being configured to perform different actions, may refer to a combination of analog and digital circuits, and/or one or more processors configured with software and/or firmware, e.g., stored in memory, that, when executed by the one or more processors such as the processing circuitry 901 , perform as described above. One or more of these processors, as well as the other digital hardware, may be included in a single Application-Specific Integrated Circuit (ASIC), or several processors and various digital hardware may be distributed among several separate components, whether individually packaged or assembled into a System-on-a-Chip (SoC).

Thus, the methods according to the embodiments described herein for the second node 112 may be respectively implemented by means of a computer program 905 product, comprising instructions, i.e., software code portions, which, when executed on at least one processing circuitry 901 , cause the at least one processing circuitry 901 to carry out the actions described herein, as performed by the second node 112. The computer program 905 product may be stored on a computer-readable storage medium 906. The computer- readable storage medium 906, having stored thereon the computer program 905, may comprise instructions which, when executed on at least one processing circuitry 901 , cause the at least one processing circuitry 901 to carry out the actions described herein, as performed by the second node 112. In some embodiments, the computer-readable storage medium 906 may be a non-transitory computer-readable storage medium, such as a CD ROM disc, or a memory stick. In other embodiments, the computer program 905 product may be stored on a carrier containing the computer program 905 just described, wherein the carrier is one of an electronic signal, optical signal, radio signal, or the computer-readable storage medium 906, as described above.

The second node 112 may comprise a communication interface configured to facilitate, or an interface unit to facilitate, communications between the second node 112 and other nodes or devices, e.g., the first node 111 , the third node 113, any of the one or more third nodes 113, such as the one or more first third nodes 114, one or more second third nodes 115, one or more third third nodes 116 and one or more fourth third nodes 117, e.g., a fourth third node 117, the radio network node 140, the one or more devices 130, such as the first subset and the second subset, another node or user equipment, and/or another structure in the communications system 100. The interface may, for example, include a transceiver configured to transmit and receive radio signals over an air interface in accordance with a suitable standard.

In other embodiments, the second node 112 may comprise a radio circuitry 907, which may comprise e.g., the receiving port 903 and the sending port 904.

The radio circuitry 907 may be configured to set up and maintain at least a wireless connection with the first node 111 , the third node 113, any of the one or more third nodes 113, such as the one or more first third nodes 114, one or more second third nodes 115, one or more third third nodes 116 and one or more fourth third nodes 117, e.g., a fourth third node 117, the radio network node 140, the one or more devices 130, such as the first subset and the second subset, another node or user equipment, and/or another structure in the communications system 100. Circuitry may be understood herein as a hardware component.

Hence, embodiments herein also relate to the second node 112, operative to operate in the communications system 100. The second node 112 may comprise the processing circuitry 901 and the memory 902, said memory 902 containing instructions executable by said processing circuitry 901 , whereby the second node 112 is further operative to perform the actions described herein in relation to the second node 112.

The embodiments described above in relation to Figures 4-7 may be understood to correspond to a first group of embodiment corresponding to a first method performed by the first node 111 and/or the second node 112.

In the first group of embodiments, the arrangement that the first node 111 may comprise may be to perform the method described in Figure 4, Figure 6 and/or Figure 7.

In the first group of embodiments, the arrangement that the first node 111 may comprise may be further operative to perform the actions described herein in relation to the first node 111 , e.g., in Figure 4, Figure 6 and/or Figure 7.

The first node 111 may be configured to perform any of the Actions described in relation to Figure 4, Figure 6 and/or Figure 7, e.g., by means of the processing circuitry 801 within the first node 111 , configured to perform any of such actions.

Also, in some embodiments, different units comprised within the first node 111 may be configured to perform the different actions described herein, implemented as one or more applications running on one or more processors such as the processing circuitry 801.

In the first group of embodiments, the arrangement that the second node 112 may comprise may be to perform the method described in Figure 5, Figure 6 and/or Figure 7.

In the first group of embodiments, the arrangement that the second node 112 may comprise may be further operative to perform the actions described herein in relation to the second node 112, e.g., in Figure 5, Figure 6 and/or Figure 7.

The second node 112 may be configured to perform any of the Actions described in relation to Figure 5, Figure 6 and/or Figure 7, e.g., by means of the processing circuitry 901 within the second node 112, configured to perform any of such actions.

Also, in some embodiments, different units comprised within the second node 112 may be configured to perform different actions described herein, implemented as one or more applications running on one or more processors such as the processing circuitry 901. Second group of embodiments

Some embodiments corresponding to a second group of embodiments herein will now be further described with some non-limiting examples, which may be combined with the embodiments just described.

In a second group of embodiments, the arrangement that the first node 111 may comprise may be to perform the method described in Figure 10 and/or Figure 12.

In the second group of embodiments, the arrangement that the first node 111 may comprise may be further operative to perform the actions described herein in relation to the first node 111 , e.g., in Figure 10 and/or Figure 12.

The first node 111 may be configured to perform any of the Actions described in relation to Figure 10 and/or Figure 12, e.g., by means of the processing circuitry 801 within the first node 111 , configured to perform any of such actions.

In the first group of embodiments, the arrangement that the second node 112 may comprise may be to perform the method described in Figure 11 and/or Figure 12.

In the first group of embodiments, the arrangement that the second node 112 may comprise may be further operative to perform the actions described herein in relation to the second node 112, e.g., in Figure 11 and/or Figure 12.

The second node 112 may be configured to perform any of the Actions described in relation to Figure 11 and/or Figure 12, e.g., by means of the processing circuitry 901 within the second node 112, configured to perform any of such actions.

Also, in some embodiments, different units comprised within the second node 112 may be configured to perform different actions described herein, implemented as one or more applications running on one or more processors such as the processing circuitry 901.

As part of the development of embodiments herein in the second group of embodiments, one or more problems with the existing technology will first be identified and discussed.

It has been agreed to study NWDAF-assisted policy control, specifically: a) identification of use cases where policy control and QoS may be further enhanced with assistance from NWDAF, b) whether and how to introduce new 5GC functionality e.g. of the NWDAF or PCF, to enhance the policy control and QoS, considering operator’s policies, c) whether and what additional input information may be needed by the NWDAF to provide assistance to policy control and QoS, and how to gather it, d) whether and what enhanced output information on top of already provided the NWDAF may provide to assist with policy control and QoS enhancements, and e) whether and how to evaluate the quality of NWDAF assistance to policy control and QoS.

In addition, how to obtain in an optimal way the most appropriate values of the QoS parameters for a service(s) to obtain a wanted QoE for these services for the different states of the NW.

Embodiments herein in the second group of embodiments may be understood to address the problems identified with the existing methods for the second group of embodiments.

In the following description, any reference to a/the PCF, simply a/the “PCF” may be understood to equally refer the first node 111 ; any reference to a/the NWDAF may be understood to equally refer to the second node 112; any reference to a/the “network” or simply a/the “NW” may be understood to equally refer to the communications system 100.

Embodiments herein may relate to an NWDAF analytic to assist PCF on QoS decision. More particularly, embodiments herein may be understood to relate to a mechanism which may address the above problems and may be based on the definition of a new, NWDAF, analytics Id which may provide the predicted optimal values of QoS parameters for a service to reach a target QoE for this service. In the proposed approach, PCF may subscribe to this new analytics Id and use the output to set the QoS of a service. The new analytics Id may be defined as: input: list of (appld, Target QoE), (optional) QoS action/s space, TargetUEs, [location], [S-NSSAI/DNN]; output: recommended QoS action/s (to achieve the Target QoE).

To obtain the new analytics report, the NWDAF may need to: collect data from the PDU sessions established by the TargetUEs, optionally in certain location and for a specific S- NSSAI/DNN, about the QoS parameters applied for the QoS flows associated to the service (QoS). In addition, NWDAF may need to retrieve the state of the NW, optionally for slice and location provided; and finally, calculate the QoE for that service for the target UEs optionally in the provided slice and location by calculating Observed Service Experience (OSE) analytics report. NWDAF may then create a labelled dataset at least with [QoE, state, QoS] and train a ML model which may estimate the QoS parameters for a service with given QoE (target QoE) and state.

The embodiments in the second group of embodiments will now be described more fully hereinafter with reference to the accompanying drawings, in which examples are shown. In this section, embodiments herein are illustrated by exemplary embodiments. It should be noted that these embodiments are not mutually exclusive. Components from one embodiment or example may be tacitly assumed to be present in another embodiment or example and it will be obvious to a person skilled in the art how those components may be used in the other exemplary embodiments. All possible combinations are not described to simplify the description.

Embodiments of a second computer-implemented method, performed by the first node 111 , will now be described with reference to the flowchart depicted in Figure 10. The second method may be understood to be for handling a Quality of Experience (QoE) aimed to be achieved. The first node 111 may operate in the communications system 100.

In some embodiments, the communications system 100 may be a 11G network.

In some embodiments, the first node 111 may be a network function.

Several embodiments are comprised herein. In some embodiments, all the actions may be performed. In some embodiments, one or more of the actions may be performed. It should be noted that the examples herein are not mutually exclusive. One or more embodiments may be combined, where applicable. All possible combinations are not described to simplify the description. Components from one embodiment may be tacitly assumed to be present in another embodiment and it will be obvious to a person skilled in the art how those components may be used in the other exemplary embodiments. A non-limiting example of the second method performed by the first node 111 is depicted in Figure 10. In a particular non-limiting example of the second method, Action 1001 and Action 1002 may be performed.

In Figure 10, optional actions are represented with dashed lines.

Action 1001

In this Action 1001 , the first node 111 may send an indication. The first node 111 may send the indication to the second node 112 operating in the communications system 100.

The indication may indicate a request to subscribe, or a query, to receive.

The request to subscribe, or the query, to receive, may be from the second node 112.

The second node 112 may be an NWDAF.

The indication may indicate the request to subscribe, or the query, to receive a recommendation. The recommendation may be for an action to be triggered by the first node 111 on an environment of the communications system 100, based on the QoE aimed to be achieved, e.g., for the one or more services.

The QoE aimed to be achieved may be for at least one of: i. one or more services, and ii. one or more slices. The at least of the one or more services and the one or more slices may be for data communication to be provided to a first subset of the one or more devices 140, or at least a second subset of the one or more devices 140.

The indication may comprise at least one of: a. a first identifier of an analytic requested from the second node 112 to receive the recommendation, b. one or more identifiers of one or more respective applications the request may apply to, c. a further indication indicating the QoE aimed to be achieved, d. a set of actions the recommended action may be to be selected from, e. one or more filters the request may apply to, f. the one or more filters comprising at least one of: i. one or more devices 140 the request may apply to, ii. one or more slices the request may apply to,

Hi. one or more data networks the request may apply to, and iv. one or more areas the request may apply to

Action 1002

In this Action 1002, the first node 111 may receive another indication.

The first node 111 may receive the second indication from the second node 112.

The receiving of the another indication may be responsive to the sent indication in Action 1001.

The another indication may indicate the recommended action.

In some embodiments, at least one of the following may apply: a. the another indication may further indicate the first identifier, b. the indication may be an NWDAF_AnalyticsSubscription_Subscribe request or a query, and c. the another indication may be an NWDAF_AnalyticsSubscription_Notify request or a response.

In some embodiments, the recommended action may be at least one of: a. updating one or more policies, b. using one or more QoS parameters, c. to be applied to at least one of one or more services and one or more slices are for data communication to be provided to one or more devices 140 operating in the communications system 100. Action 1003

In this Action 1003, the first node 111 may initiate performing the recommended action on the environment.

Initiating performing may be understood as triggering, enabling, facilitating, the performing by another node, or starting the performing itself.

In some embodiments, the initiating, in this Action 1003, performing of the recommended action may comprise triggering performance of the recommended action, e.g., on the at least of: the one or more services, and/or the one or more devices 140.

Embodiments of a second computer-implemented method performed by the second node 112, will now be described with reference to the flowchart depicted in Figure 11. The method may be understood to be for handling the QoE aimed to be achieved. The second node 112 may operate in the communications system 100.

The second method may comprise the following actions. Several embodiments are comprised herein. In some embodiments, the second method may comprise all the actions. In other embodiments, the second method may comprise one or more actions. One or more embodiments may be combined, where applicable. All possible combinations are not described to simplify the description. It should be noted that the examples herein are not mutually exclusive. Components from one example may be tacitly assumed to be present in another example and it will be obvious to a person skilled in the art how those components may be used in the other examples. In Figure 11 , optional actions are depicted with dashed lines. In particular embodiments, Action 1101 and Action 1105 may be performed.

The detailed description of some of the following corresponds to the same references provided above, in relation to the actions described for the first node 111 for the second method and will thus not be repeated here to simplify the description. For example, in some examples, the communications system 100 may be a 5G network. In some embodiments, the first node 111 may be a network function. In some embodiments, the second node 112 may be an NWDAF. In some embodiments, the network function may be one of a PCF, an SMF, and an CAM node.

Action 1101

In this Action 1101 , the second node 112 may receive the indication.

The receiving in this Action 1101 may be from the first node 111 operating in the communications system 100.

The indication may indicate the request to subscribe, or the query, to receive.

The request to subscribe, or the query, to receive, may be from the second node 112. The indication may indicate the request to subscribe, or the query, to receive the recommendation. The recommendation may be for the action to be triggered by the first node 111 on the environment of the communications system 100, based on the QoE aimed to be achieved, e.g., for the one or more services.

The QoE aimed to be achieved may be for at least one of: iii. the one or more services, and iv. the one or more slices.

The at least of the one or more services and the one or more slices may be for data communication to be provided to the first subset of the one or more devices 140, or at least the second subset of the one or more devices 140.

The indication may comprise at least one of: a. the first identifier of the analytic requested from the second node 112 to receive the recommendation, b. the one or more identifiers of the one or more respective applications the request may apply to, c. the further indication indicating the QoE aimed to be achieved, d. the set of actions the recommended action may be to be selected from, e. the one or more filters the request may apply to, f. the one or more filters may comprise at least one of: i. the one or more devices 140 the request may apply to, ii. the one or more slices the request may apply to, iii. the one or more data networks the request may apply to, and iv. the one or more areas the request may apply to.

Action 1102

In this Action 1102, the second node 112 may collect respective data. The collecting of the respective data in this Action 1102 may be from the one or more third nodes 113 operating in the communications system 100.

The collecting in this Action 1102 may be responsive to the received indication in Action 1101.

The respective data may comprise at least one of: a. first respective data; the first respective data may be collected from the one or more first third nodes 114; the first respective data may be indicative of QoS parameters; the QoS parameters may be applied for QoS flows, e.g., associated to an application the indication may relates to, b. second respective data; the second respective data may be collected from the one or more second third nodes 115, the second respective data may be indicative of a state of the communications system 100, c. third respective data; the third respective data may be collected from the one or more third third nodes 116; the third respective data may indicate an Observed Service Experience, OSE, analytics, and d. fourth respective data; the fourth respective data may be collected from the fourth third node 117; the fourth respective data may be indicate a first set of actions the recommended action may be to be selected from.

In some embodiments, at least one of the following may apply: a. the one or more third nodes 114, 115, 116 may comprise a first NF, b. one or the one or more first third nodes 114 may be an SMF, c. the one or more second third nodes 115 may comprise at least one of a UPF, a second network function, NF, other than the UPF, and the radio network node 130, d. the one or more third third nodes 116 may comprise at least one of an AF, an AMF, an SMF, a UPF, and an OAM, and e. the fourth third node 117 may be an OAM.

Action 1103

In this Action 1103, the second node 112 may determine the recommended action. The determining of the recommended action in this Action 1103 may be based on the received indication in this Action 1101 and/or the collected respective data in this Action 1102.

Determining may be understood as calculating, estimating, deriving or similar.

Action 1104

In this Action 1104, the second node 112 may send the another indication. The sending in this Action 1104 may be to the first node 111.

The sending in this Action 1104 may be responsive to the received indication in Action 1101.

The another indication may indicate the recommended action.

In some embodiments, at least one of the following may apply: a. the another indication may further indicate the first identifier, b. the indication may be the NWDAF_AnalyticsSubscription_Subscribe request or the query, and c. the another indication may be the NWDAF_AnalyticsSubscription_Notify request or the response. In some embodiments, the recommended action may be at least one of: a. updating the one or more policies, b. using the one or more QoS parameters, c. to be applied to at least the one of one or more services and the one or more slices for data communication to be provided to the one or more devices 140 operating in the communications system 100.

In the following description, any reference to a/the PCF, simply a/the “PCF”, and/or a/the “Consumer” may be understood to equally refer the first node 111 ; any reference to a/the NWDAF may be understood to equally refer to the second node 112; any reference to a/the “network” or simply a/the “NW” may be understood to equally refer to the communications system 100; any reference to a/the “UE”, “TargetUEs”, “anyUE” and/or “UEs” may be understood to equally refer to the one or more devices 140; any reference to a/the “groupOfUEs” may be understood to equally refer to the first subset of devices, and/or or the second subset of devices.

Figure 12 shows a sequence diagram describing a non-limiting example of the proposed approach based on the new NWDAF analytic. Steps are detailed below:

Steps 1 and 2) A consumer, e.g., PCF, may subscribe to NWDAF (New) analytic (Analytic-ID=QoSAction) by triggering a Nnwdaf_AnalyticsSubscription_Request message including the following parameters:

• Analytic-ID= QoSAction

• List of:

• appld, e.g., example.com. This may indicate which application the request may apply to.

• TargetQoE. This may indicate the requested target QoE for the application.

• (optional) QoSAction/s space. This may indicate the QoS action/s space, e.g., the list of possible QoS actions or QoS profiles, e.g., PCF may include a list of QFIs, for NWDAF to select one of them or provide an ordered list, this as analytic output.

• TargetUEs (anyUE, groupOfUEs). This may indicate which UEs the request may apply to.

• Analytic-Filter (S-NSSAI, DNN, Area). This may indicate which S-NSSAI, DNN and/or Area the request may apply to. Step 3) NWDAF may answer the request message in Step 2 with a successful response (accepting the request).

Step 4) NWDAF may trigger data collection from SMF, e.g., for the PDU sessions established by the TargetUEs, optionally in certain location and for a specific S-NSSAI/DNN, about the QoS parameters applied for the QoS flows associated to the service (QoS). In case the QoS parameter may be the QFI, the existing procedure in Table 6.4.2-2 may be reused, specifically to retrieve the QFI value from SMF, shown in bold in table below:

Table 6.4.2-2: QoS flow level Network Data from 5GC NF related to the QoS profile assigned for a particular service (identified by an Application Id or IP filter information)

Step 5) NWDAF may trigger data collection from 5GC NFs and RAN to obtain the state of the NW, optionally for slice, DNN and location provided. This may be calculated by using a combination of existing analytics as Slice load, NW Performance, User Data Congestion, etc. Step 6) NWDAF may trigger data collection from 5GC NFs and AF relative to existing OSE analytics.

Step 7) (Optional, present in case it was not provided in Step 2 above) NWDAF may trigger data Collection from OAM to retrieve the QoS Action/s space, e.g., the list of possible QoS actions (QFI, GBR, MBR, etc) or QoS profiles which may be selected. Step 8) NWDAF, based on the data collected in Steps 4 to 7 above, may run analytic processes, specifically: • NWDAF may generate a labelled dataset at least with [QoE, state, QoS] and train a ML model which may estimate the QoS from a service from the QoE (target QoE) and state. In order to get diversity of data, e.g., samples in the dataset with many different values of QoS, it may be possible to configure the PCF to provide random combinations of QoS in a controlled environment (lab) or train the model in a production environment but with data from friendly users using the service (no problem for those UEs if getting poor QoE for the service).

• Based on the above, NWDAF may derive the recommended QoS action/s and generate the analytic result, including:

• recommendedQoSAction(s)

Step 9) NWDAF may notify the PCF by triggering a Nnwdaf_AnalyticsSubscription_Notify request message including the following parameters:

• Analytic-ID= QoSAction

• AnalyticResult. This may include the following information (from Step 8 above):

• recommendedQoSAction(s)

Step 10) Consumer may answer the message in Step 9 with a successful response.

Step 11) Based on the Analytic-Result, PCF may apply the recommendedQoSAction(s), e.g., by updating the PCC rules accordingly.

Embodiments herein in the second group of embodiments may have a Technical Specification Impact in 3GPP TS 23.288:

• (New) NWDAF analytic:

• Input: list of (appld, Target QoE), (optional) QoS action/s space, TargetUEs, [location], [S-NSSAI/DNN]

• Output: Recommended QoS action/s (to achieve the Target QoE).

As a summarized overview of the foregoing, embodiments herein in second group of embodiments may be understood to relate to defining a new Analytics Id with input and output data that may be delivered with a service consumer.

A machinery used by NWDAF to generate a labelled dataset at least with [QoE, state, QoS] and to use the dataset to train an ML model to predict QoS parameter by having service, QoE for that service and state of the network.

Generation of recommended actions by NWDAF to a service consumer to reach the intended QoS parameters.

Retrieval of QoS related actions from QAM, by NWDAF which may be triggered for QoS optimizations. Such information may add knowledge about thew actions in NWDAF which do not exist in legacy definitions of NWDAF. Certain embodiments in the second group of embodiments disclosed herein may provide one or more of the following technical advantage(s), which may be summarized as follows.

Embodiments herein in the second group of embodiments may be understood to allow the network operator to enhance 5GC NF operations related to policy control and QoS with the assistance of the NWDAF.

Another advantage may be that PCF may adjust QoS parameters more efficiently using the data received from NWDAF.

A further advantage may be less signaling and efficient communication in the network, since NWDAF may reuse already collected data to assist PCF, while PCF may not need to collect complementary information itself.

Figure 13 is a block diagram illustrating a virtualization environment 1300 in which functions implemented by some embodiments may be virtualized. In the present context, virtualizing means creating virtual versions of apparatuses or devices which may include virtualizing hardware platforms, storage devices and networking resources. As used herein, virtualization can be applied to any device described herein, or components thereof, and relates to an implementation in which at least a portion of the functionality is implemented as one or more virtual components. Some or all of the functions described herein may be implemented as virtual components executed by one or more virtual machines (VMs) implemented in one or more virtual environments 1300 hosted by one or more of hardware nodes, such as a hardware computing device that operates as a network node, UE, core network node, or host. Further, in embodiments in which the virtual node does not require radio connectivity (e.g., a core network node or host), then the node may be entirely virtualized. In some embodiments, the virtualization environment 1300 includes components defined by the O-RAN Alliance, such as an O-Cloud environment orchestrated by a Service Management and Orchestration Framework via an 0-2 interface.

Applications 1302 (which may alternatively be called software instances, virtual appliances, network functions, virtual nodes, virtual network functions, etc.) are run in the virtualization environment Q400 to implement some of the features, functions, and/or benefits of some of the embodiments disclosed herein.

Hardware 1304 includes processing circuitry, memory that stores software and/or instructions executable by hardware processing circuitry, and/or other hardware devices as described herein, such as a network interface, input/output interface, and so forth. Software may be executed by the processing circuitry to instantiate one or more virtualization layers 1306 (also referred to as hypervisors or virtual machine monitors (VMMs)), provide VMs 1308a and 1308b (one or more of which may be generally referred to as VMs 1308), and/or perform any of the functions, features and/or benefits described in relation with some embodiments described herein. The virtualization layer 1306 may present a virtual operating platform that appears like networking hardware to the VMs 1308.

The VMs 1308 comprise virtual processing, virtual memory, virtual networking or interface and virtual storage, and may be run by a corresponding virtualization layer 1306. Different embodiments of the instance of a virtual appliance 1302 may be implemented on one or more of VMs 1308, and the implementations may be made in different ways. Virtualization of the hardware is in some contexts referred to as network function virtualization (NFV). NFV may be used to consolidate many network equipment types onto industry standard high volume server hardware, physical switches, and physical storage, which can be located in data centers, and customer premise equipment.

In the context of NFV, a VM 1308 may be a software implementation of a physical machine that runs programs as if they were executing on a physical, non-virtualized machine. Each of the VMs 1308, and that part of hardware 1304 that executes that VM, be it hardware dedicated to that VM and/or hardware shared by that VM with others of the VMs, forms separate virtual network elements. Still in the context of NFV, a virtual network function is responsible for handling specific network functions that run in one or more VMs 1308 on top of the hardware 1304 and corresponds to the application 1302.

Hardware 1304 may be implemented in a standalone network node with generic or specific components. Hardware 1304 may implement some functions via virtualization. Alternatively, hardware 1304 may be part of a larger cluster of hardware (e.g. such as in a data center or CPE) where many hardware nodes work together and are managed via management and orchestration 1310, which, among others, oversees lifecycle management of applications 1302. In some embodiments, hardware 1304 is coupled to one or more radio units that each include one or more transmitters and one or more receivers that may be coupled to one or more antennas. Radio units may communicate directly with other hardware nodes via one or more appropriate network interfaces and may be used in combination with the virtual components to provide a virtual node with radio capabilities, such as a radio access node or a base station. In some embodiments, some signaling can be provided with the use of a control system 1312 which may alternatively be used for communication between hardware nodes and radio units. General

When using the word "comprise" or “comprising”, it shall be interpreted as non- limiting, i.e., meaning "consist at least of'.

The embodiments herein are not limited to the above-described preferred embodiments. Various alternatives, modifications and equivalents may be used. Therefore, the above embodiments should not be taken as limiting the scope of the invention.

Generally, all terms used herein are to be interpreted according to their ordinary meaning in the relevant technical field, unless a different meaning is clearly given and/or is implied from the context in which it is used. All references to a/an/the element, apparatus, component, means, step, etc. are to be interpreted openly as referring to at least one instance of the element, apparatus, component, means, step, etc., unless explicitly stated otherwise. The steps of any methods disclosed herein do not have to be performed in the exact order disclosed, unless a step is explicitly described as following or preceding another step and/or where it is implicit that a step must follow or precede another step. Any feature of any of the embodiments disclosed herein may be applied to any other embodiment, wherever appropriate. Likewise, any advantage of any of the embodiments may apply to any other embodiments, and vice versa. Other objectives, features and advantages of the enclosed embodiments will be apparent from the following description.

As used herein, the expression “at least one of:” followed by a list of alternatives separated by commas, and wherein the last alternative is preceded by the “and” term, may be understood to mean that only one of the list of alternatives may apply, more than one of the list of alternatives may apply or all of the list of alternatives may apply. This expression may be understood to be equivalent to the expression “at least one of:” followed by a list of alternatives separated by commas, and wherein the last alternative is preceded by the “or” term.

Any of the terms processor and circuitry may be understood herein as a hardware component.

As used herein, the expression “in some embodiments” has been used to indicate that the features of the embodiment described may be combined with any other embodiment or example disclosed herein.

As used herein, the expression “in some examples” has been used to indicate that the features of the example described may be combined with any other embodiment or example disclosed herein. REFERENCES

1. 3GPP TS 23.288 v18.4.0 (December 2023): Architecture enhancements for 5G System (5GS) to support network data analytics services

2. 3GPP TS 23.700-84 v0.1 .0 (February 2024): Study on Core Network Enhanced Support for Artificial Intelligence (Al) / Machine Learning (ML).

EXAMPLES of the first group of embodiments:

1 . A first computer-implemented method, performed by a first node (111), for handling a Quality of Experience, QoE, aimed to be achieved, the first node (111) operating in a communications system (100), the first method comprising:

- determining (405), using a reinforcement learning procedure of machine learning, one or more actions to be applied based on the QoE aimed to be achieved, for the at least one of the one or more services and the one or more slices, wherein the determining (405) is based on information received from a second node (112) operating in the communications system (100), and

- initiating (406) application of the determined one or more actions.

2. The first method according to example 1 , wherein at least one of: a. the one or more actions are to be applied on at least one of: one or more services, one or more slices, at least a first subset of one or more devices (140) operating in the communications system (100), b. the one or more actions to be applied comprise updating one or more policies, c. the one or more actions comprise using one or more Quality of Service, QoS, parameters, d. the at least of the one or more services and the one or more slices are for data communication to be provided to the first subset, or at least a second subset, of the one or more devices (140).

3. The first method according to any of examples 1 -2, wherein the information comprises at least one of: i. an initial state of an environment of the communications system (100), ii. an initial reward corresponding to the initial state of the environment, Hi. the state of the environment after an earlier action triggered by the first node (111) on the environment, and iv. the reward for the earlier action triggered by the first node (111) in the environment, based on the state of the environment after the action.

4. The first method according to example 3, further comprising at least one of:

- obtaining (401) a first indication of the QoE aimed to be achieved, and wherein the determining (405) is based on the obtained first indication,

- sending (402), before the determining (405) of the one or more actions, a second indication to the second node (112), the second indication indicating a subscription to, or a query for, a service provided by the second node (112), the service being to provide at least one of the state and the reward,

- obtaining (403), before the determining (405) of the one or more actions and responsive to the sent second indication, a third indication from the second node (112), the third indication indicating an identifier of the reinforcement learning procedure, and

- obtaining (404) one or more fourth indications from the second node (112), the one or more fourth indications indicating the at least one of the state and the reward, and wherein the determining (405) is based on the obtained one or more fourth indications.

5. The first method according to example 4, wherein for every respective iteration subsequent to an initial iteration the first method further comprises:

- sending (407) a fifth indication to the second node (112) after performance of the determined one or more actions, the fifth indication requesting the respective one or more fourth indications for the respective iteration.

6. The first method according to examples 4 and 5, wherein the initiating (406) application of the one or more actions comprises triggering performance of the determined one or more actions, and wherein the first node (111) iterates the obtaining (404) of the one or more fourth indications, the determining (405) of the one or more actions, the initiating (406) application of the one or more actions and the sending (407) of the fifth indication until an obtained reward exceeds a threshold for a number of iterations.

7. The first method according to example 2 and any of examples 4-6, wherein the second indication indicates at least one of: a. a respective definition of the at least one of the state and the reward, b. the first subset, or at least a second subset, of the one or more devices (140) the at least one of the state and the reward applies to.

8. The first method according to any of examples 1 -7, wherein the communications system (100) is a Fifth Generation, 5G, network and at least one of: i. the first node (111) is a network function, ii. the second node (112) is a Network Data Analytics Function, NWDAF, and iii. the network function is one of a Policy Control Function, PCF, a Session Management Function, SMF, and an Operation, Administration and Maintenance, OAM, node.

9. A first computer-implemented method, performed by a second node (112), for handling a Quality of Experience, QoE, aimed to be achieved, the second node (112) operating in a communications system (100), the first method comprising:

- receiving (501) a second indication from a first node (111) operating in the communications system (100), the second indication indicating a subscription to, or a query for, a service provided by the second node (112), the service being to provide information to be used as input in a reinforcement learning procedure of machine learning to be performed by the first node (111) based on the QoE aimed to be achieved,

- determining (504) the information responsive to the received second indication, and

- sending (505) the determined information to the first node (111).

10. The first method according to example 9, wherein the reinforcement learning procedure is to determine one or more actions to be applied on the at least one of one or more services and the one or more slices, based on the QoE aimed to be achieved, and wherein at least one of: d. the QoE aimed to be achieved is for the at least one of one or more services, one or more slices, and at least a first subset of one or more devices (140) operating in the communications system (100), e. the one or more actions to be applied comprise updating one or more policies, f. the one or more actions comprise using one or more Quality of Service, QoS, parameters, and g. the at least of the one or more services and the one or more slices are for data communication to be provided to the first subset, or at least a second subset, of the one or more devices (140).

11 . The first method according to any of examples 9-10, wherein the information comprises at least one of: v. an initial state of an environment of the communications system (100), vi. an initial reward corresponding to the initial state of the environment, vii. the state of the environment after an earlier action triggered by the first node (111) on the environment, and viii. the reward for the earlier action triggered by the first node (111) in the environment, based on the state of the environment after the action.

12. The first method according to example 11 , wherein the information sent comprises one or more fourth indications, the one or more fourth indications indicating the at least one of the state and the reward.

13. The first method according to any of examples 9-12, further comprising at least one of:

- sending (502) a third indication to the first node (111), the third indication indicating an identifier of the reinforcement learning procedure, and

- collecting (503) data from the environment responsive to the received second indication, and wherein the determining (504) of the information is based on the collected data.

14. The first method according to example 12, wherein for every respective iteration subsequent to an initial iteration, the first method further comprises:

- receiving (506) a fifth indication from the first node (111) after performance of the one or more actions determined by the first node (111) based on the sent information, the fifth indication requesting the respective one or more fourth indications for the respective iteration.

15. The first method according to examples 13 and 14, wherein the second node (112) iterates the collecting (503) of the data from the environment, the determining (504) of the information, the sending (505) of the information, and the receiving (506) of the fifth indication until an obtained reward exceeds a threshold for a number of iterations.

16. The first method according to example 9 and any of examples 11-12, wherein the second indication indicates at least one of: h. a respective definition of the at least one of the state and the reward, i. the first subset, or at least a second subset, of the one or more devices (140) the at least one of the state and the reward applies to. 17. The first method according to any of examples 9-16, wherein the communications system (100) is a Fifth Generation, 5G, network and at least one of: i. the first node (111) is a network function, ii. the second node (112) is a Network Data Analytics Function, NWDAF, and

Hi. the network function is one of a Policy Control Function, PCF, a Session Management Function, SMF, and an Operation, Administration and Maintenance, OAM, node. 18. A communications system (100) comprising one or more of: a first node (111) according to any of the examples 1-8, a second node (112) according to any of the examples 9-17.

EXAMPLES of the second group of embodiments:

1 . A second computer-implemented method, performed by a first node (111), for handling a Quality of Experience, QoE, aimed to be achieved, the first node (111) operating in a communications system (100), the second method comprising:

- sending (1001), an indication to a second node (112) operating in the communications system (100), the indication indicating a request to subscribe, or a query, to receive, from the second node (112), a recommendation for an action to be triggered by the first node (111) on an environment of the communications system (100) based on the QoE aimed to be achieved for the one or more services, and

- receiving (1002), responsive to the sent indication, another indication from the second node (112), the another indication indicating the recommended action.

2. The second method according to example 1 , further comprising:

- initiating (1003) performing the recommended action on the environment.

3. The second method according to any of examples 1-2, wherein the indication comprises at least one of: a. a first identifier of an analytic requested from the second node (112) to receive the recommendation, b. one or more identifiers of one or more respective applications the request applies to, c. a further indication indicating the QoE aimed to be achieved, d. a set of actions the recommended action is to be selected from, e. one or more filters the request applies to, f. the one or more filters comprising at least one of: i. one or more devices (140) the request applies to, ii. one or more slices the request applies to,

Hi. one or more data networks the request applies to, and iv. one or more areas the request applies to.

4. The second method according to any of examples 1-4, wherein the communications system (100) is a Fifth Generation, 5G, network and at least one of: i. the first node (111) is a network function, ii. the second node (112) is a Network Data Analytics Function, NWDAF, iii. the network function is one of a Policy Control Function, PCF, a Session Management Function, SMF, and an Operation, Administration and Maintenance, OAM, node.

5. The second method according to examples 3 and 4, wherein at least one of: a. the another indication further indicates the first identifier, b. the indication is an NWDAF_AnalyticsSubscription_Subscribe request or a query, and c. the another indication is an NWDAF_AnalyticsSubscription_Notify request or a response.

6. The second method according to any of examples 1-5, wherein the recommended action is at least one of: e. updating one or more policies, f. using one or more Quality of Service, QoS, parameters, g. to be applied to at least one of one or more services and one or more slices are for data communication to be provided to one or more devices (140) operating in the communications system (100).

7. A second computer-implemented method, performed by a second node (112), for handling a Quality of Experience, QoE, aimed to be achieved, the second node (112) operating in a communications system (100), the second method comprising:

- receiving (1101), an indication to from a first node (111) operating in the communications system (100), the indication indicating a request to subscribe, or a query, to receive, from the second node (112), a recommendation for an action to be triggered by the first node (111) on an environment of the communications system (100) based on the QoE aimed to be achieved for the one or more services, and

- sending (1104), responsive to the received indication, another indication to the first node (111), the another indication indicating the recommended action.

8. The second method according to example 6, further comprising:

- collecting (1002) respective data from one or more third nodes (113) operating in the communications system (100), the collecting (1002) being responsive to the received indication, and

- determining (1003) the recommended action based on the received indication, and/or the collected respective data. The second method according to example 7, wherein the respective data comprises at least one of: a. first respective data collected from one or more first third nodes (114) indicative of Quality of Service, QoS, parameters applied for QoS flows associated to an application the indication relates to, b. second respective data collected from one or more second third nodes (115) indicative of a state of the communications system (100), c. third respective data collected from one or more third third nodes (116) indicating an Observed Service Experience, OSE, analytics, and d. fourth respective data collected from a fourth third node (117) indicating a first set of actions the recommended action is to be selected from. The second method according to any of examples 6-8, wherein the indication comprises at least one of: a. a first identifier of an analytic requested from the second node (112) to receive the recommendation, b. one or more identifiers of one or more respective applications the request applies to, c. a further indication indicating the QoE aimed to be achieved, d. a set of actions the recommended action is to be selected from, e. one or more filters the request applies to, f. the one or more filters comprising at least one of: i. one or more devices (140) the request applies to, ii. one or more slices the request applies to,

Hi. one or more data networks the request applies to, and iv. one or more areas the request applies to. The second method according to any of examples 6-9, wherein the communications system (100) is a Fifth Generation, 5G, network and at least one of: i. the first node (111) is a network function, ii. the second node (112) is a Network Data Analytics Function, NWDAF,

Hi. the network function is one of a Policy Control Function, PCF, a Session Management Function, SMF, and an Operation, Administration and Maintenance, OAM, node. The second method according to examples 9 and 10, wherein at least one of: a. the another indication further indicates the first identifier, b. the indication is an NWDAF_AnalyticsSubscription_Subscribe request or a query, and c. the another indication is an NWDAF_AnalyticsSubscription_Notify request or a response.

13. The second method according to examples 8 and 10, wherein at least one of: a. the one or more third nodes (114, 115, 116) comprise a first NF, b. one or the one or more first third nodes (114) is a Session Management Function, SMF, c. the one or more second third nodes (115) comprise at least one of a User Plane Function, UPF, a second network function, NF, other than a UPF, and a radio network node (130), d. the one or more third third nodes (116) comprise at least one of an Application Function, AF, an Access and Mobility Management Function, AMF, a Session Management Function, SMF, a UPF, and an Operations and Maintenance node, OAM, and e. the fourth third node (117) is an OAM.

14. The second method according to any of examples 7-13, wherein the recommended action is at least one of: h. updating one or more policies, i. using one or more Quality of Service, QoS, parameters, j. to be applied to at least one of one or more services and one or more slices for data communication to be provided to one or more devices (140) operating in the communications system (100).

15. A communications system (100) comprising one or more of: a first node (111) according to any of the examples 1-6, a second node (112) according to any of the examples 7- 14.

Claims

1 . A computer-implemented method, performed by a first node (111), for handling a Quality of Experience, QoE, aimed to be achieved, the first node (111) operating in a communications system (100), the method comprising:

- initiating (406) application of the determined one or more actions.

2. The method according to claim 1 , wherein at least one of: a. the one or more actions are to be applied on at least one of: one or more services, one or more slices, at least a first subset of one or more devices (140) operating in the communications system (100), b. the one or more actions to be applied comprise updating one or more policies, c. the one or more actions comprise using one or more Quality of Service, QoS, parameters, d. the at least of the one or more services and the one or more slices are for data communication to be provided to the first subset, or at least a second subset, of the one or more devices (140).

3. The method according to any of claims 1 -2, wherein the information comprises at least one of: i. an initial state of an environment of the communications system (100), ii. an initial reward corresponding to the initial state of the environment, Hi. the state of the environment after an earlier action triggered by the first node (111) on the environment, and iv. the reward for the earlier action triggered by the first node (111) in the environment, based on the state of the environment after the action.

4. The method according to claim 3, further comprising at least one of:

- obtaining (401) a first indication of the QoE aimed to be achieved, and wherein the determining (405) is based on the obtained first indication, - sending (402), before the determining (405) of the one or more actions, a second indication to the second node (112), the second indication indicating a subscription to, or a query for, a service provided by the second node (112), the service being to provide at least one of the state and the reward,

5. The method according to claim 4, wherein for every respective iteration subsequent to an initial iteration the method further comprises:

6. The method according to claims 4 and 5, wherein the initiating (406) application of the one or more actions comprises triggering performance of the determined one or more actions, and wherein the first node (111) iterates the obtaining (404) of the one or more fourth indications, the determining (405) of the one or more actions, the initiating (406) application of the one or more actions and the sending (407) of the fifth indication until an obtained reward exceeds a threshold for a number of iterations.

7. The method according to claim 2 and any of claims 4-6, wherein the second indication indicates at least one of: a. a respective definition of the at least one of the state and the reward, b. the first subset, or at least a second subset, of the one or more devices (140) the at least one of the state and the reward applies to.

8. The method according to any of claims 1 -7, wherein the communications system (100) is a Fifth Generation, 5G, network and at least one of: iv. the first node (111) is a network function, v. the second node (112) is a Network Data Analytics Function, NWDAF, and vi. the network function is one of a Policy Control Function, PCF, a Session Management Function, SMF, and an Operation, Administration and Maintenance, OAM, node.

9. A computer-implemented method, performed by a second node (112), for handling a Quality of Experience, QoE, aimed to be achieved, the second node (112) operating in a communications system (100), the method comprising:

- sending (505) the determined information to the first node (111).

10. The method according to claim 9, wherein the reinforcement learning procedure is to determine one or more actions to be applied on the at least one of one or more services and the one or more slices, based on the QoE aimed to be achieved, and wherein at least one of: j. the QoE aimed to be achieved is for the at least one of one or more services, one or more slices, and at least a first subset of one or more devices (140) operating in the communications system (100), k. the one or more actions to be applied comprise updating one or more policies, l. the one or more actions comprise using one or more Quality of Service, QoS, parameters, and m. the at least of the one or more services and the one or more slices are for data communication to be provided to the first subset, or at least a second subset, of the one or more devices (140).

11 . The method according to any of claims 9-10, wherein the information comprises at least one of: i. an initial state of an environment of the communications system (100), ii. an initial reward corresponding to the initial state of the environment, iii. the state of the environment after an earlier action triggered by the first node (111) on the environment, and iv. the reward for the earlier action triggered by the first node (111) in the environment, based on the state of the environment after the action.

12. The method according to claim 11 , wherein the information sent comprises one or more fourth indications, the one or more fourth indications indicating the at least one of the state and the reward.

13. The method according to any of claims 9-12, further comprising at least one of:

14. The method according to claim 12, wherein for every respective iteration subsequent to an initial iteration, the method further comprises:

15. The method according to claims 13 and 14, wherein the second node (112) iterates the collecting (503) of the data from the environment, the determining (504) of the information, the sending (505) of the information, and the receiving (506) of the fifth indication until an obtained reward exceeds a threshold for a number of iterations.

16. The method according to claim 9 and any of claims 11-12, wherein the second indication indicates at least one of: n. a respective definition of the at least one of the state and the reward, o. the first subset, or at least a second subset, of the one or more devices (140) the at least one of the state and the reward applies to.

17. The method according to any of claims 9-16, wherein the communications system (100) is a Fifth Generation, 5G, network and at least one of: vii. the first node (111) is a network function, viii. the second node (112) is a Network Data Analytics Function, NWDAF, and ix. the network function is one of a Policy Control Function, PCF, a Session Management Function, SMF, and an Operation, Administration and Maintenance, OAM, node.

18. Apparatus (111) comprising a processor and a memory, the memory containing instructions executable by the processor such that the apparatus is operable to perform the method of any one of claims from claim 1 to claim 8.

19. Apparatus (112) comprising a processor and a memory, the memory containing instructions executable by the processor such that the apparatus is operable to perform the method of any one of claims from claim 9 to claim 17.

20. A communications system (100) comprising one or more of: a first node (111) according to claim 18, a second node (112) according to claim 19.