WO2025178519A1 - First node and methods performed thereby for handling a first machine learning model - Google Patents
First node and methods performed thereby for handling a first machine learning modelInfo
- Publication number
- WO2025178519A1 WO2025178519A1 PCT/SE2024/050169 SE2024050169W WO2025178519A1 WO 2025178519 A1 WO2025178519 A1 WO 2025178519A1 SE 2024050169 W SE2024050169 W SE 2024050169W WO 2025178519 A1 WO2025178519 A1 WO 2025178519A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- node
- data
- machine learning
- learning model
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/096—Transfer learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/01—Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
Definitions
- the present disclosure relates generally to a first node and methods performed thereby for handling a first machine learning model.
- the present disclosure also relates generally to a computer program and a computer-readable storage medium, having stored thereon the computer program to carry out this method.
- Computer systems in a communications network or communications system may comprise one or more nodes.
- a node may comprise one or more processors which, together with computer program code may perform different functions and actions, a memory, a receiving port, and a sending port.
- a node may be, for example, a server. Nodes may perform their functions entirely on the cloud.
- the base stations may be of different classes such as e.g., Wide Area Base Stations, Medium Range Base Stations, Local Area Base Stations and Home Base Stations, based on transmission power and thereby also cell size.
- a cell may be understood to be the geographical area where radio coverage may be provided by the base station at a base station site.
- One base station, situated on the base station site, may serve one or several cells. Further, each base station may support one or several communication technologies.
- the telecommunications network may also comprise network nodes which may serve receiving nodes, such as user equipments, with serving beams.
- 5G Core Network 5G Core Network
- eMBB enhanced Mobile Broad Band
- mMTC machine to Machine type communication
- URLLC Ultra Reliable Low Latency Communication
- 5G may be understood to bring in sizeable flexibility with technological advancements along with innovations of cloud and Artificial Intelligence (Al). This may be understood to bring a whole new set of opportunities in the enterprise segment.
- Figure 8 is a signalling diagram depicting aspects of non-limiting example of the method performed by the first node, according to embodiments herein.
- Figure 11 is a signalling diagram depicting other aspects of non-limiting example of the method performed by the first node, according to embodiments herein.
- the computer system 100 comprises a first node 111.
- the computer system 100 may comprise further nodes.
- the computer system 100 may comprise a second node 112.
- the computer system 100 may alternatively or additionally comprise a plurality of third nodes 113.
- the computer system 100 may comprise additional nodes.
- any of the first node 111 , the second node 112 and the third nodes in the plurality of third nodes 113 may have a capability to manage an artificial neural network.
- the artificial neural network may be understood as a machine learning framework, which may comprise a collection of connected nodes, where in each node or perceptron, there may be an elementary decision unit. Each such node may have one or more inputs and an output. The input to a node may be from the output of another node or from a data source.
- Each of the nodes and connections may have certain weights or parameters associated with it. In order to solve a decision task, the weights may be learnt or optimized over a data set which may be representative of the decision task.
- the device 140 comprised in the telecommunications system may be, for example, portable, pocket-storable, hand-held, computer-comprised, or a vehicle-mounted mobile device, enabled to communicate voice and/or data, via the RAN, with another entity, such as a server, a laptop, a Personal Digital Assistant (PDA), or a tablet, Machine-to-Machine (M2M) device, device equipped with a wireless interface, such as a printer or a file storage device, modem, sensor, loT device, or any other radio network unit capable of communicating over a radio link in a communications system.
- the device 140 may be, or comprise, a microphone.
- the device 140 comprised in the telecommunications system may be enabled to communicate wirelessly in the telecommunications system.
- the communication may be performed e.g., via a RAN, and possibly the one or more core networks, which may be comprised within the telecommunications system.
- the telecommunications network may comprise additional radio network nodes 130 and/or additional devices 140.
- the first node 1 11 may be configured to communicate within the computer system 100 with the second node 112 over a first link 141 , e.g., a radio link, or a wired link.
- a first link 141 e.g., a radio link, or a wired link.
- the first node 1 11 may be configured to communicate within the computer system 100 with the plurality of third nodes 1 13 over a respective second link 142, e.g., a radio link, or a wired link.
- the first node 1 11 may be configured to communicate within the computer system 100 with the radio network node 130 over a third link 143, e.g., a radio link, or a wired link.
- the plurality of third nodes may be configured to communicate within the computer system 100 with the radio network node 130 over a fourth link 144, e.g., a radio link, or a wired link.
- a fourth link 144 e.g., a radio link, or a wired link.
- first link 141 , the respective second link 142, the third link 143, the fourth link 144, the respective fifth link 145 and the sixth link 146 may be a direct link or may be comprised of a plurality of individual links, wherein it may go via one or more computer systems or one or more core networks in the computer system 100, which are not depicted in Figure 3, or it may go via an optional intermediate network.
- the intermediate network may be one of, or a combination of more than one of, a public, private or hosted network; the intermediate network, if any, may be a backbone network or the Internet; in particular, the intermediate network may comprise two or more sub-networks, which is not shown in Figure 3.
- first”, “second”, “third”, “fourth”, “fifth” and/or “sixth” herein may be understood to be an arbitrary way to denote different elements or entities, and may be understood to not confer a cumulative or chronological character to the nouns they modify.
- Embodiments of a computer-implemented method, performed by the first node 1 11 will now be described with reference to the flowchart depicted in Figure 4.
- the method is for handling a first machine learning model.
- the first node 1 11 operates in the computer system 100.
- a domain may be understood as an environment where a machine learning model may be being trained or may have been trained.
- domain may be understood to refer to a data domain, wherein the data domain may have unique characteristics, such as e.g., data distribution, data features, system features, environment properties, data protection rights, etc. These unique characteristics may differ between data domains.
- the first domain may be understood to be a target domain. That is, a domain where there may be an interest, e.g., by the second node 1 12, in making predictions.
- the receiving in this Action 401 may be performed, e.g., via the first link 141 .
- Obtaining in this Action 401 may comprise receiving, e.g., from the second node 112, e.g., via the first link 141 .
- a second machine learning model has already been trained with second data from a second domain to make predictions in the second domain.
- the second domain may be referred to herein as a source domain, as it may be understood to be the domain wherein the second machine learning model may have been trained.
- the second data may be also referred to herein as the source data.
- the second machine learning model may be referred to as the source model.
- the second machine learning model may be understood to be a model that may have been trained on the data samples from different datasets, each of which with their own specifications for a specific use case.
- the second machine learning model may be, in some examples, a global model that may continuously evolve as new data may become available.
- the first node 111 may fetch the latest updates of the information characterizing the second data regularly.
- the first threshold may be understood to be a configurable, e.g., a user-defined, threshold.
- the first node 111 may be enabled to determine whether or not the first machine learning model may need to be determined from scratch. That is, whether or not it may not be worthwhile to derive the first machine learning model to make predictions in the first domain by adapting the second machine learning model.
- the first node 111 may learn a new model tailored for the first domain instead of initiating the unlearning procedure that will be described in Action 409.
- the first node 111 may proceed to determine which second machine learning model of the plurality of second machine learning models may be closest to the one or more criteria, that is, to the target specification, and use that second machine learning model.
- the first node 111 may refrain from determining the first machine learning model from the second machine learning model and may instead proceed to determine the first machine learning model from scratch.
- the one or more criteria differs more than the first, predefined, threshold from all the available source specifications, that is, from all the respective information characterizing the respective second data, then a new first machine learning model, may be constructed for the target.
- the first threshold may be a configurable, e.g., a user-defined, threshold. The first node 111 may then choose to add this newly built model to a dictionary of models.
- the first node 111 in this Action 405, may select the second machine learning model out of the plurality of second machine learning models.
- the selecting in this Action 405 may be based on the respective level of fulfilment of the one or more criteria. That is, the first node 111 may select the second machine learning model that may have a highest level of fulfilment of the one or more criteria, that is, that may be closer to the one or more criteria.
- the first node 111 may perform assessment and decide which second machine learning model may be useful for the first domain and ask the corresponding third node 113 in the second domain, e.g., NWDAF source, to send the second machine learning model and the respective data, or representation, that may not be desired to be represented in the first machine learning model.
- NWDAF source e.g., NWDAF source
- the first node 111 may perform the selection in this Action 405 by comparing the one or more criteria, that is, the target specification, against the respective information characterizing the respective second data, that is, the source specifications.
- the source specification that may be determined to be closest to the one or more criteria may then be chosen and accordingly its corresponding second machine learning model and respective second data.
- the first node 111 may then proceed with unlearning as will be described in Action 409. This procedure is schematically depicted later in Figure 9.
- the second machine learning model may have been selected, it may be understood that not all the data samples in the second data it may have been trained with may equally fulfil the one or more criteria at the same level. There may be data samples that may have a good match to the one or more criteria, while other data samples may have a poorer match to the one or more criteria or not match them at all. In this Action 406, the first node 111 may try to identify which data samples the poor matches or no matches at all may be.
- the first node 111 determines first data samples in the second data lacking a level of fulfilment of the one or more criteria exceeding the first threshold.
- the determined first data samples have an effect on the second machine learning model.
- Determining may be understood as calculating, estimating, deriving, or similar, or obtaining or receiving from another node.
- the first node 111 may perform the determining in this Action 406 by checking the correspondence between the one or more criteria and the information characterizing the second data, that is, the meta information of source and target specifications, for example, the type of feature attributes, environment, data access rights, etc... .
- the determining in this Action 406 of the first data samples may be performed by a third module, referred to herein as an “assessment module”, comprised in the first node 111.
- the first node 111 may be enabled to identify the, e.g., unwanted, noisy, biased, erroneous, and privacy-sensitive samples from a fully trained second machine learning model.
- the first node 111 may then be enabled to remove the effect of such samples from the second machine learning model while not requiring retraining the second machine learning model by performing an unlearning procedure of the second machine learning model in order to minimize the effect that the first data samples may have had on the second machine learning model. That is, in order to “erase” the effect they may have had on the second machine learning model, as will be explained in Action 409.
- the posterior distribution of the second machine learning may be denoted by p(0 ⁇ D s ).
- the posterior distribution may be learned from all data D s .
- the probabilistic class of models e.g., a Gaussian process
- the posterior may be immediately available as part of the learning.
- the first node 111 may first need to approximate the posterior.
- the first node 111 may approximate the posterior through this Action 407 and the next Action 408.
- the first node 111 may, in this Action 407, determine second data samples from a posterior function of the second machine learning model using a non-parametric sampling procedure.
- This Action 407 may be also referred to as nonparametric modelling, wherein the empirical distribution of these samples may follow approximately the true posterior distribution.
- the difference between the empirical posterior and the true posterior may be understood to be that the latter may be understood to be expressed in terms of a probability density function, in other words, it may be understood to have a known functional form.
- the first node 111 may have only access to the samples and may only know the empirical distribution of the posterior but not know the parametric functional form of the posterior.
- the first node 111 may perform the determining of the second data samples in this Action 407 by drawing samples that approximately follow the empirical posterior distribution using a non-parametric Markov chain Monte Carlo (MCMC) sampling technique.
- MCMC Markov chain Monte Carlo
- a well-known technique that may be used may be Hamiltonian Markov (HMC) chain sampling.
- the first node 111 may then be enabled, in the next Action 408, to determine a functional form of the posterior function, particularly, to derive a parametric approximation of the posterior function, which may enable the first node 111 to in turn then proceed with the derivation of the first machine learning model in Action 409, according to embodiments herein, in the event that the second machine learning model may be a non-probabilistic model. That is, by performing this Action 407, as well as the next Action 408, the first node 111 may make the method described herein applicable to instances wherein the second machine learning model may be non-probabilistic.
- the first node 111 may determine a parametric approximation of the posterior function using a parametric mixture model and based on the second data.
- D s ) may be understood to become a mixture of Student-t distributions.
- the first node 111 obtains the first machine learning model by performing an unlearning procedure of the second machine learning model.
- the first node 111 performs the unlearning procedure by reducing the effect of the determined first data samples from the second machine learning model over a second threshold.
- the unlearning procedure may comprise erasing the determined first data samples from the second data to yield third data.
- the first node 111 may enable to remove the effect of unwanted, noisy, biased, erroneous, and privacy-sensitive samples from a fully trained second machine learning model while not requiring retraining the second machine learning model.
- the second machine learning model may be understood to not need retraining, but rather may need to be maintained, this may be understood to provide a potential energy savings.
- Context based dynamic handover management use case in O-RAN Another O- RAN use case may be about context based handover (HO) management.
- This use case may preferably necessitate datasets from different entities and domains, such as historical traffic/navigation data, e.g., road conditions, radio/HO data, e.g., about neighboring cells and their network traffic load, as well as dataset from a UE, e.g., received signal strength, vehicle speed, to enable better handover.
- a particular entity may stop sharing data, e.g., due to changed regulations or technical link failure. In that case, removal of dependencies from the model may need to be performed.
- Figure 7 is a schematic block diagram depicting a non-limiting example of aspects of a method performed by the first node 111 according to embodiments herein. Particularly, Figure
- Figure 8 is a signalling diagram depicting a non-limiting example of the a method performed by the first node 1 11 , according to embodiments herein.
- the first node 111 is indicated as an “apparatus node” and the second node 112 is indicated as a target node.
- the first node 1 11 may be implemented centralized or distributed as shown in the signalling diagram.
- the first node 11 1 may, according to Action 401 , receive the request from the second node 1 12 for a machine learning model to make predictions in the first domain. That is for the first machine learning model as, in this case, the model of the first domain.
- Figure 10 is a signalling diagram depicting a non-limiting example of a method performed by the first node 111 in an embodiment wherein there the computer system 100 may comprise the plurality of third nodes 113, and wherein each of the first node 111 , the second node 112 and the third nodes 113 may be NWDAFs.
- the plurality of third nodes 113 may comprise N third nodes 113.
- a first third node 113 denoted NWDAF Source 1
- a second third node 113 denoted NWDAF Source 2
- an N third node 113 denoted NWDAF Source N.
- the first node 111 may request additional data specifications from the plurality of third nodes 113, that is, those potential source NWDAFs where a model or plurality of models may be, or may have been, trained historically.
- the first node 111 may then, according to Action 403, obtain the respective information characterizing the respective second data, that is, the source data specifications, from each of the third nodes 113.
- the first node 111 in accordance with Action 404, may perform an assessment and, in accordance with Action 405, may decide which source model may be useful for the target domain and ask the corresponding NWDAF source to send the model and the data, or representation, that may not desired be to be represented in the target model.
- the second node 112 may send an indication of the accuracy of the first machine learning model to the first node 111 , and at 15, in accordance with Action 411 , the accuracy of the target model may be recorded at the first node 111 along with its specifications for similar future requests.
- Embodiments herein may be understood to not be specific to particular use case, and potentially many Open RAN (O-RAN) use cases, e.g., QoE optimization, context based dynamic handover management for Vehicle-to-everything (V2X), etc. may host the embodiments herein.
- O-RAN Open RAN
- embodiments herein may involve an extension of the existing ORAN flow diagram described in Figure is 4.4.3-1 in the O-RAN Work Group 1 , Use Cases and Overall Architecture, Use Cases Detailed Specifications, Technical Specification.
- the first node 111 may be comprised in a non-RT RIC, and co-localized with the second node 112.
- the Non-RT RIC may be comprised, along with a collector 1101 in a Service Management and Orchestration system 1102.
- the computer system 100 may further comprise an O-RAN 1103 comprising a Near-RT RIC 1104 and an Open Centralized Unit/Open Distributed Unit (O-CU/O-DU) 1105.
- the computer system 100 may further comprise the plurality of third nodes 113 as external application servers 1106: Application Server #1 , Application Server #2 and Application Server #N. Starting in panel a), at 1 , the O-CU/O-DU 1105 may trigger data collection from the collector 1101.
- the collector 1101 may trigger retrieval of collected data from the Non-RT RIC 111 , 112. This may in turn trigger the data retrieval of application data by the non-RT RIC 111 , 112 from the Application Server #1 at 3, the Application Server #2 at 4 and the Application Server #N at 5.
- the Non-RT RIC 111 , 112 may trigger an ML workflow by training ML models.
- the Non-RT RIC 111 , 112 may deploy internal ML models.
- the Non-RT RIC 111 , 112 may deploy AI/ML models at the Near RT-RIC 1104.
- a performance evaluation and optimization phase may be triggered.
- Such benefit may be understood to be particularly advantageous in large language models (LLMs). Such models may be understood to be very expensive to train. Hence, it may be understood to be advantageous to unlearn from an LLM instead of learning from scratch, or retraining.
- LLMs large language models
- the second machine learning model prior to the unlearning procedure, may be configured to be a non-probabilistic model
- the first node 111 may be further configured to determine the second data samples from the posterior function of the second machine learning model using the non-parametric sampling procedure.
- the determining of the parametric approximation of the posterior function may be configured to use the parametric mixture model and the second data samples configured to be determined.
- the first node 111 may be further configured to store the second indication.
- the radio circuitry 1207 may be configured to set up and maintain at least a wireless connection with the second node 112, the third node, the radio network node 130, the device 140, and/or another structure in the computer system 100. Circuitry may be understood herein as a hardware component.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Probability & Statistics with Applications (AREA)
- Algebra (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
A method performed by a first node (111). The first node (111) obtains (402) one or more criteria to be met by first data to be used as input to obtain a first ML model to make predictions in a first domain. A second ML model has already been trained with second data from a second domain to make predictions in the second domain. The first node (111) obtains (403) information characterizing the second data and determines (406) first data samples in the second data lacking a fulfilment of the one or more criteria exceeding a first threshold. The first data samples have an effect on the second ML model. The first node (111) obtains (409) the first ML model by performing unlearning of the second ML model by reducing the effect of the first data samples. The first node (111) then provides (410) an indication indicating the first ML model.
Description
FIRST NODE AND METHODS PERFORMED THEREBY FOR HANDLING A FIRST
MACHINE LEARNING MODEL
TECHNICAL FIELD
The present disclosure relates generally to a first node and methods performed thereby for handling a first machine learning model. The present disclosure also relates generally to a computer program and a computer-readable storage medium, having stored thereon the computer program to carry out this method.
BACKGROUND
Computer systems in a communications network or communications system may comprise one or more nodes. A node may comprise one or more processors which, together with computer program code may perform different functions and actions, a memory, a receiving port, and a sending port. A node may be, for example, a server. Nodes may perform their functions entirely on the cloud.
Computer systems may be comprised in a telecommunications network. The telecommunications network may cover a geographical area which may be divided into cell areas, each cell area being served by a type of node, a network node in the Radio Access Network (RAN), radio network node or Transmission Point (TP), for example, an access node such as a Base Station (BS), e.g., a Radio Base Station (RBS), which sometimes may be referred to as e.g., gNB, evolved Node B (“eNB”), “eNodeB”, “NodeB”, “B node”, or Base Transceiver Station (BTS), depending on the technology and terminology used. The base stations may be of different classes such as e.g., Wide Area Base Stations, Medium Range Base Stations, Local Area Base Stations and Home Base Stations, based on transmission power and thereby also cell size. A cell may be understood to be the geographical area where radio coverage may be provided by the base station at a base station site. One base station, situated on the base station site, may serve one or several cells. Further, each base station may support one or several communication technologies. The telecommunications network may also comprise network nodes which may serve receiving nodes, such as user equipments, with serving beams.
The standardization organization Third Generation Partnership Project (3GPP) is currently in the process of specifying a New Radio Interface called Next Generation Radio or New Radio (NR), as well as a Fifth Generation (5G) Packet Core Network, which may be referred to as 5G Core Network (5GC). The advantages of 5G NR may include higher bandwidth, more resources, low latency and network slicing. 5G may provide services to
various applications, such as enhanced Mobile Broad Band (eMBB), machine to Machine type communication (mMTC), Ultra Reliable Low Latency Communication (URLLC), etc.
5G may be understood to bring in sizeable flexibility with technological advancements along with innovations of cloud and Artificial Intelligence (Al). This may be understood to bring a whole new set of opportunities in the enterprise segment.
For many enterprises, mobile cellular technology has already proven to bring great value to their digitalization process, which may include numerous use cases, such as autonomous robotics, enhanced video services, connected vehicles, remote operations, hazard, and maintenance sensors etc. This may be understood to not only enhance productivity in connected factories, but also make workplaces safer.
In the course of operations of the telecommunications network, data may be collected via the telecommunications network, which may enable to monitor and manage different functions.
The advent of for example, the Internet of Things (loT) has exponentially increased the amount of data to be monitored. The availability of large amounts of data, such as those collected for example, from loT devices, may be understood to enable the possibility of analysing such data to make predictions on events, with a high predictive power. To make predictions on events may be understood to refer to building mathematical models that may fit those data, which mathematical models may then be used to make predictions for such events. Within this context, machine learning models may be used to analyze the data collected, and enable an improved management of different types of operations via the telecommunications network.
Machine Learning
Machine learning (ML) may be understood as the study of computer algorithms that may improve automatically through experience. It is seen as a part of AL ML algorithms may build a model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to do so. ML algorithms may be used in a wide variety of applications, such as email filtering and computer vision, where it may be difficult or unfeasible to develop conventional algorithms to perform the needed tasks.
There may be basically three types of ML Algorithms: Supervised Learning, Unsupervised Learning, and Reinforcement Learning (RL).
Supervised Learning algorithms may comprise a target/outcome variable, or dependent variable, which may have to be predicted from a given set of predictors, that is, independent variables. Using this set of variables, a function may be generated that may map inputs to desired outputs. The training process may continue until the model may achieve a desired level of accuracy on the training data. Once an ML model may have been trained, an inference process may begin, whereby new data may be run through the ML model to
calculate an output. Examples of Supervised Learning may be Regression, Decision Tree, Random Forest, KNN, Logistic Regression etc.
In Unsupervised Learning algorithms, there may be no target or outcome variable to predict/estimate. It may be used for clustering a population into different groups, which may be widely used for segmenting customers in different groups for specific intervention. Examples of Unsupervised Learning may be K-means, mean-shift clustering, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Expectation-Maximization (EM) Clustering using Gaussian Mixture Models (GMM), Agglomerative Hierarchical Clustering, etc....
Cluster analysis or clustering may be understood as an ML technique which may comprise grouping a set of objects in such a way that objects in the same group, which may be called a cluster, may be understood to be more similar, in some sense, to each other than to those in other groups, that is, other clusters. It may be understood as a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including pattern recognition, image analysis, information retrieval, bioinformatics, data compression, computer graphics and ML.
Using an RL algorithm, a machine may be trained to make specific decisions. It may be understood to work as follows: the machine may be exposed to an environment where it may train itself continually using trial and error. This machine may learn from past experience and may try to capture the best possible knowledge to make accurate decisions. An example of RL may be a Markov Decision Process (MDP). The training using RL may comprise generating an ML model. To train such an ML model, an agent, given a state of the environment, may take an action in this environment and receive a reward. The action may result in a new state of the environment. This process may be repeated in a loop. Over time, the agent may learn to take actions that may result in larger immediate and future rewards, meaning that it may be understood to be in the best interest of the agent not to take the action that may only lead to the highest reward in the next state, but the action that may cumulatively lead to the highest reward in the next state and in a future number of states.
The agent may comprise a neural network which may input the state and may produce an action. There may be several ML algorithms that may be used for training the network of the agent, e.g., policy-learning based, such as actor-critic approaches, or value-based learning, such as deep-q networks.
A Network Data Analytics Function (NWDAF) may be understood to be a Network Function (NF) in the 5G Core Network (5GC). The NWDAF may be understood to be designed to collect data from diverse data sources such as User Equipments (UEs), other NFs in 5GC, and Operation, Administration, and Maintenance systems (OAM), Cloud, and Edge networks. The NWDAF may exploit and process the collected data to train ML models which may be used to provide predictions and information from the past to generate analytic reports
and deliver them to an NF. Accordingly, different NFs, called service consumers, may be subscribed to the NWDAF. The different NFs may benefit from the capability of the NWDAF and request information about the network state, such as slice load level-related network data, User equipment (UE)-related, and user data congestion analytics.
. Figure 1 is a schematic diagram depicting one of the problems that may be encountered. An ML model may be considered, which may be referred to as a source model 1 Machine learning according to existing methods may face some challenges, that may have been trained on data for solving a task. The data may have been obtained from a source domain 2 where there may be access to many data samples 3. A target domain 4 may be considered where there may be limited data samples available such that merely using these data will not be enough for training an ML model. In such a context it may be desired to transfer the source model 1 to the target domain 4 and enrich an ML model for the target domain 4, referred to herein as the target model, using the source model 1 . A first complication of such a scenario is that the source model 1 may have been trained using data samples that may not comply with the target domain specifications. Specifications may be understood to refer to one or more criteria that data of a certain domain may need to comply with. Specifically, the source domain may contain data samples to which the target domain may not have access or may contain data samples that may not be supportive of the use case in the target domain, for example the data samples may correspond to a different version, may be unlabelled, etc. Therefore, the source model 1 cannot be used directly as it may violate either data privacy rights or may not be suitable for the use case in the target. Figure 1 schematically depicts how the source model 1 may be trained using a) data samples 5 that may not be in compliance with the target domain specifications, b) data samples 6 that may not comply with the source specifications and c) data samples 7 that may comply with both, the source and data target specifications.
Figure 2 is a schematic diagram depicting the existing technologies to approach this problem, sample selection and source model selection. According to a first approach schematically depicted in panel a), which may be referred to as sample selection, the samples that do not meet the requirements of the target domain may be removed and the source model may be trained to be personalized for the target domain. According to a second approach schematically depicted in panel b), which may be referred to as source model selection, a source model may be chosen from a dictionary of the source models that may best correspond to the target specifications.
Machine learning according to the above described existing methods may result in waste of time and computing resources, as well as being costly.
SUMMARY
As part of the development of embodiments herein, one or more problems with the existing technology will first be identified and discussed.
There are a couple of problems with the first approach depicted in panel a) of Figure 2. First, retraining a new source model will take time and requires availability of sufficient computational resources at the compute node as the source model may be typically trained on many samples. Second, it requires availability of all data samples at the compute node, or alternatively in cloud implementation, it requires back-and-forth communication between the compute node and the data node from where the data may be obtained.
There are two main problems with the second approach depicted in panel b) of Figure 2. Firstly, it requires maintaining different source models that are trained on data samples with different specifications. The life cycle management of many models can become costly. Second, the chosen source model from the dictionary of source models may still be only partly relevant to the target domain. Also, source models having been trained on data samples that do not comply with the target specifications may be found not suitable for use. According to a third approach, which may be referred to as target agnostic model and which is not schematically depicted in Figure 2, a target-agnostic model may be built. That is, a model so generic that may be suitable for a large class of use cases in the target domain. In practice, however, such models may only be suitable for use cases in the target domain that may be somewhat related to the source domain. Finally, this approach is not applicable when some of the data used at the training of the target-agnostic model contained privacy sensitive information or data samples that were erroneous.
A second complication may be that the source model may have been trained using a dataset that contains erroneous or biased samples which were not known at the time of training. The existing approach is to remove the samples and retrain the source model from scratch. However, retraining of the source model may be costly, in particular where the data size is large.
According to the foregoing, it is an object of embodiments herein to improve the handling of machine learning models in a computer system.
According to a first aspect of embodiments herein, the object is achieved by a computer- implemented method, performed by a first node. The method is for handling a first machine learning model. The first node operates in a computer system. The first node obtains one or more criteria to be met by first data to be used as input to obtain a first machine learning model to make predictions in a first domain. A second machine learning model has already been trained with second data from a second domain to make predictions in the second domain. The first node obtains information characterizing the second data. The first node then determines first data samples in the second data lacking a level of fulfilment of the one or
more criteria exceeding the first threshold. The determined first data samples have an effect on the second machine learning model. The first node obtains the first machine learning model by performing an unlearning procedure of the second machine learning model. The first node performs the unlearning procedure by reducing the effect of the determined first data samples from the second machine learning model over a second threshold. The first node then provides an indication indicating the obtained first machine learning model.
According to a second aspect of embodiments herein, the object is achieved by the first node. The first node may be understood to be for handling the first machine learning model. The first node is configured to operate in the computer system. The first node is configured to obtain the one or more criteria to be met by the first data to be used as input to obtain the first machine learning model to make predictions in the first domain. The second machine learning model has already been trained with the second data from the second domain to make predictions in the second domain. The first node is also configured to obtain the information configured to characterize the second data. The first node is further configured to determine the first data samples in the second data lacking the level of fulfilment of the one or more criteria exceeding the first threshold. The first data samples configured to be determined have the effect on the second machine learning model. The first node is additionally configured to obtain the first machine learning model by performing the unlearning procedure of the second machine learning model by reducing the effect of the determined first data samples from the second machine learning model over the second threshold. The first node is further configured to provide the indication configured to indicate the first machine learning model configured to be obtained.
According to a third aspect of embodiments herein, the object is achieved by a computer-readable storage medium, having stored thereon the computer program, comprising instructions which, when executed on at least one processing circuitry, cause the at least one processing circuitry to carry out the method performed by the first node.
According to a fourth aspect of embodiments herein, the object is achieved by a computer program, comprising instructions which, when executed on at least one processing circuitry, cause the at least one processing circuitry to carry out the method performed by the first node.
By obtaining the one or more criteria to be met by first data to be used as input to obtain the first machine learning model to make predictions in the first domain, the first node may then be enabled to check if there may be a suitable, machine learning model, that may have already been trained with data that may sufficiently comply with the one or more criteria, and that may therefore be adapted to obtain the first machine learning model, instead of having to train a new model from scratch.
By obtaining the information characterizing the second data, the first node may then be enabled to determine whether the second machine learning model may be suitable or not as a starting point of adaptation in order to obtain the first machine learning model to make predictions in the first domain.
By determining the first data samples in the second data lacking the level of fulfilment of the one or more criteria exceeding the first threshold, the first node may be enabled to identify the, e.g., unwanted, noisy, biased, erroneous, and privacy-sensitive samples from the fully trained second machine learning model. The first node may then be enabled to remove the effect of such samples from the second machine learning model while not requiring retraining the second machine learning model by performing an unlearning procedure of the second machine learning model in order to minimize the effect that the first data samples may have had on the second machine learning model. That is, in order to “erase” the effect they may have had on the second machine learning model.
By obtaining the first machine learning model by performing the unlearning procedure of the second machine learning model, the first node may then be enabled to customize the second machine learning model according to the one or more criteria of the first domain via the model unlearning process. The first node may then be enabled to provide the first machine learning model as an adaptive on-demand customized model. The first node may be enabled to obtain the on-demand customized model in a shorter amount of time than it would otherwise take to retrain the second machine learning model, or training the first machine learning model from scratch, and hence, without requiring availability of high computational resources. Further, the first node may be enabled to obtain the first machine learning model without requiring availability of all data samples. Hence back-and-forth communication between the first node a data source may not be required.
The customized model may be understood to not need to be maintained. As such, a minimal number of source models may need to be maintained and continuously evolved. Hence the expensive storage requirements and maintenance efforts otherwise existing due to high number of models may be advantageously reduced.
As the second machine learning model may be understood to not need retraining, but rather may need to be maintained, this may be understood to provide a potential energy savings.
A particular advantage may be that the first node may enable to address requirements, e.g., European Union (EU) directive on Al, where the right for private persons to have their data removed from data bases and systems, e.g., including ML models, may need to be complied with.
The first node may also be further enabled to send the customized model, now without the unwanted representation of the source node.
By providing the indication, the first node may enable the usage of the first machine learning model to for example, make predictions in the first domain.
BRIEF DESCRIPTION OF THE DRAWINGS
Examples of embodiments herein are described in more detail with reference to the accompanying drawings, according to the following description.
Figure 1 is a schematic diagram illustrating an example of a source model trained using samples that may not be in compliance with the target domain specifications, according to existing methods.
Figure 2 is a schematic diagram illustrating a conceptual visualization of the existing solutions for sample selection and source model selection.
Figure 3 is a schematic diagram illustrating two non-limiting examples, in panels a) and b), of a computer system, according to embodiments herein.
Figure 4 is a flowchart depicting a method in a first node, according to embodiments herein.
Figure 5 is a schematic diagram depicting components of the first node according to of a nonlimiting example of embodiments herein.
Figure 6 is a schematic diagram depicting particular aspects of another non-limiting example of the method performed by the first node, according to embodiments herein.
Figure 7 is a schematic diagram depicting a non-limiting example of other particular aspects of another non-limiting example of the method performed by the first node, according to embodiments herein.
Figure 8 is a signalling diagram depicting aspects of non-limiting example of the method performed by the first node, according to embodiments herein.
Figure 9 is a schematic diagram depicting a non-limiting example of other particular aspects of another non-limiting example of the method performed by the first node, according to embodiments herein.
Figure 10 is a signalling diagram depicting aspects of non-limiting example of the method performed by the first node, according to embodiments herein.
Figure 11 is a signalling diagram depicting other aspects of non-limiting example of the method performed by the first node, according to embodiments herein.
Figure 12 is a schematic block diagram illustrating an embodiment of a first node, according to embodiments herein.
DETAILED DESCRIPTION
Certain aspects of the present disclosure and their embodiments address the challenges identified in the Background and Summary sections with the existing methods and provide solutions to the challenges discussed.
Embodiments herein may be understood to relate to a method and apparatus for creating adaptive-on-demand customized models from a source model. As a generalized overview, the method may comprise the apparatus, a first node, receiving a request for a customized ML model for a specific target domain. The first node may receive data specifications, or guidelines, from the source domain and the target domain. Next, the first node may determine the out-of-policy data samples, and/or irrelevant data samples, which may be understood to be data samples in the source domain that may not meet the specifications of the target domain. The first node may then execute a machine unlearning method that may allow removal of the effect of out-of-policy and/or irrelevant data samples from the source model and hence constructing an adapted version of the source model that may be customized for the need of the target with the objective that the customized model may follow the data guidelines and specifications from the source and target domains. The new model for the target domain may be communicated to the requesting entity.
Embodiments herein may be understood to relate to how to maintain a minimal number of source models and how to make them customized to a target domain on demand.
Some of the embodiments contemplated will now be described more fully hereinafter with reference to the accompanying drawings, in which examples are shown. In this section, the embodiments herein will be illustrated in more detail by a number of exemplary embodiments. Other embodiments, however, are contained within the scope of the subject matter disclosed herein. The disclosed subject matter should not be construed as limited to only the embodiments set forth herein; rather, these embodiments are provided by way of example to convey the scope of the subject matter to those skilled in the art. It should be noted that the exemplary embodiments herein are not mutually exclusive. Components from one embodiment may be tacitly assumed to be present in another embodiment and it will be obvious to a person skilled in the art how those components may be used in the other exemplary embodiments.
Several embodiments and examples are comprised herein. It should be noted that the embodiments and/or examples herein are not mutually exclusive. Components from one embodiment or example may be tacitly assumed to be present in another embodiment or example and it will be obvious to a person skilled in the art how those components may be used in the other exemplary embodiments and/or examples.
Figure 3 depicts two non-limiting examples, in panels “a” and “b”, respectively, of a computer system 100, in which embodiments herein may be implemented. In some example
implementations, such as that depicted in the non-limiting example of Figure 3a, the computer system 100 may be a computer network. In other example implementations, such as that depicted in the non-limiting example of Figure 3b, the computer system 100 may be implemented in a telecommunications system, sometimes also referred to as a telecommunications network, cellular radio system, cellular network, or wireless communications system. In some examples, the telecommunications system may comprise network nodes which may serve receiving nodes, such as wireless devices. The computer system 100 may for example be a network such as a 5G system, or a newer system supporting similar functionality. The telecommunications system may additionally support other technologies such as, for example, Long-Term Evolution (LTE), e.g., LTE Frequency Division Duplex (FDD), LTE Time Division Duplex (TDD), LTE Half-Duplex Frequency Division Duplex (HD-FDD), or LTE operating in an unlicensed band. The telecommunications system may also support yet other technologies, such as Wideband Code Division Multiple Access (WCDMA), Universal Mobile Telecommunications System Terrestrial Radio Access (UTRA) TDD, Global System for Mobile communications (GSM) network, GSM/Enhanced Data Rate for GSM Evolution (EDGE) Radio Access Network (GERAN) network, Ultra-Mobile Broadband (UMB), EDGE, any combination of Radio Access Technologies (RATs) such as e.g. MultiStandard Radio (MSR) base stations, multi-RAT base stations etc., any 3rd Generation Partnership Project (3GPP) cellular network, Wireless Local Area Network/s (WLAN) or WiFi network/s, Worldwide Interoperability for Microwave Access (WiMax), IEEE 802.15.4-based low-power short-range networks such as IPv6 over Low-Power Wireless Personal Area Networks (6LowPAN), Zigbee, Z-Wave, Bluetooth Low Energy (BLE), or any cellular network or system. The telecommunications system may for example support a Low Power Wide Area Network (LPWAN). LPWAN technologies may comprise Long Range physical layer protocol (LoRa), Haystack, SigFox, LTE-M, and Narrow-Band loT (NB-loT).
The computer system 100 comprises a first node 111. The computer system 100 may comprise further nodes. In some embodiments, the computer system 100 may comprise a second node 112. In yet other embodiments, as depicted in the non-limiting examples of Figure 3, the computer system 100 may alternatively or additionally comprise a plurality of third nodes 113. The computer system 100 may comprise additional nodes.
Any of the first node 111 , the second node 112 and the plurality of third nodes 113 may be understood, respectively, as a first computer system or server, a second computer system or server and a plurality of third computer systems or servers. Any of the first node 111 , the second node 112 and the third nodes in the plurality of third nodes 113 may be implemented as a standalone server in e.g., a host computer in the cloud 115, as depicted in the nonlimiting example of Figure 3b) for the first node 111 , the second node 112 and the plurality of third nodes 113. In other examples, any of the first node 111 , the second node 112 and the
third nodes in the plurality of third nodes 113 may be a distributed node or distributed server, such as a virtual node in the cloud 115, and may perform some of its respective functions locally, e.g., by a client manager, and some of its functions in the cloud 115, by e.g., a server manager. In other examples, any of the first node 111 , the second node 112 and the third nodes in the plurality of third nodes 113 may perform its functions entirely on the cloud 115, or partially, in collaboration or collocated with a radio network node. Yet in other examples, any of the first node 111 , the second node 112 and the third nodes in the plurality of third nodes 113 may also be implemented as processing resources in a server farm.
Yet in other examples, any of the first node 111 , the second node 112 and the third nodes in the plurality of third nodes 113 may also be implemented as virtual network functions, e.g., according to a Network Functions Virtualization (NFV) Architecture.
Any of the first node 111 , the second node 112 and the plurality of third nodes 113 may be under the ownership or control of a service provider or may be operated by the service provider, or on behalf of the service provider.
In some examples, which are not depicted in Figure 3, any of the first node 111 , the second node 112 and any of the third nodes in the plurality of third nodes 113 may be colocated or be the same node. However, in typical embodiments, the first node 111 , the second node 112 and the plurality of third nodes 113 may be different nodes.
Any of the first node 111 , the second node 112 and the third nodes in the plurality of third nodes 113 may have a capability to perform machine-implemented learning procedures, which may be also referred to as “machine learning” (ML).
In some non-limiting examples, any of the first node 111 , the second node 112 and the third nodes in the plurality of third nodes 113 may have a capability to manage an artificial neural network. The artificial neural network may be understood as a machine learning framework, which may comprise a collection of connected nodes, where in each node or perceptron, there may be an elementary decision unit. Each such node may have one or more inputs and an output. The input to a node may be from the output of another node or from a data source. Each of the nodes and connections may have certain weights or parameters associated with it. In order to solve a decision task, the weights may be learnt or optimized over a data set which may be representative of the decision task. The most commonly used node may have each input separately weighted, and the sum may be passed through a non-linear function which may be known as an activation function. The nature of the connections and the node may determine the type of the neural network, for example a feedforward network, recurrent neural network etc. That any of the first node 111 , the second node 112 and the third nodes in the plurality of third nodes 113 may have the capability to manage the artificial neural network may be understood herein as having the capability to
store the training data set and the models that may result from the machine learning, to train a new model, and once the model may have been trained, to use this model for prediction.
Any of the first node 111 , the second node 112 and the third nodes in the plurality of third nodes 113 may, for example, support running python/Java with Tensorflow or Pytorch, Theano etc... Any of the first node 111 , the second node 112 and the third nodes in the plurality of third nodes 113 may also have GPU capabilities.
Any of the first node 111 , the second node 112 and the third nodes in the plurality of third nodes 113 may, in some examples, be an operator managed network analytics logical function. That is, as a node that may have a capability to handle data collection and analysis from different sources in the computer system 100. Any of the first node 111 , the second node 112 and the third nodes in the plurality of third nodes 113 may interact with different entities for different purposes, such as to data collection provided by, e.g., Access and Mobility Function (AMF), Session Management Function (SMF), Policy Control Function (PCF), Unified Data Management Function (UDM), Application Function (AF), based on event subscription, directly or via a Network Exposure Function (NEF), and Operations And Management (OAM), retrieval of information from data repositories, e.g., Unified Data Repository (UDR) via UDM for subscriber-related information, retrieval of information about NFs, e.g., NRF for NF-related information, and Network Slice Selection Function (NSSF) for slice-related information, on demand provision of analytics to consumers, and storage in an Analytics Data Repository Function, e.g., Analytics Data Repository Function (ADRF), for two types of data: collected Data, e.g., Event Exposure data, and Analytics reports.
As depicted in Figure 3, a non-limiting example of the first node 111 , the second node 112 and the third nodes in the plurality of third nodes 113, wherein the computer system 100 may be a 5G network, may be a respective NWDAF. In particular examples, any of the first node 111 , the second node 112 and the plurality of third nodes 113 may be different Model Training Logical Functions (MTLFs), where they may hold data in different regional NWDAFs. In particular examples of embodiments herein, the first node 111 may be implemented at an ML life cycle management (LCM) unit, located for example in a 3GPP NWDAF.
The first node 111 may be node having a capability to perform an unlearning procedure, as will be described later, e.g., in relation to Action 409.
The second node 112 may be a node which may comprise data in a target domain where it may be interested in making predictions, but which may lack a machine learning model to make such predictions.
The plurality of third nodes 113 may be nodes which may be respectively training, or which may have trained, respective machine learning models, in respective source domains. The respective source domains may or may not be the same domain.
The respective domains may be understood to be different, at least partially, from the target domain where the second node 112 may have to make predictions.
The computer system 100 may in some examples, comprise one or more radio network nodes, such as radio network node 130, depicted in Figure 3 b). The radio network node 130 may be, e.g., comprised in a Radio Access Network of the telecommunications system. That is, the radio network node 130 may be a transmission point such as a radio base station, for example a gNB, an eNB, or any other network node with similar features capable of serving a wireless device, such as a user equipment or a machine type communication device, in the computer system 100. In typical examples, the radio network node 130 may be a base station, such as a gNB or an eNB. In other examples, the radio network node 130 may be a distributed node, such as a virtual node in the cloud 115, and may perform its functions entirely on the cloud 115, or partially, in collaboration with a radio network node.
The telecommunications system may cover a geographical area, which in some embodiments may be divided into cell areas, wherein each cell area may be served by a radio network node 130, although, one radio network node 130 may serve one or several cells. In the example of Figure 3, the cells are not depicted to simplify the figure. The radio network node 130 may be of different classes, such as, e.g., macro eNodeB, home eNodeB or pico base station, based on transmission power and thereby also cell size. In some examples, the radio network node 130 may serve receiving nodes with serving beams. The radio network node 130 may be directly connected to one or more core networks.
Any of the first node 111 and the second node 112, and/or any of the nodes comprised in the computer system 100 may support one or several communication technologies, and its name may depend on the technology and terminology used.
A device 140 may be comprised in the telecommunication network. The device 140 comprised in the computer system 100 may be a wireless communication device such as a 5G UE, or a UE, which may also be known as e.g., mobile terminal, wireless terminal and/or mobile station, a Customer Premises Equipment (CPE) a mobile telephone, cellular telephone, or laptop with wireless capability, just to mention some further examples. The device 140 comprised in the telecommunications system may be, for example, portable, pocket-storable, hand-held, computer-comprised, or a vehicle-mounted mobile device, enabled to communicate voice and/or data, via the RAN, with another entity, such as a server, a laptop, a Personal Digital Assistant (PDA), or a tablet, Machine-to-Machine (M2M) device, device equipped with a wireless interface, such as a printer or a file storage device, modem, sensor, loT device, or any other radio network unit capable of communicating over a radio link in a communications system. In typical examples, the device 140 may be, or comprise, a microphone. The device 140 comprised in the telecommunications system may be enabled to communicate wirelessly in the telecommunications system. The communication may be performed e.g., via a RAN,
and possibly the one or more core networks, which may be comprised within the telecommunications system.
It may be understood that the telecommunications network may comprise additional radio network nodes 130 and/or additional devices 140.
The first node 1 11 may be configured to communicate within the computer system 100 with the second node 112 over a first link 141 , e.g., a radio link, or a wired link. The first node
1 11 may be configured to communicate within the computer system 100 with the plurality of third nodes 1 13 over a respective second link 142, e.g., a radio link, or a wired link. The first node 1 11 may be configured to communicate within the computer system 100 with the radio network node 130 over a third link 143, e.g., a radio link, or a wired link. The second node
1 12 may be configured to communicate within the computer system 100 with the radio network node 130 over a fourth link 144, e.g., a radio link, or a wired link. The plurality of third nodes
1 13 may be configured to communicate within the computer system 100 with the radio network node 130 over a respective fifth link 145, e.g., a radio link, or a wired link. The radio network node 130 may be configured to communicate within the computer system 100 with the device 140 over a sixth link 146, e.g., a radio link.
Any of the first link 141 , the respective second link 142, the third link 143, the fourth link 144, the respective fifth link 145 and the sixth link 146 may be a direct link or may be comprised of a plurality of individual links, wherein it may go via one or more computer systems or one or more core networks in the computer system 100, which are not depicted in Figure 3, or it may go via an optional intermediate network. The intermediate network may be one of, or a combination of more than one of, a public, private or hosted network; the intermediate network, if any, may be a backbone network or the Internet; in particular, the intermediate network may comprise two or more sub-networks, which is not shown in Figure 3.
In general, the usage of “first”, “second”, “third”, “fourth”, “fifth” and/or “sixth” herein may be understood to be an arbitrary way to denote different elements or entities, and may be understood to not confer a cumulative or chronological character to the nouns they modify.
Some of the embodiments contemplated herein will now be described more fully with reference to the accompanying drawings. Other embodiments, however, are contained within the scope of the subject matter disclosed herein, the disclosed subject matter should not be construed as limited to only the embodiments set forth herein; rather, these embodiments are provided by way of example to convey the scope of the subject matter to those skilled in the art.
Embodiments of a computer-implemented method, performed by the first node 1 11 , will now be described with reference to the flowchart depicted in Figure 4. The method is for
handling a first machine learning model. The first node 1 11 operates in the computer system 100.
Several embodiments are comprised herein. In some embodiments all the actions may be performed. In some embodiments, one or more actions may be optional. In Figure 4, optional actions are indicated with dashed lines. It should be noted that the examples herein are not mutually exclusive. Components from one embodiment may be tacitly assumed to be present in another embodiment and it will be obvious to a person skilled in the art how those components may be used in the other exemplary embodiments. One or more embodiments may be combined, where applicable. All possible combinations are not described to simplify the description.
In some embodiments, the computer system 100 may be 5G system. In some of such embodiments, the first node 1 11 may be an NWDAF.
Action 401
In this Action 401 , the first node 1 11 may receive, from the second node 112 operating in the computer system 100, a first indication. The first indication may indicate a request for a first machine learning model to make predictions in a first domain.
A domain may be understood as an environment where a machine learning model may be being trained or may have been trained. As used herein, domain may be understood to refer to a data domain, wherein the data domain may have unique characteristics, such as e.g., data distribution, data features, system features, environment properties, data protection rights, etc. These unique characteristics may differ between data domains. In embodiments herein, the first domain may be understood to be a target domain. That is, a domain where there may be an interest, e.g., by the second node 1 12, in making predictions.
The receiving in this Action 401 may be performed, e.g., via the first link 141 .
In some embodiments wherein the computer system 100 may be 5G system, the second node 1 12 may be an NWDAF. In some of such embodiments, the first node 1 11 may be a first NWDAF and the second node 1 12 may be a second NWDAF.
The second node 112 may be also referred to as a “target node”, which may be understood to refer to a node from the second, target, domain, which may need the first machine learning model in order to make the predictions in the first domain. The second node 1 12 may have data but not the model to make the desired predictions. In some non-limiting examples wherein the second node 1 12 may be an NWDAF, the second mode 1 12 may ask yet another node, e.g., Network Repository Function (NRF), operating in the computer system 100, for a discovery of available NWDAFs before sending the first indication to the first node 1 11. The NRF may indicate to the second node 112 the availability of the first node 1 11 ,
which may then trigger the second node 112 to send the first indication to the first node 111 in this Action 111.
By receiving the first indication from the second node 112 in this Action 401 , the first node 111 may enable the second node 112 to ultimately obtain a predictive model, the first machine learning model, to make predictions in the first domain, without requiring that the second node 112 may need to train and/or maintain the first machine learning model itself.
Action 402
In this Action 402, the first node 111 obtains one or more criteria to be met by first data to be used as input to obtain the first machine learning model to make predictions in the first domain.
Obtaining in this Action 401 may comprise receiving, e.g., from the second node 112, e.g., via the first link 141 .
The first data may be also referred to as target data. Target may be understood to refer herein to where the predictions of the first machine learning model may be desired or needed to be made. In this document,
may be used to denote the data from the first domain, that is, the target domain.
The one or more criteria may be understood to be specifications of the first domain. The one or more criteria may therefore also be referred to as target specifications. The one or more criteria may be understood to include meta information about the first domain, such as: type of environment, version, feature attributes, data access right on the sample level, data access right on distribution level, data quality metrics per sample, etc. As a non-limiting illustrative example, the type of environment may be a base station located in a city-centre area, the version may be LTE, the feature attributes may be analytics data related to performance, configuration and location of base stations and also core network, the data access right may be that analytics data related to performance, configuration and location of base stations and also core network related to UE location, behavior and mobility may be restricted, data quality may be that analytics data related to performance, configuration and location of base stations and also core network collected during weekended time may need to be discarded.
In this document, ZDt may be used to denote the specifications of the first data in the first domain, that is, the target domain.
In some non-limiting examples, the obtaining in this Action 402 of the one or more criteria may be performed by a first module, referred to herein as a “target specifications” module, comprised in the first node 111.
In some embodiments, the obtaining in this Action 402 of one or more criteria may be responsive to the received first indication.
In some embodiments, the first node 111 , e.g., via the target specification module, may fetch the latest updates of the one or more criteria regularly.
By obtaining the one or more criteria to be met by first data to be used as input to obtain the first machine learning model to make predictions in the first domain, the first node 111 may then be enabled to check if there may be a suitable, machine learning model, that may have already been trained with data that may sufficiently comply with the one or more criteria, and that may therefore be adapted to obtain the first machine learning model, instead of having to train a new model from scratch.
Action 403
According to embodiments herein, a second machine learning model has already been trained with second data from a second domain to make predictions in the second domain. The second domain may be referred to herein as a source domain, as it may be understood to be the domain wherein the second machine learning model may have been trained. The second data may be also referred to herein as the source data. The second machine learning model may be referred to as the source model. The second machine learning model may be understood to be a model that may have been trained on the data samples from different datasets, each of which with their own specifications for a specific use case. The second machine learning model may be, in some examples, a global model that may continuously evolve as new data may become available. The second machine learning model may belong to a different class of machine learning models including but not limited to neural networks, random forest, linear models, etc. The only requirement of the second machine learning model may be understood to be that the class of model may have to be differentiable or approximately differentiable. Being differentiable may be understood to mean being able to compute derivatives of the objective function of a learning problem with respect to the parameters of the learning model. In this context, this may be understood to mean for this class of models they may be differentiable in the space of their parameters so that the derivative of the learning objective with respect to the parameters of the model may be computed.
In this document, Ds =
may be used to indicate the second data, that is, the training data in the second domain.
In this document, fe may denote the second machine learning model trained using the second data, that is, the training data Ds.
In this Action 403, the first node 111 obtains information characterizing the second data. The information characterizing the second data may be also referred to as the source data specifications. In this document, ZDs = zDs[ n = 1, ..., N] may be used to indicate the source
data specifications, where ZD* may denote the source specifications for n-th data sample, D^. The source data specification ZDs may be understood to include meta information about the second domain, such as type of the data features, feature attributes, data quality metrics per sample, the environment specifications, e.g., type of environment, information such as climate, landscape, temperature, dense/rural, data access rights, on the sample level, data access right on distribution level, e.g., who may have access rights to the data, the time interval of the collected data, etc. In addition, information about the software and hardware versions as well as the radio access technology, e.g., 3G, 4G, 5G, ... , and core network version of the product that the data may be received from. The kind of information that may characterize the second data may be understood to be equivalent to the one or more criteria, but in relation to the second data. Hence, the information obtained in this Action 403 may be understood as second information, that is source specifications whereas the one or more criteria may be understood as first information, that is, target specifications. The information characterizing the second data may be at least partially overlapping with the one or more criteria.
In some non-limiting examples, the obtaining in this Action 403 of the information characterizing the second data may be performed by a second module, referred to herein as a “source specifications” module, comprised in the first node 111.
In some embodiments, the first node 111 , e.g., via the source specification module, may fetch the latest updates of the information characterizing the second data regularly.
In some embodiments, the second machine learning model may be one of a plurality of second machine learning models that may have already been trained with respective second data from the second domain to make respective predictions in the second domain. In such embodiments, the obtaining in this Action 403 of the information may comprise obtaining respective information characterizing the respective second data. In other words, if there are more than one second machine learning model available, the first node 111 may obtain the information of each of those second machine learning models in this Action 403.
In some embodiments wherein the respective information characterizing the respective second data may be obtained from the plurality of third nodes 113, the computer system 100 may be a 5G network and each of first node 111 , the second node 112 and the plurality of third nodes 113 may be NWDAFs. The plurality of third nodes 113 may be understood as a plurality of source nodes.
The first node 111 may have access to all second machine learning models existing at different nodes, e.g., NWDAFs, e.g., regional, and may request additional data specifications from those potential source NWDAFs where a second machine learning model or plurality of second machine learning models being trained historically may be stored at.
By obtaining the information characterizing the second data in this Action 403, the first node 111 may then be enabled to determine whether the second machine learning model, or at least one of the second machine learning models in the plurality of second machine learning models may be suitable or not as a starting point of adaptation in order to obtain the first machine learning model to make predictions in the first domain, as will be explained in the next Action 404 and Action 405. Moreover, the first node 111 may be enabled to then determine which data samples in the second data may not fulfil the one or more criteria to a sufficient degree, as will be explained in Action 406.
Action 404
In this Action 404, the first node 111 may determine whether or not a level of fulfilment of the one or more criteria by the information characterizing the second data may exceed a first threshold. In other words, the first node 111 may determine if the target specification may differ within acceptable threshold from the source specification.
The first threshold may be understood to be a configurable, e.g., a user-defined, threshold.
Determining may be understood as calculating, estimating, deriving, or similar, or obtaining or receiving from another node.
In some embodiments wherein the second machine learning model may be one of the plurality of second machine learning models that may have already been trained with respective second data from the second domain to make respective predictions in the second domain and, the obtaining in Action 403 of the information may comprise obtaining the respective information characterizing the respective second data, the determining in this Action 404 may further comprise determining whether or not a respective level of fulfilment of the one or more criteria by the respective information exceeds the first threshold. The information that may exceed the first threshold may be one set of the respective information.
By determining whether or not the level of fulfilment of the one or more criteria by the information characterizing the second data may exceed the first threshold in this Action 404, the first node 111 may be enabled to determine whether or not the first machine learning model may need to be determined from scratch. That is, whether or not it may not be worthwhile to derive the first machine learning model to make predictions in the first domain by adapting the second machine learning model. In an alternative implementation, if the one or more criteria of the first domain differ substantially from the information characterizing the second data, the first node 111 may learn a new model tailored for the first domain instead of initiating the unlearning procedure that will be described in Action 409.
In the event that there may be multiple second machine learning models to choose from, with the proviso that the level of fulfilment of the one or more criteria by the information
characterizing the second data may exceed the first threshold, the first node 111 may proceed to determine which second machine learning model of the plurality of second machine learning models may be closest to the one or more criteria, that is, to the target specification, and use that second machine learning model.
With the proviso that the level of fulfilment of the one or more criteria by the information characterizing the second data may fail to exceed the first threshold, the first node 111 may refrain from determining the first machine learning model from the second machine learning model and may instead proceed to determine the first machine learning model from scratch. In other words, if the one or more criteria differs more than the first, predefined, threshold from all the available source specifications, that is, from all the respective information characterizing the respective second data, then a new first machine learning model, may be constructed for the target. As stated earlier, the first threshold may be a configurable, e.g., a user-defined, threshold. The first node 111 may then choose to add this newly built model to a dictionary of models.
Action 405
In some embodiments wherein the second machine learning model may be one of the plurality of second machine learning models, the first node 111 , in this Action 405, may select the second machine learning model out of the plurality of second machine learning models. The selecting in this Action 405 may be based on the respective level of fulfilment of the one or more criteria. That is, the first node 111 may select the second machine learning model that may have a highest level of fulfilment of the one or more criteria, that is, that may be closer to the one or more criteria.
The first node 111 may perform the selecting in this Action 405 with the proviso that the level of fulfilment of the one or more criteria by the respective information characterizing the respective second data may exceed the first threshold for at least two of the second machine learning models in the plurality of second machine learning models.
Based on the respective information characterizing the respective second data and the one or more criteria, that is, based on the data specifications, the first node 111 may perform assessment and decide which second machine learning model may be useful for the first domain and ask the corresponding third node 113 in the second domain, e.g., NWDAF source, to send the second machine learning model and the respective data, or representation, that may not be desired to be represented in the first machine learning model.
In practice, if there is only a single source model, it may necessitate many changes for every request. In the case the change of the second machine learning model may be too dramatic for a particular group of target domain models, the first node 111 may need to spawn and train multiple source models while maintaining a minimal number of such source models.
In some examples of embodiments herein, the first node 111 may create a model dictionary from multiple source models each of which satisfying certain specifications, that is each of which having respective information characterizing the respective second data. Once the first node 111 may receive a request from the second node 112 in Action 401 , the first node 111 may, in this Action 405, choose a source model from the model dictionary that may be most related to the target in terms of the data specifications and requirements.
The first node 111 may perform the selection in this Action 405 by comparing the one or more criteria, that is, the target specification, against the respective information characterizing the respective second data, that is, the source specifications. The source specification that may be determined to be closest to the one or more criteria may then be chosen and accordingly its corresponding second machine learning model and respective second data.
The first node 111 may then proceed with unlearning as will be described in Action 409. This procedure is schematically depicted later in Figure 9.
By selecting the second machine learning model out of the plurality of second machine learning models based on the respective level of fulfilment of the one or more criteria in this Action 405, the first node 111 may be enabled to select the second machine learning model that may be closer to the one or more criteria, and therefore choose a starting point for the adaptation of the second machine learning model to the first domain that may require the least adaptation, which may be understood to make the process more efficient and the resulting first machine learning model less costly and more accurate.
Action 406
Once the second machine learning model may have been selected, it may be understood that not all the data samples in the second data it may have been trained with may equally fulfil the one or more criteria at the same level. There may be data samples that may have a good match to the one or more criteria, while other data samples may have a poorer match to the one or more criteria or not match them at all. In this Action 406, the first node 111 may try to identify which data samples the poor matches or no matches at all may be.
In this Action 406, the first node 111 determines first data samples in the second data lacking a level of fulfilment of the one or more criteria exceeding the first threshold.
The determined first data samples have an effect on the second machine learning model.
Determining may be understood as calculating, estimating, deriving, or similar, or obtaining or receiving from another node.
The first node 111 may perform the determining in this Action 406 taking as input the one or more criteria, that is, the specifications received in Action 402 from the second node
112 and the information obtained in Action 403, e.g., that is, the current set of source specifications ZDs. The first node 111 may then determine which data samples within the second data may not be relevant for the first domain, and may produce a list of data samples Ie that may need to be erased. This Action 406 may be shown as:
Ie <- Assessment
The first node 111 may perform the determining in this Action 406 by checking the correspondence between the one or more criteria and the information characterizing the second data, that is, the meta information of source and target specifications, for example, the type of feature attributes, environment, data access rights, etc... .
In some non-limiting examples, the determining in this Action 406 of the first data samples may be performed by a third module, referred to herein as an “assessment module”, comprised in the first node 111.
By determining the first data samples in the second data lacking the level of fulfilment of the one or more criteria exceeding the first threshold in this Action 406, the first node 111 may be enabled to identify the, e.g., unwanted, noisy, biased, erroneous, and privacy-sensitive samples from a fully trained second machine learning model. The first node 111 may then be enabled to remove the effect of such samples from the second machine learning model while not requiring retraining the second machine learning model by performing an unlearning procedure of the second machine learning model in order to minimize the effect that the first data samples may have had on the second machine learning model. That is, in order to “erase” the effect they may have had on the second machine learning model, as will be explained in Action 409.
Action 407
In this document, fe may denote the second machine learning model and 0 may denote the parameters of the second machine learning model. For example, in the case of a neural network as the choice of the underlying second machine learning model, 0 may include all learnable parameters. In order to ultimately adapt the second machine learning model to obtain the first machine learning model, the posterior distribution of the second machine learning model may need to be calculated first. If the second machine learning model is not a probabilistic model expressed by a posterior distribution, then the first node 111 may need to approximate it.
The posterior distribution of the second machine learning may be denoted by p(0\Ds). The posterior distribution may be learned from all data Ds. For the probabilistic class of
models, e.g., a Gaussian process, the posterior may be immediately available as part of the learning. However, for a non-probabilistic family of models, such as neural networks, the first node 111 may first need to approximate the posterior. The first node 111 may approximate the posterior through this Action 407 and the next Action 408.
In some embodiments wherein the second machine learning model may be a non- probabilistic model, the first node 111 may, in this Action 407, determine second data samples from a posterior function of the second machine learning model using a non-parametric sampling procedure. This Action 407 may be also referred to as nonparametric modelling, wherein the empirical distribution of these samples may follow approximately the true posterior distribution. The difference between the empirical posterior and the true posterior may be understood to be that the latter may be understood to be expressed in terms of a probability density function, in other words, it may be understood to have a known functional form. It may be noted that at this stage, the first node 111 may have only access to the samples and may only know the empirical distribution of the posterior but not know the parametric functional form of the posterior.
Determining may be understood as calculating, estimating, deriving, or similar, or obtaining or receiving from another node.
The first node 111 may perform the determining of the second data samples in this Action 407 by drawing samples that approximately follow the empirical posterior distribution using a non-parametric Markov chain Monte Carlo (MCMC) sampling technique. A well-known technique that may be used may be Hamiltonian Markov (HMC) chain sampling. HMC may be understood to take as input the energy function E(0) and its gradient V0E(e), that is:
where E(e) = - logfe and 07- may be understood to be the j-th Monte Carlo sample and J may be understood to be the total number of Monte Carlo samples. The set of posterior samples may be denoted as: = { l7 = 1. /}■
In some non-limiting examples, the determining in this Action 407 of the second data samples may be performed by a fourth module, referred to herein as a “posterior module learner”, comprised in the first node 111.
By determining the second data samples from the posterior function of the second machine learning model in this Action 407, the first node 111 may then be enabled, in the next Action 408, to determine a functional form of the posterior function, particularly, to derive a parametric approximation of the posterior function, which may enable the first node 111 to in turn then proceed with the derivation of the first machine learning model in Action 409,
according to embodiments herein, in the event that the second machine learning model may be a non-probabilistic model. That is, by performing this Action 407, as well as the next Action 408, the first node 111 may make the method described herein applicable to instances wherein the second machine learning model may be non-probabilistic.
Action 408
In this Action 408, the first node 111 may determine a parametric approximation of the posterior function using a parametric mixture model and based on the second data.
In some embodiments wherein the second machine learning model may be a non- probabilistic model, the first node 111 may, in this Action 408, determine a parametric approximation of the posterior function using a parametric mixture model and the determined second data samples in Action 407. This Action 408 may be also referred to herein as parametric modelling.
The first node 111 may determine the parametric approximation of the posterior function in this Action 408 as follows. Given the second data samples 0 determined in Action 407, that is, the set of posterior samples, the first node 111 may approximate the functional form of the posterior using the parametric mixture model. In one implementation, the Bayesian Gaussian mixture model may be used to model 0 as:
where <p may be understood to be the parameters of the Gaussian mixture model. These parameters may be optimized using standard implementation of the Gaussian mixture model. Then, the functional form of the posterior may be given by the predictive posterior distribution. The predictive posterior for an unseen 0 may then be understood to be given by the marginal distribution as:
For the Gaussian mixture, the predictive distribution p(0| Ds) may be understood to become a mixture of Student-t distributions.
In some non-limiting examples, the determining in this Action 408 of the parametric approximation may be performed by the fourth module, that is, the posterior module learner, comprised in the first node 111.
As explained before, by determining the parametric approximation of the posterior function using the parametric mixture model and the determined second data samples in this Action 408, the first node 111 may then be enabled to proceed with the derivation of the first machine learning model in Action 409, according to embodiments herein, in the event that the second machine learning model may be a non-probabilistic model. That is, by performing this
Action 408, the first node 111 may make the method described herein applicable to instances wherein the second machine learning model may be non-probabilistic.
Action 409
In this Action 409, the first node 111 obtains the first machine learning model by performing an unlearning procedure of the second machine learning model. The first node 111 performs the unlearning procedure by reducing the effect of the determined first data samples from the second machine learning model over a second threshold.
The unlearning procedure performed in this Action 409 may use the second machine learning model selected in Action 404, in the embodiments wherein the selection of Action 404 may have been performed, and the respective second data and of the selected second machine learning model.
In some examples of these embodiments, the first node 111 may perform the unlearning procedure by minimizing the effect of the determined first data samples from the second machine learning model.
In some embodiments, the unlearning procedure may comprise erasing the determined first data samples from the second data to yield third data.
In this document, Ie may be used to indicate the set of indices of data samples that may need to be erased and D may be used to denote the corresponding erase data. D may be used to denote the remaining data samples after removal of erase data samples, that is the third data, D = Ds \ Dg.
The first node 111 may use the first data samples determined in Action 406 as the assessment Ie and may create two sets of data labelled as Dg and D , where Dg may correspond to the first data samples that may need to be erased as specified by Ie, and D may be understood to be the third data, that is, the remaining data.
In some embodiments, the unlearning procedure may be based on the posterior function of the second machine learning model. Such embodiments may apply to embodiments wherein the second machine learning model may be a non-probabilistic model.
The first node 111 may also receive the current posterior distribution over all data samples, p(6\Ds>). The first node 111 may then unlearn the effect of the posterior from the second machine learning model. The output of this Action 409 may be understood to be the updated posterior distribution shown as q(0|Dr) , where the effect of the erased first data samples may be understood to have been removed from p(e\Ds>).
In one implementation, the first node 111 may perform this determination by minimizing the evidence upper bound which may be defined as:
where p(Dl |0) may be understood to be the conditional probability distribution of the first data samples and q(0|Dr) may be understood to be the target posterior distribution given the rest of data, that is, the third data, after removal of the first data samples, that is, the erase data. The minimization may be applied following the description provided in [1],
The first machine learning model may be understood to be obtained to fit to the third data.
In some non-limiting examples, the obtaining in this Action 409 of the first machine learning model may be performed by a fifth module, referred to herein as the posterior module unlearner, comprised in the first node 111.
By obtaining the first machine learning model by performing the unlearning procedure of the second machine learning model in this Action 409, the first node 111 may then enabled to customize the second machine learning model according to the one or more criteria of the first domain via the model unlearning process, including the posterior model learning and posterior model unlearner, if necessary. The first node 111 may then be enabled to provide the first machine learning model as an adaptive on-demand customized model. The first node 111 may be enabled to obtain the on-demand customized model in a shorter amount of time than it would otherwise take to retrain the second machine learning model, or training the first machine learning model from scratch, and hence, without requiring availability of high computational resources. Further, the first node 111 may be enabled to obtain the first machine learning model without requiring availability of all data samples. Hence back-and- forth communication between the first node 111 and the third node 113 may not be required.
The customized models may be understood to not need to be maintained. As such, a minimal number of source models may need to be maintained and continuously evolved. Hence the expensive storage requirements and maintenance efforts otherwise existing due to high number of models may be advantageously reduced.
This advantage may be particularly relevant in large language models (LLMs). Such models are very expensive to train. Hence, it may be understood to be advantageous to unlearn from an LLM, instead of learning from scratch, or retraining.
The first node 111 may enable to remove the effect of unwanted, noisy, biased, erroneous, and privacy-sensitive samples from a fully trained second machine learning model while not requiring retraining the second machine learning model. As the second machine learning model may be understood to not need retraining, but rather may need to be maintained, this may be understood to provide a potential energy savings.
A particular advantage of this Action 409 may be that the first node 111 may enable to address requirements, e.g., European Union (EU) directive on Al, where the right for private
persons to have their data removed from data bases and systems, e.g., including ML models, may need to be complied with.
The first node 111 may also be further enabled to send the customized model, now without the unwanted representation of the source third node 113, to the second node 112, as will be described in the next Action 410.
Action 410
In this Action 411 , the first node 111 provides an indication indicating the obtained first machine learning model.
The indication may indicate the customized posterior over model parameters q(e|£>^).
In some embodiments wherein the obtaining of one or more criteria in Action 402 may have been responsive to the received first indication, the indication provided in this Action 410 may be understood to be a second indication that may be sent to the second node 112.
In a particular non-limiting example, the first node 111 may, in this Action 410, send the customized posterior over model parameters q(e|£>,?) to the second node 112.
It may be noted that the second node 112 may then use the posterior q(6\Dr) itself or take point estimates from the posterior for the purpose of transfer learning. A point estimate from the posterior may be given by the mean of the posterior, 9 = E[q(91£>^)] .
In some embodiments, the determining in Action 406 of the first data samples, the initiating in Action 409 of the unlearning procedure and the providing in this Action 410 of the indication may be performed with the proviso that the information exceeds the first threshold.
By providing the indication in this Action 410, the first node 111 may enable the usage of the first machine learning model to for example, make predictions in the first domain.
Action 411
In this Action 411 , the first node 111 may store 411 the second indication.
The accuracy of the first machine learning model may be recorded at the first node 111 along with its specifications, that is, its one or more criteria. By storing the second indication in this Action 411 , the first node 111 may the be enabled to use the stored information for similar future requests.
Use case examples
Embodiments herein may have applications in a variety of contexts. A few use case non-limiting examples are described next.
Diversity in radio base station: Energy-efficient network modernization is being and will be performed at a fast pace in mobile networks with respect to the climate commitments, according to the International Telecommunication Union. The modernization is a combination
of hardware modernization, and development of new software features, and network optimization techniques in RAN as well as in the process of operating the base stations. Today, high volumes of data are being collected from the networks, and the deployment of new advancements to the existing mobile network is expected to cause significant change in the characteristics of the dataset. For example, a dataset collected on base stations where a power saving feature was not activated would yield significantly different energy consumption patterns as compared to the case where the feature is activated. In addition, a mobile operator modernizes a network with new hardware equipment, therefore significant efforts may need to be performed for retraining the existing predictive machine learning models with the data collected from new products and software features. This is energy consuming on its own from multiple aspects: time, computation, communication, storage. With the embodiments herein, an ML model may be updated to a more up-to-date version without the need of retraining. There is also large diversity of base station deployments today, depending on the geographical and consumer profile, with different radio capabilities. In addition, these base stations may be configured with various settings. Ideally the number of machine learning models to serve these base stations may be preferably kept in low amounts to reduce the complexity in Life Cycle Management.
Non-obvious Spatial and Temporal Data distribution Similarity Use cases
There may also be a drift in distribution at the network over time. Hence, a use case may concern transfer of models between different time intervals. For example, data collected from a base station during a rainfall season, causing a microwave radio signal degradation, many years ago may be relevant to today when the phenomenon may repeat itself. Due to climate change, the data collected from a base station may have much distinct drifting in distribution in recent years. In other words, data may be constantly drifting. It may be the case that at some point in the future data may be be similar to some point in the past, then a pre-existing model may be re-used if available. But, it may also be the case that data may drift and may not ever get similar to the old data.
Multi-tenant models: The use cases above or other similar use cases may be provided to the operators by hardware providers. Some operators may agree to allow the usage of their data to train the source models, but other operators may allow that on the condition that the knowledge of their data is unlearned before sharing that source model with a third party. Then it is possible to use all data to train a source model and remove knowledge from some operators before sharing the source model. This may be understood to advantageously enable to provide a guide to better approaches due to having access to more data, but still removing any sensitive data before sharing.
QoE Optimization Use case in Open RAN (O-RAN): One of the use cases may be QoE optimization, since it may be understood to span over increasing number of applications
in 5G, such as augmented reality, telephony, and 4K video, as depicted in the slide from ITU. Each of those application services may have different level of requirements, such as e.g., bitrate, continuity, latency. For example, while in 4k video, bitrate and continuity may be important metrics to be satisfied; in an Augmented Reality (AR), all three may be understood to be of high importance. This may be understood to necessitate many ML models to be (re)trained and (re)deployed with different distribution of datasets. It may be that this may create many complexities in model management. Even within one application service type, such as AR, there may be multiple sub use cases such as telephony, personal trainer, and assisted surgery. Therefore, embodiments herein may advantageously enable customization and adaptation of pretrained ML models to different prioritization choices, by removing irrelevant contributions, depending on the application of interest in real time, e.g., AR vs telephony vs data download, etc.
Context based dynamic handover management use case in O-RAN: Another O- RAN use case may be about context based handover (HO) management. This use case may preferably necessitate datasets from different entities and domains, such as historical traffic/navigation data, e.g., road conditions, radio/HO data, e.g., about neighboring cells and their network traffic load, as well as dataset from a UE, e.g., received signal strength, vehicle speed, to enable better handover. It may be in some situations that a particular entity may stop sharing data, e.g., due to changed regulations or technical link failure. In that case, removal of dependencies from the model may need to be performed.
Figure 5 is a schematic diagram depicting a non-limiting example of components the first node 111 may comprise, according to examples of embodiments herein. As depicted in Figure 5, the first node 111 in such examples may comprise the source specifications module 501 , the target specifications module 502, the source model 503, the assessment module 504, the posterior model learner module 505, and the posterior model unlearner module 506.
Figure 6 is a schematic diagram depicting a non-limiting example of aspects of a method performed by the first node 111 according to embodiments herein. In particular, Figure 6 depicts a non-limiting example representation of the output of Action 407, that is, the empirical posterior distribution of the samples. Figure 6 also depicts a non-limiting example of the output of Action 408, where the empirical distribution of the posterior samples has been approximated by posterior with a parametric function form. Further particularly, panel (A) schematically depicts the actions involved in the approximation of the posterior distribution p(e\Ds) given data samples Ds and the second machine learning model, that is, the source model fe. In some embodiments wherein the second machine learning model may be a non- probabilistic model the first node 111 may, according to Action 407, in a first step, determine
the second data samples, that is, the posterior data samples 6_ =
from the posterior function of the second machine learning model using a non-parametric sampling procedure, such as the Hamiltonian Markov chain, taking the samples Ds and the second machine learning model, that is, the source model fe as input. The first node 111 mat then, in a second step of parametric sampling, according to Action 408, determine the parametric approximation of the posterior function using the parametric mixture model and the determined second data samples in Action 407. This is indicated in Figure 6 as the predictive posterior p(6\Ds>). As an example, the first node 111 may use a mixture of Gaussian distributions with predictive distribution represented by a mixture of Student-t distributions. In panel (B), Figure
6 depicts an example for fe being a logistic regression with 9 being the regressor parameters. The horizontal axis indicates the first dimension, the first principal component of the data. The vertical axis indicates the second dimension, that is, the second principal component of the data. The predictive distribution of a Gaussian mixture model may be understood to be a student-t distribution. In this Figure 6, the samples from the posterior are shown with a contour plot and the approximate posterior is given by the predictive distribution of student-t distributions where the modes of Student-t distributions are indicated by a cross and scaled by their importance.
Figure 7 is a schematic block diagram depicting a non-limiting example of aspects of a method performed by the first node 111 according to embodiments herein. Particularly, Figure
7 depicts the inner working of various components of the method. Given the target data specifications ZDt obtained according to Action 402, and the current source data specifications ZDs obtained according to Action 403, the first node 111 , according to Action 406 may apply the assessment module and infer the first data samples, e.g., the erase index set, Ie. The first node 111 may, obtain the posterior over model parameters for all data p(9\Ds). If the posterior over model parameters p(9\Ds) is unknown, the first node 111 may apply the posterior model learner module and, in accordance with Action 408, infer the posterior over the model parameters, p(9\Ds). Given the posterior p(9\Ds) and erase index set Ie, the first node 111 may, according to Action 409, apply the posterior model unlearner module and output q(6\Dr) which may be understood to be the posterior over the remaining data D after removal of erase data Dg. The first node 111 may then send, according to Action 410, the customized posterior over model parameters q(e|Z>,?) to the second node 112.
The steps shown in Figure 7 as a block diagram representation are depicted in Figure 8 in a signaling diagram.
Figure 8 is a signalling diagram depicting a non-limiting example of the a method performed by the first node 1 11 , according to embodiments herein. In Figure 8, the first node 111 is indicated as an “apparatus node” and the second node 112 is indicated as a target node. The first node 1 11 may be implemented centralized or distributed as shown in the signalling diagram. At 1 , the first node 11 1 may, according to Action 401 , receive the request from the second node 1 12 for a machine learning model to make predictions in the first domain. That is for the first machine learning model as, in this case, the model of the first domain. Together with the request, the first node 1 11 may receive the one or more criteria, that is, the target data specification ZDs, according to Action 402. At 2, the first node 11 1 may, according to Action 403, obtain the current source data specifications, that is the information characterizing the second data. In an alternative implementation, if the data specification of the target domain differs substantially from the source specification, the first node 1 11 may learn a new model tailored for the target instead of initiating the unlearning procedure. At 3, given the target data specification ZDt and current source data specifications ZDs, the first node 11 1 may, according to Action 406, apply the assessment module and infer the erase index set, Ie. At 4, the first node 1 11 may, according to Action 408, obtain the posterior over the model parameters p(6 \Ds). If the posterior over model parameters p(e \Ds) is unknown, the first node 11 1 may apply the posterior model learner module and infer the posterior over the model parameters, p(6\Ds). At 5, given the posterior p(6\Ds) and erase index set Ie, the first node 1 11 may, according to Action 409, apply the posterior model unlearner module and output q(0|Dr) which is the posterior over remaining data D after removal of erase data Dg. At 6, the first node 1 11 may, send the customized posterior over the model parameters q(0 |Z ) to second node 112.
Figure 9 is a schematic diagram depicting aspects of a non-limiting example of a method performed by the first node 1 11 , according to embodiments herein, wherein there may be multiple source models, that is, the plurality of second machine learning models. Panel (A) depicts the plurality of second machine learning models as comprising three second machine models which may be stored in a model dictionary. According to Action 404, the target specification may be compared by the first node 11 1 against the source specifications. Then, according to Action 405, the source specification that may be closest to the target specification may be chosen and accordingly its corresponding model and data, highlighted in the figure with the superscript *. If the target specification differs more than a predefined threshold from all the available source specifications, then a new source model may be constructed for the target. This step may involve the first threshold, a user-defined threshold. This newly built model may be chosen to be added to the dictionary of the models. Panel (B) depicts the
unlearning procedure that the first node 111 may then follow given the selected data and second machine learning model, as described in relation to Figure 7.
Figure 10 is a signalling diagram depicting a non-limiting example of a method performed by the first node 111 in an embodiment wherein there the computer system 100 may comprise the plurality of third nodes 113, and wherein each of the first node 111 , the second node 112 and the third nodes 113 may be NWDAFs. In this particular example, the plurality of third nodes 113 may comprise N third nodes 113. A first third node 113 denoted NWDAF Source 1 , a second third node 113 denoted NWDAF Source 2, an N third node 113 denoted NWDAF Source N. Any of the first node 111 , the second node 112 and the plurality of third nodes 113 depicted in Figure 10 may all be different Model Training Logical Functions (MTLFs), where they may hold data in different regional NWDAFs. In the nonlimiting example of Figure 10, the second node 112 is depicted as an NWDAF at a target domain with data but not a model to make predictions in the target domain. At 1 , the second node 112, which is denoted as “NWDAF Target” in this example, may ask a corresponding NRF 1001 for a discovery of available NWDAFs by sending a discovery request indicating an identifier (ID) of the analytics. At 2, the NRF 1001 sends a discovery response to the second node 112 indicating the first node 111. The first node 111 is depicted as an NWDAF model learning/unlearning agent that may have access to all models existing at different NWDAFs, e.g., regional. At 3, the first node 111 may receive, according to Action 401 , the first indication indicating the request for the first machine learning model to make predictions in the first domain from the second node 112, and at 4, the first node 111 may request the one or more criteria, that is, the specifications of the target data, from the second node 112, which the first node 111 may then obtain in accordance with Action 402. At 5, 6 and 7, the first node 111 may request additional data specifications from the plurality of third nodes 113, that is, those potential source NWDAFs where a model or plurality of models may be, or may have been, trained historically. The first node 111 may then, according to Action 403, obtain the respective information characterizing the respective second data, that is, the source data specifications, from each of the third nodes 113. At 8, based on the data specifications, the first node 111 , in accordance with Action 404, may perform an assessment and, in accordance with Action 405, may decide which source model may be useful for the target domain and ask the corresponding NWDAF source to send the model and the data, or representation, that may not desired be to be represented in the target model. At 9, the first node 111 may send a request to the third node 113 holding the selected second machine learning model and at 10, the NWDAF source 2 third node 113 my send the pre-trained second machine learning model to the first node 111 . At 11 , the first node 111 may customize the source model according to the target model via the model unlearning process, including the posterior model learning of
Action 409, and at 12, the posterior model unlearner of Action 409. At 13, the first node 111 may send the customized first machine learning model, now without the unwanted representation of the source NWDAF, to the target NWDAF, that is, the second node 112, in agreement with Action 410. At 14, the second node 112 may send an indication of the accuracy of the first machine learning model to the first node 111 , and at 15, in accordance with Action 411 , the accuracy of the target model may be recorded at the first node 111 along with its specifications for similar future requests.
O-RAN Implementation
Embodiments herein may be understood to not be specific to particular use case, and potentially many Open RAN (O-RAN) use cases, e.g., QoE optimization, context based dynamic handover management for Vehicle-to-everything (V2X), etc. may host the embodiments herein. Particularly, embodiments herein may involve an extension of the existing ORAN flow diagram described in Figure is 4.4.3-1 in the O-RAN Work Group 1 , Use Cases and Overall Architecture, Use Cases Detailed Specifications, Technical Specification. O-RAN Alliance, 2023, v. 12, in such a way to enable contribution removal of a particular data provider, e.g., an application server in this example, when a QoE model may be trained jointly by a plurality of application servers. Figure 11 is a signalling diagram depicting a non-limiting example of such a flow diagram, extended according to embodiments herein. The actions corresponding to the extension according to embodiments herein are framed in a dashed box in panel b). Taking as a reference the O-RAN overall logical architecture described in Figure 1 of the O-RAN Use Cases and Deployment Scenarios, Towards Open and Smart RAN, O-RAN Alliance, WhitePaper of February 2020, in the QoE optimization use case, the impacted interfaces may be 01 , between non-Real Time RAN Intelligent Controller (non- RT RIC) and the Open-eNB (O-eNB), and A1 , between the near real time RAN Intelligent Controller (RIC) and the non-RT RIC, and E2, between the Near RT-RIC and the Open-Centralized Unit Control Plane (O-CU-CP). In Figure 11 , the first node 111 may be comprised in a non-RT RIC, and co-localized with the second node 112. The Non-RT RIC may be comprised, along with a collector 1101 in a Service Management and Orchestration system 1102. The computer system 100 may further comprise an O-RAN 1103 comprising a Near-RT RIC 1104 and an Open Centralized Unit/Open Distributed Unit (O-CU/O-DU) 1105. The computer system 100 may further comprise the plurality of third nodes 113 as external application servers 1106: Application Server #1 , Application Server #2 and Application Server #N. Starting in panel a), at 1 , the O-CU/O-DU 1105 may trigger data collection from the collector 1101. At 2, the collector 1101 may trigger retrieval of collected data from the Non-RT RIC 111 , 112. This may in turn trigger the data retrieval of application data by the non-RT RIC 111 , 112 from the Application Server #1 at 3, the Application Server #2 at 4 and the Application Server #N at 5.
At 6, the Non-RT RIC 111 , 112 may trigger an ML workflow by training ML models. At 7, the Non-RT RIC 111 , 112 may deploy internal ML models. At 8, the Non-RT RIC 111 , 112 may deploy AI/ML models at the Near RT-RIC 1104. Next, a performance evaluation and optimization phase may be triggered. At 9, the O-CU/O-DU 1105 may trigger «01 » data collection over the 01 interface from the collector 1101 , which may trigger data retrieval at 10 by the collector 1101 from the Non-RT RIC 111 , 112. Continuing in panel b), the Non-RT RIC 111 , 112 may then start performance monitoring and evaluation at 11. It may be in some situations that a particular entity, in this example the Application Server #2, may stop sharing data, e.g., due to changed regulations or technical link failure. In that case, removal of dependencies from the model may need to be performed. At 12, the link between the Application server #2 and the Non-RT RIC 111 , 112 may contain, along with the dataset that may be intended to be used for ML model training, both for learning and unlearning, that may have specific characteristics related to the domain, an extra information about the intent for deletion of data/contribution. Once this intent is registered, the Non-RT RIC 111 , 112 may trigger at 13 the unlearn function to remove that particular client contribution from the source model in agreement with Action 409. The particular client contribution may be identified with a sample_id_vector. At 14, the Non-RT RIC 111 , 112 may continue performing monitoring and evaluation of the new version of the model obtained after unlearning. At 15, the Non-RT RIC 111 , 112 may perform model versioning. This may be understood to be performed for storing/check-pointing the unlearned model in a storage for later re-use. The model may be stored with additional corresponding details on processing steps applied on a previous version of models. For example, it may explain which sample ids may have removed, and its new model performance, size etc... At 16, the Non-RT RIC 111 , 112 may then share, in accordance with Action 410, via «O1 » or «A1» interfaces, Update AI/ML models with the Near-RT RIC 1102. From the contributors/clients, that is, the Application server in the ORAN architecture, to the Non-RT RIC 111 , 112, the information about Intent or incentive flag, that is, the willingness to contribute or not, may be shared. From the Non-RT RIC 111 , 112 to the contributors/clients, that is, the Application server in the ORAN architecture, information about the updated model and/or perhaps a notification about the existence of an updated model, e.g., event may be sent to an external consumer/producer e.g., application server and/or to near-RT RIC.
As a summarized overview of the foregoing, embodiments herein may be understood to relate to a first node 111 that may receive a request for a model from a target node, it may create a customized version of the source model, using machine unlearning for the target, with the objective that the customized model may follow the data guidelines and specifications of the target domain. In particular examples of embodiments herein, the method or the first node
111 may be implemented at an ML life cycle management (LCM) unit, located for example in the 3GPP NWDAF.
Certain embodiments herein may provide one or more of the following technical advantage(s). Embodiments herein may be understood to enable to provide adaptive on- demand customized models. The customized models may be understood to not need to be maintained. As such, a minimal number of source models may need to be maintained and continuously evolved. This may be understood to be especially important given that embodiments herein may reduce the expensive storage requirements and maintenance efforts otherwise pertaining to existing methods due to high number of models.
Such benefit may be understood to be particularly advantageous in large language models (LLMs). Such models may be understood to be very expensive to train. Hence, it may be understood to be advantageous to unlearn from an LLM instead of learning from scratch, or retraining.
Embodiments herein may further enable to achieve energy savings since they may allow for the removal of the effect of unwanted, noisy, biased, erroneous, and privacy-sensitive samples from a fully trained model while not requiring retraining the model. This may be understood to be since the source model may be understood to not need retraining but rather may only need to be maintained.
Embodiments herein may further enable to address new Ell directive on Al where for example the right for private persons to have their data removed from data bases and systems may be relevant.
Figure 12 depicts an example of the arrangement that the first node 111 may comprise to perform the method described in Figure 4 and/or Figures 5-11 . The first node 111 may be understood to be for handling the first machine learning model. The first node 111 is configured to operate in the computer system 100.
Several embodiments are comprised herein. It should be noted that the examples herein are not mutually exclusive. One or more embodiments may be combined, where applicable. All possible combinations are not described to simplify the description. Components from one embodiment may be tacitly assumed to be present in another embodiment and it will be obvious to a person skilled in the art how those components may be used in the other exemplary embodiments. The detailed description of some of the following corresponds to the same references provided above, in relation to the actions described for the first node 111 , and will thus not be repeated here. For example, the first machine learning model may be also referred to as a target model.
The first node 111 is configured to obtain the one or more criteria to be met by the first data to be used as input to obtain the first machine learning model to make predictions in the first domain. The second machine learning model has already been trained with the second data from the second domain to make predictions in the second domain.
The first node 111 is also configured to obtain the information configured to characterize the second data.
The first node 111 is further configured to determine the first data samples in the second data lacking the level of fulfilment of the one or more criteria exceeding the first threshold. The first data samples configured to be determined have the effect on the second machine learning model.
The first node 111 is additionally configured to obtain the first machine learning model by performing the unlearning procedure of the second machine learning model by reducing the effect of the determined first data samples from the second machine learning model over the second threshold.
The first node 111 is further configured to provide the indication configured to indicate the first machine learning model configured to be obtained.
In some embodiments, the unlearning procedure may be configured to comprise erasing the first data samples configured to be determined from the second data to yield third data. The first machine learning model may be configured to be obtained to fit to the third data.
In some embodiments the unlearning procedure may be configured to be based on the posterior function of the second machine learning model, and the first node 111 may be further configured to determine the parametric approximation of the posterior function using the parametric mixture model and based on the second data configured to be obtained.
In some embodiments the second machine learning model, prior to the unlearning procedure, may be configured to be a non-probabilistic model, the first node 111 may be further configured to determine the second data samples from the posterior function of the second machine learning model using the non-parametric sampling procedure. The determining of the parametric approximation of the posterior function may be configured to use the parametric mixture model and the second data samples configured to be determined.
In some embodiments, the first node 111 may be further configured to determine whether or not the level of fulfilment of the one or more criteria by the information configured to characterize the second data may exceed the first threshold. The determining of the first data samples, the initiating of the unlearning procedure and the providing of the indication may be configured to be performed with the proviso that the information may exceed the first threshold.
In some embodiments, the second machine learning model may be one of the plurality of second machine learning models that have already been trained with respective second data
from the second domain to make respective predictions in the second domain. In such embodiments, the obtaining of the information may be configured to comprise obtaining respective information configured to characterize the respective second data, and the determining may be further configured to comprise determining whether or not the respective level of fulfilment of the one or more criteria by the respective information may exceed the first threshold. The information that may exceed the first threshold may be configured to be the one set of the respective information.
In some embodiments, the first node 111 may be further configured to select the second machine learning model out of the plurality of second machine learning models. The selecting may be configured to be based on the respective level of fulfilment of the one or more criteria.
In some embodiments, the first node 111 may be further configured with at least one of the following two configurations.
In some embodiments, the first node 111 may be further configured to receive, from the second node 112 configured to operate in the computer system 100, the first indication. The first indication may be configured to indicate the request for the first machine learning model to make predictions in the first domain. The obtaining of the one or more criteria may be configured to be responsive to the first indication configured to be received. The indication may be configured to be the second indication that may be configured to be sent to the second node 112.
In some embodiments, the first node 111 may be further configured to store the second indication.
In some embodiments, the respective information configured to characterize the respective second data may be configured to be obtained from the plurality of third nodes 113, the computer system 100 is configured to be the Fifth Generation network and each of first node 111 , the second node 112 and the plurality of third nodes 113 may be configured to be Network Data Analytics Functions.
The embodiments herein in the first node 111 may be implemented through one or more processors, such as a processing circuitry 1201 in the first node 111 depicted in Figure 12, together with computer program code for performing the functions and actions of the embodiments herein. A processor, as used herein, may be understood to be a hardware component. The program code mentioned above may also be provided as a computer program product, for instance in the form of a data carrier carrying computer program code for performing the embodiments herein when being loaded into the first node 111. One such carrier may be in the form of a CD ROM disc. It is however feasible with other data carriers such as a memory stick. The computer program code may furthermore be provided as pure program code on a server and downloaded to the first node 111.
The first node 111 may further comprise a memory 1202 comprising one or more memory units. The memory 1202 is arranged to be used to store obtained information, store data, configurations, schedulings, and applications etc. to perform the methods herein when being executed in the first node 111.
In some embodiments, the first node 111 may receive information from, e.g., the second node 112, any of the plurality of third nodes 113, the radio network node 130, the device 140, and/or another structure in the computer system 100, through a receiving port 1203. In some embodiments, the receiving port 1203 may be, for example, connected to one or more antennas in first node 111. In other embodiments, the first node 111 may receive information from another structure in the computer system 100 through the receiving port 1203. Since the receiving port 1203 may be in communication with the processing circuitry 1201 , the receiving port 1203 may then send the received information to the processing circuitry 1201 . The receiving port 1203 may also be configured to receive other information.
The processing circuitry 1201 in the first node 111 may be further configured to transmit or send information to e.g., the second node 112, any of the plurality of third nodes 113, the radio network node 130, the device 140, and/or another structure in the computer system 10, through a sending port 1204, which may be in communication with the processing circuitry 1201 , and the memory 1202.
The first node 111 may be configured to perform any of the Actions described in relation to Figure 4 and/or Figures 5-11 , e.g., by means of the processing circuitry 1201 within the first node 111 , configured to perform any of such actions.
Also, in some embodiments, different units comprised within the first node 111 may be configured to perform the different actions described above, implemented as one or more applications running on one or more processors such as the processing circuitry 1201.
Those skilled in the art will also appreciate that the units comprised within the first node 111 described above as being configured to perform different actions, may refer to a combination of analog and digital circuits, and/or one or more processors configured with software and/or firmware, e.g., stored in memory, that, when executed by the one or more processors such as the processing circuitry 1201 , perform as described above. One or more of these processors, as well as the other digital hardware, may be included in a single Application-Specific Integrated Circuit (ASIC), or several processors and various digital hardware may be distributed among several separate components, whether individually packaged or assembled into a System-on-a-Chip (SoC).
Thus, the methods according to the embodiments described herein for the first node 111 may be respectively implemented by means of a computer program 1205 product, comprising instructions, i.e. , software code portions, which, when executed on at least one processing circuitry 1201 , cause the at least one processing circuitry 1201 to carry out the
actions described herein, as performed by the first node 111. The computer program 1205 product may be stored on a computer-readable storage medium 1206. The computer- readable storage medium 1206, having stored thereon the computer program 1205, may comprise instructions which, when executed on at least one processing circuitry 1201 , cause the at least one processing circuitry 1201 to carry out the actions described herein, as performed by the first node 111. In some embodiments, the computer-readable storage medium 1206 may be a non-transitory computer-readable storage medium, such as a CD ROM disc, or a memory stick. In other embodiments, the computer program 1205 product may be stored on a carrier containing the computer program 1205 just described, wherein the carrier is one of an electronic signal, optical signal, radio signal, or the computer-readable storage medium 1206, as described above.
The first node 111 may comprise a communication interface configured to facilitate, or an interface unit to facilitate, communications between the first node 111 and other nodes or devices, e.g., the second node 112, the third node, the radio network node 130, the device 140, and/or another structure in the computer system 100. The interface may, for example, include a transceiver configured to transmit and receive radio signals over an air interface in accordance with a suitable standard.
In other embodiments, the first node 111 may comprise a radio circuitry 1207, which may comprise e.g., the receiving port 1203 and the sending port 1204.
The radio circuitry 1207 may be configured to set up and maintain at least a wireless connection with the second node 112, the third node, the radio network node 130, the device 140, and/or another structure in the computer system 100. Circuitry may be understood herein as a hardware component.
Hence, embodiments herein also relate to the first node 111 operative to operate in the computer system 100. The first node 111 may comprise the processing circuitry 1201 and the memory 1202, said memory 1202 containing instructions executable by said processing circuitry 1201 , whereby the first node 111 is further operative to perform the actions described herein in relation to the first node 111 , e.g., in Figure 4 and/or Figures 5-11 .
REFERENCES
1 . [Nguyen 2020] Nguyen, Q.P., Low, B., & Jaill et, P. (2020). Variational Bayesian Unlearning. ArXiv, abs/2010.12883. F
2. [Ben 2018] Ban, Y., Alameda-Pineda, X., Girin, L., & Horaud, R. (2018). Variational Bayesian Inference for Audio-Visual Tracking of Multiple Speakers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 1761 -1776. The evolution toward a smart energy setup at ICT sites - Ericsson
[Pan 2010] Pan, S.J., & Yang, Q. (2010). A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering, 22, 1345-1359. [Yan 2020] Yan, X., Acuna, D., & Fidler, S. (2020). Neural Data Server: A Large-Scale Search Engine for Transfer Learning Data. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 3892-3901. [ITU] International Telecommunication Union (ITU) | Department of Economic and Social Affairs.
Claims
1 . A computer-implemented method performed by a first node (111 ), the method being for handling a first machine learning model, the first node (111) operating in a computer system (100), the method comprising:
- obtaining (402) one or more criteria to be met by first data to be used as input to obtain a first machine learning model to make predictions in a first domain, wherein a second machine learning model has already been trained with second data from a second domain to make predictions in the second domain,
- obtaining (403) information characterizing the second data,
- determining (406) first data samples in the second data lacking a level of fulfilment of the one or more criteria exceeding a first threshold, wherein the determined first data samples have an effect on the second machine learning model, and
- obtaining (409) the first machine learning model by performing an unlearning procedure of the second machine learning model by reducing the effect of the determined first data samples from the second machine learning model over a second threshold, and
- providing (410) an indication indicating the obtained first machine learning model.
2. The method according to claim 1 , wherein the unlearning procedure comprises erasing the determined first data samples from the second data to yield third data, and wherein the first machine learning model is obtained to fit to the third data.
3. The method according to any of claims 1 -2, wherein the unlearning procedure is based on a posterior function of the second machine learning model, and wherein the method further comprises:
- determining (408) a parametric approximation of the posterior function using a parametric mixture model and based on the obtained second data.
4. The method according to claim 3, wherein the second machine learning model, prior to the unlearning procedure, is a non-probabilistic model, and wherein the method further comprises:
- determining (407) second data samples from the posterior function of the second machine learning model using a non-parametric sampling procedure, and wherein the determining (408) of the parametric approximation of the
posterior function uses the parametric mixture model and the determined second data samples.
5. The method according to any of claim 1 -4, wherein the method further comprises:
- determining (404) whether or not a level of fulfilment of the one or more criteria by the information characterizing the second data exceeds the first threshold, and wherein the determining (406) of the first data samples, the initiating (409) of the unlearning procedure and the providing (410) of the indication are performed with the proviso that the information exceeds the first threshold.
6. The method according to claim 5, wherein the second machine learning model is one of a plurality of second machine learning models that have already been trained with respective second data from the second domain to make respective predictions in the second domain, wherein the obtaining (403) of the information comprises obtaining respective information characterizing the respective second data, and wherein the determining (404) further comprises determining whether or not a respective level of fulfilment of the one or more criteria by the respective information exceeds the first threshold, and wherein the information that exceeds the first threshold is one set of the respective information.
7. The method according to claim 6, further comprising:
- selecting (405) the second machine learning model out of the plurality of second machine learning models, the selecting (405) being based on the respective level of fulfilment of the one or more criteria.
8. The method according to claim any of claims 1 -7, further comprising at least one of:
- receiving (401), from a second node (112) operating in the computer system (100), a first indication, the first indication indicating a request for the first machine learning model to make predictions in the first domain, wherein the obtaining (402) of one or more criteria is responsive to the received first indication, and wherein the indication is a second indication that is sent to the second node (112), and
- storing (411 ) the second indication.
9. The method according to any of claims 6-8, wherein the respective information characterizing the respective second data is obtained from a plurality of third nodes (113), wherein the computer system (100) is a Fifth Generation network and wherein each of first
node (111), the second node (112) and the plurality of third nodes (113) are Network Data Analytics Functions.
10. A first node (111 ), for handling a first machine learning model, the first node (111) being configured to operate in a computer system (100), the first node (111) being further configured to:
- obtain one or more criteria to be met by first data to be used as input to obtain a first machine learning model to make predictions in a first domain, wherein a second machine learning model has already been trained with second data from a second domain to make predictions in the second domain,
- obtain information configured to characterize the second data,
- determine first data samples in the second data lacking a level of fulfilment of the one or more criteria exceeding a first threshold, wherein the first data samples configured to be determined have an effect on the second machine learning model, and
- obtain the first machine learning model by performing an unlearning procedure of the second machine learning model by reducing the effect of the determined first data samples from the second machine learning model over a second threshold, and
- provide an indication configured to indicate the first machine learning model configured to be obtained.
11 . The first node (111 ) according to claim 10, wherein the unlearning procedure is configured to comprise erasing the first data samples configured to be determined from the second data to yield third data, and wherein the first machine learning model is configured to be obtained to fit to the third data.
12. The first node (111 ) according to any of claims 10-11 , wherein the unlearning procedure is configured to be based on a posterior function of the second machine learning model, and wherein the first node (111) is further configured to:
- determine a parametric approximation of the posterior function using a parametric mixture model and based on the second data configured to be obtained.
13. The first node (111 ) according to claim 12, wherein the second machine learning model, prior to the unlearning procedure, is configured to be a non-probabilistic model, and wherein the first node (111 ) is further configured to:
- determine second data samples from the posterior function of the second machine learning model using a non-parametric sampling procedure, and wherein the determining of the parametric approximation of the posterior function is configured to use the parametric mixture model and the second data samples configured to be determined.
14. The first node (111 ) according to any of claim 10-13, wherein the first node (111) is further configured to:
- determine whether or not a level of fulfilment of the one or more criteria by the information configured to characterize the second data exceeds the first threshold, and wherein the determining of the first data samples, the initiating of the unlearning procedure and the providing of the indication are configured to be performed with the proviso that the information exceeds the first threshold.
15. The first node (111 ) according to claim 14, wherein the second machine learning model is one of a plurality of second machine learning models that have already been trained with respective second data from the second domain to make respective predictions in the second domain, wherein the obtaining of the information is configured to comprise obtaining respective information configured to characterize the respective second data, and wherein the determining is further configured to comprise determining whether or not a respective level of fulfilment of the one or more criteria by the respective information exceeds the first threshold, and wherein the information that exceeds the first threshold is configured to be one set of the respective information.
16. The first node (111 ) according to claim 15, being further configured to:
- select the second machine learning model out of the plurality of second machine learning models, the selecting being configured to be based on the respective level of fulfilment of the one or more criteria.
17. The first node (111 ) according to claim any of claims 10-16, being further configured to at least one of:
- receive, from a second node (112) configured to operate in the computer system (100), a first indication, the first indication being configured to indicate a request for the first machine learning model to make predictions in the first domain, wherein the obtaining of one or more criteria is configured to be responsive to the first indication configured to be received, and wherein the
indication is configured to be a second indication that is configured to be sent to the second node (112), and
- store the second indication.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/SE2024/050169 WO2025178519A1 (en) | 2024-02-21 | 2024-02-21 | First node and methods performed thereby for handling a first machine learning model |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/SE2024/050169 WO2025178519A1 (en) | 2024-02-21 | 2024-02-21 | First node and methods performed thereby for handling a first machine learning model |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025178519A1 true WO2025178519A1 (en) | 2025-08-28 |
Family
ID=96847656
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/SE2024/050169 Pending WO2025178519A1 (en) | 2024-02-21 | 2024-02-21 | First node and methods performed thereby for handling a first machine learning model |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025178519A1 (en) |
-
2024
- 2024-02-21 WO PCT/SE2024/050169 patent/WO2025178519A1/en active Pending
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11109283B1 (en) | Handover success rate prediction and management using machine learning for 5G networks | |
| US20230224752A1 (en) | Communication method, apparatus, and system | |
| EP3742669B1 (en) | Machine learning in radio access networks | |
| US20230135872A1 (en) | Power saving in radio access network | |
| US20200371893A1 (en) | System and method for low latency edge computing | |
| US20230216737A1 (en) | Network performance assessment | |
| US20240135247A1 (en) | Method and Apparatus for Selecting Machine Learning Model for Execution in a Resource Constraint Environment | |
| US12107741B2 (en) | Determining spatial-temporal informative patterns for users and devices in data networks | |
| US20220021587A1 (en) | A method and an apparatus for fault prediction in network management | |
| EP4315932A1 (en) | Adaptive learning in distribution shift for ran ai/ml models | |
| WO2023006205A1 (en) | Devices and methods for machine learning model transfer | |
| Koudouridis et al. | An architecture and performance evaluation framework for artificial intelligence solutions in beyond 5G radio access networks | |
| Leppänen et al. | Service modeling for opportunistic edge computing systems with feature engineering | |
| US12302157B2 (en) | Automatic and real-time cell performance examination and prediction in communication networks | |
| Sun et al. | Zero-shot multi-level feature transmission policy powered by semantic knowledge base | |
| US20250292071A1 (en) | Generating model parameters and normalization statistics by utilizing generative artificial intelligence | |
| Kasuluru et al. | On the impact of prb load uncertainty forecasting for sustainable open ran | |
| Ho et al. | Energy efficiency learning closed-loop controls in O-RAN 5G network | |
| CN117076916A (en) | Model selection method, device and network side equipment | |
| WO2025178519A1 (en) | First node and methods performed thereby for handling a first machine learning model | |
| US20250016071A1 (en) | Computerized systems and methods for application prioritization during runtime | |
| US12445956B2 (en) | Computerized systems and methods for an energy aware adaptive network | |
| US20240037409A1 (en) | Transfer models using conditional generative modeling | |
| WO2023147877A1 (en) | Adaptive clustering of time series from geographic locations in a communication network | |
| WO2023110108A1 (en) | Devices and methods for operating machine learning model performance evaluation |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24926189 Country of ref document: EP Kind code of ref document: A1 |