WO2025208347A1

WO2025208347A1 - Training framework

Info

Publication number: WO2025208347A1
Application number: PCT/CN2024/085587
Authority: WO
Inventors: Yijia Feng; Chen Hui YE; Yihang HUANG
Original assignee: Nokia Shanghai Bell Co Ltd; Nokia Solutions and Networks Oy; Nokia Technologies Oy
Current assignee: Nokia Shanghai Bell Co Ltd; Nokia Solutions and Networks Oy; Nokia Technologies Oy
Priority date: 2024-04-02
Filing date: 2024-04-02
Publication date: 2025-10-09
Anticipated expiration: 2026-10-02

Abstract

A training framework. In an example method, a terminal device receives, from a network device, a training dataset and a distance threshold for training an encoder at the terminal device. The encoder is associated with a decoder at the network device, the distance threshold is used to assess a distance loss in the training of the encoder, and the distance loss is associated with a similarity performance related to the encoder and the decoder. Then, the terminal device trains the encoder based on the training dataset and the distance threshold. In this way, end-to-end (E2E) SGCS performance for CSI reconstruction can be ensured at the terminal device and the network device, and overhead waste of training dataset can be avoided or decreased.

Description

TRAINING FRAMEWORK

FIELD

Example embodiments of the present disclosure generally relate to the field of communications, and in particular, to a terminal device, a network device, methods, apparatuses, and a computer-readable medium for a training framework.

BACKGROUND

A communication network can be seen as a facility that enables communications between two or more communication devices, or provides communication devices access to a data network. A mobile or wireless communication network is one example of a communication network.

Such communication networks operate in accordance with standards, such as those promulgated by 3GPP (Third Generation Partnership Project) or ETSI (European Telecommunications Standards Institute) . Examples of such standards include the so-called 5G (5th Generation) standard or other standards promulgated by 3GPP.

SUMMARY

In general, example embodiments of the present disclosure provide a solution for a training framework, especially a training framework for NW-first separate training framework.

In a first aspect, there is provided a terminal device. The terminal device comprises at least one processor and at least one memory storing instructions that, when executed by the at least one processor, cause the terminal device to: receive, from a network device, a training dataset and a distance threshold for training an encoder at the terminal device, wherein the encoder is associated with a decoder at the network device, the distance threshold is used to assess a distance loss in the training of the encoder, and the distance loss is associated with a similarity performance related to the encoder and the decoder; and train the encoder based on the training dataset and the distance threshold.

In a second aspect, there is provided a network device. The network device comprises at least one processor and at least one memory storing instructions that, when executed by the at least one processor, cause the terminal device to: determine a training dataset and a distance threshold for training an encoder at the terminal device, wherein the encoder is associated with a decoder at the network device, the distance threshold is used to assess a distance loss in the training of the encoder, and the distance loss is associated with a similarity performance related to the encoder and the decoder; and transmit, to the terminal device, the training dataset and the distance threshold.

In a third aspect, there is provided a method. The method comprises: receiving, at a terminal device and from a network device, a training dataset and a distance threshold for training an encoder at the terminal device, wherein the encoder is associated with a decoder at the network device, the distance threshold is used to assess a distance loss in the training of the encoder, and the distance loss is associated with a similarity performance related to the encoder and the decoder; and training the encoder based on the training dataset and the distance threshold.

In a fourth aspect, there is provided a method. The method comprises: determining, by a network device, a training dataset and a distance threshold for training an encoder at the terminal device, wherein the encoder is associated with a decoder at the network device, the distance threshold is used to assess a distance loss in the training of the encoder, and the distance loss is associated with a similarity performance related to the encoder and the decoder; and transmitting, at the network device and to the terminal device, the training dataset and the distance threshold.

In a fifth aspect, there is provided an apparatus. The apparatus comprises: means for receiving a training dataset and a distance threshold for training an encoder at the terminal device, wherein the encoder is associated with a decoder at the network device, the distance threshold is used to assess a distance loss in the training of the encoder, and the distance loss is associated with a similarity performance related to the encoder and the decoder; and means for training the encoder based on the training dataset and the distance threshold.

In a sixth aspect, there is provided an apparatus. The apparatus comprises: means for determining a training dataset and a distance threshold for training an encoder at the terminal device, wherein the encoder is associated with a decoder at the network device, the distance threshold is used to assess a distance loss in the training of the encoder, and the distance loss is associated with a similarity performance related to the encoder and the decoder; and means for transmitting the training dataset and the distance threshold.

In a seventh aspect, there is provided a non-transitory computer-readable storage medium having instructions stored thereon. The instructions, when executed on at least one processor, cause the at least one processor to perform the method of the third or fourth aspect.

In an eighth aspect, there is provided a computer program comprising instructions, which, when executed by an apparatus, cause the apparatus at least to: receive, from a network device, a training dataset and a distance threshold for training an encoder at the terminal device, wherein the encoder is associated with a decoder at the network device, the distance threshold is used to assess a distance loss in the training of the encoder, and the distance loss is associated with a similarity performance related to the encoder and the decoder; and train the encoder based on the training dataset and the distance threshold.

In a ninth aspect, there is provided a computer program comprising instructions, which, when executed by an apparatus, cause the apparatus at least to: determine a training dataset and a distance threshold for training an encoder at the terminal device, wherein the encoder is associated with a decoder at the network device, the distance threshold is used to assess a distance loss in the training of the encoder, and the distance loss is associated with a similarity performance related to the encoder and the decoder; and transmit, to the terminal device, the training dataset and the distance threshold.

In a tenth aspect, there is provided a terminal device. The terminal device comprises: receiving circuitry configured to receive, from a network device, a training dataset and a distance threshold for training an encoder at the terminal device, wherein the encoder is associated with a decoder at the network device, the distance threshold is used to assess a distance loss in the training of the encoder, and the distance loss is associated with a similarity performance related to the encoder and the decoder; and training circuitry configured to train the encoder based on the training dataset and the distance threshold.

In an eleventh aspect, there is provided a network device. The network device comprises: determining circuitry configured to determine a training dataset and a distance threshold for training an encoder at the terminal device, wherein the encoder is associated with a decoder at the network device, the distance threshold is used to assess a distance loss in the training of the encoder, and the distance loss is associated with a similarity performance related to the encoder and the decoder; and transmitting circuitry configured to transmit, to the terminal device, the training dataset and the distance threshold.

It is to be understood that the summary section is not intended to identify key or essential features of embodiments of the present disclosure, nor is it intended to be used to limit the scope of the present disclosure. Other features of the present disclosure will become easily comprehensible through the following description.

BRIEF DESCRIPTION OF THE DRAWINGS

Some example embodiments will now be described with reference to the accompanying drawings, in which:

FIG. 1A illustrates an example network environment in which some example embodiments of the present disclosure may be implemented;

FIG. 1B illustrates a schematic diagram of a relationship between squared generalized cosine similarity (SGCS) performance and different training data sizes in accordance with some embodiments of the present disclosure;

FIG. 2 illustrates a signaling chart illustrating an example communication process in accordance with some example embodiments of the present disclosure;

FIG. 3A illustrates a schematic diagram of an evaluation of the relationship between mean squared error (MSE) of the encoder output channel state information (CSI) codeword and the SGCS performance of the reconstructed CSI in accordance with some embodiments of the present disclosure;

FIG. 3B illustrates a schematic diagram for a converging monotonic relationship between MSE and SGCS in accordance with some embodiments of the present disclosure;

FIG. 3C illustrates another schematic diagram for a converging monotonic relationship between MSE and SGCS in accordance with some embodiments of the present disclosure;

FIG. 4 illustrates a signaling chart illustrating another example communication process in accordance with some example embodiments of the present disclosure;

FIG. 5 illustrates a schematic workflow for deriving an example MSE-SGCS-relationship in accordance with some embodiments of the present disclosure;

FIG. 6 illustrates a schematic diagram of a relationship between MSE and different training data sizes in accordance with some embodiments of the present disclosure;

FIG. 7 illustrates a flowchart of another example method implemented at a terminal device in accordance with some embodiments of the present disclosure;

FIG. 8 illustrates a flowchart of further another example method implemented at a network device in accordance with some embodiments of the present disclosure;

FIG. 9 illustrates a simplified block diagram of a device that is suitable for implementing some example embodiments of the present disclosure; and

FIG. 10 illustrates a block diagram of an example of a computer-readable medium in accordance with some example embodiments of the present disclosure.

Throughout the drawings, the same or similar reference numerals represent the same or similar elements.

DETAILED DESCRIPTION

Principles of the present disclosure will now be described with reference to some example embodiments. It is to be understood that these embodiments are described for the purpose of illustration and help those skilled in the art to understand and implement the present disclosure, without suggesting any limitation as to the scope of the disclosure. The disclosure described herein can be implemented in various manners other than the ones described below.

In the following description and claims, unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skills in the art to which this disclosure belongs.

References in the present disclosure to “one embodiment, ” “an embodiment, ” “an example embodiment, ” and the like indicate that the embodiment described may include a particular feature, structure, or characteristic, but it is not necessary that every embodiment includes the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.

It shall be understood that although the terms “first” and “second” etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are used to distinguish one element from another. For example, a first element could be termed a second element, and similarly, a second element could be termed a first element, without departing from the scope of example embodiments. As used herein, the term “and/or” includes any and all combinations of one or more of the listed terms.

The terminology used herein is for the purpose of describing particular embodiments and is not intended to be limiting of example embodiments. As used herein, the singular forms “a” , “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” , “comprising” , “has” , “having” , “includes” and/or “including” , when used herein, specify the presence of stated features, elements, and/or components etc., but do not preclude the presence or addition of one or more other features, elements, components and/or combinations thereof. As used herein, “at least one of the following: <a list of two or more elements>” and “at least one of <a list of two or more elements>” and similar wording, where the list of two or more elements are joined by “and” or “or” , mean at least any one of the elements, or at least any two or more of the elements, or at least all the elements.

As used in this application, the term “circuitry” may refer to one or more or all of the following:

(a) hardware-only circuit implementations (such as implementations in only analog and/or digital circuitry) and

(b) combinations of hardware circuits and software, such as (as applicable) :

(i) a combination of analog and/or digital hardware circuit (s) with software/firmware and

(ii) any portions of hardware processor (s) with software (including digital signal processor (s) ) , software, and memory (ies) that work together to cause an apparatus, such as a mobile phone or server, to perform various functions) and

(c) hardware circuit (s) and or processor (s) , such as a microprocessor (s) or a portion of a microprocessor (s) , that requires software (for example, firmware) for operation, but the software may not be present when it is not needed for operation.

This definition of circuitry applies to all uses of this term in this application, including in any claims. As a further example, as used in this application, the term circuitry also covers an implementation of merely a hardware circuit or processor (or multiple processors) or portion of a hardware circuit or processor and its (or their) accompanying software and/or firmware. The term circuitry also covers, for example and if applicable to the particular claim element, a baseband integrated circuit or processor integrated circuit for a mobile device or a similar integrated circuit in server, a cellular network device, or other computing or network device.

As used herein, the term “communication network” refers to a network following any suitable communication standards, such as Long Term Evolution (LTE) , LTE-Advanced (LTE-A) , Wideband Code Division Multiple Access (WCDMA) , High-Speed Packet Access (HSPA) , Narrow Band Internet of Things (NB-IoT) , Wireless Fidelity (WiFi) and so on. Furthermore, the communications between a terminal device and a network device in the communication network may be performed according to any suitable generation communication protocols, including, but not limited to, the fourth generation (4G) , 4.5G, the future fifth generation (5G) , IEEE 802.11 communication protocols, and/or any other protocols either currently known or to be developed in the future. Embodiments of the present disclosure may be applied in various communication systems. Given the rapid development in communications, there will of course also be future type communication technologies and systems with which the present disclosure may be embodied. It should not be seen as limiting the scope of the present disclosure to only the aforementioned system.

As used herein, the term “network device” refers to a node in a communication network via which a terminal device accesses the network and receives services therefrom. The network device may refer to a base station (BS) or an access point (AP) , for example, a node B (NodeB or NB) , an evolved NodeB (eNodeB or eNB) , a NR NB (also referred to as a gNB) , a Remote Radio Unit (RRU) , a radio header (RH) , a remote radio head (RRH) , a WiFi device, a relay, a low power node such as a femto, a pico, and so forth, depending on the applied terminology and technology. In the following description, the terms “network device” , “AP device” , “AP” and “access point” may be used interchangeably.

The term “terminal device” refers to any end device that may be capable of wireless communication. By way of example rather than limitation, a terminal device may also be referred to as a communication device, user equipment (UE) , a Subscriber Station (SS) , a Portable Subscriber Station, a Mobile Station (MS) , a station (STA) or station device, or an Access Terminal (AT) . The terminal device may include, but not limited to, a mobile phone, a cellular phone, a smart phone, voice over IP (VoIP) phones, wireless local loop phones, a tablet, a wearable terminal device, a personal digital assistant (PDA) , portable computers, desktop computer, image capture terminal devices such as digital cameras, gaming terminal devices, music storage and playback appliances, vehicle-mounted wireless terminal devices, wireless endpoints, mobile stations, laptop-embedded equipment (LEE) , laptop-mounted equipment (LME) , USB dongles, smart devices, wireless customer-premises equipment (CPE) , an Internet of Things (loT) device, a watch or other wearable, a VR (virtual reality) device, an XR (eXtended reality) device, a head-mounted display (HMD) , a vehicle, a drone, a medical device and applications (for example, remote surgery) , an industrial device and applications (for example, a robot and/or other wireless devices operating in an industrial and/or an automated processing chain contexts) , a consumer electronics device, a device operating on commercial and/or industrial wireless networks, and the like. In the following description, the terms “station” , “station device” , “STA” , “terminal device” , “communication device” , “terminal” , “user equipment” and “UE” may be used interchangeably.

Data are communicated between the terminal device and network device. Channel state information (CSI) is an example of data. Hereafter, the description may take CSI as an example of the data communicated between the terminal device and the network device to introduce the proposed scheme. To process the data, artificial intelligent /machine learning (AI/ML) models may be trained and deployed at the terminal device and the network device.

Communication performance is of great importance to communications. At the same time, channel state information (CSI) feedback is important for communication performance between the terminal device and network device. CSI may be encoded (compressed) at the terminal device, for example, using an AI/ML model (also referred to as an encoder) , and decoded (decompressed) at the network device, for example, using another AI/ML model (also referred to as a decoder) corresponding to the AI/ML model used at the terminal device. In most cases, in a communication network, a network device may serve multiple terminal devices, and the multiple terminal devices may from different vendors, i.e., the multiple terminal devices may have different encoders to encode the CSI. At the same time, the multiple terminal devices may have different UE capabilities and be in different network conditions, which may influence the training of their encoders. Therefore, CSI feedback enhancement is under study to resolve (at least alleviate) the inter-vendor training collaboration issues as well as intra-UE network condition differences.

From the specification in 3GPP TR 38.843 (Rel. 18) , for the two-sided model use case like CSI compression, the separate AI/ML model training approach is regarded as Type 3 for model training collaboration. In Type 3 training, the UE-side CSI generation part and the NW-side CSI reconstruction part are trained separately and sequentially. The training process can be initiated by either UE or NW (i.e., gNB) . In Type 3 AI/ML model training collaborations, separate AI/ML model training is performed at NW side and UE side, and the UE-side CSI generation part and the NW-side CSI reconstruction part are trained by UE side and network side, respectively. In other words, the UE-side CSI generation part is trained at UE side, and the NW-side CSI reconstruction part is trained at network side. 3GPP is considering standardizing aspects such as a reference model (for example, a structure and/or parameters) , a reference model structure, a model format, dataset, a dataset format, etc., to address/resolve or alleviate issues related to inter-vendor AI/ML model training collaboration. When comparing different standardization options, factors such as complexity, performance, feasibility, among others, need to be considered.

In Type 3 training, as mentioned above, the NW-first separate training process is divided into 2 phases, i.e., NW-side model training phase and UE-side model training phase. At the NW-side model training phase, a similarity metric (like SGCS) may be used to evaluate the performance of the decoder at the network device. The decoder may adopt the transformer (hereafter “TF” for short) structure. And, at the UE-side model training phase, a distance metric (like MSE) may be used to evaluate the performance of the encoder at the terminal device. The encoder may also adopt the transformer structure. Hereafter, in some embodiments of the present disclosure, it is assumed that both the decoder and encoder, as well as hypothetical encoder and MSE-oriented hypothetical encoder adopt the transformer structure.

However, it is potentially problematic if the training process of UE-side encoder is self-managed by UE solely and without any explicit mutual alignment between UE and NW. The essential reason why UE cannot guarantee its encoder training performance on its own is due to the lack of NW-side decoder in UE-side training process, which causes the absence of the overall assessment metric (e.g. SGCS) . Therefore, UE is incapable to assess UE-side encoder’s quality and its matching degree with NW-side decoder. On the other hand, since gNB also lacks the details of UE-side model, e.g., trainable parameter number, it’s therefore challenging for NW to determine the amount of training data. To ensure the E2E SGCS performance of CSI reconstruction, one option for gNB is to share sufficient large dataset for UE-side encoder training, regarded as the best effort mode, while the large dataset may be larger than a dataset with an appropriate data size, thus cause an overhead waste.

Therefore, in view of the above situation, a method to enhance CSI feedback is proposed in some embodiments of the present disclosure. According to this method, at the network side, the network device determines a training dataset and a distance threshold for training an encoder at the terminal device. The encoder is associated with a decoder at the network device, the distance threshold is used to assess a distance loss in the training of the encoder, and the distance loss is associated with a similarity performance related to the encoder and the decoder. The network device further transmits, to the terminal device, the training dataset and the distance threshold. At the terminal device, the terminal device receives, from the network device, the training dataset and the distance threshold for training the encoder at the terminal device. Then, with the received training dataset and distance threshold, the terminal device trains the encoder based on the training dataset and the distance threshold. In this way, end-to-end (E2E) SGCS performance for CSI reconstruction can be ensured at the terminal device and the network device, and overhead waste of training dataset can be avoided or decreased.

FIG. 1A illustrates an example communication system 100 in which some embodiments of the present disclosure can be implemented. The communication system 100, which is a part of a communication network, includes a terminal device (UE) 110-1, a UE 110-2, and a network device 120. The terminal device 110-1 and 110-2 may be, for example, Internet of Things (IoT) devices. The network device 120 may be, for example, a random access network (RAN) device (like an NG-RAN device, also called as gNB) , or a communication module thereof. The network device 120 is associated with a cell 121, and provides communication service to terminal devices (like UE 110-1 and 110-2) in the cell 121. The UE 110-1 and 110-2 may also be referred to as UE 110 collectively. As illustrated in FIG. 1A, the terminal device 110-1 and 110-2 are both in connection with the network device 120.

In the system 100, a link from the network device 120 to terminal device 110 is referred to as a downlink (DL) , while a link from terminal device 110 to the network device 120 is referred to as an uplink (UL) . In downlink, the network device 120 is a transmitting (TX) device (or a transmitter) and terminal device 110 is a receiving (RX) device (or a receiver) . In uplink, terminal device 110 is a transmitting TX device (or a transmitter) and the network device 120 is a RX device (or a receiver) .

The communications in the communication system 100 may conform to any suitable standards including, but not limited to, Long Term Evolution (LTE) , LTE-Advanced (LTE- A) , Wideband Code Division Multiple Access (WCDMA) , Code Division Multiple Access (CDMA) and Global System for Mobile Communications (GSM) and the like. Furthermore, the communications may be performed according to any generation communication protocols either currently known or to be developed in the future. Examples of the communication protocols include, but not limited to, the first generation (1G) , the second generation (2G) , 2.5G, 2.75G, the third generation (3G) , the fourth generation (4G) , 4.5G, the fifth generation (5G) , 5.5G, 5G-Advanced networks, or the sixth generation (6G) communication protocols.

It is to be understood that the number of devices (including terminal device 110 and the network device 120) and their connection relationships and types shown in FIG. 1A are only for illustrative purposes without suggesting any limitation. The communication system 100 may include any suitable number of devices adapted for implementing embodiments of the present disclosure.

FIG. 1B illustrates a schematic diagram 100B of a relationship between SGCS performance and different training data sizes in accordance with some embodiments of the present disclosure. FIG. 1B shows the SGCS performances of different encoders (here, encoder 1 at UE1, encoder 2 at UE2, and encoder 3 at UE3) of different UEs with different training data sizes. The three encoders (i.e., encoder 1, encoder 2 and encoder 3) are of different model complexities. In this example, if the required SGCS is 0.7245, UE3 may only need a training dataset with a 20K data size for training, while UE1 and UE2 may need a training dataset with a 40K data size for training, respectively, to ensure the SGCS requirement. It may be an overhead waste for gNB to send the same 40K training data to UE3.

FIG. 2 illustrates a signaling chart illustrating an example communication process 200 in accordance with some example embodiments of the present disclosure. For the purpose of discussion, the communication process 200 will be described with reference to FIG. 1A. The communication process 200 may involve a terminal device (for example, an IoT device like the terminal device 110 as illustrated in FIG. 1A) and a network device (for example, a network device 120 as illustrated in FIG. 1A) . In the following, the communication process 200 will be described with reference to the terminal device 110 and network device 120 as illustrated in FIG. 1A. As described above, there may be a decoder (for example, the decoder denoted as “DEC” in FIGS. 3A and 5) in the network device 120 and a corresponding encoder (for example, the encoder denoted as “ENC” in FIG. 3A) at the terminal device 110.

As illustrated in FIG. 2, the network device 120 determines (210) a training dataset and a distance threshold for training an encoder at the terminal device 110. The encoder is associated with the decoder at the network device 120. The distance threshold is used to assess a distance loss in the training of the encoder, and the distance loss is associated with an E2E similarity performance related to the encoder and the decoder. The distance loss may be an absolute direct distance, a mean squared error (MSE) , or a cubic error. The similarity performance may be a squared generalized cosine similarity (SGCS) performance. Hereafter, the description may take the MSE as an example of the distance loss, and take SGCS performance of the decoder as an example of the E2E similarity performance.

In order to determine the relationship between the distance loss and the similarity performance, the network device 120 may train the decoder and a corresponding hypothetical encoder with a first loss function defined as the similarity performance, and generate a training dataset between the hypothetical encoder and the decoder. Then, until a number of training epochs reaches a pre-defined or pre-determined maximum training epochs, in each training epoch, the network device 120 may train a distance-oriented encoder based on the generated training dataset with a second loss function defined as a distance loss in the training of the distance-oriented encoder, record an average distance value of codewords which are output of the distance-oriented encoder for the training dataset (the average distance value may represent the distance loss) , input the codewords to the decoder which is frozen to derive reconstructed data, and record an average similarity value based on the reconstructed data (the average similarity value may represent the similarity performance) . After training for the maximum training epochs is finished, the network device 120 may determine the relationship between the distance loss and the similarity performance based on the recorded average distance values and the average similarity values. Such training may be conducted using different training models and with different parameters and different data size of the training dataset. Then the relationship between the distance loss and the similarity performance may be plotted (for example, as a curve) based on the recorded average distance values and the average similarity values.

For example, when the distance loss is defined as MSE and the similarity performance is defined as the SGCS performance, in order to determine the training dataset and the distance threshold, the network device 120 may derive an MSE-SGCS-relationship and the MSE threshold for the terminal device 110 at the network device. This will be described in more detail with reference to FIG. 5.

The network device 120 may determine the training dataset and the distance threshold either voluntarily (i.e., without a request from the terminal device 110 for the training dataset and the distance threshold) , or in response to a request from the terminal device 110 for the training dataset and the distance threshold (this case will be discussed with reference to FIG. 4) .

In the former case, the network device 120 may determine a similarity performance value (for example, an SGCS performance value) by itself for determining the training dataset and the distance threshold at 210.

In the latter case, before the network device 120 determines the training dataset and the distance threshold, the terminal device 110 may transmit to the network device 120 a request for the training dataset and the distance threshold, and the request may comprise a similarity performance value (for example, an SGCS performance value) , i.e., indicating the network device 120 to provide the training dataset and the distance threshold which can satisfy the similarity performance value.

In both cases, the network device 120 may determine the distance threshold (for example, MSE threshold) based on a relationship between the distance loss and the similarity performance and based on the similarity performance value. The relationship between the distance loss and the similarity performance may be presented by a table or a formula. Therefore, with the similarity performance value, the network device 120 may look up the table or put the similarity performance value into the formula to obtain the corresponding distance loss as the distance threshold. As to how to determine the distance threshold based on the relationship and the similarity performance value, more details will be described with reference to FIGS. 3A-3C and 5 with MSE threshold as an example of the distance threshold and SGCS performance as an example of the similarity performance.

After the training dataset and the distance threshold are determined by the network device 120, the network device transmits (215) the training dataset and the distance threshold 201 to the terminal device 110.

On the other side of communication, the terminal device 110 receives (217) the training dataset and the distance threshold 201 from the network device 120. With the training dataset and the distance threshold 201, the terminal device 110 trains (220) the encoder based on the training dataset and the distance threshold 201. Based on the distance threshold determined and assigned from the network device 120, terminal device 110 can monitor the average distance loss of the training dataset and manage its encoder training progress. More specifically, when the average distance loss of the training dataset is lower than the distance threshold, the terminal device 110 can terminate the training process. Alternatively, when the terminal device 110 realizes it is not able to train the encoder using the received training dataset to meet the distance threshold within the maximum training epochs, it may request the network device 120 for a supplementary training dataset for further training of the encoder.

During the training at 220, for example, the terminal device 110 may determine whether the trained encoder meets the distance threshold after a pre-determined or pre-determined maximum training epochs. For example, the number of the pre-determined or pre-determined maximum training epochs may be 10000 times. In other words, after training using the training dataset for 10000 times, the terminal device 110 may determine whether the trained encoder meets the distance threshold (for example, the distance loss denoted as an average MSE value for an training epoch is lower than the distance threshold denoted as an MSE threshold) .

In case that the trained encoder meets the distance threshold (for example, the distance loss is lower than the distance threshold) , which means the trained encoder is good enough to use, the terminal device 110 may terminate the training of the encoder. In case that the trained encoder does not meet the distance threshold (for example, the distance loss is greater than or equal to the distance threshold after pre-determined maximum training epochs) , the terminal device 110 may transmit, to the network device 120, a further request message for a further training dataset.

At the network device 120, on receiving the further request message from the terminal device 110 for the further training dataset, the network device 120 may transmit, to the terminal device 110, the supplementary training dataset as a response to the further request message. At the terminal device 110, on receiving the supplementary training dataset, the terminal device 110 may train the encoder based on the further training dataset and the distance threshold, until the trained encoder meets the distance threshold. Through the example communication process 200, the E2E reconstruction SGCS performance can be ensured between the terminal device 110 and the network device 120.

Hereinbefore, some examples of the present disclosure are generally described with reference to a high level signaling chart FIG. 2. In the following, some further examples of the present disclosure are described with reference to FIGS. 3-6. In the following description, MSE threshold is used as an example of the distance threshold, and SGCS performance is used as the similarity performance.

FIG. 3A illustrates a schematic diagram of an evaluation of the relationship between MSE of the encoder output codeword and the SGCS performance of the reconstructed CSI in accordance with some embodiments of the present disclosure. The upper half of FIG. 3A depicts a NW-first separate training scheme, where no relationship between SGCS and MSE is established, and the lower half of FIG. 3A highlights the MSE-SGCS-relationship evaluation. In this sense, some embodiments of the present disclosure may be considered as an extension and amendment of such a NW-first separate training framework and is intended to further augment the framework including facilitating performance monitoring during UE-side model training.

As shown in FIG. 3A, the NW-first separate training process comprises 3 stages. At the first state, NW (for example, the network device 120 as illustrated in FIGS. 1A and 2) trains the decoder with hypothetical encoder using the E2E SGCS loss function, i.e., At the second stage, NW sends the training dataset to UE (for example, the terminal device 110 as illustrated in FIGS. 1A and 2) . At the third stage, UE trains the actual encoder (i.e., the encoder at UE side) using the MSE loss function (i.e., Loss2 =MSE(Y, Y′) of encoder’s output codeword.

When training the UE-side actual encoder, based on the trained (frozen) NW-side decoder, the relationship between the average MSE of the encoder’s output codeword (e.g., MSE (Y, Y’) in FIG. 3A) and the average SGCS of the ground truth CSI and reconstructed CSI (e.g., SGCS (X, X’) in FIG. 3A) of the training dataset in each epoch are evaluated, for example, in the following FIGS. 3B and 3C. The model for calculating the E2E SGCS is derived by cascading the UE-side encoder and the NW-side decoder together.

The dataset parameters are given in Table 1. As shown in Table 1, the evaluation is conducted in a simulation scenario of UMa (Urban Macro) , where a distribution of UEs is assumed to be indoor 80%while outdoor 20%, and the UE speeds for indoor UE and outdoor UE are supposed to be 3 km/h and 30 km/h, respectively. The carrier frequency used in the simulation is 4 GHz with an SCS of 30 kHz, the bandwidth is 20 MHz. 7 sites with 3 sectors per site are used in the simulation. BS antenna height is 25 meters. Macro cell inter-site distance is supposed to be 200 meters.

Table 1 Dataset parameters

In the evaluation, the NW-side decoder and its corresponding hypothetical encoder are both transformer-backboned. Since it was observed in Rel-18 study that the mismatch between the encoder and decoder which are not transformer-backboned leads to a larger performance degradation in separate training for CSI compression, only the transformer-backboned UE-side encoders is evaluated to check the above-mentioned MSE-SGCS relationship. All the cases are evaluated using scalar quantization of 2 bits per element and CR (compression ratio) = 1/32. The hyperparameters involved in the model training are shown in Table 2.

Table 2 Hyperparameters for model training

More specifically, in the example as illustrated in Table 2, in the model training, data size of training data samples for NW-side decoder is 80K, the trainable parameters of NW-side decoder is 1.1M. The training samples (training dataset) for UE-side encoder may have a data size of 20K, 40K or 80K. UE-side encoder may adopt transformer TF1, TF2, TF3 and TF4, and the trainable parameters of TF1, TF2, TF3 and TF4 may be 0.3M, 0.7M, 1.1M and 4.5M, respectively. Value of the parameter “learning rate” is selected to be “5e-4” . A maximum number of training epochs is set to be 200. Batch size is set to be 128, and the loss function is selected to be SGCS.

It is to be noted the parameters shown in Tables 1 and 2 are only for illustrative purpose and the model training may also adopt other parameters. For example, data size of training data samples for NW-side decoder may also be 60K or 100K.

During each training epoch of the UE-side encoder, average MSE values of the training dataset and the corresponding average SGCS values are collected. Four different encoders (i.e., TF1, TF2, TF3 and TF4) and three training data volumes (i.e., 20K, 40K, and 80K) are experimented. Part of the collected evaluation results are shown in FIGS. 3B and 3C.

FIG. 3B illustrates a schematic diagram 300B for a converging monotonic relationship between MSE and SGCS in accordance with some embodiments of the present disclosure. FIG. 3C illustrates another schematic diagram 300C for a converging monotonic relationship between MSE and SGCS in accordance with some embodiments of the present disclosure.

As can be seen from FIGS. 3B and 3C, there is a converging monotonic relationship between the average MSE for the training dataset of the actual encoder’s output codeword and the average SGCS performance of the reconstructed CSI in each training epoch. This convergence is particularly pronounced in cases with high SGCS and can be fit into a line/curve, as illustrated in FIGS. 3B and 3C.

The monotonic relationship between the MSE values and the corresponding SGCS values is agnostic to the model (TF in this case) complexity and training dataset volume. In other words, the monotonic relationship between the MSE values and the corresponding SGCS values is irrelevant to the model complexity and training dataset volume. For example, as shown in FIG. 3B, different models (here, TF1, TF2, TF3 and TF4) exhibit the same monotonic relationship between the MSE values and the corresponding SGCS values, with the training dataset volume being the same (here, 80K) . Therefore, once the NW-side decoder is trained, regardless of the actual encoder’s model complexity and training dataset volume, MSE-SGCS-relationship always converges to the same relationship. For this reason, the MSE-SGCS-relationship may be obtained using hypothetical encoder and MSE-oriented hypothetical encoder (which is described in details with reference to FIG. 5) to guide UE-side encoder training.

As mentioned above, since this MSE-SGCS-relationship is agnostic (/irrelevant) to the actual encoder’s model complexity and training dataset volume, NW can pre-evaluate the MSE-SGCS-relationship with any ‘hypothetical UE-side encoder’ using the training dataset. This pre-evaluated MSE-SGCS-relationship can be stored as a numerical table or fit into a parametric formula at NW-side.

This MSE-SGCS-relationship can be utilized to assess the E2E performance at UE-side and provide training guidance to UE-side training (like determining whether the encoder has been trained to reach a satisfying performance, indicating whether UE needs more training data to improve the performance) .

For the above examples, the observations can be conducted at either UE side or at NW side, and the phenomenon is irrespective of UE or NW that there is a converging monotonic relationship between the average MSE of the actual encoder’s output codeword and the average SGCS of the reconstructed CSI in the training dataset in each training epoch, no matter what models (for example, TF1, TF2, TF3, or TF4) or what training dataset volume (for example, 20K, 40K, or 80K) is used. Therefore, NW may obtain (or determine) the relationship at NW side and determine a corresponding MSE threshold if an SGCS performance value is decided (for example, indicated by UE or determined by NW or pre-determined or pre-defined) .

FIG. 4 illustrates a signaling chart illustrating an example communication process 400 in accordance with some example embodiments of the present disclosure. The communication process 400 may be used for UE-side training guidance and further on-demand supplementary training dataset transmission in NW-first separate training scheme for CSI compression. For the purpose of discussion, the communication process 400 will be described with reference to FIG. 1A. The communication process 400 may involve UE 410 which may be an example of the terminal device 110 as illustrated in FIG. 1A and gNB 420 which may be an example of the network device 120 as illustrated in FIG. 1A. In the following, the communication process 400 will be described with reference to the UE 410 and gNB 420.

As illustrated in FIG. 4, at 425, gNB 420 maintains a standardized decoder. Meanwhile, gNB 420 also maintains the MSE-SGCS-relationship of training the MSE-oriented encoder.

Then, at 430, UE 410 transmits a request to gNB 420 for UE-side encoder’s training dataset as well as the MSE threshold for the target SGCS. In other words, in addition to requesting gNB 420 for transmission of training dataset, UE 410 also requests gNB 420 for the MSE threshold of its target SGCS.

The operations at 430 may be optional. In other words, gNB 420 may also determine the target SGCS by itself and transmit the corresponding training dataset and MSE threshold to UE 410.

At 435, once gNB 420 receives the target SGCS from UE 410 in the request transmitted from UE 410 at 430 or determined the target SGCS by itself, gNB 420 checks the reference table or use the pre-stored formula to obtain the corresponding MSE value as the MSE threshold and then send it back to UE 410. Here, the reference table is a table in which the relationship between the MSE value and the SGCS performance value (hereafter, also referred to as “MSE-SGCS-relationship” for simplicity) are recorded, and the pre-stored formula is a formula representing the relationship between the MSE value and the SGCS performance value. For example, if SGCS value desired by UE 410 or determined by gNB 420 is 0.7245, gNB 420 may refer to a local-stored MSE-SGCS-relationship (for example, as shown in FIGS. 3B and 3C) and determine that, when SGCS=0.7245, 1/log (MSE) =-0.2, thus the corresponding MSE equals about 0.0067. In this way, the MSE threshold corresponding to an SGCS value of 0.7245 is determined as 0.0067.

At 440, UE 410 uses the training dataset to train the actual encoder at UE 410. The received MSE threshold works as an indicator to guide the training of UE 410. The training can be terminated once the MSE loss is lower than the received MSE threshold.

At 445, after a pre-determined or pre-defined maximum training epoch, UE 410 determines whether the trained encoder meet the MSE threshold. If the encoder does not meet the MSE threshold, operations at 450, 455 and 460 are performed.

At 450, UE 410 transmits a request to gNB 420 for supplementary training dataset. On the other side of communication, gNB 420 receives the request for supplementary training dataset.

At 455, in response to the received request, gNB 420 transmits the requested supplementary training dataset to UE 410. On the other side of communication, UE 410 receives the supplementary training dataset from gNB 420. With the received supplementary training dataset at 455, UE 410 further trains its encoder.

At 460, the operations at 440, 445, 450 and 455 are repeated till the training loss (here, MSE) of the encoder at UE 410 is lower than the MSE threshold. For example, it is assumed that SGCS desired by UE 410 is 0.7245 and the corresponding MSE threshold is 0.0067, as mentioned above with reference to FIGS. 3B and 3C. If the relationship between training dataset volume and MSE is like what is shown in FIG. 6, and UE 410 adopts TF2 as encoder and trains it with a 20K training dataset, then UE 410 is only capable to reach an MSE of 0.0087, while the desired MSE is less than 0.0067. Therefore, it implies that UE 410 needs more data for training. So UE 410 may request, at 450 to gNB 420, a 20K supplementary training dataset for further training of its encoder, and receive the requested 20K supplementary training dataset. Then, when UE 410 further trains the encoder with the requested 20K supplementary training dataset, its MSE will be reduced to 0.0067, which can meet the MSE requirement. On the other hand, with the same assumption, if UE 410 adopts TF4 as encoder and trains it with a 20K training dataset, it is capable to reach the MSE requirement of 0.0067. Therefore, in this case, there is no need to for supplemental training dataset, which saves much transmission overhead for UE 410 and gNB 420 compared to a best-effort scheme.

FIG. 5 illustrates a schematic workflow 500 for deriving an example MSE-SGCS-relationship in accordance with some embodiments of the present disclosure. In workflow 500, NW (for example, the network device 120 as illustrated in FIGS. 1A and 2, or gNB 420 as illustrated in FIG. 4) derives the MSE-SGCS-relationship with hypothetical UE-side encoders (it is to be noted these “hypothetical UE-side encoders” are actually present at NW, and are only “hypothetical” , for deriving the MSE-SGCS-relationship) , which are trained using MSE loss function and can be regarded as MSE-oriented encoders.

As shown in FIG. 5, at STEP-1, NW trains the decoder (denoted as “DEC” ) and the corresponding hypothetical encoder (denoted as “Hypo ENC” ) for CSI compression with the E2E loss function, e.g., where X is the original CSI (i.e., input to the Hypo ENC) , Y is the codeword between the hypothetical encoder and decoder, andis the reconstructed CSI (output from DEC) from its input Y, which is also the output of the hypothetical encoder. In other words, after the reconstructed CSI, i.e., is obtained, E2E loss function, e.g., can be evaluated to assess the E2E similarity performance between the Hypo ENC (which hypothetically represents the encoder at UE) and the DEC (which represents the decoder at NW/gNB) . At STEP-2, NW generates training dataset { (X, Y) } , where Y is the codeword between the hypothetical encoder and decoder, as mentioned above.

At STEP-3, NW trains MSE-oriented hypothetical encoders based on the generated dataset { (X, Y) } with MSE (Y, Y′) as loss function, i.e., Loss2 = MSE (Y, Y′) , where Y is the codeword between the MSE-oriented hypothetical encoder and decoder, as mentioned above, and Y′ is output from the MSE-oriented hypothetical encoders. Note here “MSE-oriented hypothetical encoders” are different from “hypothetical encoder” in STEP-1 in that, “hypothetical encoder” is used for evaluating the E2E similarity loss, i.e., SGCS, while “MSE-oriented hypothetical encoders” are used for evaluating distance loss, i.e., here MSE loss. To obtain the MSE loss, several different MSE-oriented hypothetical encoders may be used, and these MSE-oriented hypothetical encoders may be trained with training datasets with different data size. For example, TF1, TF2, TF3 and TF4 as illustrated in FIGS. 4B and 4C may be examples of the MSE-oriented hypothetical encoders, and 80K, 40K and 20K may be examples of different data size of the training datasets. NW records the average MSE values of the codewords in training dataset in each training epoch.

At STEP-4, during the same training process, by freezing the decoder, the output codewords of the MSE-oriented encoder are input to the frozen decoder to derive the reconstructed CSI. NW records the average SGCS values of the CSI in training dataset in each epoch.

At STEP-5, with the average MSE values and average SGCS values from each training epoch of different MSE-oriented hypothetical encoders trained with training datasets of same or different sizes (also refer to FIGS. 3B and 3C) , NW attains the MSE-SGCS-relationship with which NW can assign the MSE threshold to UE, for example, in response to UE’s required SGCS. This relationship can be stored as a numerical table or fit into a parametric formula. This MSE-SGCS-relationship can be utilized to assess the E2E performance at UE-side and provide training guidance to UE-side training (like determining whether the encoder has been trained to reach a satisfying performance, indicating whether UE needs more training data to boost the performance) .

In this way, based on the MSE threshold determined and assigned from NW, UE can monitor the average MSE of the training dataset and manage its encoder training progress. More specifically, when the average MSE of the training dataset is lower than the MSE threshold at UE side, UE can terminate the training process. Alternatively, when UE realizes it is not able to train the encoder to meet the MSE threshold within the maximum training epochs, it would request NW for a supplementary training dataset for further training.

FIG. 6 illustrates a schematic diagram of a relationship between MSE and different training data sizes in accordance with some embodiments of the present disclosure. FIG. 6 shows the MSE of different encoders (here, TF1, TF2, TF3, and TF4) of different UEs, with different training data sizes. The four encoders (i.e., TF1, TF2, TF3, and TF4) are of different model complexities. In this example, if the required SGCS is 0.7245 and the corresponding MSE threshold is 0.0067, the UE having TF4 encoder may only need 20K data for training, while the UE having TF2 encoder and the UE having TF3 encoder may need 40K training data to ensure the SGCS requirement. Therefore, it may be an overhead waste for gNB to send the same 40K training data to the UE having TF4 encoder.

FIG. 7 illustrates a flowchart of an example method 700 implemented at a terminal device (for example, the terminal device 110 as illustrated in FIGS. 1A and 2) in accordance with some other embodiments of the present disclosure. For the purpose of discussion, the method 700 will be described from the perspective of the terminal device 110 with reference to FIGS. 1A and 2.

As illustrated in FIG. 7, at block 710, the terminal device 110 receives, from a network device (for example, the network device 120 as illustrated in FIGS. 1A and 2) , a training dataset and a distance threshold (for example, the training dataset and distance threshold 201 as illustrated in FIG. 2) for training an encoder at the terminal device 110. The encoder is associated with a decoder at the network device, the distance threshold is used to assess a distance loss in the training of the encoder, and the distance loss is associated with a similarity performance related to the encoder and the decoder. At block 720, the terminal device 110 trains the encoder based on the training dataset and the distance threshold.

In some example embodiments, in order to receive the training dataset and the distance threshold, the terminal device 110 may further transmit, to the network device, a request for the training dataset and the distance threshold, and then receive the training dataset and the distance threshold from the network device as a response to the request.

In some example embodiments, the request may comprise a similarity performance value of the similarity performance.

In some example embodiments, in order to train the encoder based on the training dataset and the distance threshold, the terminal device 110 may determine whether the trained encoder meets the distance threshold.

In some example embodiments, if the terminal device 110 determines that the trained encoder meets the distance threshold, the terminal device 110 may terminate the training of the encoder.

In some example embodiments, if the terminal device 110 determines that the trained encoder does not meet the distance threshold after pre-determined maximum training epochs, the terminal device 110 may further transmit, to the network device, a further request message for a further training dataset, and receive from the network device, a further response message comprising the further training dataset, then the terminal device 110 may further train the encoder based on the further training dataset and the distance threshold.

In some example embodiments, in order to determine whether the trained encoder meets the distance threshold, the terminal device 110 may determine whether the distance loss is lower than the distance threshold.

In some example embodiments, the distance loss may comprise one of an absolute direct distance, a mean squared error (MSE) , or a cubic error, and/or the similarity performance may comprise a squared generalized cosine similarity (SGCS) performance.

FIG. 8 illustrates a flowchart of an example method 800 implemented at a network device (for example, the network device 120 as illustrated in FIGS. 1A and 2) in accordance with some other embodiments of the present disclosure. For the purpose of discussion, the method 800 will be described from the perspective of the network device 120 with reference to FIGS. 1A and 2.

As illustrated in FIG. 8, at block 810, the network device 120 determines a training dataset and a distance threshold for training an encoder at the terminal device. The encoder is associated with a decoder at the network device, the distance threshold is used to assess a distance loss in the training of the encoder, and the distance loss is associated with a similarity performance related to the encoder and the decoder. At block 820, the network device 120 transmits, to the terminal device, the training dataset and the distance threshold (for example, the training dataset and the distance threshold 201 as illustrated in FIG. 2) .

In some example embodiments, prior to transmitting the training dataset and the distance threshold, the network device 120 may receive, from the terminal device, a request for the training dataset and the distance threshold, and then transmit the training dataset and the distance threshold to the terminal device as a response to the request.

In some example embodiments, prior to determining the training dataset and the distance threshold, the network device 120 may further determine a similarity performance value.

In some example embodiments, the network device 120 may further receive, from the terminal device, a further request message for a further training dataset, and transmit, to the terminal device, the further training dataset as a response to the further request message.

In some example embodiments, the network device 120 may determine the distance threshold based on a relationship between the distance loss and the similarity performance and the similarity performance value.

In some example embodiments, the relationship between the distance loss and the similarity performance may be represented by a table or a formula.

In some example embodiments, in order to determine the relationship between the distance loss and the similarity performance, the network device 120 may train the decoder and a corresponding hypothetical encoder with a first loss function defined as the similarity performance, and generate a training dataset between the hypothetical encoder and the decoder. And, until a number of training epochs reaches a pre-determined maximum training epochs, in a training epoch at the network device 120, the network device 120 may further train a distance-oriented encoder based on the generated training dataset with a second loss function defined as a distance loss in the training of the distance-oriented encoder, record an average distance value of codewords (the average distance value may represent the distance loss) which are output of the distance-oriented encoder for the training dataset, input the codewords to the decoder which is frozen to derive reconstructed data, and record an average similarity value (the average similarity value may represent the similarity performance) based on the reconstructed data. Then, with the recorded average distance values and the average similarity values, the network device 120 may determine the relationship between the distance loss and the similarity performance based on the recorded average distance values and the average similarity values.

In some embodiments, an apparatus capable of performing the method 700 may comprise means for performing the respective steps of the method 700. The means may be implemented in any suitable form. For example, the means may be implemented in a circuitry or software module.

In some example embodiments, the apparatus comprises: means for receiving a training dataset and a distance threshold for training an encoder at the terminal device, the encoder is associated with a decoder at the network device, the distance threshold is used to assess a distance loss in the training of the encoder, and the distance loss is associated with a similarity performance related to the encoder and the decoder; and means for training the encoder based on the training dataset and the distance threshold.

In some example embodiments, the means for receiving the training dataset and the distance threshold may comprise means for transmitting, to the network device, a request for the training dataset and the distance threshold, and means for receiving the training dataset and the distance threshold from the network device as a response to the request.

In some example embodiments, the means for training the encoder based on the training dataset and the distance threshold may comprise means for determining whether the trained encoder meets the distance threshold.

In some example embodiments, if the means for determining determines that the trained encoder meets the distance threshold, the apparatus may further comprise means for terminating the training of the encoder.

In some example embodiments, if the means for determining determines that the trained encoder does not meet the distance threshold after pre-determined maximum training epochs, the apparatus may further comprise means for transmitting, to the network device, a further request message for a further training dataset, and means for receiving, from the network device, a further response message comprising the further training dataset. The apparatus may further comprise means for training the encoder based on the further training dataset and the distance threshold.

In some example embodiments, the means for determining whether the trained encoder meets the distance threshold may comprise means for determining whether the distance loss is lower than the distance threshold.

In some embodiments, the apparatus further comprises means for performing other steps in some embodiments of the method 700. In some embodiments, the means comprises at least one processor and at least one memory including computer program code, the at least one memory and computer program code configured to, with the at least one processor, cause the performance of the apparatus.

In some embodiments, an apparatus capable of performing the method 800 may comprise means for performing the respective steps of the method 800. The means may be implemented in any suitable form. For example, the means may be implemented in a circuitry or software module.

In some example embodiments, the apparatus comprises: means for determining a training dataset and a distance threshold for training an encoder at the terminal device, the encoder is associated with a decoder at the network device, the distance threshold is used to assess a distance loss in the training of the encoder, and the distance loss is associated with a similarity performance related to the encoder and the decoder; and means for transmitting the training dataset and the distance threshold.

In some example embodiments, prior to transmitting the training dataset and the distance threshold, the apparatus may comprise means for receiving, from the terminal device, a request for the training dataset and the distance threshold, and means for transmitting the training dataset and the distance threshold to the terminal device as a response to the request.

In some example embodiments, prior to determining the training dataset and the distance threshold, the apparatus may comprise means for determining a similarity performance value.

In some example embodiments, the apparatus may further comprise means for receiving, from the terminal device, a further request message for a further training dataset, and means for transmitting, to the terminal device, the further training dataset as a response to the further request message.

In some example embodiments, the apparatus may comprise means for determining the distance threshold based on a relationship between the distance loss and the similarity performance and the similarity performance value.

In some example embodiments, the means for determining the relationship between the distance loss and the similarity performance may comprise means for training the decoder and a corresponding hypothetical encoder with a first loss function defined as the similarity performance, and means for generating a training dataset between the hypothetical encoder and the decoder. And, until a number of training epochs reaches a pre-determined maximum training epochs, the means for determining the relationship may further comprise means for training a distance-oriented encoder based on the generated training dataset with a second loss function defined as a distance loss in the training of the distance-oriented encoder, means for recording an average distance value of codewords (the average distance value may represent the distance loss) which are output of the distance-oriented encoder for the training dataset, means for inputting the codewords to the decoder which is frozen to derive reconstructed data, and means for recording an average similarity value (the average similarity value may represent the similarity performance) based on the reconstructed data. Then, with the recorded average distance values and the average similarity values, the apparatus may further comprise means for determine the relationship between the distance loss and the similarity performance based on the recorded average distance values and the average similarity values.

In some example embodiments, the distance loss may comprise one of the an absolute direct distance, a mean squared error (MSE) , or a cubic error, and/or the similarity performance may comprise a squared generalized cosine similarity (SGCS) performance.

In some embodiments, the apparatus further comprises means for performing other steps in some embodiments of the method 800. In some embodiments, the means comprises at least one processor and at least one memory including computer program code, the at least one memory and computer program code configured to, with the at least one processor, cause the performance of the apparatus.

FIG. 9 illustrates a simplified block diagram of a device 900 that is suitable for implementing some example embodiments of the present disclosure. The device 900 may be provided to implement a communication device, for example, the terminal device 110 and the network device 120 as shown in FIGS. 1A and 2. As shown, the device 900 includes one or more processors 910, one or more memories 920 coupled to the processor 910, and one or more communication modules 940 coupled to the processor 910.

The communication module 940 is for bidirectional communications. The communication module 940 has at least one antenna to facilitate communication. The communication interface may represent any interface that is necessary for communication with other network elements.

The processor 910 may be of any type suitable for the local network and may include one or more of the following: general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs) and processors based on multicore processor architecture, as non-limiting examples. The device 900 may have multiple processors, such as an application specific integrated circuit chip that is slaved in time to a clock which synchronizes the main processor.

The memory 920 may include one or more non-volatile memories and one or more volatile memories. Examples of the non-volatile memories include, but are not limited to, a Read Only Memory (ROM) 924, an electrically programmable read only memory (EPROM) , a flash memory, a hard disk, a compact disc (CD) , a digital video disk (DVD) , and other magnetic storage and/or optical storage. Examples of the volatile memories include, but are not limited to, a random access memory (RAM) 922 and other volatile memories that will not last in the power-down duration.

A computer program 930 includes computer executable instructions that are executed by the associated processor 910. The program 930 may be stored in the ROM 924. The processor 910 may perform any suitable actions and processing by loading the program 930 into the RAM 922.

The embodiments of the present disclosure may be implemented by means of the program 930 so that the device 900 may perform any process of the disclosure as discussed with reference to FIGS. 2, 4 and 7-8. The embodiments of the present disclosure may also be implemented by hardware or by a combination of software and hardware.

In some example embodiments, the program 930 may be tangibly contained in a computer-readable medium which may be included in the device 900 (such as in the memory 920) or other storage devices that are accessible by the device 900. The device 900 may load the program 930 from the computer-readable medium to the RAM 922 for execution. The computer-readable medium may include any types of tangible non-volatile storage, such as ROM, EPROM, a flash memory, a hard disk, CD, DVD, and the like.

FIG. 10 illustrates a block diagram of an example of a computer-readable medium 1000 in accordance with some example embodiments of the present disclosure. The computer-readable medium 1000 has the program 930 stored thereon. It is noted that although the computer-readable medium 1000 is depicted in form of CD or DVD in FIG. 10, the computer-readable medium 1000 may be in any other form suitable for carry or hold the program 930.

Generally, various embodiments of the present disclosure may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. Some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device. While various aspects of embodiments of the present disclosure are illustrated and described as block diagrams, flowcharts, or using some other pictorial representations, it is to be understood that the block, apparatus, system, technique or method described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.

The present disclosure also provides at least one computer program product tangibly stored on a non-transitory computer-readable storage medium. The computer program product includes computer-executable instructions, such as those included in program modules, being executed in a device on a target real or virtual processor, to carry out any of the method 200, 400 and 700-800 as described above with reference to FIGS. 2, 4 and 7-8. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, or the like that perform particular tasks or implement particular abstract data types. The functionality of the program modules may be combined or split between program modules as desired in various embodiments. Machine-executable instructions for program modules may be executed within a local or distributed device. In a distributed device, program modules may be located in both local and remote storage media.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowcharts and/or block diagrams to be implemented. The program code may execute entirely on a machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present disclosure, the computer program codes or related data may be carried by any suitable carrier to enable the device, apparatus or processor to perform various processes and operations as described above. Examples of the carrier include a signal, computer-readable medium, and the like.

The computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium. A computer-readable medium may include but not limited to an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of the computer-readable storage medium would include an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM) , a read-only memory (ROM) , an erasable programmable read-only memory (EPROM or Flash memory) , an optical fiber, a portable compact disc read-only memory (CD-ROM) , an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. The term “non-transitory, ” as used herein, is a limitation of the medium itself (i.e., tangible, not a signal) as opposed to a limitation on data storage persistency (e.g., RAM vs. ROM) .

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are contained in the above discussions, these should not be construed as limitations on the scope of the present disclosure, but rather as descriptions of features that may be specific to particular embodiments. Certain features that are described in the context of separate embodiments may also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment may also be implemented in multiple embodiments separately or in any suitable sub-combination.

Although the present disclosure has been described in languages specific to structural features and/or methodological acts, it is to be understood that the present disclosure defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

A terminal device comprising:

at least one processor; and

at least one memory storing instructions that, when executed by the at least one processor, cause the terminal device to:

receive, from a network device, a training dataset and a distance threshold for training an encoder at the terminal device, wherein the encoder is associated with a decoder at the network device, the distance threshold is used to assess a distance loss in the training of the encoder, and the distance loss is associated with a similarity performance related to the encoder and the decoder; and

train the encoder based on the training dataset and the distance threshold.
The terminal device of claim 1, wherein the terminal device is caused to receive the training dataset and the distance threshold by:

transmitting, to the network device, a request for the training dataset and the distance threshold; and

receiving, from the network device, the training dataset and the distance threshold as a response to the request.
The terminal device of claim 2, wherein the request comprises a similarity performance value of the similarity performance.
The terminal device of any of claims 1 to 3, wherein the terminal device is caused to train the encoder based on the training dataset and the distance threshold by:

determining whether the trained encoder meets the distance threshold.
The terminal device of claim 4, wherein the terminal device is further caused to:

based on determining that the trained encoder meets the distance threshold, terminate the training of the encoder.
The terminal device of claim 4, wherein the terminal device is further caused to:

based on determining that the trained encoder does not meet the distance threshold after pre-determined maximum training epochs,

transmit, to the network device, a further request message for a further training dataset;

receive, from the network device, a further response message comprising the further training dataset; and

train the encoder based on the further training dataset and the distance threshold.
The terminal device of any of claims 4 to 6, wherein the terminal device is caused to determine whether the trained encoder meets the distance threshold by:

determining whether the distance loss is lower than the distance threshold.
The terminal device of any of claims 1 to 7, wherein at least one of the following:

the distance loss comprises one of an absolute direct distance, a mean squared error (MSE) , or a cubic error; or

the similarity performance comprises a squared generalized cosine similarity (SGCS) performance.
A network device comprising:

at least one processor; and

at least one memory storing instructions that, when executed by the at least one processor, cause the terminal device to:

determine a training dataset and a distance threshold for training an encoder at the terminal device, wherein the encoder is associated with a decoder at the network device, the distance threshold is used to assess a distance loss in the training of the encoder, and the distance loss is associated with a similarity performance related to the encoder and the decoder; and

transmit, to the terminal device, the training dataset and the distance threshold.
The network device of claim 9, wherein the network device is caused to transmit the training dataset and the distance threshold by:

receiving, from the terminal device, a request for the training dataset and the distance threshold; and

transmitting, to the terminal device, the training dataset and the distance threshold as a response to the request.
The network device of claim 10, wherein the request comprises a similarity performance value of the similarity performance.
The network device of claim 9, wherein prior to determining the training dataset and the distance threshold, the network device is further caused to:

determine a similarity performance value.
The network device of any of claims 9 to 12, wherein the network device is further caused to:

receive, from the terminal device, a further request message for a further training dataset; and

transmit, to the terminal device, the further training dataset as a response to the further request message.
The network device of any of claims 11 to 13, wherein the network device is caused to determine the distance threshold based on:

a relationship between the distance loss and the similarity performance; and

the similarity performance value.
The network device of claim 14, wherein the relationship between the distance loss and the similarity performance is represented by a table or a formula.
The network device of claim 14 or 15, wherein the network device is caused to determine the relationship between the distance loss and the similarity performance by:

training, at the network device, the decoder and a corresponding hypothetical encoder with a first loss function defined as the similarity performance;

generating a training dataset between the hypothetical encoder and the decoder;

until a number of training epochs reaches a pre-determined maximum training epochs, in a training epoch at the network device,

training a distance-oriented encoder based on the generated training dataset with a second loss function defined as a distance loss in the training of the distance-oriented encoder,

recording an average distance value of codewords which are output of the distance-oriented encoder for the training dataset, the average distance value representing the distance loss,

inputting the codewords to the decoder which is frozen to derive reconstructed data, and

recording an average similarity value based on the reconstructed data, the average similarity value representing the similarity performance; and

determining the relationship between the distance loss and the similarity performance based on the recorded average distance values and the average similarity values.
The network device of any of claims 9 to 16, wherein at least one of the following:

the distance loss comprises one of an absolute direct distance, a mean squared error (MSE) , or a cubic error, or

the similarity performance comprises a squared generalized cosine similarity (SGCS) performance.
A method comprising:

receiving, at a terminal device and from a network device, a training dataset and a distance threshold for training an encoder at the terminal device, wherein the encoder is associated with a decoder at the network device, the distance threshold is used to assess a distance loss in the training of the encoder, and the distance loss is associated with a similarity performance related to the encoder and the decoder; and

training the encoder based on the training dataset and the distance threshold.
A method comprising:

determining, by a network device, a training dataset and a distance threshold for training an encoder at the terminal device, wherein the encoder is associated with a decoder at the network device, the distance threshold is used to assess a distance loss in the training of the encoder, and the distance loss is associated with a similarity performance related to the encoder and the decoder; and

transmitting, at the network device and to the terminal device, the training dataset and the distance threshold.
An apparatus comprising:

means for receiving a training dataset and a distance threshold for training an encoder at the terminal device, wherein the encoder is associated with a decoder at the network device, the distance threshold is used to assess a distance loss in the training of the encoder, and the distance loss is associated with a similarity performance related to the encoder and the decoder; and

means for training the encoder based on the training dataset and the distance threshold.
An apparatus comprising:

means for determining a training dataset and a distance threshold for training an encoder at the terminal device, wherein the encoder is associated with a decoder at the network device, the distance threshold is used to assess a distance loss in the training of the encoder, and the distance loss is associated with a similarity performance related to the encoder and the decoder; and

means for transmitting the training dataset and the distance threshold.
A non-transitory computer-readable medium comprising program instructions stored thereon, the instructions, when executed on at least one processor, causing the at least one processor to perform the method of claim 18 or 19.