[go: up one dir, main page]

WO2024255785A1 - Procédé de communication et appareil de communication - Google Patents

Procédé de communication et appareil de communication Download PDF

Info

Publication number
WO2024255785A1
WO2024255785A1 PCT/CN2024/098890 CN2024098890W WO2024255785A1 WO 2024255785 A1 WO2024255785 A1 WO 2024255785A1 CN 2024098890 W CN2024098890 W CN 2024098890W WO 2024255785 A1 WO2024255785 A1 WO 2024255785A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
training
data corresponding
data
requirements
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/CN2024/098890
Other languages
English (en)
Chinese (zh)
Inventor
廉晋
田畅
蔡世杰
刘鹍鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of WO2024255785A1 publication Critical patent/WO2024255785A1/fr
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/06Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station
    • H04B7/0613Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission
    • H04B7/0615Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission of weighted versions of same signal
    • H04B7/0619Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission of weighted versions of same signal using feedback from receiving side
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/06Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/06Testing, supervising or monitoring using simulated traffic

Definitions

  • the present application relates to the field of communication technology, and in particular to a communication method and a communication device.
  • AI artificial intelligence
  • centralized AI model training methods and distributed AI model training methods can be used.
  • centralized AI model training refers to the use of a large amount of information/data collected from multiple sub-nodes and stored in the central node to conduct centralized model training, so that the trained model can achieve good classification/inference effects.
  • the centralized AI model training method also faces many problems: the scale of AI models tends to grow exponentially over time, the training complexity is too high, and the model training time and training convergence are difficult to meet the requirements; considering the limitations of user data privacy protection, centralized AI model training also faces problems such as difficulty in data acquisition and high data interaction overhead.
  • distributed AI model training does not require collecting and aggregating data from multiple sub-nodes to a central node for centralized training. Instead, each sub-node is first trained locally based on local data, and then the intermediate results (such as model parameters or gradients, etc.) obtained during training are transferred between the sub-nodes and the central node. Finally, the intermediate results obtained are fused at the central node to complete the training and update of the entire model.
  • the present application provides a communication method and a communication device, which can better plan and schedule sub-nodes for distributed training, thereby helping to improve the efficiency of distributed training.
  • the present application provides a communication method, which is performed by a first device.
  • the first device may be a central node of a training network (such as a base station, a terminal, etc.), or a component of a central node (such as a processor, a chip, or a chip system, etc.), or a logic module that can implement all or part of the functions of a central node.
  • the first device sends multiple first messages, and the multiple first messages are used to request multiple second devices to perform distributed training.
  • the first device receives multiple second messages, and any second message of the multiple second messages includes one or more of the following information: training status information of the second device corresponding to the second message, data status information of the second device corresponding to the second message, and first indication information indicating whether the second device corresponding to the second message meets the training requirements.
  • the first device schedules the second device participating in the distributed training according to the multiple second messages.
  • the first device through the information feedback of the second device (such as the sub-node in the training network), the first device (such as the central node in the training network) can better plan and schedule the sub-nodes for distributed training, which is conducive to improving the efficiency of distributed training.
  • the first message includes second indication information
  • the second indication information is used to indicate that the second device corresponding to the request should perform distributed training.
  • the first device can specifically instruct the corresponding (specified) second device to perform distributed training through the second indication information to obtain the training status information and/or data status information of the corresponding second device, which is conducive to the first device to better plan and schedule sub-nodes for distributed training.
  • the first message further includes one or more of the following information: full model parameters trained at the current moment, sub-model parameters of the corresponding second device trained at the current moment, and training round information.
  • the training round information includes a first training number for requesting the second device to perform a complete training or a parameter update on all training samples, and/or a second training number for requesting the second device to perform a complete training or a parameter update on part of the training samples.
  • the first message sent by the first device to the second device can carry training-related parameters, so that the second device can feedback corresponding training data, which is conducive to improving the efficiency of distributed training.
  • the training state information of the second device corresponding to the second message includes convergence information obtained by the second device after the first training times and/or the second training times according to the training model at the current moment and the corresponding data; the convergence information includes cross Validation accuracy or cross-validation test error.
  • the first device schedules the corresponding second device to participate in the distributed training according to the training state information.
  • the first device may preferentially schedule the second device with lower cross-validation accuracy to participate in the distributed training.
  • the first device can schedule the corresponding second device to participate in distributed training according to the training status information (such as training model, convergence information obtained from training, etc.) specifically fed back by the second device (such as giving priority to scheduling the second device with lower cross-validation accuracy, the data of the second device has a greater demand for model updating, and the previous model may not have been traversed to the second device).
  • the training status information such as training model, convergence information obtained from training, etc.
  • the second device such as giving priority to scheduling the second device with lower cross-validation accuracy, the data of the second device has a greater demand for model updating, and the previous model may not have been traversed to the second device.
  • the data status information includes one or more of the following information: quality information of the data corresponding to the second device, environmental information of the second device, first feature information of the data corresponding to the second device, second feature information of the data corresponding to the second device, computing power information of the second device, and communication capability information of the second device.
  • the quality information of the data corresponding to the second device includes one or more of the following information: the signal-to-noise ratio of the data corresponding to the second device, the signal-to-interference-to-noise ratio of the data corresponding to the second device, and the number of samples of the data corresponding to the second device.
  • the environmental information of the second device includes the location information and/or environmental parameters of the second device.
  • the first feature information of the data corresponding to the second device includes the first feature distribution of the data corresponding to the second device.
  • the second feature information of the data corresponding to the second device includes the second feature distribution obtained by feature extraction for the data corresponding to the second device.
  • the computing power information of the second device includes one or more of the following information: the calculation delay of the second device, the number of operations that can be performed per unit time, and the number of multiplications/additions that can be performed per unit time.
  • the communication capability information of the second device includes one or more of the following information: the communication delay of the second device, the channel quality indication CQI corresponding to the second device, the rank indication RI of the channel corresponding to the second device, and the reference signal received power RSRP of the channel corresponding to the second device.
  • the first device schedules the corresponding second device to participate in the distributed training according to the data status information. For example, the first device may preferentially schedule the second device whose training samples differ/deviate greatly from the previous distributed training.
  • the first device can schedule the corresponding second device to participate in distributed training based on the data status information specifically fed back by the second device (such as the first feature information and the existing training samples of the first device, or the difference/deviation degree from the training samples of the previous distributed training, etc.) (such as giving priority to scheduling the second device whose training samples have a larger difference/deviation degree from the previous distributed training, because the data of the second device has a greater demand for model updating, and the previous model may not have traversed to the second device).
  • the data status information specifically fed back by the second device such as the first feature information and the existing training samples of the first device, or the difference/deviation degree from the training samples of the previous distributed training, etc.
  • the first message also includes one or more of the following information: quality requirements for data corresponding to the second device, environmental requirements for the second device, minimum computing power requirements for the second device, minimum communication capability requirements for the second device, first characteristic information requirements for data corresponding to the second device, and second characteristic information requirements for data corresponding to the second device.
  • the first device can also obtain information such as the minimum computing power and minimum communication capability requirements of the second device, so as to determine whether the second device meets the training requirements and whether to participate in the current distributed training. This is conducive to ensuring that the distributed training can obtain updated results within the specified time, meet the latency requirements, and ensure the completeness, accuracy, and contribution of the distributed training to the large model.
  • the first indication information includes confirmation indication information or rejection indication information.
  • the confirmation indication information is used to indicate one or more of the following information: the quality information of the data corresponding to the second device meets the quality requirements, the environmental information of the second device meets the environmental requirements, the computing power of the second device meets the minimum computing power requirements, the communication capability information of the second device meets the minimum communication capability requirements, the first feature information of the data corresponding to the second device meets the first feature information requirements, and the second feature information of the data corresponding to the second device meets the second feature information requirements.
  • the rejection indication information is used to indicate one or more of the following information: the quality information of the data corresponding to the second device does not meet the quality requirements, the environmental information of the second device does not meet the environmental requirements, the computing power of the second device does not meet the minimum computing power requirements, the communication capability information of the second device does not meet the minimum communication capability requirements, the first feature information of the data corresponding to the second device does not meet the first feature information requirements, and the second feature information of the data corresponding to the second device does not meet the second feature information requirements.
  • the first device schedules all or part of the second devices whose first indication information is confirmation indication information to participate in distributed training; or, the first device schedules all or part of the second devices whose first indication information is not rejection indication information or which have not fed back rejection indication information to participate in distributed training.
  • the first device can receive confirmation indication information or rejection indication information fed back by the second device and perform scheduling, which is beneficial to reduce the communication and computing overhead of sub-nodes, ensure the completeness and accuracy of distributed training, and contribute to the large model.
  • the types of the training state information, the data state information, and the first indication information are different.
  • a first device schedules a second device to participate in distributed training according to multiple second messages
  • the multiple second messages include at least two types of information
  • the second device to participate in the distributed training is scheduled according to the various types of information included in the multiple second messages and the weights of the various types of information.
  • the first device can schedule the corresponding second device based on the information and weights to ensure the completeness, accuracy, and contribution to the large model of the distributed training.
  • the present application provides a communication method, which is performed by a second device.
  • the first device It can be a sub-node of the training network (such as a base station, terminal, etc.), or a component of a sub-node (such as a processor, chip, or chip system, etc.), or a logical module that can realize all or part of the functions of a sub-node.
  • the second device receives a first message, and the first message is used to request the second device to perform distributed training.
  • the second device sends a second message, and the second message includes one or more of the following information: training status information of the second device, data status information of the second device, and first indication information indicating whether the second device meets the training requirements.
  • the second device participates in distributed training according to the scheduling of the first device.
  • the second device can feed back training and/or data-related information to the first device, which is beneficial for the first device to better plan and schedule sub-nodes for distributed training and to improve the efficiency of distributed training.
  • the first message further includes one or more of the following information: full model parameters trained at the current moment, sub-model parameters of the corresponding second device trained at the current moment, and training round information.
  • the training round information includes a first training number for requesting the second device to perform a complete training or a parameter update on all training samples, and/or a second training number for requesting the second device to perform a complete training or a parameter update on part of the training samples.
  • the first message received by the second device includes training-related parameters, so that the second device can feed back corresponding training data, which is beneficial to improving the efficiency of distributed training.
  • the training status information of the second device corresponding to the second message includes convergence information obtained by the second device after training and updating the first training times and/or the second training times based on the training model at the current moment and the corresponding data; the convergence information includes cross-validation accuracy or cross-validation test error.
  • the second device can feedback training status information (such as training model, convergence information obtained from training, etc.) to the first device, which is beneficial for the first device to schedule the corresponding second device to participate in distributed training according to the training status information (such as giving priority to scheduling the second device with lower cross-validation accuracy, the data of the second device has a greater demand for model updating, and the previous model may not have been traversed to the second device).
  • training status information such as training model, convergence information obtained from training, etc.
  • the data status information includes one or more of the following information: quality information of the data corresponding to the second device, environmental information of the second device, first feature information of the data corresponding to the second device, second feature information of the data corresponding to the second device, computing power information of the second device, and communication capability information of the second device.
  • the quality information of the data corresponding to the second device includes one or more of the following information: the signal-to-noise ratio of the data corresponding to the second device, the signal-to-interference-to-noise ratio of the data corresponding to the second device, and the number of samples of the data corresponding to the second device.
  • the environmental information of the second device includes the location information and/or environmental parameters of the second device.
  • the first feature information of the data corresponding to the second device includes the first feature distribution of the data corresponding to the second device.
  • the second feature information of the data corresponding to the second device includes the second feature distribution obtained by feature extraction for the data corresponding to the second device.
  • the computing power information of the second device includes one or more of the following information: the calculation delay of the second device, the number of operations that can be performed per unit time, and the number of multiplications/additions that can be performed per unit time.
  • the communication capability information of the second device includes one or more of the following information: the communication delay of the second device, the channel quality indication CQI corresponding to the second device, the rank indication RI of the channel corresponding to the second device, and the reference signal received power RSRP of the channel corresponding to the second device.
  • the second device can feedback data status information to the first device (such as based on the first feature information and the existing training samples of the first device, or the difference/deviation degree from the training samples of the previous distributed training, etc.), which is conducive to the first device scheduling the corresponding second device to participate in the distributed training (such as giving priority to scheduling the second device whose training samples are more different/deviation from the previous distributed training, the data of the second device has a greater demand for model updating, and the previous model may not have traversed to the second device).
  • the first device such as based on the first feature information and the existing training samples of the first device, or the difference/deviation degree from the training samples of the previous distributed training, etc.
  • the first message also includes one or more of the following information: quality requirements for data corresponding to the second device, environmental requirements for the second device, minimum computing power requirements for the second device, minimum communication capability requirements for the second device, first characteristic information requirements for data corresponding to the second device, and second characteristic information requirements for data corresponding to the second device.
  • the second device can report the minimum computing power, minimum communication capability requirements and other information to the first device in advance, which is beneficial for the first device to determine whether the second device meets the training requirements and whether to participate in the current distributed training. It is also beneficial to ensure that the distributed training can obtain updated results within the specified time, meet the latency requirements, and ensure the completeness, accuracy, and contribution to the large model of the distributed training.
  • the first indication information includes confirmation indication information or rejection indication information.
  • the confirmation indication information is used to indicate one or more of the following information: the quality information of the data corresponding to the second device meets the quality requirements, the environmental information of the second device meets the environmental requirements, the computing power of the second device meets the minimum computing power requirements, the communication capability information of the second device meets the minimum communication capability requirements, the first feature information of the data corresponding to the second device meets the first feature information requirements, and the second feature information of the data corresponding to the second device meets the second feature information requirements.
  • the rejection indication information is used to indicate one or more of the following information: the quality information of the data corresponding to the second device does not meet the quality requirements, the environmental information of the second device does not meet the environmental requirements, the computing power of the second device does not meet the minimum computing power requirements, the communication capability information of the second device does not meet the minimum communication capability requirements, the first feature information of the data corresponding to the second device does not meet the first feature information requirements, and the second feature information of the data corresponding to the second device does not meet the second feature information requirements.
  • the second device can determine whether to participate in the training based on the training request and feedback confirmation indication information or rejection indication information, which is beneficial to reducing the communication and computing overhead of sub-nodes and ensuring the completeness, accuracy, and contribution to the large model of distributed training.
  • the present application provides a communication device.
  • the communication device may be a central node of a training network, or a device of a central node, or a device that can be used in conjunction with a central node.
  • the communication device may include a functional module, which may be a hardware circuit, or software, or a combination of a hardware circuit and software.
  • the communication device includes a communication unit and a processing unit.
  • the communication unit is used to send multiple first messages, and the multiple first messages are used to request multiple second devices to perform distributed training.
  • the communication unit is also used to receive multiple second messages, and any second message among the multiple second messages includes one or more of the following information: training status information of the second device corresponding to the second message, data status information of the second device corresponding to the second message, and first indication information indicating whether the second device corresponding to the second message meets the training requirements.
  • the processing unit is used to schedule the second devices participating in the distributed training according to the multiple second messages.
  • the first message includes second indication information
  • the second indication information is used to indicate that the second device corresponding to the request should perform distributed training.
  • the first message further includes one or more of the following information: full model parameters trained at the current moment, sub-model parameters of the corresponding second device trained at the current moment, and training round information.
  • the training round information includes a first training number for requesting the second device to perform a complete training or a parameter update on all training samples, and/or a second training number for requesting the second device to perform a complete training or a parameter update on part of the training samples.
  • the training status information of the second device corresponding to the second message includes convergence information obtained by the second device after training and updating the first training times and/or the second training times based on the training model at the current moment and the corresponding data; the convergence information includes cross-validation accuracy or cross-validation test error.
  • the processing unit is used to schedule the second device participating in the distributed training according to the plurality of second messages, including: scheduling the corresponding second device to participate in the distributed training according to the training state information.
  • the first device may preferentially schedule the second device with lower cross-validation accuracy to participate in the distributed training.
  • the data status information includes one or more of the following information: quality information of the data corresponding to the second device, environmental information of the second device, first feature information of the data corresponding to the second device, second feature information of the data corresponding to the second device, computing power information of the second device, and communication capability information of the second device.
  • the quality information of the data corresponding to the second device includes one or more of the following information: the signal-to-noise ratio of the data corresponding to the second device, the signal-to-interference-to-noise ratio of the data corresponding to the second device, and the number of samples of the data corresponding to the second device.
  • the environmental information of the second device includes the location information and/or environmental parameters of the second device.
  • the first feature information of the data corresponding to the second device includes the first feature distribution of the data corresponding to the second device.
  • the second feature information of the data corresponding to the second device includes the second feature distribution obtained by feature extraction for the data corresponding to the second device.
  • the computing power information of the second device includes one or more of the following information: the calculation delay of the second device, the number of operations that can be performed per unit time, and the number of multiplications/additions that can be performed per unit time.
  • the communication capability information of the second device includes one or more of the following information: the communication delay of the second device, the channel quality indication CQI corresponding to the second device, the rank indication RI of the channel corresponding to the second device, and the reference signal received power RSRP of the channel corresponding to the second device.
  • the processing unit is used to schedule the second device participating in the distributed training according to the plurality of second messages, including: scheduling the corresponding second device to participate in the distributed training according to the data status information.
  • the first device may preferentially schedule the second device having a larger difference/deviation from the training sample of the previous distributed training.
  • the first message also includes one or more of the following information: quality requirements for data corresponding to the second device, environmental requirements for the second device, minimum computing power requirements for the second device, minimum communication capability requirements for the second device, first characteristic information requirements for data corresponding to the second device, and second characteristic information requirements for data corresponding to the second device.
  • the first indication information includes confirmation indication information or rejection indication information.
  • the confirmation indication information is used to indicate one or more of the following information: the quality information of the data corresponding to the second device meets the quality requirements, the environmental information of the second device meets the environmental requirements, the computing power of the second device meets the minimum computing power requirements, the communication capability information of the second device meets the minimum communication capability requirements, the first feature information of the data corresponding to the second device meets the first feature information requirements, and the second feature information of the data corresponding to the second device meets the second feature information requirements.
  • the rejection indication information is used to indicate one or more of the following information: the quality information of the data corresponding to the second device does not meet the quality requirements, the environmental information of the second device does not meet the environmental requirements, the computing power of the second device does not meet the minimum computing power requirements, the communication capability information of the second device does not meet the minimum communication capability requirements, the first feature information of the data corresponding to the second device does not meet the first feature information requirements, and the second feature information of the data corresponding to the second device does not meet the second feature information requirements.
  • the processing unit is used to schedule second devices participating in distributed training based on multiple second messages, including: scheduling all or part of the second devices whose first indication information is confirmation indication information to participate in distributed training; or scheduling all or part of the second devices whose first indication information is not rejection indication information or which have not fed back rejection indication information to participate in distributed training.
  • the types of the training state information, the data state information, and the first indication information are different.
  • processing unit schedules the second device participating in the distributed training according to the plurality of second messages, when the plurality of second messages include When at least two types of information are included, the processing unit schedules the second device participating in the distributed training according to the various types of information included in the multiple second messages and the weights of the various types of information.
  • the present application provides a communication device.
  • the communication device may be a central node of a training network, or a device of a central node, or a device that can be used in conjunction with a central node.
  • the communication device may include a functional module, which may be a hardware circuit, or software, or a combination of a hardware circuit and software.
  • the communication device includes a communication unit and a processing unit.
  • the communication unit receives a first message, the first message being used to request the second device to perform distributed training.
  • the communication unit is also used to send a second message, the second message including one or more of the following information: training status information of the second device, data status information of the second device, and first indication information indicating whether the second device meets the training requirements.
  • the processing unit is used to participate in the distributed training according to the scheduling of the first device.
  • the first message further includes one or more of the following information: full model parameters trained at the current moment, sub-model parameters of the corresponding second device trained at the current moment, and training round information.
  • the training round information includes a first training number for requesting the second device to perform a complete training or a parameter update on all training samples, and/or a second training number for requesting the second device to perform a complete training or a parameter update on part of the training samples.
  • the training status information of the second device corresponding to the second message includes convergence information obtained by the second device after training and updating the first training times and/or the second training times based on the training model at the current moment and the corresponding data; the convergence information includes cross-validation accuracy or cross-validation test error.
  • the data status information includes one or more of the following information: quality information of the data corresponding to the second device, environmental information of the second device, first feature information of the data corresponding to the second device, second feature information of the data corresponding to the second device, computing power information of the second device, and communication capability information of the second device.
  • the quality information of the data corresponding to the second device includes one or more of the following information: the signal-to-noise ratio of the data corresponding to the second device, the signal-to-interference-to-noise ratio of the data corresponding to the second device, and the number of samples of the data corresponding to the second device.
  • the environmental information of the second device includes the location information and/or environmental parameters of the second device.
  • the first feature information of the data corresponding to the second device includes the first feature distribution of the data corresponding to the second device.
  • the second feature information of the data corresponding to the second device includes the second feature distribution obtained by feature extraction for the data corresponding to the second device.
  • the computing power information of the second device includes one or more of the following information: the calculation delay of the second device, the number of operations that can be performed per unit time, and the number of multiplications/additions that can be performed per unit time.
  • the communication capability information of the second device includes one or more of the following information: the communication delay of the second device, the channel quality indication CQI corresponding to the second device, the rank indication RI of the channel corresponding to the second device, and the reference signal received power RSRP of the channel corresponding to the second device.
  • the first message also includes one or more of the following information: quality requirements for data corresponding to the second device, environmental requirements for the second device, minimum computing power requirements for the second device, minimum communication capability requirements for the second device, first characteristic information requirements for data corresponding to the second device, and second characteristic information requirements for data corresponding to the second device.
  • the first indication information includes confirmation indication information or rejection indication information.
  • the confirmation indication information is used to indicate one or more of the following information: the quality information of the data corresponding to the second device meets the quality requirements, the environmental information of the second device meets the environmental requirements, the computing power of the second device meets the minimum computing power requirements, the communication capability information of the second device meets the minimum communication capability requirements, the first feature information of the data corresponding to the second device meets the first feature information requirements, and the second feature information of the data corresponding to the second device meets the second feature information requirements.
  • the rejection indication information is used to indicate one or more of the following information: the quality information of the data corresponding to the second device does not meet the quality requirements, the environmental information of the second device does not meet the environmental requirements, the computing power of the second device does not meet the minimum computing power requirements, the communication capability information of the second device does not meet the minimum communication capability requirements, the first feature information of the data corresponding to the second device does not meet the first feature information requirements, and the second feature information of the data corresponding to the second device does not meet the second feature information requirements.
  • the present application provides a communication device, comprising: a processor, configured to execute instructions; optionally, the communication device further comprises a memory, the memory being configured to store the instructions, and when the instructions are executed by the processor, the communication device implements the method in the first aspect and any possible implementation of the first aspect.
  • the processor and the memory are coupled.
  • the present application provides a communication device, comprising: a processor, configured to execute instructions; optionally, the communication device further comprises a memory, the memory being configured to store the instructions, and when the instructions are executed by the processor, the communication device implements the method in the second aspect and any possible implementation of the second aspect.
  • the processor and the memory are coupled.
  • the present application provides a communication system, which includes multiple devices or equipment in the above-mentioned third to sixth aspects, so that the devices or equipment execute the first aspect and the second aspect, and the method in any possible implementation of the first aspect and the second aspect.
  • the present application provides a computer-readable storage medium storing instructions, which, when executed on a computer, enables the computer to execute the method in the first aspect and the second aspect, as well as any possible implementation of the first aspect and the second aspect.
  • the present application provides a chip system, the chip system comprising a processor and an interface, and optionally, a memory.
  • the chip system may be composed of a chip, or may include a chip and other discrete devices.
  • the present application provides a computer program product, comprising instructions, which, when executed on a computer, enable the computer to execute the method in the first aspect and the second aspect, as well as any possible implementation of the first aspect and the second aspect.
  • FIG1 is a schematic diagram of a communication system provided by the present application.
  • FIG2 is a flow chart of a communication method provided by the present application.
  • FIG3 is a schematic diagram of a process of a first device scheduling a corresponding second device to perform distributed training according to training state information provided by the present application;
  • FIG4 is a schematic diagram of a process of a first device scheduling a corresponding second device to perform distributed training according to data status information provided by the present application;
  • FIG5 is a schematic diagram of a process of a first device scheduling a corresponding second device to perform distributed training according to first indication information provided by the present application;
  • FIG6 is a schematic diagram of a communication device provided by the present application.
  • FIG. 7 is a schematic diagram of another communication device provided in the present application.
  • “/" can indicate that the objects associated before and after are in an "or” relationship, for example, A/B can indicate A or B; “and/or” can be used to describe that there are three relationships between the associated objects, for example, A and/or B can indicate: A exists alone, A and B exist at the same time, and B exists alone, where A and B can be singular or plural.
  • the words “first” and “second” can be used to distinguish between technical features with the same or similar functions. The words “first” and “second” do not limit the quantity and execution order, and the words “first” and “second” do not necessarily limit the difference.
  • the present application provides a communication method and a communication device, which can selectively schedule sub-nodes for distributed training, which is conducive to improving the efficiency of distributed training.
  • the communication method provided in the present application can be applied to the communication system shown in Figure 1, for example, the communication system includes a central node and a sub-node.
  • the central node can be an access network device (including a base station, a macro base station, a micro base station, etc.), a terminal device, etc.
  • the sub-node can be an access network device (including a base station, a macro base station, a micro base station, etc.), a terminal device, etc.
  • the communication method is applicable to 4G, 5G or future mobile communication systems.
  • the communication method is applicable to low-frequency scenarios (sub 6G) and also to high-frequency scenarios (above 6G).
  • the communication method is applicable to single transmit receive point (Single-TRP) or multiple transmit receive point (Multi-TRP) scenarios, as well as any of their derived scenarios.
  • the communication method is applicable to new radio (NR) uplink transmission.
  • NR new radio
  • terminal equipment also known as user equipment (UE), mobile station (MS), mobile terminal (MT), etc.
  • UE user equipment
  • MS mobile station
  • MT mobile terminal
  • terminal devices refers to equipment that provides voice and/or data connectivity to users.
  • handheld devices with wireless connection functions vehicle-mounted devices, etc.
  • some examples of terminal devices are: mobile phones, tablet computers, laptops, PDAs, mobile internet devices (MID), wearable devices, drones, virtual reality (VR) devices, augmented reality (AR) devices, wireless terminals in industrial control, wireless terminals in self driving, wireless terminals in remote medical surgery, wireless terminals in smart grids, wireless terminals in transportation safety, wireless terminals in smart cities, wireless terminals in smart homes, terminal devices in 5G networks, terminal devices in future evolved PLMN networks or terminal devices in future communication systems, etc.
  • access network equipment refers to the radio access network (RAN) node (or equipment) that connects the terminal device to the wireless network, which can also be called a base station.
  • RAN nodes are: evolved Node B (gNB), transmission reception point (TRP), evolved Node B (eNB), radio network controller (RNC), Node B (NB), base station controller (BSC), base transceiver station (BTS), etc.
  • the invention relates to a wireless transceiver station (BTS), a home base station (e.g., home evolved NodeB, or home Node B, HNB), a base band unit (BBU), or a wireless fidelity (Wifi) access point (AP), a satellite in a satellite communication system, a wireless controller in a cloud radio access network (CRAN) scenario, a wearable device, a drone, or a device in an Internet of Vehicles (e.g., vehicle to everything (V2X)), or a communication device in device to device (D2D) communication.
  • BTS wireless transceiver station
  • HNB home evolved NodeB
  • BBU base band unit
  • AP wireless fidelity access point
  • satellite in a satellite communication system e.g., a satellite communication system
  • CRAN cloud radio access network
  • a wearable device e.g., vehicle to everything (V2X)
  • D2D communication device in device to device
  • the access network device may include a centralized unit (CU) node, a distributed unit (DU) node, or a RAN device including a CU node and a DU node.
  • the RAN device including the CU node and the DU node splits the protocol layer of the eNB in the long term evolution (LTE) system, places the functions of some protocol layers in the CU for centralized control, and distributes the functions of the remaining part or all of the protocol layers in the DU, and the DU is centrally controlled by the CU.
  • LTE long term evolution
  • CU can also be divided into CU-control plane (CP) and CU-user plane (UP), etc.
  • CP CU-control plane
  • UP CU-user plane
  • the access network device can also be an antenna unit (radio unit, RU), etc.
  • the access network device can also be an open radio access network (ORAN) architecture, etc., and the present application does not limit the specific type of access network device.
  • ORAN open radio access network
  • the access network device shown in the embodiment of the present application can be an access network device in ORAN, or a module in the access network device, etc.
  • CU can also be called open (open, O)-CU
  • DU can also be called O-DU
  • CU-DU can also be called O-CU-DU
  • CU-UP can also be called O-CU-UP
  • RU can also be called O-RU.
  • Sending and “receiving” in the embodiments of the present application indicate the direction of signal transmission.
  • sending configuration information to a terminal device can be understood as the destination end of the configuration information being the terminal device, which can include direct sending through the air interface, and also includes indirect sending through the air interface by other units or modules.
  • Receiviving configuration information from a network device can be understood as the source end of the configuration information being the network device, which can include directly receiving from the network device through the air interface, and also includes indirectly receiving from the network device through the air interface from other units or modules.
  • Send can also be understood as the "output" of the chip interface, and “receiving” can also be understood as the "input” of the chip interface.
  • sending and receiving can be performed between devices, for example, between a network device and a terminal device, or can be performed within a device, for example, sending or receiving between components, modules, chips, software modules, or hardware modules within the device through a bus, wiring, or interface.
  • information may be processed between the source and destination of information transmission, such as coding, modulation, etc., but the destination can understand the valid information from the source. Similar expressions in this application can be understood similarly and will not be repeated.
  • indication may include direct indication and indirect indication, and may also include explicit indication and implicit indication.
  • the information indicated by a certain information is called information to be indicated.
  • information to be indicated In the specific implementation process, there are many ways to indicate the information to be indicated, such as but not limited to, directly indicating the information to be indicated, such as the information to be indicated itself or the index of the information to be indicated.
  • the information to be indicated may also be indirectly indicated by indicating other information, wherein there is an association between the other information and the information to be indicated; it may also be possible to indicate only a part of the information to be indicated, while the other part of the information to be indicated is known or agreed in advance, for example, the indication of specific information can be realized by means of the arrangement order of each information agreed in advance (such as predefined by the protocol), thereby reducing the indication overhead to a certain extent.
  • the present application does not limit the specific method of indication. It is understandable that, for the sender of the indication information, the indication information can be used to indicate the information to be indicated, and for the receiver of the indication information, the indication information can be used to determine the information to be indicated.
  • Machine learning learning models or rules from raw data. There are many different machine learning methods, such as neural networks, decision trees, support vector machines, etc.
  • AI model A function model that maps input of a certain dimension to output of a certain dimension, and its model parameters are obtained through machine learning training.
  • a and b correspond to the parameters of the model, which can be obtained through machine learning training.
  • Neural network For example, artificial neural network, which is a mathematical model that imitates the behavioral characteristics of animal neural networks and performs distributed parallel information processing. It is a special form of AI model.
  • Dataset Data used for model training, verification, and testing in machine learning. The quantity and quality of data will affect the effectiveness of machine learning.
  • Model training By selecting a suitable loss function and applying an optimization algorithm to train the model parameters, the loss function value is minimized.
  • Loss function used to measure the difference between the model's predicted value and the true value.
  • Model testing After training, apply test data to evaluate model performance.
  • Model application Apply the trained model to solve practical problems.
  • Supervised learning Based on the collected sample values and sample labels, the machine learning algorithm is applied to learn the mapping relationship from sample values to sample labels, and the learned mapping relationship is expressed by the machine learning model.
  • the process of training the machine learning model is the process of learning this mapping relationship. For example, in signal detection, the noisy received signal is the sample, and the real constellation point corresponding to the signal is the label. Machine learning expects to learn the mapping relationship between the sample and the label through training, that is, to make the machine learning model learn a signal detector. During training, the model parameters are optimized by calculating the error between the model's predicted value and the real label. Once the mapping relationship is learned, the learned mapping can be applied to predict the sample label of each new sample.
  • the mapping relationship learned by supervised learning can include linear mapping and nonlinear mapping. According to the type of label, the learning task can be divided into classification task and regression task.
  • Federated learning is a machine learning framework, and its original intention is to effectively help multiple institutions use data and conduct machine learning modeling while meeting the requirements of user privacy protection and data security.
  • the system architecture of classic federated learning can include a central node and multiple distributed nodes. Multiple distributed nodes are all connected to the central node in communication, and each distributed node includes a distributed data set about the application environment of the distributed node.
  • federated learning can effectively solve the problem of data islands, allowing participants to jointly model without sharing data, and can technically break data islands and realize AI collaboration.
  • federated learning can be divided into three categories: horizontal federated learning, vertical federated learning, and federated transfer learning.
  • horizontal federated learning refers to the case where the user features of two data sets overlap a lot but the users overlap less, the data sets are split horizontally (i.e., user dimension), and the data with the same user features but different users are taken out for training.
  • Vertical federated learning refers to the case where the users of two data sets overlap a lot but the user features overlap less, the data sets are split vertically (i.e., feature dimension), and the data with the same users but different user features are taken out for training.
  • Federated transfer learning refers to the case where the users and user features of two data sets overlap less, the data is not split, and transfer learning can be applied to overcome the situation of insufficient data or labels.
  • Centralized AI model training refers to the use of a large amount of information/data collected from multiple sub-nodes and stored in a central node to conduct centralized model training, so that the trained model can achieve good classification/inference results.
  • the AI model obtained by centralized training has good generalization ability and performance due to the characteristics of extensive data.
  • the centralized AI model training method also faces many problems:
  • Distributed AI model training In distributed AI model training, the workload for training the model can be split and shared among multiple microprocessors, which are called subnodes. These subnodes can work in parallel to accelerate model training. Federated learning is a typical distributed AI model training framework.
  • the distributed AI model training does not need to collect and aggregate data from multiple sub-nodes to the central node for centralized training. Instead, each sub-node is first trained locally based on local data, and then the intermediate results (such as model parameters or gradients, etc.) obtained in the training are transmitted between the sub-nodes and the central node. Finally, the intermediate results obtained are fused at the central node to complete the training and update of the entire model.
  • the distributed AI model training can effectively reduce the air interface overhead of the sub-node to the central node because only the intermediate results obtained in the training are transmitted by each sub-node (or part of the sub-node).
  • the distributed AI model can effectively approach the performance of the centralized AI model by fully learning the local data characteristics of each sub-node. Therefore, in the wireless communication system, the research and development of distributed AI model training technology is an important technical means to effectively enable network intelligence.
  • the central node cannot obtain the training results of all sub-nodes in each iteration; and in the process of fusion processing of the intermediate results, the training results of some sub-nodes may be unsatisfactory, resulting in poor data quality, which ultimately affects the accuracy of model training.
  • Batch size Indicates the number of samples involved in training in one training.
  • Epoch In the AI training process, an epoch represents a complete training round for the samples indicated by batch-size. Generally speaking, it takes at least one epoch to complete the training.
  • FIG2 is a flow chart of a communication method provided by the present application.
  • the communication method is applied to the communication system shown in FIG1.
  • the communication method can be implemented by interaction between a first device (such as a central node) and a second device (such as a subnode), including the following steps:
  • a first device sends multiple first messages to multiple second devices.
  • the multiple second devices receive the first messages respectively.
  • multiple first messages are used to request multiple second devices to perform distributed training.
  • the distributed training network includes a central node and multiple sub-nodes (such as sub-node 1 and sub-node 2)
  • the central node can send a first message to each of the multiple sub-nodes, and each first message is used to request the corresponding sub-node to perform distributed training.
  • the first message is a distributed training request message.
  • the first message includes second indication information, and the second indication information is used to indicate that the second device corresponding to the request is to perform distributed training.
  • the central node sends a distributed training request message 1 to the child node 1, and the distributed training request message 1 includes the second indication information, which is used to indicate that the child node 1 is requested to perform distributed training.
  • the first message may also carry training requirements.
  • each second device may correspond to different training requirements, and the training requirements of each second device may include one or more of the following information: quality requirements of data corresponding to the second device, environmental requirements of the second device, minimum computing power requirements of the second device, minimum communication capability requirements of the second device, first feature information requirements of data corresponding to the second device, second feature information requirements of data corresponding to the second device, etc.
  • the plurality of second devices send a plurality of second messages to the first device.
  • the first device receives the plurality of second messages from the plurality of second devices.
  • any second message among the multiple second messages includes one or more of the following information: training status information of the second device corresponding to the second message, data status information of the second device corresponding to the second message, and first indication information indicating whether the second device corresponding to the second message meets the training requirements.
  • the training state information of the second device corresponding to the second message refers to the information obtained by the second device through preliminary training, including but not limited to the following information: the convergence information obtained by the second device through the first training number of training updates according to the training model at the current moment and the corresponding data, the convergence information obtained by the second device through the second training number of training updates according to the training model at the current moment and the corresponding data, etc.
  • the convergence information includes cross-validation accuracy or cross-validation test error, etc.
  • the first training number refers to the epoch of a complete training or a parameter update for all training samples, or the first training number is a preset fixed number
  • the second training number refers to the iteration number of a complete training or a parameter update for part of the training samples, or the second training number is a preset fixed number.
  • the training state information fed back by subnode 1 may include the cross-validation accuracy/cross-validation test error obtained by subnode 1 through the first training number of training updates according to the training model at the current moment and the corresponding data, and the cross-validation accuracy/cross-validation test error obtained by subnode 1 through the second training number of training updates according to the training model at the current moment and the corresponding data.
  • the data status information of the second device corresponding to the second message refers to the current data status of the second device, that is, the second device directly feeds back the current data status without training, including one or more of the following information: quality information of the data corresponding to the second device, environmental information of the second device, first feature information of the data corresponding to the second device, second feature information of the data corresponding to the second device, computing power information of the second device, and communication capability information of the second device.
  • the quality information of the data corresponding to the second device includes one or more of the following information: the signal-to-noise ratio (SNR) of the data corresponding to the second device, the signal-to-interference plus noise ratio (SINR) of the data corresponding to the second device, the number of samples of the data corresponding to the second device, etc.
  • SNR signal-to-noise ratio
  • SINR signal-to-interference plus noise ratio
  • the quality information of the data fed back by subnode 1 may include the SNR/SINR of the data of subnode 1, the number of sample points/batch number/batch-size of the data, etc.
  • the environmental information of the second device includes the location information and/or environmental parameters of the second device.
  • the location information of the second device may be the geographical location information (such as three-dimensional coordinates) and location identifier of the subnode
  • the environmental parameter of the second device may be a channel state information (CSI) environmental map, indicating which specific environment the second device is in (such as an urban environment or a suburban environment, etc.).
  • CSI channel state information
  • the first characteristic information of the data corresponding to the second device includes a first characteristic distribution of the data corresponding to the second device.
  • the first characteristic distribution may be a delay distribution, a beam distribution, a K-factor, etc.
  • the delay distribution may be represented by a delay spread of a channel corresponding to the child node.
  • the delay spread (DS) is an indicator of multipath information/frequency selectivity of the feedback channel.
  • the second feature information of the data corresponding to the second device includes a second feature distribution obtained by extracting features from the data corresponding to the second device.
  • the second device may extract features from the channel data, thereby converting the channel data into parameters of other domains (eg, binary domains).
  • the computing power information of the second device includes one or more of the following information: computing latency of the second device, the number of operations that can be performed per unit time, the number of multiplications/additions that can be performed per unit time, etc.
  • a child node can determine the computing latency of the child node based on the number of operations that can be performed per unit time, the amount of data, and the model scale.
  • the communication capability information of the second device includes one or more of the following information: the communication delay of the second device, the channel corresponding to the second device Channel quality indicator (CQI), rank indication (RI), reference signal received power (RSRP).
  • CQI Channel quality indicator
  • RI rank indication
  • RSRP reference signal received power
  • the subnode can obtain the CQI information, RI information, RSRP information, etc. of the subnode according to the downlink reference signal.
  • the computing power information and communication capability information of the second device can ensure that the central node obtains the required model update within the specified time limit.
  • the first indication information is used to indicate whether the second device corresponding to the second message meets the training requirements.
  • the child node can evaluate whether the training requirements are met based on the distributed training request message and the data of the child node. If it is met, the first indication information fed back by the child node can be confirmation indication information (such as confirmation (acknowledge, ACK)), indicating that the child node meets the training requirements; if it is not met, the first indication information fed back by the child node can be rejection indication information (such as non-acknowledge (NACK)), indicating that the child node does not meet the training requirements.
  • confirmation indication information such as confirmation (acknowledge, ACK)
  • NACK non-acknowledge
  • the first device schedules the second device participating in the distributed training according to the multiple second messages.
  • the first device can schedule the second device to participate in the distributed training based on different types of information.
  • the types of the training status information of the second device, the data status information of the second device, and the first indication information of the second device are different; when the second message includes the training status information of the second device (such as the cross-validation accuracy/cross-validation test error obtained by the second device after the first training number of training updates based on the training model at the current moment and the corresponding data), the first device can prioritize the scheduling of the second device with lower cross-validation accuracy/larger cross-validation test error to participate in the distributed training based on the cross-validation accuracy/cross-validation test error.
  • the first device can comprehensively consider these two types of information to schedule the corresponding second device to participate in distributed training. For example, when multiple second messages include at least two types of information, the first device can schedule the second device to participate in distributed training based on the various types of information included in the multiple second messages and the weights of the various types of information. Specifically, the central node can schedule the corresponding sub-node to participate in distributed training based on the training status information and the first weight of the training status information, and the data status information and the second weight of the data status information.
  • the first device can better plan and schedule the sub-nodes to perform distributed training, which is conducive to improving the efficiency of distributed training.
  • the following example takes a distributed training network scenario as an example, assuming that the distributed training network scenario includes a first device (such as a central node) and multiple second devices (such as sub-nodes 1 to X).
  • a first device such as a central node
  • second devices such as sub-nodes 1 to X.
  • Example 1 The first device makes a scheduling decision based on the training status information fed back by the second device.
  • FIG3 is a schematic diagram of a process provided by the present application in which a first device schedules a corresponding second device to perform distributed training according to training state information, including the following steps:
  • a first device sends multiple distributed training request messages to multiple second devices.
  • any one of the multiple distributed training request messages includes but is not limited to the following information: second indication information indicating that the corresponding second device is requested to perform distributed training, full model parameters that have been trained at the current moment, sub-model parameters of the corresponding second device that have been trained at the current moment, training round information, etc.
  • the training round information includes the first training number and/or the second training number.
  • the first training number refers to the epoch in which all training samples are fully trained or the parameter is updated once, or the first training number is a preset fixed number;
  • the second training number refers to the iteration number in which part of the training samples are fully trained or the parameter is updated once, or the second training number is a preset fixed number.
  • the central node sends a distributed training request message to the child node X
  • the distributed training request message includes: the second indication information indicating the request for the child node X to perform distributed training, the full model parameters trained at the current moment, the sub-model parameters of the child node X trained at the current moment, the first training number of P epochs for the child node X to be requested, and the second training number of P iterations for the child node X to be requested.
  • the first device may send multiple distributed training request messages to multiple second devices by unicast (such as a central node sending point-to-point to each child node); or may send by broadcast/multicast, thereby reducing transmission overhead (compared to unicast, transmission overhead can be reduced).
  • the distributed training request message may also be delivered in the form of a bitmap.
  • the first device may broadcast an X-bit message, in which the Nth bit is 1, indicating that a distributed training request is issued to child node N, and 0, indicating that a distributed training request is not issued to child node N.
  • multiple second devices perform preliminary training according to their respective distributed training request messages and their respective data to obtain their respective training status information.
  • subnode X performs preliminary training based on the distributed training request message (e.g., including the full model parameters trained at the current moment, the sub-model parameters of subnode X trained at the current moment, etc.) and the data at subnode X to obtain the training parameters of subnode X.
  • Status information for example, including the cross-validation accuracy or cross-validation test error of child node X for P epochs, the cross-validation accuracy or cross-validation test error of child node X for P iterations, etc.).
  • S203 The multiple second devices send training status information of the multiple second devices to the first device.
  • subnodes 1 to X respectively send their own training state information (such as including training state information 1 to X) to the central node.
  • the first device determines a second device that participates in the distributed training according to the training status information of the plurality of second devices.
  • the central node evaluates the priority order of the cross-validation test errors of sub-nodes 1 to X (for example, a larger cross-validation test error indicates that the data of the sub-node has a greater demand for model updating and has a higher priority for participating in scheduling; a smaller cross-validation test error indicates that the priority for participating in scheduling is lower) to determine the sub-nodes participating in distributed training.
  • S205 The first device sends a scheduling request message to the second device participating in the distributed training.
  • the central node may send a scheduling request message to the sub-nodes participating in the distributed training in a point-to-point manner, where the scheduling request message is used to request the corresponding sub-nodes to update the model and information.
  • the central node can broadcast/multicast a scheduling request message to all child nodes and transmit the scheduling information in the form of a bitmap; wherein 0 in the bitmap indicates that the corresponding child node does not participate in this distributed training, and 1 indicates that the corresponding child node participates in this distributed training.
  • Example 2 The first device makes a scheduling decision based on the data status information fed back by the second device.
  • FIG4 is a flow chart of a first device provided in the present application scheduling a corresponding second device to perform distributed training according to data status information, including the following steps:
  • a first device sends multiple distributed training request messages to multiple second devices.
  • the central node sends a distributed training request message to the child node X, and the distributed training request message includes: second indication information indicating that the child node X is requested to perform distributed training.
  • the second device in this embodiment may not perform training first after receiving the distributed training request message
  • the difference from the description in S201 is that the distributed training request message in this embodiment does not carry the full model parameters trained at the current moment, the sub-model parameters of the sub-node X trained at the current moment, the number of first training times for the sub-node X to be P epochs, the number of second training times for the sub-node X to be P iterations, etc. It can be understood that this embodiment reduces the delay of training compared to Example 1.
  • the first device may send multiple distributed training request messages to multiple second devices by unicast (such as a central node sending point-to-point to each child node); or may send by broadcast/multicast, thereby reducing transmission overhead (compared to unicast, transmission overhead can be reduced).
  • the distributed training request message may also be delivered in the form of a bitmap.
  • the first device may broadcast an X-bit message, in which the Nth bit is 1, indicating that a distributed training request is issued to child node N, and 0, indicating that a distributed training request is not issued to child node N.
  • the multiple second devices send data status information of the multiple second devices to the first device.
  • subnode X sends the data status information of subnode X to the central node
  • the data status information of subnode X includes one or more of the following information: quality information of the data of subnode X, environmental information of subnode X, first characteristic information of subnode X, second characteristic information of subnode X, computing power information of subnode X, communication capability information of subnode X, etc.
  • the second device may feedback specific data status information to the first device according to a predefined table or rasterized form. Assuming that the second device sets the number of bits of data status information reported to be K, the second device may divide the data status into 2k groups, each group of data status corresponding to K bits of data status information.
  • Table 1 is a table of delay spread of a channel corresponding to a subnode and corresponding data status information.
  • K 2
  • the second device can divide the delay spread of the channel corresponding to the subnode into 4 groups, and each group of delay spread corresponds to 2 bits of data status information.
  • Table 1 Delay spread of the channel corresponding to the child node and the corresponding data status information
  • Table 2 is a table of calculation delay/communication delay and corresponding data status information.
  • K 2
  • the second device can divide the calculation delay/communication delay of the subnode into 4 groups, and each group of delay extension corresponds to 2 bits of data status information.
  • the first device determines a second device that participates in the distributed training based on data status information of multiple second devices.
  • the central node evaluates the subnodes 1 to X and determines the subnodes participating in the distributed training. Assuming that the communication delay is less than 100ms to meet the training requirements, the first device can schedule according to the delay fed back by the subnodes 1 to X (such as scheduling the subnodes that meet the delay of less than 100ms). For another example, assuming that the delay spread of the channel corresponding to the subnode is greater than 200ns to meet the training requirements, the first device can schedule according to the delay spread fed back by the subnodes 1 to X (such as scheduling the subnodes that meet the delay spread greater than 200ns).
  • S304 The first device sends a scheduling request message to the second device participating in the distributed training.
  • S304 may refer to the corresponding description in S205 and will not be repeated here.
  • Example 3 The child node determines whether to perform distributed training based on the training request.
  • FIG5 is a schematic diagram of a process provided by the present application in which a first device schedules a corresponding second device to perform distributed training according to first indication information, including the following steps:
  • a first device sends multiple distributed training request messages to multiple second devices.
  • the distributed training request message includes one or more of the following information: second indication information, quality requirements of data corresponding to the second device, environmental requirements of the second device, minimum computing power requirements of the second device, minimum communication capability requirements of the second device, first characteristic information requirements of data corresponding to the second device, and second characteristic information requirements of data corresponding to the second device.
  • the quality requirements of the data corresponding to the second device are related to the quality information of the data, including but not limited to: SNR/SINR/sample number thresholds, etc.
  • the environmental requirements of the second device are related to the environmental information, including but not limited to: a specified location area and/or a specified CSI environment map, etc.
  • the minimum computing power requirements of the second device are related to the computing power information, including but not limited to a calculation delay threshold, a threshold for the number of operations that can be performed per unit time, a threshold for the number of multiplications/additions that can be performed per unit time, etc.
  • the minimum communication capability requirements of the second device are related to the communication capability information, including but not limited to: a communication delay threshold, a specified CQI, etc.
  • the first feature information requirements of the second device are related to the first feature information, including a specified first feature distribution (such as a specified delay distribution, a specified beam distribution, etc.).
  • the second feature information requirements of the second device are related to the second feature information, including a specified second feature distribution.
  • the first device sends multiple distributed training request messages to multiple second devices by unicast (such as a central node sending point-to-point to each child node); it can also be sent by broadcast/multicast, thereby reducing transmission overhead (compared to unicast, transmission overhead can be reduced).
  • unicast such as a central node sending point-to-point to each child node
  • broadcast/multicast thereby reducing transmission overhead (compared to unicast, transmission overhead can be reduced).
  • multiple second devices determine whether the training requirements are met based on their respective distributed training request messages and their respective data.
  • child node X can determine whether its own computing power information meets the requirements based on the computing power requirements/delay requirements and other information in the distributed training request message. If so, it is determined that child node X meets the training requirements. For another example, child node X can determine whether its own first feature information/second feature information meets the requirements based on the first feature information requirements/second feature information requirements in the distributed training request message. If not, it is determined that child node X does not meet the training requirements.
  • the multiple second devices send first indication information of the multiple second devices to the first device.
  • the first indication information includes confirmation indication information or rejection indication information.
  • the confirmation indication information is used to indicate one or more of the following information: the quality information of the data corresponding to the second device meets the quality requirements, the environmental information of the second device meets the environmental requirements, the computing power of the second device meets the minimum computing power requirements, the communication capability information of the second device meets the minimum communication capability requirements, the first feature information of the data corresponding to the second device meets the first feature information requirements, and the second feature information of the data corresponding to the second device meets the second feature information requirements.
  • the rejection indication information is used to indicate one or more of the following information: the quality information of the data corresponding to the second device does not meet the quality requirement, the environment information of the second device does not meet the environment requirement, the computing power of the second device does not meet the minimum computing power requirement, the communication capability information of the second device does not meet the minimum communication capability requirement, the first feature information of the data corresponding to the second device does not meet the first feature information requirement, the data corresponding to the second device
  • the second characteristic information does not meet the second characteristic information requirements.
  • the child node X can make a judgment to determine whether it meets the delay requirement. If the delay requirement is met (for example, the communication delay of the child node X is less than 100ms), the child node X can feedback ACK to the central node; if the delay requirement is not met (for example, the communication delay of the child node X is greater than 100ms), the child node X can feedback NACK to the central node.
  • the child nodes make decisions (for example, the child nodes decide whether to perform distributed training), which requires lower computing power for the central node, which is beneficial to reducing uplink and downlink overhead.
  • the first device determines a second device that participates in the distributed training according to first indication information of multiple second devices.
  • the central node schedules all or part of the sub-nodes whose first indication message is confirmation indication message to participate in distributed training; or, schedules all or part of the sub-nodes whose first indication message is not rejection indication message or who have not fed back rejection indication message to participate in distributed training.
  • S405 The first device sends a scheduling request message to the second device participating in the distributed training.
  • S405 may refer to the corresponding description in S404, which will not be repeated here.
  • the above examples 1 to 3 can be performed separately or in combination.
  • the central node can schedule the corresponding subnode to participate in the distributed training according to the training status information and the first weight of the training status information, and the data status information and the second weight of the data status information.
  • the specific scheduling method can refer to the corresponding description in the previous embodiment, which will not be repeated here.
  • the device or equipment provided by the present application may include a hardware structure and/or a software module, and the above functions are realized in the form of a hardware structure, a software module, or a hardware structure plus a software module. Whether a certain function in the above functions is executed in the form of a hardware structure, a software module, or a hardware structure plus a software module depends on the specific application and design constraints of the technical solution.
  • the division of modules in the present application is schematic, which is only a logical function division, and there may be other division methods in actual implementation.
  • each functional module in each embodiment of the present application can be integrated in a processor, or it can be physically present separately, or two or more modules can be integrated in one module.
  • the above integrated module can be implemented in the form of hardware or in the form of software functional modules.
  • FIG 6 is a schematic diagram of a communication device provided by the present application.
  • the device may include a module corresponding to the method/operation/step/action described in any of the embodiments shown in Figures 2 to 5, and the module may be a hardware circuit, or software, or a combination of a hardware circuit and software.
  • the apparatus 600 includes a communication unit 601 and a processing unit 602, and is used to implement the methods executed by the various devices in the above embodiments.
  • the device is a first device, which may be a central node of a training network (such as a base station, a terminal, etc.), or located in a central node of a training network (such as a base station, a terminal, etc.).
  • the communication unit 601 is used to send multiple first messages, and the multiple first messages are used to request multiple second devices to perform distributed training.
  • the communication unit 601 is also used to receive multiple second messages, and any second message among the multiple second messages includes one or more of the following information: training status information of the second device corresponding to the second message, data status information of the second device corresponding to the second message, and first indication information indicating whether the second device corresponding to the second message meets the training requirements.
  • the processing unit 602 is used to schedule the second devices participating in the distributed training according to the multiple second messages.
  • the specific execution process of the communication unit 601 and the processing unit 602 in this embodiment can refer to the description of the steps executed by the central node of the training network in the previous method embodiment, and the corresponding description in the content of the invention, which will not be repeated here.
  • the first device such as the central node in the training network
  • the first device can better plan and schedule the sub-nodes for distributed training, which is conducive to improving the efficiency of distributed training.
  • the device is a second device, which may be a sub-node of the training network (such as a base station, a terminal, etc.), or located in a sub-node of the training network (such as a base station, a terminal, etc.).
  • the communication unit 601 is used to receive a first message, which is used to request the second device to perform distributed training.
  • the communication unit 601 is also used to send a second message, which includes one or more of the following information: training status information of the second device, data status information of the second device, and first indication information indicating whether the second device meets the training requirements.
  • the processing unit 602 is used to participate in distributed training according to the scheduling of the first device.
  • the specific execution process of the communication unit 601 and the processing unit 602 in this embodiment can refer to the description of the steps executed by the sub-nodes of the training network in the previous method embodiment, and the corresponding description in the content of the invention, which will not be repeated here.
  • the second device can feedback training and/or data-related information to the first device, which is conducive to the first device to better plan and schedule sub-nodes for distributed training, and is conducive to improving the efficiency of distributed training.
  • FIG7 is a schematic diagram of another communication device provided by the present application, which is used to implement the communication methods in the above-mentioned method embodiments.
  • the communication device 700 includes necessary forms such as modules, units, elements, circuits, or interfaces, which are appropriately configured together to perform the methods in the present application.
  • the communication device 700 can be a functional entity such as an enabling server, an enabling client, a transmission server, an application server, etc., or it can be a component (such as a chip) in a functional entity to implement the method described in the foregoing method embodiment.
  • the communication device 700 includes one or more processors 701.
  • the processor 701 can be a general-purpose processor or a dedicated processor, etc.
  • it can be a baseband processor or a central processing unit.
  • the baseband processor can be used to process the communication protocol and communication data
  • the central processing unit can be used to control the communication device, execute the software program, and process the data of the software program.
  • the processor 701 may include a program 702 (sometimes also referred to as code or instruction), and the program 702 may be executed on the processor 701, so that the communication device 700 performs the method described in the foregoing embodiment.
  • the communication device 700 includes a circuit (not shown in FIG. 7 ), which is used to implement the functions of the functional entities such as the enabling server, enabling client, transmission server, and application server in the foregoing embodiment.
  • the communication device 700 may include one or more memories 703 on which a program 704 (sometimes also referred to as code or instruction) is stored.
  • the program 704 may be executed on the processor 701 so that the communication device 700 executes the method described in the above method embodiment.
  • the processor 701 and/or the memory 703 may include AI modules 705 and 706, and the AI modules are used to implement AI-related functions.
  • the AI module may be implemented by software, hardware, or a combination of software and hardware.
  • the AI module may include a RIC module.
  • the AI module may be a near real-time RIC or a non-real-time RIC.
  • the communication device 700 further includes a transceiver 707 and an antenna 708.
  • the transceiver 707 and the antenna 708 can implement transceiver functions, for example, communicate with other devices through a transmission medium, so that the communication device 700 can communicate with other devices.
  • the device is a first device, which may be a central node of a training network (such as a base station, a terminal, etc.), or located in a central node of a training network (such as a base station, a terminal, etc.).
  • the transceiver 707 and the antenna 708 are used to send multiple first messages, and the multiple first messages are used to request multiple second devices to perform distributed training.
  • the transceiver 707 and the antenna 708 are also used to receive multiple second messages, and any second message of the multiple second messages includes one or more of the following information: training status information of the second device corresponding to the second message, data status information of the second device corresponding to the second message, and first indication information indicating whether the second device corresponding to the second message meets the training requirements.
  • the processor 701 is used to schedule the second device participating in the distributed training according to the multiple second messages.
  • the specific execution process of the communication device 700 in this embodiment can refer to the description of the steps executed by the central node of the training network in the previous method embodiment, as well as the corresponding description in the content of the invention, which will not be repeated here.
  • the first device such as the central node in the training network
  • the first device can better plan and schedule the sub-nodes for distributed training, which is conducive to improving the efficiency of distributed training.
  • the device is a second device, which may be a sub-node of the training network (such as a base station, a terminal, etc.), or located in a sub-node of the training network (such as a base station, a terminal, etc.).
  • the transceiver 707 and the antenna 708 are used to receive a first message, which is used to request the second device to perform distributed training.
  • the transceiver 707 and the antenna 708 are also used to send a second message, which includes one or more of the following information: training status information of the second device, data status information of the second device, and first indication information indicating whether the second device meets the training requirements.
  • the processor 701 is used to participate in distributed training according to the scheduling of the first device.
  • the specific execution process of the communication device 700 in this embodiment can refer to the description of the steps executed by the sub-nodes of the training network in the previous method embodiment, as well as the corresponding description in the content of the invention, which will not be repeated here.
  • the second device can feedback training and/or data-related information to the first device, which is conducive to the first device to better plan and schedule sub-nodes for distributed training, and is conducive to improving the efficiency of distributed training.
  • the processor may be a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, and may implement or execute the methods, steps, and logic block diagrams disclosed in this application.
  • a general-purpose processor may be a microprocessor or any conventional processor, etc. The steps of the method disclosed in this application may be directly embodied as being executed by a hardware processor, or may be executed by a combination of hardware and software modules in the processor.
  • the memory may be a non-volatile memory, such as a hard disk drive (HDD) or a solid-state drive (SSD), etc., or a volatile memory (volatile memory), such as a random-access memory (RAM).
  • the memory is any other medium that can be used to carry or store the desired program code in the form of instructions or data structures and can be accessed by a computer, but is not limited thereto.
  • the memory in the present application may also be a circuit or any other device that can realize a storage function, used to store program instructions and/or data.
  • the present application provides another communication device, which includes a processor and an interface.
  • a processor is coupled to the memory, and the processor is used to read and execute computer instructions stored in the memory to implement the communication method in the embodiments shown in Figures 2 to 5.
  • the present application provides a communication system, which includes one of the entities or devices in the embodiments shown in Figures 2 to 5. or more.
  • the present application provides a computer-readable storage medium.
  • the computer-readable storage medium stores a program or instruction.
  • the program or instruction is executed on a computer, the computer executes the communication method in the embodiments shown in FIGS. 2 to 5 .
  • the present application provides a computer program product, which includes instructions, which, when executed on a computer, enable the computer to execute the communication method in the embodiments shown in FIG. 2 to FIG. 5 .
  • the present application provides a chip or a chip system, which includes at least one processor and an interface, the interface and the at least one processor are interconnected through lines, and the at least one processor is used to run computer programs or instructions to execute the communication method in the embodiments shown in Figures 2 to 5.
  • the interface in the chip may be an input/output interface, a pin or a circuit, etc.
  • the above-mentioned chip system can be a system on chip (SOC) or a baseband chip, etc., wherein the baseband chip can include a processor, a channel encoder, a digital signal processor, a modem and an interface module, etc.
  • SOC system on chip
  • baseband chip can include a processor, a channel encoder, a digital signal processor, a modem and an interface module, etc.
  • the chip or chip system described above in the present application further includes at least one memory, in which instructions are stored.
  • the memory may be a storage unit inside the chip, such as a register, a cache, etc., or a storage unit of the chip (e.g., a read-only memory, a random access memory, etc.).
  • the technical solution provided in this application can be implemented in whole or in part by software, hardware, firmware or any combination thereof.
  • software When implemented by software, it can be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general-purpose computer, a special-purpose computer, a computer network, a network device, a terminal device or other programmable device.
  • the computer instructions can be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions can be transmitted from a website site, computer, server or data center to another website site, computer, server or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) mode.
  • the computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server or data center that includes one or more available media integrated.
  • the available medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital video disc (DVD)), or a semiconductor medium, etc.
  • the various embodiments may reference each other, for example, the methods and/or terms between method embodiments may reference each other, for example, the functions and/or terms between device embodiments may reference each other, for example, the functions and/or terms between device embodiments and method embodiments may reference each other.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

La présente demande concerne un procédé de communication et un appareil de communication. Dans le procédé, selon une rétroaction d'informations de seconds appareils (par exemple, des sous-nœuds) (par exemple, renvoi d'informations d'état d'apprentissage, d'informations d'état de données, etc. des sous-nœuds), un premier appareil (par exemple, un nœud central) peut mieux planifier et programmer les seconds appareils pour effectuer un apprentissage distribué, ce qui permet d'améliorer l'efficacité de l'apprentissage distribué.
PCT/CN2024/098890 2023-06-16 2024-06-13 Procédé de communication et appareil de communication Pending WO2024255785A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202310721724.4 2023-06-16
CN202310721724.4A CN119154925A (zh) 2023-06-16 2023-06-16 一种通信方法及通信装置

Publications (1)

Publication Number Publication Date
WO2024255785A1 true WO2024255785A1 (fr) 2024-12-19

Family

ID=93814489

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2024/098890 Pending WO2024255785A1 (fr) 2023-06-16 2024-06-13 Procédé de communication et appareil de communication

Country Status (2)

Country Link
CN (1) CN119154925A (fr)
WO (1) WO2024255785A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111753997A (zh) * 2020-06-28 2020-10-09 北京百度网讯科技有限公司 分布式训练方法、系统、设备及存储介质
WO2022151071A1 (fr) * 2021-01-13 2022-07-21 Oppo广东移动通信有限公司 Procédé et appareil de détermination de nœud de tâche distribuée, dispositif et support
WO2023046269A1 (fr) * 2021-09-22 2023-03-30 Nokia Technologies Oy Amélioration de la sélection de nœuds de formation pour un apprentissage fédéré fiable
CN116010072A (zh) * 2021-10-22 2023-04-25 华为技术有限公司 机器学习模型的训练方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111753997A (zh) * 2020-06-28 2020-10-09 北京百度网讯科技有限公司 分布式训练方法、系统、设备及存储介质
WO2022151071A1 (fr) * 2021-01-13 2022-07-21 Oppo广东移动通信有限公司 Procédé et appareil de détermination de nœud de tâche distribuée, dispositif et support
WO2023046269A1 (fr) * 2021-09-22 2023-03-30 Nokia Technologies Oy Amélioration de la sélection de nœuds de formation pour un apprentissage fédéré fiable
CN116010072A (zh) * 2021-10-22 2023-04-25 华为技术有限公司 机器学习模型的训练方法和装置

Also Published As

Publication number Publication date
CN119154925A (zh) 2024-12-17

Similar Documents

Publication Publication Date Title
CN114125785A (zh) 数字孪生网络低时延高可靠传输方法、装置、设备及介质
WO2021244334A1 (fr) Procédé de traitement d'informations et dispositif associé
CN114091679A (zh) 一种更新机器学习模型的方法及通信装置
WO2020168761A1 (fr) Procédé et appareil d'apprentissage de modèle
CN114302421B (zh) 通信网络架构的生成方法、装置、电子设备及介质
CN117768875A (zh) 一种通信方法及装置
US20240311648A1 (en) Model training method and related apparatus
JP2024547061A (ja) 通信方法及び機器
WO2024017001A1 (fr) Procédé de formation de modèle et appareil de communication
Waqas et al. Mobility assisted content transmission for device-to-device communication underlaying cellular networks
WO2023279967A1 (fr) Procédé et dispositif d'entraînement de modèle intelligent
WO2024011376A1 (fr) Procédé et dispositif de planification de tâche pour service de fonction de réseau d'intelligence artificielle (ia)
CN111225384A (zh) 一种上行干扰建模方法、干扰确定方法和装置
CN116887327A (zh) 一种QoS预测方法及装置
WO2024255785A1 (fr) Procédé de communication et appareil de communication
US20250158896A1 (en) Artificial intelligence model processing method and related device
WO2022121671A1 (fr) Procédé, appareil, système et dispositif de gestion de fonctionnement et de maintenance de sous-réseau de tranches de réseau, et support
Chen et al. Minimizing age-upon-decisions in bufferless system: Service scheduling and decision interval
CN118153668A (zh) 一种鲁棒的联邦学习方法
Hu et al. Edge intelligence-based e-health wireless sensor network systems
CN117729520A (zh) 一种基于有效性的半异步联邦学习的车辆轨迹预测方法
WO2023186048A1 (fr) Procédé, appareil et système d'acquisition d'informations de service d'ia
WO2023208043A1 (fr) Dispositif électronique et procédé pour système de communication sans fil, et support de stockage
CN116801389A (zh) 无线能量传输方法、装置、设备及可读存储介质
CN115119315A (zh) 一种车联网资源分配方法、装置及计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24822734

Country of ref document: EP

Kind code of ref document: A1